Skip To Content ArcGIS for Developers Sign In Dashboard

ArcGIS API for Python

Download the samples Try it live

Reconstructing 3D buildings from Aerial LiDAR with Deep Learning

  • 🔬 Data Science
  • 🥠 Deep Learning and Instance Segmentation

Introduction

The workflow traditionally used to reconstruct 3D building models from aerial LiDAR is relatively straight-forward: the LiDAR point-cloud is transformed into a Digital Surface Model (DSM) raster, then inspected by human editors for buildings present. If a building is found, one or more polygons describing the roof form of the building is manually digitized, e.g. if it is a large hip roof with two gable outlets, there will be three polygons (one hip and two gables on top) drawn by the editor. Once all the roofs are described that way, a set of ArcGIS Procedural rules is applied to extrude the building models using the manually digitized roof segments, with heights and ridge directions computed from the DSM.

Figure 1. 3D building reconstruction from Lidar example: a building with complex roof shape and its representation in visible spectrum (RGB), Aerial LiDAR, and corresponding roof segments digitized by a human editor. The last one is a 3D reconstruction of the same building using manually digitized masks and ArcGIS Procedural rules.

The most time-consuming and expensive step in the above workflow is the manual search and digitization of the roof segment polygons from a DSM raster. In this notebook, we are going to focus on this challenging step and demonstrate how to detect instances of roof segments of various types using instance segmentation to make the process more efficient. The workflow consists of four major steps: (1) extract training data, (2) train a deep learning instance segmentation model, (3) model deployment and roof segments detection and (4) 3D enabling the detected segments.

Prerequisites

Complete data required to run this sample is packaged together in a project package and can be downloaded from here. You are also required to download the rule package used in Part 4 of this notebook from here.

Below are the items present in the project package shared:

  • D1_D2_D3_Buildings_1: labelled feature data for training data preparation
  • R7_nDSM_TestVal: raster image for training data preparation
  • DSM_AOI_Clip: DSM raster for area of interest, required during model inferencing
  • DTM_AOI_Clip: DTM raster for area of interest, required during model inferencing
  • DSM_AOI_Clip_DetectObjects_26032020_t4_220e: sample results obtained from the trained MaskRCNN model inferenced on area of interest obtained after performing part 3 of the notebook
  • DSM_AOI_Clip_DetectObjects_26032020_t4_220e_selection_3dEnabling: sample 3D enabled roof segments obtained after performing part 4 of the notebook

    Moreover, there is a toolbox (3d_workflow.tbx) in the 'Toolboxes' section of the project having the script (3dEnabling) to perform part 4 of the notebook.

Part 1 - Data Preparation

We started with two input data:

  • A single-band raster layer (R7_nDSM_TestVal) with 2.25 square feet per pixel resolution converted from LiDAR point cloud (using the “LAS Dataset to Raster” geoprocessing tool)
  • A feature class (D1_D2_D3_Buildings_1) that defines the location and label (i.e. flat, gable, hip, shed, mansard, vault, dome) of each roof segment.

We are using single band Lidar data which is essentially elevation to train our deep learning MaskRCNN model.

Figure 2. Example of different roof types (flat not shown).

Export training data

Export training data using 'Export Training data for deep learning' tool, detailed documentation here.

  • Input Raster: R7_nDSM_TestVal
  • Output Folder: Set a location where you want to export the training data, it can be an existing folder or the tool will create that for you.
  • Input Feature Class Or Classified Raster: D1_D2_D3_Buldings_1
  • Image Format: TIFF format
  • Tile Size X & Tile Size Y can be set to 256
  • Stride X & Stride Y: 128
  • Meta Data Format: Select 'RCNN Masks' as the data format because we are training a MaskRCNN model.
  • In Environments tab set an optimum Cell Size. For this example, as we have to perform the analysis on the LiDAR imagery, we used 0.2 cell size.

Figure 3. Export Training data screenshot from ArcGIS Pro.
arcpy.ia.ExportTrainingDataForDeepLearning(in_raster="R7_nDSM_TestVal",
out_folder=r"\Documents\PCNN\Only_nDSM",
in_class_data="D1_D2_D3_Buildings_1", 
image_chip_format="TIFF",
tile_size_x=256,
tile_size_y=256,
stride_x=128,
stride_y=128, 
output_nofeature_tiles="ONLY_TILES_WITH_FEATURES", 
metadata_format="RCNN_Masks",
start_index=0,
class_value_field="None",
buffer_radius=0,
in_mask_polygons=None,
rotation_angle=0, 
reference_system="MAP_SPACE",
processing_mode="PROCESS_AS_MOSAICKED_IMAGE",
blacken_around_feature="NO_BLACKEN",
crop_mode="FIXED_SIZE")

After filling all details and running the Export Training Data For Deep Learning tool, a code like above will be generated and executed. That will create all the necessary files needed for the next step in the 'Output Folder', and we will now call it our training data.

Part 2 - Model Training

You should already have the training chips with you exported from ArcGIS pro. Please change the path to your own export training data folder that contains "images" and "labels" folder. Note that we set a relatively small batch_size here on purpose as instance segmentation is a more computationally intensive task compared with object detection and pixel-based classification. If you run into "insufficient memory" issue during training, you can come back to adjust it to meet your own needs.

Necessary Imports

In [1]:
import os
from pathlib import Path

from arcgis.learn import prepare_data, MaskRCNN
In [2]:
#connect to GIS
from arcgis.gis import GIS
gis = GIS('home')

Prepare data

We will now use the prepare_data() function to apply various types of transformations and augmentations on the training data. These augmentations enable us to train a better model with limited data and also prevent the model from overfitting. prepare_data() takes 3 parameters.

  • path: path of the folder containing training data.
  • batch_size: No of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 8 worked for us on a 11GB GPU.
  • imagery_type: It is a mandatory input to enable a model for multispectral data processing. It can be "landsat8", "sentinel2", "naip", "ms" or "multispectral".
In [3]:
training_data = gis.content.get('fa74a15b6dae4e43bda2fb8d33d36299')
training_data
Out[3]:
building_reconstruction_using_mask_rcnn
Image Collection by api_data_owner
Last Modified: August 28, 2020
0 comments, 1 views
In [4]:
filepath = training_data.download(file_name=training_data.name)
In [5]:
import zipfile
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
In [6]:
data_path = Path(os.path.join(filepath.split('.')[0]))
In [4]:
data = prepare_data(data_path, batch_size=8, imagery_type='ms')

Visualize a few samples from your training data

To get a sense of what the training data looks like, arcgis.learn.show_batch() method randomly picks a few training chips and visualizes them. Note that the masks representing different roof segments are overlaid upon the original images with red and pink colors.

rows: No of rows we want to see the results for.

In [7]:
data.show_batch(rows=2)

Load model architecture

Here we use Mask R-CNN [1], a well-recognized instance algorithm, to detect roof segments (Figure 3). A Mask R-CNN model architecture and a pretrained model has already been predefined in arcgis.learn, so we can just define it with a single line. Please refer to the guide on our developers' site for more information.

The idea of Mask R-CNN is to detect objects in an image while simultaneously generating a high-quality segmentation mask for each instance. In other words, it is like a combination of UNet and SSD and does two jobs in one go. This is also why it is relatively computationally more intensive.

In [8]:
model = MaskRCNN(data)