Reconstructing 3D buildings from Aerial LiDAR with Deep Learning¶
- 🔬 Data Science
- 🥠 Deep Learning and Instance Segmentation
Table of Contents¶
- Part 1 - Data Preparation
- Part 2 - Model Training
- Part 3 - Deploy Model and Detect Roof Types
- Part 4 - 3D enabling the MaskRCNN results
The workflow traditionally used to reconstruct 3D building models from aerial LiDAR is relatively straight-forward: the LiDAR point-cloud is transformed into a Digital Surface Model (DSM) raster, then inspected by human editors for buildings present. If a building is found, one or more polygons describing the roof form of the building is manually digitized, e.g. if it is a large hip roof with two gable outlets, there will be three polygons (one hip and two gables on top) drawn by the editor. Once all the roofs are described that way, a set of ArcGIS Procedural rules is applied to extrude the building models using the manually digitized roof segments, with heights and ridge directions computed from the DSM.
The most time-consuming and expensive step in the above workflow is the manual search and digitization of the roof segment polygons from a DSM raster. In this notebook, we are going to focus on this challenging step and demonstrate how to detect instances of roof segments of various types using instance segmentation to make the process more efficient. The workflow consists of four major steps: (1) extract training data, (2) train a deep learning instance segmentation model, (3) model deployment and roof segments detection and (4) 3D enabling the detected segments.
Complete data required to run this sample is packaged together in a project package and can be downloaded from here. You are also required to download the rule package used in Part 4 of this notebook from here.
Below are the items present in the project package shared:
- D1_D2_D3_Buildings_1: labelled feature data for training data preparation
- R7_nDSM_TestVal: raster image for training data preparation
- DSM_AOI_Clip: DSM raster for area of interest, required during model inferencing
- DTM_AOI_Clip: DTM raster for area of interest, required during model inferencing
- DSM_AOI_Clip_DetectObjects_26032020_t4_220e: sample results obtained from the trained MaskRCNN model inferenced on area of interest obtained after performing part 3 of the notebook
DSM_AOI_Clip_DetectObjects_26032020_t4_220e_selection_3dEnabling: sample 3D enabled roof segments obtained after performing part 4 of the notebook
Moreover, there is a toolbox (3d_workflow.tbx) in the 'Toolboxes' section of the project having the script (3dEnabling) to perform part 4 of the notebook.
Part 1 - Data Preparation¶
We started with two input data:
- A single-band raster layer (R7_nDSM_TestVal) with 2.25 square feet per pixel resolution converted from LiDAR point cloud (using the “LAS Dataset to Raster” geoprocessing tool)
- A feature class (D1_D2_D3_Buildings_1) that defines the location and label (i.e. flat, gable, hip, shed, mansard, vault, dome) of each roof segment.
We are using single band Lidar data which is essentially elevation to train our deep learning MaskRCNN model.
Export training data using 'Export Training data for deep learning' tool, detailed documentation here.
Input Raster: R7_nDSM_TestVal
Output Folder: Set a location where you want to export the training data, it can be an existing folder or the tool will create that for you.
Input Feature Class Or Classified Raster: D1_D2_D3_Buldings_1
Image Format: TIFF format
Tile Size X&
Tile Size Ycan be set to 256
Stride Y: 128
Meta Data Format: Select 'RCNN Masks' as the data format because we are training a MaskRCNN model.
Environmentstab set an optimum
Cell Size. For this example, as we have to perform the analysis on the LiDAR imagery, we used 0.2 cell size.
arcpy.ia.ExportTrainingDataForDeepLearning(in_raster="R7_nDSM_TestVal", out_folder=r"\Documents\PCNN\Only_nDSM", in_class_data="D1_D2_D3_Buildings_1", image_chip_format="TIFF", tile_size_x=256, tile_size_y=256, stride_x=128, stride_y=128, output_nofeature_tiles="ONLY_TILES_WITH_FEATURES", metadata_format="RCNN_Masks", start_index=0, class_value_field="None", buffer_radius=0, in_mask_polygons=None, rotation_angle=0, reference_system="MAP_SPACE", processing_mode="PROCESS_AS_MOSAICKED_IMAGE", blacken_around_feature="NO_BLACKEN", crop_mode="FIXED_SIZE")
After filling all details and running the
Export Training Data For Deep Learning tool, a code like above will be generated and executed. That will create all the necessary files needed for the next step in the 'Output Folder', and we will now call it our training data.
You should already have the training chips with you exported from ArcGIS pro. Please change the path to your own export training data folder that contains "images" and "labels" folder. Note that we set a relatively small
batch_size here on purpose as instance segmentation is a more computationally intensive task compared with object detection and pixel-based classification. If you run into "insufficient memory" issue during training, you can come back to adjust it to meet your own needs.
from arcgis.learn import prepare_data, MaskRCNN
#connect to GIS from arcgis.gis import GIS gis = GIS('home')
We will now use the
prepare_data() function to apply various types of transformations and augmentations on the training data. These augmentations enable us to train a better model with limited data and also prevent the model from overfitting.
prepare_data() takes 3 parameters.
path: path of the folder containing training data.
batch_size: No of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 8 worked for us on a 11GB GPU.
imagery_type: It is a mandatory input to enable a model for multispectral data processing. It can be "landsat8", "sentinel2", "naip", "ms" or "multispectral".
data_path = r'Documents\PCNN\Only_nDSM\Data_AOI2' data = prepare_data(data_path, batch_size=8, imagery_type='ms')
Visualize a few samples from your training data¶
To get a sense of what the training data looks like,
arcgis.learn.show_batch() method randomly picks a few training chips and visualizes them. Note that the masks representing different roof segments are overlaid upon the original images with red and pink colors.
rows: No of rows we want to see the results for.
Here we use Mask R-CNN , a well-recognized instance algorithm, to detect roof segments (Figure 3). A Mask R-CNN model architecture and a pretrained model has already been predefined in
arcgis.learn, so we can just define it with a single line. Please refer to the guide on our developers' site for more information.
The idea of Mask R-CNN is to detect objects in an image while simultaneously generating a high-quality segmentation mask for each instance. In other words, it is like a combination of UNet and SSD and does two jobs in one go. This is also why it is relatively computationally more intensive.
model = MaskRCNN(data)