Reconstructing 3D buildings from Aerial LiDAR with Deep Learning¶
- 🔬 Data Science
- 🥠 Deep Learning and Instance Segmentation
Introduction¶
The workflow traditionally used to reconstruct 3D building models from aerial LiDAR is relatively straight-forward: the LiDAR point-cloud is transformed into a Digital Surface Model (DSM) raster, then inspected by human editors for buildings present. If a building is found, one or more polygons describing the roof form of the building is manually digitized, e.g. if it is a large hip roof with two gable outlets, there will be three polygons (one hip and two gables on top) drawn by the editor. Once all the roofs are described that way, a set of ArcGIS Procedural rules is applied to extrude the building models using the manually digitized roof segments, with heights and ridge directions computed from the DSM.
The most time-consuming and expensive step in the above workflow is the manual search and digitization of the roof segment polygons from a DSM raster. In this notebook, we are going to focus on this challenging step and demonstrate how to detect instances of roof segments of various types using instance segmentation to make the process more efficient. The workflow consists of four major steps: (1) extract training data, (2) train a deep learning instance segmentation model, (3) model deployment and roof segments detection and (4) 3D enabling the detected segments.
Prerequisites¶
Complete data required to run this sample is packaged together in a project package and can be downloaded from here. You are also required to download the rule package used in Part 4 of this notebook from here.
Below are the items present in the project package shared:
- D1_D2_D3_Buildings_1: labelled feature data for training data preparation
- R7_nDSM_TestVal: raster image for training data preparation
- DSM_AOI_Clip: DSM raster for area of interest, required during model inferencing
- DTM_AOI_Clip: DTM raster for area of interest, required during model inferencing
- DSM_AOI_Clip_DetectObjects_26032020_t4_220e: sample results obtained from the trained MaskRCNN model inferenced on area of interest obtained after performing part 3 of the notebook
DSM_AOI_Clip_DetectObjects_26032020_t4_220e_selection_3dEnabling: sample 3D enabled roof segments obtained after performing part 4 of the notebook
Moreover, there is a toolbox (3d_workflow.tbx) in the 'Toolboxes' section of the project having the script (3dEnabling) to perform part 4 of the notebook.
Part 1 - Data Preparation¶
We started with two input data:
- A single-band raster layer (R7_nDSM_TestVal) with 2.25 square feet per pixel resolution converted from LiDAR point cloud (using the “LAS Dataset to Raster” geoprocessing tool)
- A feature class (D1_D2_D3_Buildings_1) that defines the location and label (i.e. flat, gable, hip, shed, mansard, vault, dome) of each roof segment.
We are using single band Lidar data which is essentially elevation to train our deep learning MaskRCNN model.
Export training data¶
Export training data using 'Export Training data for deep learning' tool, detailed documentation here.
Input Raster
: R7_nDSM_TestValOutput Folder
: Set a location where you want to export the training data, it can be an existing folder or the tool will create that for you.Input Feature Class Or Classified Raster
: D1_D2_D3_Buldings_1Image Format
: TIFF formatTile Size X
&Tile Size Y
can be set to 256Stride X
&Stride Y
: 128Meta Data Format
: Select 'RCNN Masks' as the data format because we are training a MaskRCNN model.- In
Environments
tab set an optimumCell Size
. For this example, as we have to perform the analysis on the LiDAR imagery, we used 0.2 cell size.
arcpy.ia.ExportTrainingDataForDeepLearning(in_raster="R7_nDSM_TestVal",
out_folder=r"\Documents\PCNN\Only_nDSM",
in_class_data="D1_D2_D3_Buildings_1",
image_chip_format="TIFF",
tile_size_x=256,
tile_size_y=256,
stride_x=128,
stride_y=128,
output_nofeature_tiles="ONLY_TILES_WITH_FEATURES",
metadata_format="RCNN_Masks",
start_index=0,
class_value_field="None",
buffer_radius=0,
in_mask_polygons=None,
rotation_angle=0,
reference_system="MAP_SPACE",
processing_mode="PROCESS_AS_MOSAICKED_IMAGE",
blacken_around_feature="NO_BLACKEN",
crop_mode="FIXED_SIZE")
After filling all details and running the Export Training Data For Deep Learning tool
, a code like above will be generated and executed. That will create all the necessary files needed for the next step in the 'Output Folder', and we will now call it our training data.
Part 2 - Model Training¶
You should already have the training chips with you exported from ArcGIS pro. Please change the path to your own export training data folder that contains "images" and "labels" folder. Note that we set a relatively small batch_size
here on purpose as instance segmentation is a more computationally intensive task compared with object detection and pixel-based classification. If you run into "insufficient memory" issue during training, you can come back to adjust it to meet your own needs.
Necessary Imports¶
import os
from pathlib import Path
from arcgis.learn import prepare_data, MaskRCNN
#connect to GIS
from arcgis.gis import GIS
gis = GIS('home')
Prepare data¶
We will now use the prepare_data()
function to apply various types of transformations and augmentations on the training data. These augmentations enable us to train a better model with limited data and also prevent the model from overfitting. prepare_data()
takes 3 parameters.
path
: path of the folder containing training data.batch_size
: No of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 8 worked for us on a 11GB GPU.imagery_type
: It is a mandatory input to enable a model for multispectral data processing. It can be "landsat8", "sentinel2", "naip", "ms" or "multispectral".
training_data = gis.content.get('807425fa74f34d7695ec024eb934456c')
training_data
filepath = training_data.download(file_name=training_data.name)
import zipfile
with zipfile.ZipFile(filepath, 'r') as zip_ref:
zip_ref.extractall(Path(filepath).parent)
data_path = Path(os.path.join(os.path.splitext(filepath)[0]))
data = prepare_data(data_path, batch_size=8, imagery_type='ms')
Visualize a few samples from your training data¶
To get a sense of what the training data looks like, arcgis.learn.show_batch()
method randomly picks a few training chips and visualizes them. Note that the masks representing different roof segments are overlaid upon the original images with red and pink colors.
rows
: No of rows we want to see the results for.
data.show_batch(rows=2)
Load model architecture¶
Here we use Mask R-CNN [1], a well-recognized instance algorithm, to detect roof segments (Figure 3). A Mask R-CNN model architecture and a pretrained model has already been predefined in arcgis.learn
, so we can just define it with a single line. Please refer to the guide on our developers' site for more information.
The idea of Mask R-CNN is to detect objects in an image while simultaneously generating a high-quality segmentation mask for each instance. In other words, it is like a combination of UNet and SSD and does two jobs in one go. This is also why it is relatively computationally more intensive.
model = MaskRCNN(data)