Automate Road Surface Investigation Using Deep Learning

  • 🔬 Data Science
  • 🥠 Deep Learning and Object Detection

Introduction and objective

Deterioration of road surface due to factors including vehicle overloading, poor construction quality, over ageing, natural disasters and other climatic conditions may lead to road pavement failure. This may result in traffic slowness causing jams and vehicle damage due to cracks. This also causes problems for civic authorities who are in need to accurately identify these cracks and do the repair work. If these cracks are not repaired at early stages, cost of repair gradually increases causing unnecessary burden on exchequer.

Traditionally, inspection of road surface is done by humans either by visually observing it or by using sophisticated machines which are expensive too. The manual approach to detect damage is not just time consuming but is also ineffective since detection of such damages requires consistent help from subject matter experts who have the ability to identify and differentiate different types of pavement failures. Artificial Intelligence supported by Deep Learning comes to the rescue. Deep learning integrated with ArcGIS plays a crucial role by automating the process.

In this notebook, We use a great labeled dataset of asphalt distress images from the 2018 IEEE Bigdata Cup Challenge in order to train our model to detect as well as to classify type of road cracks. The training and test data consists of 9,053 photographs, collected from smartphone cameras, hand labeled with the presence or absence of 8 road damage categories [1].

The table below shows sample images of the dataset corresponding to each of the 8 categories of damage type.

Class Name Class Description Image
D00 Liner, crack, longitudinal, wheel mark part
D01 Liner crack, longitudinal, construction joint part
D10 Liner crack, lateral, equal interval
D11 Liner crack, lateral, construction, joint part
D20 Alligator crack
D40 Rutting, bump, pothole, separation
D43 White line blur
D44 Cross walk blur

Through this sample, we will walk you through step-by-step process to build robust Deep Learning solution to identify road pavement failures and eventually integrate with ArcGIS as a reusable tool.

Necessary imports

# Restart the kernel after installation is complete
!pip install opencv-python==
import pandas as pd
import os
import shutil
from pathlib import Path

from arcgis.gis import GIS
from arcgis.features import GeoAccessor
from arcgis.learn import SingleShotDetector, prepare_data

Prepare data that will be used for training

You can download pavement cracks data from the following link: Extract the downloaded file and run the code below to prepare data in a format that deep learning models expect.

# Please uncomment the following code to prepare your training data.

# input_path = Path(input("Enter the path where you extracted data: "))
# output_path = Path(input("Enter the path where you want to create training data: "))
# try:
#     if not os.path.exists(output_path/'images') and os.path.exists(output_path/'labels'):
#         os.mkdir(output_path/'images')
#         os.mkdir(output_path/'labels')
# except: raise
# for fl in os.listdir(input_path):
#     if not(fl.startswith(".")):
#         for f in os.listdir(input_path/fl/'Annotations'):
#             if not(f.startswith(".")):
#                 img_name = f.split('.')[0] + '.jpg'
#                 shutil.copyfile(input_path/fl/'JPEGImages'/img_name, output_path/'images'/img_name)
#                 shutil.copyfile(input_path/fl/'Annotations'/f, output_path/'labels'/f)

Model training

You change the path to your own training data folder that contains "images" and "labels" folder.

gis = GIS('home')
training_data = gis.content.get('9c7274bbfac343f3aef33f2dc1ff4baf')
Image Collection by api_data_owner
Last Modified: August 25, 2020
0 comments, 4 views
filepath =
import zipfile
with zipfile.ZipFile(filepath, 'r') as zip_ref:
data_path = Path(os.path.join(os.path.splitext(filepath)[0]))

prepare_data function takes path to training data and creates a fastai databunch with specified transformation, batch size, split percentage,etc.

data = prepare_data(data_path,

We can use the classes attribute of the data object to get information about the number of classes.

['background', 'D00', 'D01', 'D10', 'D11', 'D20', 'D30', 'D40', 'D43', 'D44']

Visualize training data

To get a sense of what the training data looks like, arcgis.learn.show_batch() method randomly picks a few training chips and visualize them.


Load model architecture

arcgis.learn provides the SingleShotDetector (SSD) model for object detection tasks, which is based on a pretrained convnet, like ResNet that acts as the 'backbone'. More details about SSD can be found here.

We will use the SingleShotDetector to train the damage detection model with backbones as resnet101.

ssd = SingleShotDetector(data, backbone='resnet101',focal_loss=True)

Let us have a look at the results of the untrained model.


We see that the model is randomly detecting the road cracks. In order to give good results our model needs to be trained.

Learning rate is one of the most important hyperparameters in model training. We will use the lr_find() method to find an optimum learning rate at which we can train a robust model fast enough.

lr = ssd.lr_find()

Train a model

Based on the suggested learning rate above, we will start training our model with 30 epochs for the sake of time.

Input, lr=lr)
epoch train_loss valid_loss time
0 3.916756 3.534273 07:47
1 2.189290 1.960024 07:48
2 1.913575 1.737066 07:45
3 1.727438 1.574788 07:44
4 1.587650 2.134769 07:31
5 1.508131 1.415902 07:46
6 1.400807 5.037269 07:48
7 1.382145 1.719041 07:48
8 1.375488 4.048904 07:43
9 1.303755 1.848563 07:50
10 1.280773 1.222865 07:44
11 1.252260 1.214416 07:47
12 1.217753 1.236139 07:30
13 1.239035 1.161670 07:39
14 1.237716 1.127153 07:29
15 1.147980 1.103687 07:47
16 1.161228 1.105242 07:43
17 1.159945 1.075735 07:45
18 1.071214 1.058415 07:44
19 1.093338 1.065908 07:31
20 1.099237 1.042938 07:45
21 1.114819 1.041307 05:45
22 1.060352 1.031148 04:09
23 1.021770 1.024204 04:10
24 1.056092 1.101342 04:09
25 1.022077 1.014639 04:10
26 1.018347 1.020852 04:10
27 1.035899 1.017190 04:10
28 1.017030 1.005037 04:10
29 1.007083 1.005612 04:12

The graph below plots training and validation losses.


average_precision_score method computes average precision on the validation set for each class.

{'D00': 0.5585359730352724,
 'D01': 0.7302843487881194,
 'D10': 0.2577634234076642,
 'D11': 0.14445632230490446,
 'D20': 0.7623061137618858,
 'D30': 0.0,
 'D40': 0.16982323703158553,
 'D43': 0.813513353090408,
 'D44': 0.6490994066172426}

We can see the model accuracy for each class of our validation data. The model is giving varying results. Let's us dig deeper to find the reason for model to preform better on one class in comparison to the other. This will also help us understand why D30 class has zero average precision score.

# Calculate the number of images of each classs in training data
all_classes = []
for i, bb in enumerate(data.train_ds.y):
    all_classes +=[1].tolist()
df = pd.value_counts(all_classes, sort=False)
df.index = [data.classes[i] for i in df.index] 
D43     753
D00    2477
D44    3369
D01    3418
D10     677
D11     574
D20    2290
D30      22
D40     369
dtype: int64

We have only 22 images for training our model to detect class D30 which is very less. Thus, the model is giving poor score for this specific class.

Detect and visualize pavement cracks in validation set

ssd.show_results(rows=10, thresh=0.2, nms_overlap=0.5)

Save the model

As we can see, with 30 epochs, we are already seeing reasonable results. Further improvment can be acheived through more sophisticated hyperparameter tuning. Let's save the model for further training or inference later. The model should be saved into a models folder in your folder. By default, it will be saved into your data_path that you specified in the very beginning of this notebook.

Input / 'pavement-cracks-model-resnet101'))

Model inference

We will do model inference using the two methods: predict and predict_video. Let's get the data required to predict on image and video.

inference_data = gis.content.get('92a75cec191e4dbbb53067761287b977')
Image Collection by api_data_owner
Last Modified: November 23, 2020
0 comments, 0 views
inf_data_path =
import zipfile
with zipfile.ZipFile(inf_data_path, 'r') as zip_ref:
img_file = os.path.join(os.path.splitext(inf_data_path)[0], 'test_img.jpg')
video_file = os.path.join(os.path.splitext(inf_data_path)[0], 'test_video.mp4')
metadata_file = os.path.join(os.path.splitext(inf_data_path)[0], 'metadata.csv')

Detecting pavement cracks on an image

bbox_data = ssd.predict(img_file, threshold=0.1, visualize=True)

Detecting pavement cracks from video feed

100.00% [11295/11295 13:43<00:00]

Publish results to your GIS

The predict_video function also updates the metadata file provided in csv format with the detections at each frame. We will now read this csv using pandas and publish it as a layer on our GIS.

import pandas as pd
df = pd.read_csv(metadata_file)
UNIX Time Stamp Sensor Latitude Sensor Longitude Sensor True Altitude Frame Center Latitude Frame Center Longitude Frame Center Elevation vmtilocaldataset
0 1.564889e+15 28.412995 77.162906 294.736633 28.412995 77.162906 294.736633 \n
1 1.564889e+15 28.412995 77.162904 294.729656 28.412995 77.162904 294.729656 \n
2 1.564889e+15 28.412996 77.162902 294.722680 28.412996 77.162902 294.722680 \n
3 1.564889e+15 28.412997 77.162901 294.715703 28.412997 77.162901 294.715703 \n
4 1.564889e+15 28.412997 77.162899 294.708546 28.412997 77.162899 294.708546 \n
... ... ... ... ... ... ... ... ...
1794 1.564889e+15 28.414831 77.159583 278.085715 28.414831 77.159583 278.085715 NaN
1795 1.564889e+15 28.414831 77.159581 278.076278 28.414831 77.159581 278.076278 NaN
1796 1.564889e+15 28.414830 77.159578 278.066841 28.414830 77.159578 278.066841 NaN
1797 1.564889e+15 28.414830 77.159576 278.057561 28.414830 77.159576 278.057561 NaN
1798 1.564889e+15 28.414829 77.159574 278.047585 28.414829 77.159574 278.047585 NaN

1799 rows x 8 columns

The code below removes rows from the DataFrame with no detections and also creates a new columns which contains count of the number of detections at each frame.

# Handle NAN and '\n' values
df.vmtilocaldataset = df.vmtilocaldataset.str.strip()
df.loc[df.vmtilocaldataset == '', 'vmtilocaldataset'] = ''
df['count'] = ( df['vmtilocaldataset'].str.split(';').str.len().fillna(1) - 1)
fps = 60
a = (pd.Series(df.index.values) / fps)
a = (a - .49).round().abs()
df['group'] = a
# Get index of row with max detections in each group
max_detection_idxes = df[['group', 'count']].groupby('group').idxmax()['count'].values
# Extract rows for the indexes
df_flt = df.iloc[max_detection_idxes]
df_flt.drop(df_flt.loc[df['count']==0].index, inplace=True)
sdf = GeoAccessor.from_xy(df_flt, 'Sensor Longitude', 'Sensor Latitude')
cracks_lyr = gis.content.import_data(sdf, title='crack points')
crack points
Feature Layer Collection by demos_deldev
Last Modified: November 20, 2019
0 comments, 0 views
m1 ='Haryana, India')
m1.basemap = "satellite"
m1.add_layer(cracks_lyr, {"renderer":"ClassedSizeRenderer",
                                      "field_name": "count_"})


In this notebook, we learnt how civic authorities can automate road surface investigation using deep learning in order to make policy decisions. This will not only help in repairing exisiting cracks but may prevent pavement failures in future.


[1] Hiroya Maeda, et al. "Road Damage Detection Using Deep Neural Networks with Images Captured Through a Smartphone", 1801.09454, arXiv, 2018

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.