Skip To Content ArcGIS for Developers Sign In Dashboard

ArcGIS API for Python

Download the samples Try it live

Vehicle detection and tracking using deep learning

  • 🔬 Data Science
  • 🥠 Deep Learning and Object Detection
  • 🛤️ Tracking

Introduction and objective

Vehicle detection and tracking is a common problem with multiple use cases. Government authorities and private establishment might want to understand the traffic flowing through a place to better develop its infrastructure for the ease and convenience of everyone. A road widening project, timing the traffic signals and construction of parking spaces are a few examples where analysing the traffic is integral to the project.

Traditionally, identification and tracking has been carried out manually. A person will stand at a point and note the count of the vehicles and their types. Recently, sensors have been put into use, but they only solve the counting problem. Sensors will not be able to detect the type of vehicle.

In this notebook, we'll demonstrate how we can use deep learning to detect vehicles and then track them in a video. We'll use a short video taken from live traffic camera feed.

Necessary imports

In [1]:
import os
import pandas as pd
from pathlib import Path

from arcgis.gis import GIS
from arcgis.learn import RetinaNet, prepare_data
In [2]:
gis = GIS('home')

Prepare data that will be used for training

You can download vehicle training data from here. Extract the downloaded file to get your training data.

Model training

Let's set a path to the folder that contains training images and their corresponding labels.

In [3]:
training_data = gis.content.get('ccaa060897e24b379a4ed2cfd263c15f')
training_data
Out[3]:
vehicle_detection_and_tracking
Image Collection by api_data_owner
Last Modified: August 26, 2020
0 comments, 10 views
In [4]:
filepath = training_data.download(file_name=training_data.name)
In [5]:
import zipfile
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
In [6]:
data_path = Path(os.path.join(filepath.split('.')[0]))

We'll use the prepare_data function to create a fastai databunch with the necessary parameters such as batch_size, and chip_size. A complete list of parameters can be found in the API reference.

The given dataset has 235 images of size 854x480 pixels. We will define a chip_size of 480 pixels which will create random crops of 480x480 from the given images. This way we will maintain the aspect ratios of the objects but can miss out on objects when training the model for fewer epochs. To avoid cropping, we can set resize_to=480 so that every chip is an entire frame and doesn't miss any object, but there is a risk of poor detection with smaller sized object.

In [8]:
data = prepare_data(data_path, 
                    batch_size=4, 
                    dataset_type="PASCAL_VOC_rectangles", 
                    chip_size=480)
Please check your dataset. 9 images dont have the corresponding label files.

We see the warning above because there are a few images in our dataset with missing corresponding label files. These images will be ignored while loading the data. If it is a significant number, we might want to fix this issue by adding the label files for those images or removing those images.

We can use the classes attribute of the data object to get information about the number of classes.

In [4]:
data.classes
Out[4]:
['background',
 'bicycle',
 'bus',
 'car',
 'motorcycle',
 'person',
 'scooter',
 'tempo',
 'tractor',
 'truck',
 'van']

Visualize training data

To visualize and get a sense of the training data, we can use the data.show_batch method.

In [14]:
data.show_batch()

In the previous cell, we see a sample of the dataset. We can observe, in the given chips, that the most common vehicles are cars and bicycles. It can also be noticed that the different instance of the vehicles have varying scales.

Load model architecture

arcgis.learn provides us object detection models which are based on pretrained convnets, such as ResNet, that act as the backbones. We will use RetinaNet with the default parameters to create our vehicle detection model. For more details on RetinaNet check out How RetinaNet works? and the API reference.

In [6]:
retinanet = RetinaNet(data)

We will use the lr_find() method to find an optimum learning rate. It is important to set a learning rate at which we can train a model with good accuracy and speed.

In [7]:
retinanet.lr_find()
Out[7]:
4.365158322401661e-05

Train the model

We will now train the RetinaNet model using the suggested learning rate from the previous step. We can specify how many epochs we want to train for. Let's train the model for 100 epochs. Also, we can turn tensorboard True if we want to visualize the training process in tensorboard.

In [8]:
retinanet.fit(100, lr=4.365158322401661e-05, tensorboard=True) 
epoch train_loss valid_loss time
0 2.651160 3.122699 00:33
1 2.727485 3.089710 00:32
2 2.744920 3.015922 00:32
3 2.671797 2.851994 00:31
4 2.457554 2.497410 00:31
5 2.381740 2.328834 00:31
6 2.060174 4.138567 00:31
7 1.792403 21.451857 00:31
8 1.712977 4.193508 00:31
9 1.608706 4.876813 00:32
10 1.496329 4.955950 00:32
11 1.575526 2.124239 00:33
12 1.448479 2.765982 00:31
13 1.356783 2.739088 00:31
14 1.296036 1.941170 00:32
15 1.235588 3.042969 00:32
16 1.177469 2.916740 00:32
17 1.163151 2.462182 00:32
18 1.124477 1.952319 00:32
19 1.055723 2.639346 00:32
20 0.976554 1.884056 00:32
21 0.865862 1.545389 00:32
22 0.885476 1.693674 00:32
23 0.861983 1.386624 00:32
24 0.812286 1.257245 00:33
25 0.794138 1.578588 00:32
26 0.765640 1.208835 00:34
27 0.702818 1.117395 00:32
28 0.669110 1.213653 00:33
29 0.674798 1.130191 00:32
30 0.675300 1.154881 00:32
31 0.680791 1.257907 00:33
32 0.655586 1.072347 00:32
33 0.586407 1.009210 00:32
34 0.570755 1.220290 00:33
35 0.590223 0.982790 00:34
36 0.575041 0.997690 00:33
37 0.585412 1.035814 00:33
38 0.572887 1.015082 00:33
39 0.552126 0.949728 00:32
40 0.535455 1.195224 00:33
41 0.499169 0.946746 00:33
42 0.527345 1.009812 00:34
43 0.547029 0.991675 00:33
44 0.515441 0.906661 00:33
45 0.547948 0.986166 00:33
46 0.517109 0.943002 00:33
47 0.474826 0.894875 00:33
48 0.440434 0.909886 00:33
49 0.441918 0.819840 00:33
50 0.433040 0.837711 00:33
51 0.424501 0.834161 00:33
52 0.442397 0.825194 00:33
53 0.438501 0.778577 00:34
54 0.425794 0.790809 00:33
55 0.405544 0.774125 00:34
56 0.397529 0.751094 00:34
57 0.386021 0.756899 00:33
58 0.395799 0.763772 00:33
59 0.385372 0.785581 00:35
60 0.379765 0.767338 00:34
61 0.369503 0.720050 00:33
62 0.367806 0.720712 00:35
63 0.378731 0.734859 00:34
64 0.368838 0.729135 00:33
65 0.344555 0.700024 00:35
66 0.340411 0.743908 00:35
67 0.350800 0.718764 00:34
68 0.364890 0.715524 00:35
69 0.337952 0.688673 00:34
70 0.348077 0.719215 00:35
71 0.323196 0.700020 00:34
72 0.361027 0.719423 00:35
73 0.367712 0.719814 00:35
74 0.367507 0.693808 00:35
75 0.347651 0.708264 00:35
76 0.345269 0.705601 00:34
77 0.341163 0.719633 00:34
78 0.321359 0.719021 00:34
79 0.325086 0.710695 00:34
80 0.307621 0.709985 00:34
81 0.312010 0.695209 00:34
82 0.308455 0.723050 00:34
83 0.333749 0.721235 00:34
84 0.323337 0.718696 00:33
85 0.330353 0.709316 00:34
86 0.337785 0.728645 00:33
87 0.299953 0.732279 00:33
88 0.309058 0.723001 00:33
89 0.341413 0.749138 00:33
90 0.332262 0.734328 00:33
91 0.306863 0.716808 00:33
92 0.300803 0.737754 00:33
93 0.313041 0.714918 00:33
94 0.329477 0.711772 00:33
95 0.321354 0.714558 00:33
96 0.321379 0.701373 00:34
97 0.301340 0.726296 00:33
98 0.297174 0.726158 00:33
99 0.310064 0.736690 00:33

After the training is complete, we can view the plot with training and validation losses.

In [21]:
retinanet.learn.recorder.plot_losses()

Visualize results on validation set

To see sample results we can use the show_results method. This method displays the chips from the validation dataset with ground truth (left) and predictions (right). We can also specify the threshold to view predictions at different confidence levels. This visual analysis helps in assessing the qualitative results of the trained model.

In [18]:
retinanet.show_results(thresh=0.4)