- 🔬 Data Science
- 🥠 Deep Learning and Object Detection
- 🛤️ Tracking
Vehicle detection and tracking is a common problem with multiple use cases. Government authorities and private establishment might want to understand the traffic flowing through a place to better develop its infrastructure for the ease and convenience of everyone. A road widening project, timing the traffic signals and construction of parking spaces are a few examples where analysing the traffic is integral to the project.
Traditionally, identification and tracking has been carried out manually. A person will stand at a point and note the count of the vehicles and their types. Recently, sensors have been put into use, but they only solve the counting problem. Sensors will not be able to detect the type of vehicle.
In this notebook, we'll demonstrate how we can use deep learning to detect vehicles and then track them in a video. We'll use a short video taken from live traffic camera feed.
import os import pandas as pd from pathlib import Path from arcgis.gis import GIS from arcgis.learn import RetinaNet, prepare_data
gis = GIS('home')
You can download vehicle training data from here. Extract the downloaded file to get your training data.
Let's set a path to the folder that contains training images and their corresponding labels.
training_data = gis.content.get('ccaa060897e24b379a4ed2cfd263c15f') training_data
filepath = training_data.download(file_name=training_data.name)
import zipfile with zipfile.ZipFile(filepath, 'r') as zip_ref: zip_ref.extractall(Path(filepath).parent)
data_path = Path(os.path.join(os.path.splitext(filepath)))
We'll use the
prepare_data function to create a fastai databunch with the necessary parameters such as
chip_size. A complete list of parameters can be found in the API reference.
The given dataset has 235 images of size 854x480 pixels. We will define a
chip_size of 480 pixels which will create random crops of 480x480 from the given images. This way we will maintain the aspect ratios of the objects but can miss out on objects when training the model for fewer epochs. To avoid cropping, we can set
resize_to=480 so that every chip is an entire frame and doesn't miss any object, but there is a risk of poor detection with smaller sized object.
data = prepare_data(data_path, batch_size=4, dataset_type="PASCAL_VOC_rectangles", chip_size=480)
Please check your dataset. 9 images dont have the corresponding label files.
We see the warning above because there are a few images in our dataset with missing corresponding label files. These images will be ignored while loading the data. If it is a significant number, we might want to fix this issue by adding the label files for those images or removing those images.
We can use the
classes attribute of the data object to get information about the number of classes.
['background', 'bicycle', 'bus', 'car', 'motorcycle', 'person', 'scooter', 'tempo', 'tractor', 'truck', 'van']
To visualize and get a sense of the training data, we can use the
In the previous cell, we see a sample of the dataset. We can observe, in the given chips, that the most common vehicles are cars and bicycles. It can also be noticed that the different instance of the vehicles have varying scales.
arcgis.learn provides us object detection models which are based on pretrained convnets, such as ResNet, that act as the backbones. We will use
RetinaNet with the default parameters to create our vehicle detection model. For more details on
RetinaNet check out How RetinaNet works? and the API reference.
retinanet = RetinaNet(data)
We will use the
lr_find() method to find an optimum learning rate. It is important to set a learning rate at which we can train a model with good accuracy and speed.
lr = retinanet.lr_find()
We will now train the
RetinaNet model using the suggested learning rate from the previous step. We can specify how many epochs we want to train for. Let's train the model for 100 epochs. Also, we can turn
tensorboard True if we want to visualize the training process in tensorboard.
retinanet.fit(100, lr=lr, tensorboard=True)
After the training is complete, we can view the plot with training and validation losses.
To see sample results we can use the
show_results method. This method displays the chips from the validation dataset with ground truth (left) and predictions (right). We can also specify the threshold to view predictions at different confidence levels. This visual analysis helps in assessing the qualitative results of the trained model.