Traffic Light Detection In Oriented Imagery Using ArcGIS Pretrained Model

  • 🔬 Data Science
  • 🥠 Deep Learning and Object classification


We have generally applied object detection on images taken looking straight down at the ground, like traditional satellite imagery, predictions from which can be visualized on a map and incorporated into your GIS. Other imagery, however, is more difficult to visualize and incorporate into your GIS. Such non-nadir oriented imagery includes oblique, bubble, 360-degree, street-side, and inspection imagery, among others. Through this sample, we will demonstrate the utility of an object detection model for detecting objects in an oriented imagery using ArcGIS API for Python.

The arcgis.learn module supports number of object detection models such as SingleShotDetector, RetinaNet, FasterRCNN, YoloV3 and even more. In the notebook, we will be using YoloV3 model for detecting traffic lights in the oriented imagery. The biggest advantage of YOLOv3 in arcgis.learn is that it comes preloaded with weights pretrained on the COCO dataset. This makes it ready-to-use for the 80 common objects (car, truck, person, etc.) that are part of the COCO dataset. Using this model, we will try to detect traffic light in the oriented imagery.

Necessary imports

import os, json, cv2
from math import *
import numpy as np
import itertools
import pandas as pd
import zipfile
from pathlib import Path
import arcgis, arcpy
from arcgis import GIS
from arcgis.geometry import Point
from arcgis.learn import YOLOv3

Download & setup data

We will need oriented imagery and oriented imagery meta data file so that we can use that for inferencing and plotting the points. We have sample images uploaded on the ArcGIS Online org. We will download those items below and use those for our workflow.

gis = GIS(
# Sample data can be directly downloded by clickng on the link below
oriented_imagery_data = gis.content.get("d606e6827c8746e383de96d8718be9a8") 
Oriented Imagery Sample Data
oriented imagery sample notebook dataImage Collection by api_data_owner
Last Modified: January 03, 2023
0 comments, 0 views
filepath = = os.getcwd(),
with zipfile.ZipFile(filepath, 'r') as zip_ref:

After the extraction of the zip file, we will set the path of the items which are there in the zip file which we will use in this workflow.

  • data_path: Folder containing all the oriented imagery.
  • image_meta_data : File containing meta data for all the oriented images in the data_path.
  • depth_image_path : Folder containing all the relative estimated depth image of oriented imagery.
data_path = Path(os.path.join(os.path.splitext(filepath)[0]), "street_view_data")
image_meta_data = Path(os.path.join(os.path.splitext(filepath)[0]), "oriented_imagery_meta_data.csv")
depth_image_path = Path(os.path.join(os.path.splitext(filepath)[0]), "saved_depth_image")
image_path_list = [os.path.join(data_path, image) for image in os.listdir(data_path)]

Model training

Since we will be using the pretrained YOLOv3 model so we will pass pretrained_backbone as True. In this way while initializing the YOLOv3 model the pre trained weights of the YOLOv3 model with COCO dataset will be downloaded. We will later be using these weights to detect traffic lights.

yolo = YOLOv3(pretrained_backbone=True)

Model inferencing

Once we have the model loaded and ready for inferencing, we will be create a function named traffic_light_finder that will take oriented image as input and will return 2 things.

  • Json containing traffic lights coordinates
  • Traffic lights annotated image

We will save all the traffic lights annotated image into a folder named traffic_light_marked and save all the annotations in a combined json file on the disk.

def traffic_light_finder(oriented_image_path):
    flag = 0
    coordlist = []
    temp_list = {}
    out = yolo.predict(oriented_image_path, threshold=0.5, batch_size = 4) # Depending upon your GPU capability, batch_size number can be changed.
    test_img = cv2.imread(oriented_image_path)
    if len(out[0]) == 0:
        temp_list["object"] = False
        for index, (value, label, confidence) in enumerate(zip(out[0], out[1], out[2])):
            if label == "traffic light":
                flag = 1
                    [int(value[0]), int(value[1]), int(value[2]), int(value[3])]
                test_img = cv2.rectangle(
                    (int(value[0]), int(value[1]), int(value[2]), int(value[3])),
                    (0, 0, 255),
                textvalue = label + "_" + str(confidence)
                    (int(value[0]), int(value[1]) - 10),
                    (0, 0, 255),
        if flag == 1:
            temp_list["object"] = True
            temp_list["coords"] = coordlist
            temp_list["assetname"] = "traffic light"
    return temp_list, test_img

Here we will create a folder named traffic_light_marked which will contain all the images with traffic lights detected on them. We can use these images to check the output of the model. Later we can use them for our use case.

marked_image_saved_folder = os.path.join(os.getcwd(), "traffic_light_marked")
os.makedirs(marked_image_saved_folder, exist_ok=True)
print("Path created for saving the images with traffic light detected on them : - ", marked_image_saved_folder)
Path created for saving the images with traffic light detected on them : -  C:\Users\roh12004\Documents\arcgis-python-api\samples\04_gis_analysts_data_scientists\traffic_light_marked
detections = {}
for e, image in enumerate(image_path_list):
        val_dict, out_image = traffic_light_finder(image)
        if bool(val_dict):
            detections[os.path.basename(image)] = val_dict
            cv2.imwrite(os.path.join(marked_image_saved_folder, os.path.basename(image)), out_image)
    except Exception as e:

Here we are also saving the coordinates of the traffic lights in a json file. We can use these coordinates to create a webmap or in any of the other use cases.

with open("traffic_light_data_sample.json", "w") as f:
    json.dump(detections, f)

Below are some of the images showcasing how the pretrained YOLOv3 model performs on the oriented imagery.

Relative depth estimation model

We now have run the YOLOv3 pretrained model on all the oriented images and got the coordinates of detected traffic lights in them.

We will now calculate the relative estimated depth of the objects in the oriented imagery. For that we have a pretrainded model available from the Open-source Bts-PyTorch based on From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation.

We have packagaed the model as a dlpk file which we can use with ArcGIS Pro to calculate the relative estimated depth of the objects in the oriented imagery.

For this sample notebook, we have already provided the output of this model on all the sample images in the folder saved_depth_image with the sample data downloded in the Oriented Imagery Sample Data.

depth_model_item = gis.content.get("c19f7ce733cd4811b5609566fa4cf5bb")
Relative Depth Estimation Model
A Deep Learning Model to estimate the relative depth of objects in street-level imagery.Deep Learning Package by api_data_owner
Last Modified: January 03, 2023
0 comments, 0 views

Once we have downloaded the dlpk file we will use it for calculating the estimated depth of the oriented images.

with arcpy.EnvManager(processorType='cpu'):
    out_classified_raster = \
            r"D:\sample\relative_depth_estimation.dlpk", None,
            'PROCESS_AS_MOSAICKED_IMAGE', None)"D:\sample\samplename.png")

Below is the image showcasing how the pretrained relative depth estimation model performs on the oriented imagery.

Extract location of traffic lights on map

We now have run the YOLOv3 pretrained model on all the oriented images and got the coordinates of detected traffic lights in them. We now also have the relative estimated depth of objects in oriented imagery.

We also have an oriented image meta data file in csv format (downloaded above) which contains the meta data of the oriented imagery like the coordinate at which the image was taken, AvgHtAG, CamHeading, CamOri, HFOV, VFOV etc. You can understand more about these data points from this document.

Using these data, now we will now try to find the exact location of the traffic lights on the map.

camera_df = pd.read_csv(image_meta_data)
Unnamed: 0AcquisitionDateAvgHtAGCamHeadingCamOriCamPitchCamRollFarDistHFOVImageNameNearDistOBJECTIDOITypeSHAPEVFOV
00NaN2.5320.865403|3346||582715.827|6063651.438|111.212|-38.134...88.64669-1.5836050.0360.0{'x': 2814531.6957999994, 'y': 7304332.8588000...180.0
11NaN2.5358.025903|3346||582717.142|6063646.62|111.26|-0.97405|...88.97281-2.2064150.0360.0{'x': 2814533.810899999, 'y': 7304324.47829999...180.0
22NaN2.56.044203|3346||582716.637|6063641.632|111.262|7.0442|...88.89201-3.1181050.0360.0{'x': 2814532.786800001, 'y': 7304315.84749999...180.0
33NaN2.56.314953|3346||582716.017|6063636.642|111.284|7.31495...88.85389-3.1405750.0360.0{'x': 2814531.5621999986, 'y': 7304307.2359, '...180.0
44NaN2.56.279973|3346||582715.377|6063631.674|111.302|7.27997...88.77918-3.3432750.0360.0{'x': 2814530.293200001, 'y': 7304298.6629, 's...180.0
dets = list(detections.keys())
def find_intersection(
    px = ((x1 * y2 - y1 * x2) * (x3 - x4) - (x1 - x2) * (x3 * y4 - y3
          * x4)) / ((x1 - x2) * (y3 - y4) - (y1 - y2) * (x3 - x4))
    py = ((x1 * y2 - y1 * x2) * (y3 - y4) - (y1 - y2) * (x3 * y4 - y3
          * x4)) / ((x1 - x2) * (y3 - y4) - (y1 - y2) * (x3 - x4))
    return [px, py]

def process(input_list, threshold=(10, 10)):
    combos = itertools.combinations(input_list, 2)
    points_to_remove = [point2 for (point1, point2) in combos
                        if abs(point1[0] - point2[0]) <= threshold[0]
                        and abs(point1[1] - point2[1]) <= threshold[1]]
    points_to_keep = [point for point in input_list if point
                      not in points_to_remove]
    return points_to_keep
(H, W, _) = cv2.imread(image_path_list[0]).shape
points = []

for i in range(len(dets) - 1):  # check coordinates of two consecutive images

    # load data of image1

    img1 = (dets[i])[:-4]
    cam1 = camera_df[camera_df['Name'] == img1].to_dict('records')[0]
    bboxes1 = detections[img1 + '.jpg']['coords']

    # load data of image2

    img2 = (dets[i + 1])[:-4]
    cam2 = camera_df[camera_df['Name'] == img2].to_dict('records')[0]
    bboxes2 = detections[img2 + '.jpg']['coords']

    for bbox1 in bboxes1:  # loop over all the bbox in image1
        if bbox1[3] > 50:  # ignore small bboxes

            # calculate the anngle of the object in image1

            direction_angle1 = cam1['CamHeading'] + cam1['HFOV'] / 2. \
                * (bbox1[0] + bbox1[2] / 2 - W / 2.) / (W / 2.)
            angle_subtended_by_object1 = cam1['VFOV'] * bbox1[3] / H

            # calculale the distance of object in image1 from center

            dist1 = OBJECT_HEIGHT_IN_WORLD \
                / tan(radians(angle_subtended_by_object1))
            dist1 = dist1 * pi

            # find coordinate of object in image1

            x12 = Point(eval(cam1['SHAPE']))['x'] + dist1 * cos(pi / 2
                    - radians(direction_angle1))
            y12 = Point(eval(cam1['SHAPE']))['y'] + dist1 * sin(pi / 2
                    - radians(direction_angle1))
            x11 = Point(eval(cam1['SHAPE']))['x']
            y11 = Point(eval(cam1['SHAPE']))['y']

            for bbox2 in bboxes2:  # loop over all the bbox in image2
                if bbox2[3] > 50:  # ignore small bboxes

                    # calculate the anngle of the object in image2

                    direction_angle2 = cam2['CamHeading'] + cam2['HFOV'
                            ] / 2. * (bbox2[0] + bbox2[2] / 2 - W / 2.) \
                        / (W / 2.)
                    angle_subtended_by_object2 = cam2['VFOV'] \
                        * bbox2[3] / H

                    # calculale the distance of object in image2 from center

                    dist2 = OBJECT_HEIGHT_IN_WORLD \
                        / tan(radians(angle_subtended_by_object2))
                    dist2 = dist2 * pi

                    # find coordinate of object in image2

                    x22 = Point(eval(cam2['SHAPE']))['x'] + dist2 \
                        * cos(pi / 2 - radians(direction_angle2))
                    y22 = Point(eval(cam2['SHAPE']))['y'] + dist2 \
                        * sin(pi / 2 - radians(direction_angle2))
                    x21 = Point(eval(cam2['SHAPE']))['x']
                    y21 = Point(eval(cam2['SHAPE']))['y']

                    # fin the point where coordinate from image1 and image2 intersect

                    pointval = find_intersection(

                    # load estimated depth image and select the mininum depth from the area where object is identified

                    (xmin, ymin, xmax, ymax) = (bbox2[0], bbox2[1],
                            bbox2[0] + bbox2[2], bbox2[1] + bbox2[3])
                    depth_image = \
                        cv2.imread(os.path.join(depth_image_path, img2
                                   + '.jpg'))
                    cropped_depth_image = depth_image[ymin:ymax, xmin:

                    # take the estimated depth as distance from the center

                    DIST = np.min(cropped_depth_image[:, :, 0])
                    DIST = DIST + 7

                    # find coordinate of object using estimated depth as distance

                    x22_1 = Point(eval(cam2['SHAPE']))['x'] + DIST \
                        * cos(pi / 2 - radians(direction_angle2))
                    y22_1 = Point(eval(cam2['SHAPE']))['y'] + DIST \
                        * sin(pi / 2 - radians(direction_angle2))

                    point0 = np.array([float(pointval[0]),
                    point1 = np.array([float(x22_1), float(y22_1)])

                    # calculate euclidian distance between the point where coordinate from image1 and image2 intersect and  point calcuated using estimated depth

                    dist_points = np.linalg.norm(point0 - point1)

                    # if distance is less than 5 then take the point

                    if dist_points < 5:

After the above mentioned process we have got some coordinates where there will be traffic lights but as one traffic light can be detected in multiple images therefore we will further cluster the data and take only one traffic light from a cluster.

In this way we will remove the redundant traffic light near a point.

print 'Number of traffic lights extracted - {}'.format(len(points))
outpoints = process(points)
print 'Number of traffic lights extracted after clustering and removing redundant traffic light - {}'.format(len(outpoints))
Number of traffic lights extracted - 302
Number of traffic lights extracted after clustering and removing redundant traffic light - 40


We will load a map and draw the final selected coordinates on it. These coordinates are the places where there are traffic lights.

m ='Vilnius City')
m = {'x': 25.28489583988743, 'y': 54.70681816057357,
            'spatialReference': {'wkid': 4326, 'latestWkid': 4326}}
m.zoom = 19
m.basemap = 'satellite'
for point in outpoints:
    intpoint = {'x': point[0], 'y': point[1],
                'spatialReference': {'wkid': 102100,
                'latestWkid': 3857}}
    m.draw(arcgis.geometry.Point(intpoint), symbol={
        'type': 'simple-marker',
        'style': 'square',
        'color': 'red',
        'size': '8px',


In this notebook, we have performed object detection on imagery taken at any angle naming oriented imagery. We use YoloV3 model with pretrained weights for detecting traffic lights and located these on the gis map using ArcGIS API for Python.


[1][Managing and visualizing oriented imagery](

[2][YOLOv3 Object Detector](

[3][Working with Oriented Imagery](

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.