Skip To Content ArcGIS for Developers Sign In Dashboard

ArcGIS API for Python

Guide to Network Analysis (Part 5 - Generate OD Cost Matrix)

Introduction

Now we have learned about Network Datasets and Network Analysis Layer (NA Layer) in Part 1, how to find routes from one point to another, and among multiple points in Part 2, how to generate service area in Part 3, and how to find closest facility in Part 4, let's move onto the fifth topic - how to create an Origin Destination Cost Matrix. Please refer to the road map below if you want to revisit the previous topics or jump to the next topic -

  • Network Dataset and Network Analysis services (Part 1)
  • Find Routes (Part 2)
  • Generate Service Area (Part 3)
  • Find Closest Facility (Part 4)
  • Generate Origin Destination Cost Matrix (You are here!)
  • Solve Location Allocation (Part 6)
  • Vehicle Routing Problem Service (Part 7)

An origin-destination (OD) cost matrix from multiple origins to multiple destinations, is a table that contains the cost, such as the travel time or travel distance, from each origin to each destination. Additionally, it ranks the destinations that each origin connects to in ascending order based on the minimum cost required to travel from that origin to each destination. When generating an OD cost matrix, you can optionally specify the maximum number of destinations to find for each origin and the maximum time or distance to travel when searching for destinations [1].

The API of Generate Origin Destination Cost Matrix is useful when you need to find and measure the least-cost paths along the network from multiple origins to multiple destinations, and you can specify the number of destinations to find and a maximum distance to search. Note that, even though the OD cost matrix solver doesn't output lines that follow the network, the values stored in the Lines attribute table reflect the network distance, not the straight-line distance [2].

Fig 1. The OD cost matrix found the least-cost paths from each origin to the four nearest destinations. The output shape type was set to produce straight lines (source: [2]).

Fig 2. Even though the OD cost matrix solver doesn't output lines that follow the network, the values stored in the Lines attribute table reflect the network distance, not the straight-line distance. This is because by default the displaying option for the output lines in the OD Cost Matrix analysis layer is the generalized straight lines from one point to another. In doing such, the server side can save computation resources and network bandwidth in transferring. However, their cost attributes always report the least-cost network path (source: [3]).

This part of guide to Networking Analysis will help you measure the cost of traveling between origins and destinations. When you are ready, let's make your hands dirty with some real implementations!

Differences between Closest Facility and OD cost Matrix

The closest facility and OD cost matrix solvers perform very similar analyses; the main difference, however, is in the output and the computation speed.

OD cost matrix generates results more quickly but cannot return the true shapes of routes or their driving directions. It is designed to quickly solve large M x N problems and, as a result, does not internally contain the information required to generate route shapes and driving directions.

Alternatively, the closest facility solver returns routes and directions but performs the analysis more slowly than the OD cost matrix solver. If you need driving directions or true shapes of routes, use the closest facility solver; otherwise, use the OD cost matrix solver to reduce the computation time [2].

Methods

The ArcGIS API for Python provides two ways to solve a OD Cost matrix problem, which as shown in the table below:

Operation network.analysis features.use_proximity
Route find_routes plan_routes
ServiceArea generate_service_areas create_drive_time_areas
ClosestFacility find_closest_facilities find_nearest
OD Cost Matrix generate_origin_destination_cost_matrix connect_origins_to_destinations

These two methods are defined in different modules of the arcgis package, and will make distinct REST calls in the back end. A key separation from network.analysis to features.use_proximity is that the former provides full capabilities of solvers and runs faster, and the latter is workflow-driven and provides service-to-service I/O approach.

We will walk through each one of these implementations and further explore the differences in the process. Getting started, let's first connect to a GIS instance.

Connect to GIS object

In [1]:
from arcgis.gis import GIS
import arcgis.network as network
from arcgis.features import FeatureLayer, Feature, FeatureSet, FeatureCollection, use_proximity
import pandas as pd
import time
import datetime as dt

If you have already set up a profile to connect to your ArcGIS Online organization, execute the cell below to load the profile and create the GIS class object. If not, use a traditional username/password log-in e.g. my_gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123', verify_cert=False, set_active=True)

In [2]:
my_gis = GIS(profile="your_online_profile")

Method 1 - using arcgis.network.analysis.generate_origin_destination_cost_matrix

Problem statement

Part 5 focuses on how to measure the cost of traveling from facilities to the incidents. Let's assume the first user story is like this:

Jim is an operator working for Medical Emergency Response Office, and needs to dispatch ambulance from nearby hospitals to the spot of incident. Now given the three incidents reported in Redlands and Loma Linda, is Jim able to provide an origin-destination cost matrix?

Now that Jim's objectives are defined, we can go onto break down the problem:

  • Data: where to access the input datasets
  • Methods: what tools can be used to build the network model and perform OD Cost Matrix analysis
  • Tables and maps: deliverables in which directions and routes are visualized.

Let's first access and/or explore the input datasets (in this case, the origins and destinations feature class).

Define Origins and Destinations Feature Class

Generate Origin Destination Cost Matrix can be associated with a local network dataset or a network service hosted in ArcGIS Online or ArcGIS Enterprise. Here, we will be using an existing feature layer that contains hospitals derived from various sources (refer SOURCE field) for the Homeland Infrastructure Foundation-Level Data (HIFLD) database (https://gii.dhs.gov/HIFLD), which can be accessed from esri_livingatlas.

If you do not have access to the hospital layer provided by Esri_livingatlas as referenced in the cell below, an alternative approach is for you to download the hospital listing of San Bernadino County from this source and publish the csv to the organization before proceeding forward.

In [3]:
""" This try-except block will help you download the CSV and publish to current GIS object, if 
    "You do not have permissions to access this resource or perform this operation."
"""
try:
    hospital_item = my_gis.content.get("a2817bf9632a43f5ad1c6b0c153b0fab")
except RuntimeError as ne:
    try:
        print("Trying from an alternative source...")
        hospital_item = my_gis.content.get("50fb63f303304835a048d16bd86c3024")
    except RuntimeError as ne:
        print("Trying to publish from csv...")
        import requests
        import csv
        import os

        out_file_name = 'hospitals_SB_county.csv'
        url = "https://data.chhs.ca.gov/datastore/dump/641c5557-7d65-4379-8fea-6b7dedbda40b?q=&sort=_id+asc&fields=OSHPD_ID%2CFACILITY_NAME%2CLICENSE_NUM%2CFACILITY_LEVEL_DESC%2CDBA_ADDRESS1%2CDBA_CITY%2CDBA_ZIP_CODE%2CCOUNTY_CODE%2CCOUNTY_NAME%2CER_SERVICE_LEVEL_DESC%2CTOTAL_NUMBER_BEDS%2CFACILITY_STATUS_DESC%2CFACILITY_STATUS_DATE%2CLICENSE_TYPE_DESC%2CLICENSE_CATEGORY_DESC%2CLATITUDE%2CLONGITUDE&filters=%7B%22COUNTY_CODE%22%3A+%5B%2236%22%5D%7D&format=csv"
        download = requests.get(url)

        with open(out_file_name, 'w') as out_file:
            out_file.writelines(download.text)
            print(out_file_name)
        csv_item = my_gis.content.add({}, out_file_name)
        hospital_item = csv_item.publish()
display(hospital_item)
You do not have permissions to access this resource or perform this operation.
Trying from an alternative source...
hospitals_SB_county
Feature Layer Collection by arcgis_python
Last Modified: September 27, 2019
0 comments, 10 views
In [4]:
hospital_fl = FeatureLayer(hospital_item.url + "/0")

""" If you are using the exisiting layer from Esri_LivngAtlas, there is a "County" column in the dataset;
    or else, the feature layer collection published from the downloaded CSV file is already targetted at SB County.
"""
try:
    facilities = hospital_fl.query(where="County='SAN BERNARDINO' AND State='CA'", as_df=False)
except RuntimeError as re:
    """ when seeing 'Invalid field: County' parameter is invalid
    """
    print("Trying from an alternative approach...")
    facilities = hospital_fl.query(where="Dba_city='REDLANDS' or Dba_city='LOMA LINDA'", as_df=False)
display(facilities)
'Invalid field: County' parameter is invalid
Trying from an alternative approach...
<FeatureSet> 33 features

Now we have the facilities layer ready, we can go onto define the incidents layer. Here, we randomly picked six locations in Redlands, CA.

In [5]:
incidents_json = {
                    "features": [{"attributes": {"CurbApproach": 0,
                                                 "ID": "C100045",
                                                 "Name": "Incident at Esri"},
                                  "geometry": {"x": -117.19569523299998, "y": 34.05608640000003}},
                                 {"attributes": {"CurbApproach": 0,
                                                 "ID": "F100086",
                                                 "Name": "Incident at APT"},
                                  "geometry": {"x": -117.20520037855628, "y": 34.04472649163186}},
                                 {"attributes": {"CurbApproach": 0,
                                                 "ID": "C100097",
                                                 "Name": "Incident at Walmart"},
                                  "geometry": {"x": -117.222253, "y": 34.065378}},
                                 {"attributes": {"CurbApproach": 0,
                                                 "ID": "C100097",
                                                 "Name": "Incident at High school"},
                                  "geometry": {"x": -117.192296, "y": 34.060759}},
                                 {"attributes": {"CurbApproach": 0,
                                                 "ID": "C100097",
                                                 "Name": "Incident at Bowling"},
                                  "geometry": {"x": -117.194699, "y": 34.063568}},
                                 {"attributes": {"CurbApproach": 0,
                                                 "ID": "C100097",
                                                 "Name": "Incident at Burger"},
                                  "geometry": {"x": -117.201201, "y": 34.063364}}],
                    "spatialReference": {"wkid": 4326, "latestWkid": 4326},
                    "geometryType": "esriGeometryPoint",
                    "fields" : [
                        {"name" : "ID", "type" : "esriFieldTypeString", "alias" : "ID", "length" : "50"},
                        {"name" : "Name", "type" : "esriFieldTypeString", "alias" : "Name", "length" : "50"},
                        {"name" : "CurbApproach", "type" : "esriFieldTypeInteger", "alias" : "CurbApproach"}
                    ]
                }
incidents = FeatureSet.from_dict(incidents_json)

Let's have the two layers visualized in the map view.

In [82]:
map1 = my_gis.map('Redlands, CA', zoomlevel=12)
map1
In [11]:
map1.clear_graphics()

hospital_symbol = {"type":"esriPMS",
                   "url":"http://static.arcgis.com/images/Symbols/SafetyHealth/Hospital.png",
                   "contentType": "image/png", "width":20, "height":20}

map1.draw(facilities, symbol=hospital_symbol)
In [12]:
traffic_accident_symbol = {"type":"esriPMS",
                           "url":"http://static.arcgis.com/images/Symbols/Transportation/TrafficAccident.png",
                           "contentType": "image/png", "width":20, "height":20}

map1.draw(incidents, symbol=traffic_accident_symbol)

Solving Problem

What's worth mentioning is that, in using `generate_origin_destination_cost_matrix`, 0.0005 credits per input origin and destination pair will be charged. For example, if there are 100 origins and 200 destinations, the cost will be 10 credits. If you specify a cutoff or limit the number of destinations, for instance, to find only 5 closest destinations within 10 minutes of every origin, the cost will still be 10 credits, as the credits depend on the number of input origin destination pairs.

  • TargetDestinationCount- The maximum number of destinations that must be found for the origin. If a value is not specified, the value from the Number of Destinations to Find parameter is used.

  • Cutoff- Specify the travel time or travel distance value at which to stop searching for destinations from the origin. Any destination beyond the cutoff value will not be considered. The value needs to be in the units specified by the Time Units parameter if the impedance attribute in your travel mode is time based or in the units specified by the Distance Units parameter if the impedance attribute in your travel mode is distance based. If a value is not specified, the tool will not enforce any travel time or travel distance limit when searching for destinations.

  • Specify origin_destination_line_shape to see the output in map. The resulting lines of an OD cost matrix can be represented with either straight-line geometry or no geometry at all. Even though the lines are straight for performance reasons, they always store the travel time and travel distance along the street network, not straight-line distance.

Note: You can set save_output_network_analysis_layer to True if you want to output the resulting NA layer as Layer file, though this process can take up more computation time [4].

In [6]:
%%time

current_time= dt.datetime.now()

result1 = network.analysis.generate_origin_destination_cost_matrix(origins=incidents, destinations=facilities, 
                                                                   cutoff=10, time_of_day=current_time, 
                                                                   number_of_destinations_to_find=4,
                                                                   origin_destination_line_shape='Straight Line',
                                                                   gis=my_gis)
Wall time: 8.35 s

Now, check if the tool is run successfully, and what are types of the elements in the returned result set, and also get the output network analysis layer's url.

In [7]:
print('Analysis succeeded? {}'.format(result1.solve_succeeded))
Analysis succeeded? True
In [8]:
display(result1.output_origin_destination_lines,result1.output_origins, result1.output_destinations)
<FeatureSet> 24 features
<FeatureSet> 6 features
<FeatureSet> 33 features

Tabularizing the response from generate_origin_destination_cost_matrix

Now, let's see the output lines table.

In [9]:
result1.output_origin_destination_lines.sdf.tail()
Out[9]:
DestinationName DestinationOID DestinationRank ObjectID OriginName OriginOID SHAPE Shape_Length Total_Distance Total_Time
19 Location 4 23 4 20 Incident at Bowling 5 {'paths': [[[-117.19469899999996, 34.063568000... 0.014748 1.909752 4.016058
20 Location 31 357 1 21 Incident at Burger 6 {'paths': [[[-117.20120099999997, 34.063364000... 0.001528 0.202159 0.665024
21 Location 27 245 2 22 Incident at Burger 6 {'paths': [[[-117.20120099999997, 34.063364000... 0.005566 0.684698 1.734902
22 Location 28 256 3 23 Incident at Burger 6 {'paths': [[[-117.20120099999997, 34.063364000... 0.005566 0.684698 1.734902
23 Location 5 41 4 24 Incident at Burger 6 {'paths': [[[-117.20120099999997, 34.063364000... 0.016599 2.150643 3.899969
In [10]:
# filter only the required columns
od_df = result1.output_origin_destination_lines.sdf[['OriginName', 'OriginOID','DestinationOID','Total_Distance','Total_Time']]
display(od_df)
OriginName OriginOID DestinationOID Total_Distance Total_Time
0 Incident at Esri 1 245 1.103517 2.071802
1 Incident at Esri 1 256 1.103517 2.071802
2 Incident at Esri 1 357 1.229031 2.671727
3 Incident at Esri 1 23 1.710835 3.592510
4 Incident at APT 2 65 0.692399 1.274009
5 Incident at APT 2 41 0.431708 1.310646
6 Incident at APT 2 90 1.093573 1.947980
7 Incident at APT 2 98 1.332103 2.174232
8 Incident at Walmart 3 121 1.269695 3.007999
9 Incident at Walmart 3 207 1.413950 3.194782
10 Incident at Walmart 3 357 2.099291 4.401875
11 Incident at Walmart 3 48 1.978585 4.644457
12 Incident at High school 4 245 1.295173 2.218085
13 Incident at High school 4 256 1.295173 2.218085
14 Incident at High school 4 357 1.285388 2.442592
15 Incident at High school 4 23 1.269365 2.524285
16 Incident at Bowling 5 357 0.819273 2.058579
17 Incident at Bowling 5 245 1.137488 2.551424
18 Incident at Bowling 5 256 1.137488 2.551424
19 Incident at Bowling 5 23 1.909752 4.016058
20 Incident at Burger 6 357 0.202159 0.665024
21 Incident at Burger 6 245 0.684698 1.734902
22 Incident at Burger 6 256 0.684698 1.734902
23 Incident at Burger 6 41 2.150643 3.899969

We need to change the format to get a matrix with rows as origins and columns as destinations, with impedance value as travel time or travel distance. We will use the pivot_table feature of Pandas to accomplish that.

Note that number_of_destinations_to_find is set to 4, so for the destinations that are not within the top 4 closest to the origins, the total_distance will shows as NaN in the output table below:

In [11]:
# user pivot_table
od_pivot = od_df.pivot_table(index='OriginOID', columns='DestinationOID')
od_pivot
Out[11]:
Total_Distance ... Total_Time
DestinationOID 23 41 48 65 90 98 121 207 245 256 ... 41 48 65 90 98 121 207 245 256 357
OriginOID
1 1.710835 NaN NaN NaN NaN NaN NaN NaN 1.103517 1.103517 ... NaN NaN NaN NaN NaN NaN NaN 2.071802 2.071802 2.671727
2 NaN 0.431708 NaN 0.692399 1.093573 1.332103 NaN NaN NaN NaN ... 1.310646 NaN 1.274009 1.94798 2.174232 NaN NaN NaN NaN NaN
3 NaN NaN 1.978585 NaN NaN NaN 1.269695 1.41395 NaN NaN ... NaN 4.644457 NaN NaN NaN 3.007999 3.194782 NaN NaN 4.401875
4 1.269365 NaN NaN NaN NaN NaN NaN NaN 1.295173 1.295173 ... NaN NaN NaN NaN NaN NaN NaN 2.218085 2.218085 2.442592
5 1.909752 NaN NaN NaN NaN NaN NaN NaN 1.137488 1.137488 ... NaN NaN NaN NaN NaN NaN NaN 2.551424 2.551424 2.058579
6 NaN 2.150643 NaN NaN NaN NaN NaN NaN 0.684698 0.684698 ... 3.899969 NaN NaN NaN NaN NaN NaN 1.734902 1.734902 0.665024

6 rows × 22 columns

If you want to export the pivot table into a file, use to_csv to save as file-based csv output.

In [13]:
od_pivot.to_csv('data/OD_Matrix.csv')

Visualizing the response from generate_origin_destination_cost_matrix

With the customized symbologies defined here, output_origin_destination_lines can also be rendered on the map view:

In [14]:
""" Define a list of colors/styles for routes;
    alternatively, use "'type': 'simple-line', 'style': 'solid'"
"""
allocation_line_symbol = [{'type': 'esriSLS', 'style': 'esriSLSSolid',
                            'color': [237, 17, 189, 100], 'width': 6},
                          {'type': 'esriSLS', 'style': 'esriSLSSolid',
                            'color': [30, 139, 235, 90], 'width': 6},
                          {'type': 'esriSLS', 'style': 'esriSLSSolid',
                            'color': [66, 240, 43, 80], 'width': 6},
                          {'type': 'esriSLS', 'style': 'esriSLSSolid',
                            'color': [240, 234, 67, 70], 'width': 6}]

names_list = ["Incident at Esri",
              "Incident at APT",
              "Incident at Walmart",
              "Incident at High school",
              "Incident at Bowling",
              "Incident at Burger"]
In [5]:
map2 = my_gis.map('Redlands, CA', zoomlevel=13)
map2.basemap = "dark-gray"
map2