Generate OD matrix

Generate OD Matrix creates an origin-destination cost matrix from multiple origins to multiple destinations. It returns a table that contains the travel cost, including travel time and travel distance from each origin to each destination within the specified impedance cutoff.

Generate OD Matrix

Usage notes

  • Generate OD Matrix requires two point DataFrames representing the origins and destinations.

  • A network dataset is required to run any network analysis tool. It must be locally accessible to all nodes in your Spark cluster. Use setNetwork() to load the network dataset from a mobile map package or a mobile geodatabase.

  • A travel mode refers to the mode of transportation, such as driving or walking. Use setTravelMode() to choose a mode defined in the network datasource, or a custom travel mode defined in a JSON string. By default, the tool uses the default travel mode in the network datasource.

  • It is required to set the impedance cutoff using setCutoff(). It accepts a single cutoff value. The impedance cutoff is used to set the maximum travel distance or travel time from an origin to a destination. Its unit should match the travel mode. For example, if the travel mode is set in the units of distance, the impedance cutoff is required to be set in distance. There are two types of cutoff values supported in the Generate OD Matrix tool:

    • Distance cutoff—Specify the maximum traveling distance between origins to destinations. For example, when analyzing walking distance, a cutoff value of 1 mile (e.g. setCutoff(1, "miles")) means that the tool will search for the destinations in 1 mile walking from the origin.

    • Time cutoff—Specify the maximum traveling time between origins to destinations. For example, when analyzing driving time, a cutoff value of 15 minutes (e.g. setCutoff(15, "minutes")) means the tool will search for the destinations within 15 driving minutes from the origin.

    The cutoff value is required to be positive. If the unit is missing, the tool will use the distance or time unit defined in the travel mode.

  • By default, the Generate OD Matrix tool finds all destinations within the impedance cutoff. If you interested in returning the travel cost from each incident to all records in the destination DataFrame, the impedance cutoff value must be larger than the traveling attribute from every origin to every destination.

  • Use setNumDestinations() to specify the number of destinations to find per origin. If there are multiple destinations with an equal traveling cost to an origin, Generate OD Matrix will break ties by randomly selecting one or more records from the equidistant destinations to ensure the specified number of closest destinations. For example, if you are interested in finding two closest destinations when there are three destinations that are equidistant from the origin, two of the three records will be randomly selected and returned in the output.

  • The impedance cutoff can result in fewer destinations returned than the specified number of destinations. For example, if you are interested in finding three closest destinations within a specified travel distance when there are two records within the distance, only the two destinations will be returned in the output. When there is no closest destinations found within a specified cutoff, it returns a Null value in the output for no destinations.

  • Generate OD Matrix does not output the true shape of routes, but the travel time and travel distance are calculated along the network and you can choose between StraightLines or NoLines:

    • StraightLines—A straight line from the origin to the destination.

    • NoLines—No line geometry will be returned.

    When the output route geometry is set to StraightLines, the primary geometry field of the output Dataframe is the route geometry field. When set to Nolines, there is not primary geometry field in the output Dataframe.

  • Use the setter accumulateAttributes() to specify cost attributes accumulated along the network. The cost attributes are defined in the network dataset. One or more Total_[Cost] columns will be returned, where Cost is the name of the cost attribute. For example, if the available cost attributes in the network dataset are Kilometers ,Minutes, and WalkTime, you can accumulate all attributes by calling accumulateAttributes("Kilometers", "Minutes", "WalkTime"). In this case, three output fields (Total_Kilometers, Total_Minutes, and Total_WalkTime) will be returned, representing the cost along the network between the associated origin and destination.

  • When travel mode is configured wth traffic data, you can specify the departure time from origins of the OD Matrix analysis using setTime(day_of_week, time, time_zone = "UTC"). GeoAnalytics Engine supports setting the departure time as a specific time in a generic weekday using setTime(day_of_week, time, time_zone = "UTC").

    day_of_week is a string representing the day of the week. Acceptable values are:

    • Sunday
    • Monday
    • Tuesday
    • Wednesday
    • Thursday
    • Friday
    • Saturday

    time is the time of day when the traffic information to be modeled. It can be provided in two formats:

    • A string in the format "HH:mm:ss", for example, "14:30:00".
    • A datetime.time object.

    time_zone is an optional string representing the time zone. The default option is Coordinated Universal Time (UTC). You can specify a time zone ID in the following formats to use local time.

    • UTC offset—a fixed offset from UTC. For example "UTC-05:00" represents the time zone that is five hours behind UTC. GeoAnalytics Engine does not account for Daylight Saving Time (DST). Only Standard Time (SDT) is used for UTC offset.
    • Time zone identifier—a standardized string that uniquely identifies a time zone region (e.g., "America/New_York"). For a comprehensive list of time zone identifiers, refer to this list of tz database time zones on Wikipedia.

    For example, to set the time for Friday at 8:15 AM in the "America/New_York" time zone, you can use setTime("Friday", "08:15:00", "America/New_York"), setTime("Friday", datetime.time(8,15), "America/New_York"), or setTime("Friday", "13:15:00").

    Traffic information will not be used in network analysis when setTime() is not used, or when the setter is used but no traffic data is configured with the travel mode .

  • The analysis will always be completed in the coordinate system of the network dataset. If the origin Dataframe or destination Dataframe is in a different coordinate system from the network dataset, it will be automatically transformed to the coordinate system of the network dataset.

    Learn more about coordinate systems and transformations

Limitations

  • Network analysis requires a network dataset from a mobile map package or a mobile geodatabase. Loading network data from a file geodatabase is not supported. Using a network service, such as the ArcGIS Online network analysis service, is not supported.

  • GeoAnalytics Engine does not support adding barriers in network analysis.

  • Generate OD Matrix doesn't output the routes along the network. You can use Find Closest Facilities if you are interested in the true shape of the routes.

Results

The following fields are included in the output DataFrame:

  • All fields from the origin DataFrame
  • All fields from the destination DataFrame

In addition, three fields describing the result are returned:

FieldDescription
RankThe rank of the destinations. The rank is given according to ascending-order travel distance or time.
TravelTimeThe travel time in minutes from the origin to the destination.
TravelDistanceThe travel distance in meters from the origin to the destination.

Travel time and travel distance are calculated along the network. If you specify the accumulative cost attributes, your output will have one ore more fields named Total_[Cost] representing the accumulated travel cost along the network between the associated origin and destination.

If you set the output route geometry to StraightLines, your output will have a field named route_geometry representing the straight line from the origin to the destination.

Performance notes

Improve the performance of Generate OD Matrix tool by doing one or more of the following:

  • Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:

    • ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
    • ST_BboxIntersects—Select records that intersect an envelope.
    • ST_EnvIntersects—Select records having an evelope that intersects the envelope of another geometry.
    • ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.
  • Set the route geometry to Nolines instead of StraightLines if you are only interested in determining the total travel time or travel distance between the origins and destinations.
  • Use smaller values for setCutoff() and setNumDestinations().

Similar capabilities

Syntax

For more details, go to the GeoAnalytics Engine API reference for generate OD matrix.

SetterDescriptionRequired
run(origins_df, destinations_df)Runs the Generate OD Matrix tool using the provided DataFrames.Yes
setNetwork(path)Sets the network data source from a mobile map package or a mobile geodatabase.Yes
setTravelMode(travel_mode)Sets the travel mode. By default, the tool uses the default travel mode in the network datasource.No
setCutoff(cutoff, unit=None)Sets the maximum travel distance or travel time searching for destinations for each origin. By default, it is in the unit of the impedance attribute used by the travel mode.Yes
setNumDestinations(count)Sets the number of destinations to find. The default is returning all destinations within the impedance cutoff.No
setRouteGeometry(route_geometry)Sets whether to return the straight line between the incidents and the destinations. Choose from 'StraightLines' (default) or 'NoLines'.No
setTime(day_of_week, time, time_zone = "UTC")Sets the departure time for the origins.No
accumulateAttributes(*attributes)Accumulates cost attributes along the network between the associated origin and destination.No accumulated cost is returned by default.No

Examples

Run Generate OD Matrix

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Log in
import geoanalytics
geoanalytics.auth(username="myusername", password="mypassword")

# Imports
from geoanalytics.tools import GenerateODMatrix
from pyspark.sql import functions as F

# Create an origins DataFrame
origins_url = r"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/SDFireStations/FeatureServer/0"
origins_df = spark.read.format("feature-service").load(origins_url) \
                  .select("FACILITYID", "FULLADDR", "PHONE", "ACTIVE", "shape")

# Create a destinations DataFrame
destinations_url = r"https://services1.arcgis.com/Ua5sjt3LWTPigjyD/arcgis/rest/services/Public_School_Location_201819/FeatureServer/0"
destinations_df = spark.read.format("feature-service").load(destinations_url) \
                       .filter(F.col("NMCNTY") == 'San Diego County') \
                       .select("NAME", "STREET", "CITY", "STATE", "ZIP", "shape")

# Access the Network Dataset
# This needs to be accessible to the machine that is running the Generate OD Matrix tool.
# If running on a cluster, it needs to be accessible to all nodes in the cluster.
california_network = r"/data/California.mmpk"

# Run the Generate OD Matrix tool
result = GenerateODMatrix() \
        .setNetwork(california_network) \
        .setTravelMode("trucking time") \
        .setCutoff(5, "minutes") \
        .setRouteGeometry("Straightlines") \
        .run(origins_df, destinations_df) \
        .where(F.col("TravelTime").isNotNull())
result.sort("FACILITYID","Rank").show(5)
Result
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
+----------+-----------------+------------+------+--------------------+----+------------------+------------------+--------------------+---------------+---------+-----+-----+--------------------+--------------------+
|FACILITYID|         FULLADDR|       PHONE|ACTIVE|               shape|Rank|        TravelTime|    TravelDistance|                NAME|         STREET|     CITY|STATE|  ZIP|              shape1|      route_geometry|
+----------+-----------------+------------+------+--------------------+----+------------------+------------------+--------------------+---------------+---------+-----+-----+--------------------+--------------------+
|         1|1222 First Avenue|619-533-4300|   Yes|{"x":-117.1644945...|   1|1.1340757666544818| 334.0548843626827|King-Chavez Commu...|      201 A St.|San Diego|   CA|92101|{"x":-117.1626419...|{"paths":[[[-117....|
|         1|1222 First Avenue|619-533-4300|   Yes|{"x":-117.1644945...|   2| 3.650845392071322| 797.4849532250146|Washington Elemen...| 1789 State St.|San Diego|   CA|92101|{"x":-117.1661579...|{"paths":[[[-117....|
|         1|1222 First Avenue|619-533-4300|   Yes|{"x":-117.1644945...|   3|4.1088793455192265|1338.0258726513907|East Village Midd...|1313 Park Blvd.|San Diego|   CA|92101|{"x":-117.1530408...|{"paths":[[[-117....|
|         1|1222 First Avenue|619-533-4300|   Yes|{"x":-117.1644945...|   4| 4.157779919668917| 2768.796215745292|       Garfield High|  1255 16th St.|San Diego|   CA|92101|{"x":-117.1492168...|{"paths":[[[-117....|
|         1|1222 First Avenue|619-533-4300|   Yes|{"x":-117.1644945...|   5| 4.418464951462468|1466.4997960703163|San Diego Interna...|1405 Park Blvd.|San Diego|   CA|92101|{"x":-117.1532308...|{"paths":[[[-117....|
+----------+-----------------+------------+------+--------------------+----+------------------+------------------+--------------------+---------------+---------+-----+-----+--------------------+--------------------+
only showing top 5 rows

Plot results

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
# Plot the true routes in black
result_plot = result.st.plot(basemap="light",figsize=(15,15));
# Plot the fire stations in orange
result.st.plot(geometry="shape", facecolor = "orange", marker_size=30, ax=result_plot)
# Plot fire stations in green
result.st.plot(geometry="shape1", facecolor = "green", marker_size=30, ax=result_plot)
result_plot.set_title("Straight lines between fire stations and public schools within 5 minute trucking time in San Diego")
result_plot.set_xlabel("X (Degrees)")
result_plot.set_ylabel("Y (Degrees)");
Plotting example for a Generate OD Matrix result.

Version table

ReleaseNotes

1.3.0

Python tool introduced

1.4.0

Added support for setting start time.

1.5.0

Added support for loading the network dataset using SparkContext.addFile.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.