GeoAnalytics Engine supports geocoding and network analysis tools. Before using the tools, set up the locator and network dataset to be accessible to your Spark environment and configure the runtime data licenses if needed.

Accessing locators and network datasets

The locator or network dataset can be stored on local or cloud storage such as Amazon S3. The data must be locally accessible to all nodes in your Spark cluster. There are two supported workflows for setting up the data:

  • The data is copied to each node's local file system at the initialization of the Spark cluster.
  • The data is distributed across the cluster using SparkContext.addFile after the Spark session has started.

Set up data during Spark cluster initialization

You can first upload the locator or network dataset to a cloud storage like Amazon S3, and mount or copy it to each node's local system. The location storing these files in each node's file system needs to have enough disk space to store the locator or network dataset.

Below is an example to set up the locator and network dataset using init script in Databricks:

  1. Install GeoAnalytics Engine on Databricks.
  2. Upload the locator or network dataset to a cloud file system like S3.
  3. On a notebook, mount the locator or network dataset to DBFS using the dbutils.fs.mount command.
    Use dark colors for code blocksCopy
    1
    2
    dbutils.fs.mount("s3://bucket_name/path_to_locator", "/mnt/locators")
    dbutils.fs.mount("s3://bucket_name/path_to_network_datases", "/mnt/network_datasets")
  4. Update the Cluster-scoped init script to copy files from the mounted location to /databricks/.
    Use dark colors for code blocksCopy
    1
    2
    cp -r /dbfs/mnt/locators/. /databricks/locators/
    cp -r /dbfs/mnt/network_datasets/. /databricks/network_datasets/
  5. After the files are copied locally, reference the local paths when running geocoding or network analysis tools. You can use SparkFiles.get to get the local path of the file on each node. Then run the geocoding or network analysis tools with the locator or network dataset file. For example,
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    
    from geoanalytics.tools import CreateServiceAreas
    
    result = CreateServiceAreas() \
             .setNetwork("/network_datasets/example.mmpk") \
             .setCutoffs(5, "minutes") \
             .run(facilities)

Set up data using SparkContext.addFile

ArcGIS GeoAnalytics Engine 1.5.x and above supports loading locators and network datasets using SparkContext.addFile. This allows the files to be distributed across the cluster after the Spark session has already started.

Below is an example to set up the locator and network dataset using SparkContext.addFile in Databricks:

  1. Install GeoAnalytics Engine on Databricks.
  2. In a notebook cell, load the locator or network dataset using SparkContext.addFile:
    Use dark colors for code blocksCopy
    1
    sc.addFile("s3://bucket_name/example.mmpk")
  3. After the file is added, reference the file name directly in geocoding or network analysis tools. For example:
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    
    from geoanalytics.tools import CreateServiceAreas
    
    result = CreateServiceAreas() \
             .setNetwork("example.mmpk") \
             .setCutoffs(5, "minutes") \
             .run(facilities)

Using StreetMap Premium data

A valid data license is required to use any ArcGIS StreetMap Premium locator or network dataset with GeoAnalytics Engine tools. StreetMap Premium data licenses can be provided either during Spark startup through Spark configuration properties or after the Spark session has been started using the add_data_licenses() method. Learn more about StreetMap Premium data in Accessing and working with StreetMap Premium data in GeoAnalytics Engine.

Configure licenses during Spark startup

To configure a StreetMap Premium data license during Spark startup, save the runtime licensing string to a text file and store the file somewhere accessible to all nodes in your Spark cluster. In the Spark configuration, set the spark.geoanalytics.smp.license.file property to the path of the file containing the licensing string, for example:

Use dark colors for code blocksCopy
1
spark.geoanalytics.smp.license.file /data/engine/smp_license.txt

Add licenses using add_data_licenses()

Starting from GeoAnalytics Engine 2.1.0, StreetMap Premium data licenses can be added after the Spark session has been started using add_data_licenses(). Licenses can be loaded from a local or cloud-hosted license file or provided directly as runtime license strings. For example:

Use dark colors for code blocksCopy
1
2
3
import geoanalytics

geoanalytics.add_data_licenses(license_file="/data/engine/smp_license.txt")

What's next?

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.