GeoAnalytics Engine supports geocoding and network analysis tools. Before using the tools, set up the locator and network dataset to be accessible to your Spark environment and configure the runtime data licenses if needed.
Accessing locators and network datasets
The locator or network dataset can be stored on local or cloud storage such as Amazon S3. The data must be locally accessible to all nodes in your Spark cluster. There are two supported workflows for setting up the data:
- The data is copied to each node's local file system at the initialization of the Spark cluster.
- The data is distributed across the cluster using
Sparkafter the Spark session has started.Context.add File
Set up data during Spark cluster initialization
You can first upload the locator or network dataset to a cloud storage like Amazon S3, and mount or copy it to each node's local system. The location storing these files in each node's file system needs to have enough disk space to store the locator or network dataset.
Below is an example to set up the locator and network dataset using init script in Databricks:
- Install GeoAnalytics Engine on Databricks.
- Upload the locator or network dataset to a cloud file system like S3.
- On a notebook, mount the locator or network dataset to DBFS using the
dbutils.fs.mountcommand.Use dark colors for code blocks Copy dbutils.fs.mount("s3://bucket_name/path_to_locator", "/mnt/locators") dbutils.fs.mount("s3://bucket_name/path_to_network_datases", "/mnt/network_datasets") - Update the Cluster-scoped init script to copy files from the mounted location to
/databricks/.Use dark colors for code blocks Copy cp -r /dbfs/mnt/locators/. /databricks/locators/ cp -r /dbfs/mnt/network_datasets/. /databricks/network_datasets/ - After the files are copied locally, reference the local paths when running geocoding or network analysis tools.
You can use
Sparkto get the local path of the file on each node. Then run the geocoding or network analysis tools with the locator or network dataset file. For example,Files.get Python Python Scala Use dark colors for code blocks Copy from geoanalytics.tools import CreateServiceAreas result = CreateServiceAreas() \ .setNetwork("/network_datasets/example.mmpk") \ .setCutoffs(5, "minutes") \ .run(facilities)
Set up data using Spark Context.add File
ArcGIS GeoAnalytics Engine 1.5.x and above supports loading locators and network datasets using Spark.
This allows the files to be distributed across the cluster after the Spark session has already started.
Below is an example to set up the locator and network dataset using Spark in Databricks:
- Install GeoAnalytics Engine on Databricks.
- In a notebook cell, load the locator or network dataset using
Spark:Context.add File Use dark colors for code blocks Copy sc.addFile("s3://bucket_name/example.mmpk") - After the file is added, reference the file name directly in geocoding or network analysis tools. For example:
Python Python Scala Use dark colors for code blocks Copy from geoanalytics.tools import CreateServiceAreas result = CreateServiceAreas() \ .setNetwork("example.mmpk") \ .setCutoffs(5, "minutes") \ .run(facilities)
Using StreetMap Premium data
A valid data license is required to use any ArcGIS StreetMap Premium locator or
network dataset with GeoAnalytics Engine tools. StreetMap Premium data licenses can be provided either during Spark
startup through Spark configuration properties or after the Spark session has been started using the
add
method. Learn more about StreetMap Premium data in
Accessing and working with StreetMap Premium data in GeoAnalytics Engine.
Configure licenses during Spark startup
To configure a StreetMap Premium data license during Spark startup, save the runtime licensing string to a text file
and store the file somewhere accessible to all nodes in your Spark cluster. In the Spark configuration, set the
spark.geoanalytics.smp.license.file property to the path of the file containing the licensing string, for example:
spark.geoanalytics.smp.license.file /data/engine/smp_license.txtAdd licenses using add _data _licenses()
Starting from GeoAnalytics Engine 2.1.0, StreetMap Premium data licenses can be added after the Spark session has been
started using add. Licenses can be loaded from a local or cloud-hosted license file or
provided directly as runtime license strings. For example:
import geoanalytics
geoanalytics.add_data_licenses(license_file="/data/engine/smp_license.txt")