Locator and network dataset setup

GeoAnalytics Engine supports geocoding and network analysis tools. To use these tools, you need to setup the required components described below.

The geocoding tools require a locator and the network analysis tools require a network dataset. The locator or network dataset must be locally accessible to all nodes in your Spark cluster. In a cloud environment, you can first upload the locator or network dataset to a file system like Amazon S3 and then mount or copy it to each node's local system. This location in each node's file system needs to have enough disk space to store the locator or network dataset. In GeoAnalytics Engine 1.5.x and above, you can also load the locator or network dataset using SparkContext.addFile. This allows you to distribute files across the cluster and access them on each node.

Here is an example of how to stage the locator or network dataset in Databricks:

Upload the locator or network dataset to a cloud file system like Azure Blob Storage.
Install GeoAnalytics Engine on Databricks.
On a notebook, mount the locator or network dataset to DBFS using the dbutils.fs.mount command.
Update the Cluster-scoped init script to copy files from the mounted location to /databricks/.
Use dark colors for code blocksCopy
```
1
2
cp -r /dbfs/mnt/locators/. /databricks/locators/
cp -r /dbfs/mnt/network_datasets/. /databricks/network_datasets/
```

Here is an example of how to access the locator or network dataset using SparkContext.addFile:

Install GeoAnalytics Engine on Databricks.
In a notebook cell, load the locator or network dataset using SparkContext.addFile:
Use dark colors for code blocksCopy
```
1
sc.addFile("s3://data/example.mmpk")
```

Run the geocoding or network analysis tools with the locator or network dataset file:

Python

Scala

Use dark colors for code blocksCopy

result = CreateServiceAreas() \
         .setNetwork("example.mmpk") \
         .setCutoffs(5, "minutes") \
         .run(facilities)

Locator and network dataset setup

What's next?