ArcGIS GeoAnalytics Engine includes two tools for GeoEnrichment: GeoEnrich and GeoEnrichVariables. Before using these tools, you must first set up a GeoEnrichment dataset to be accessible to your Spark environment.

Accessing GeoEnrichment datasets

The GeoEnrichment dataset can be stored on local or cloud storage such as Amazon S3. The data must be locally accessible to all nodes in your Spark cluster. There are two supported workflows for setting up the data:

  • The data is copied to each node's local file system at the initialization of the Spark cluster.
  • The data is distributed across the cluster using SparkContext.addFile after the Spark session has started.

GeoEnrichment Essentials dataset

Starting with version 2.1.0, GeoAnalytics Engine includes a GeoEnrichment Essentials dataset, GeoEnrich_Essentials_for_ArcGIS_GeoAnalytics_Engine.biz. This dataset represents a curated subset of the Esri United States demographics dataset. The dataset is packaged and made available for download with the ArcGIS GeoAnalytics Engine distribution. Each ArcGIS GeoAnalytics Engine release is validated against its corresponding GeoEnrich Essentials dataset version. Compatibility with earlier or later dataset versions is not guaranteed.

The GeoAnalytics Engine GeoEnrichment Essentials dataset contains a focused set of demographic variables, including core variables from American Community Survey (ACS) estimates and U.S. Census counts, along with supporting Margin of Error (MOE) and Reliability (REL) fields for ACS-based variables used for validation. For more information about these source datasets, see the American Community Survey (ACS) data documentation and the 2020 Census data documentation.

Set up data during Spark cluster initialization

You can first upload the GeoEnrichment dataset to a cloud storage like Amazon S3, and mount or copy it to each node's local file system. Each local file system must have enough disk space to store the GeoEnrichment dataset.

Below is an example of setting up the GeoEnrichment dataset using an init script in Databricks:

  1. Install GeoAnalytics Engine on Databricks.
  2. Upload the GeoEnrichment dataset to a cloud file system like S3.
  3. On a notebook, mount the GeoEnrichment dataset to DBFS using the dbutils.fs.mount command.
    Use dark colors for code blocksCopy
    1
    dbutils.fs.mount("s3://bucket_name/path_to_geoenrichment_dataset", "/mnt/geoenrichment_datasets")
  4. Update the Cluster-scoped init script to copy files from the mounted location to /databricks/.
    Use dark colors for code blocksCopy
    1
    cp -r /dbfs/mnt/geoenrichment_datasets/. /databricks/geoenrichment_datasets/
  5. After the files are copied locally, reference the local paths when running GeoEnrichments tools. You can use SparkFiles.get to get the local path of the file on each node. Then run the GeoEnrichment tools with the GeoEnrichment dataset. For example,
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    result = GeoEnrichVariables() \
            .setDataPath("/geoenrichment_datasets/GeoEnrich_Essentials_for_ArcGIS_GeoAnalytics_Engine.biz") \
            .run()

Set up data using SparkContext.addFile

A GeoEnrichment dataset can also be loaded into a Spark environment using SparkContext.addFile. This allows the files to be distributed across the cluster after the Spark session has already started.

Below is an example of setting up the GeoEnrichment dataset using SparkContext.addFile in Databricks:

  1. Install GeoAnalytics Engine on Databricks.
  2. In a notebook cell, load the GeoEnrichment dataset using SparkContext.addFile:
    Use dark colors for code blocksCopy
    1
    sc.addFile("s3://bucket_name/GeoEnrich_Essentials_for_ArcGIS_GeoAnalytics_Engine.biz")
  3. After the file is added, reference the file name directly in GeoEnrichment tools. For example:
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    result = GeoEnrichVariables() \
            .setDataPath("GeoEnrich_Essentials_for_ArcGIS_GeoAnalytics_Engine.biz") \
            .run()

What's next?

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.