Performing big data analysis

The ArcGIS API for Python allows GIS analysts and data scientists to query, visualize, analyze, and transform their spatial data using the powerful GeoAnalytics Tools available in their organization. Learn more about the analysis capabilities of the API at the documentation site.

The big data analysis tools can be accessed via the arcgis.geoanalytics module.

Tools Overview

The GeoAnalytics tools are presented through a set of sub modules within the arcgis.geoanalytics module. To view the list of tools available, refer to the page titled Working with big data. In this page, we will learn how to execute big data tools.

Get started

The arcgis.geoanalytics module provides types and functions for distributed analysis of large datasets. These GeoAnalytics tools work with big data registered in the GIS's datastores as well as with the feature layers.

Use arcgis.geoanalytics.is_analysis_supported(gis) to check if geoanalytics is supported in your GIS.

Feature Input

You can run the GeoAnalytics Tools on the following:

Feature Output

The output from running GeoAnalytics Tools can be one of two options:

  • A hosted feature layer with data stored in ArcGIS Data Store registered with the portal's hosting server.
  • A dataset stored to a big data file share (a folder, cloud store, HDFS location) that you have registered with your GeoAnalytics Server.

Refer to this page for detailed information about feature layers and features.

Next, we will specify which big data file share the GeoAnalyticss results will save to. If set to None, the arcgis.env.output_datastore will reset to default. Allowed string values are: spatiotemporal or relational.

Input
import arcgis
arcgis.geoanalytics.define_output_datastore(datastore='relational')
Output
True

Environment settings

The arcgis.env module provides a shared environment used by the different modules. It stores globals, such as the currently active GIS, the default geocoder, and more. It also stores environment settings that are common among all tools, such as the output spatial reference, cell size, etc.

Set spatial reference

The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use the World Cylindrical Equal Area projection (WKID 54034) below (as it is the default used when running tools in ArcGIS Online). All results are stored in the spatiotemporal datastore of the Enterprise in the WGS 84 Spatial Reference.

See the GeoAnalytics Documentation for a full explanation of analysis environment settings.

Input
arcgis.env.process_spatial_reference=54034

Verbosity of messages

The ArcGIS Platform, including the ArcGIS API for Python, manages and transforms geographic data with a large suite of tools and functions collectively known as geoprocessing. The GeoAnalytics Tools in the ArcGIS API for Python are a subset of geoprocessing tools that operate in the context of a geoprocessing environment. You can set various aspects of this environment to control how tools are executed and what messages you receive during and after the execution. See the Logging and error handling section in the API for Python Geoprocessing Guide's Advanced concepts for ways to control messaging, including the arcgis.env.verbose setting.

Input
arcgis.env.verbose=True

Context Parameter

ArcGIS GeoAnalytics Server tasks that have the outSR property in their Context parameter will save results in the specified spatial reference. If you are saving the results to the spatiotemporal data store, all results will be projected to World Geographic Coordinate System 1984 after analysis for storage and the outSR will not be used. Set the spatial reference that results will be analyzed in using the Process Spatial Reference property.

GeoAnalytics operations use the following context parameters defined in the arcgis.env module:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.