GeoAnalytics Engine includes a toolset that operates on Spark DataFrames to manage, enrich, summarize, or analyze entire datasets. In contrast to SQL functions which operate on a row-by-row basis using one or two columns, GeoAnalytics Engine tools are aware of all columns in a DataFrame and use all rows to compute a result if required.
Each analysis tool is represented as a tool class with setter methods for configuring tool parameters. A run method on each tool class can be called with an input DataFrame to run the tool and return either a result DataFrame or a named tuple of results.
Most tools require that the input DataFrame has a geometry column. The
geometry type required depends on each tool. If there is one geometry
column it will be used as the geometry data for that DataFrame. If there
are multiple geometry columns in a DataFrame, one must be specified as
the primary geometry column using set
. Some tools
require the input geometries to be in a projected coordinate system
before running analysis. [For more information on projected coordinate
systems see Coordinate Systems and Transformations.
GeoAnalytics Engine tools recognize two time types:
-
Instant—A single moment in time represented with a single timestamp column.
-
Interval—A duration of time represented with a start timestamp column and an end timestamp column.
If time is required by a tool and there is one timestamp column, it will be used as the instant time field for the DataFrame. If there are more than one timestamp fields, one must be explicitly specified as the instant time field or two specified as the interval time fields using set_time_fields. Learn more about setting time on your DataFrame.
What's next?
Learn more about how to set up your data and run tools and SQL functions: