Skip to content

2.0.0 Release notes

GeoAnalytics Engine 2.0.0 includes a new suite of functions and tools for working with raster data, so that you can use both raster and vector data together in your big data spatial analysis workflows. GeoAnalytics Engine 2.0.0 also includes a variety of enhancements to previously-released tools and functions, as well as performance and usability improvements throughout the API. Support has also been added for new Apache Spark versions and cloud runtimes, including new Databricks and AWS EMR runtime versions.

Version 2.0.0 is a major release of GeoAnalytics Engine and includes minimal breaking changes that may impact some workflows established with GeoAnalytics Engine 1.x. For more information, see the Breaking API changes section below.

Added support for raster data and analysis

Read, write, and visualize raster data

Geoanalytics Engine 2.0.0 includes a new raster data type for storing both raster values and raster references in Spark DataFrames. When reading from most raster data sources, rasters are tiled and each tile is stored as a separate record in a raster column, enabling scalable distributed analysis.

Using the raster data source, you can now read from common raster file types like GeoTIFF, Cloud Optimized GeoTIFF, and PNG, and write results back to those same file types as well.

The new Raster Python class enables both converting NumPy arrays to GeoAnalytics rasters, as well as exporting GeoAnalytics rasters to NumPy arrays, for integration with many data science and machine learning Python libraries.

This release also adds support for reading and writing image services hosted in ArcGIS Online with the image-service data source. This allows you to easily bring in hosted imagery data from your ArcGIS Online organization or the Living Atlas of the World, and then write results back for sharing and visualization. The image-service data source currently supports reading but not writing image services hosted in ArcGIS Enterprise.

For quick visualization in a PySpark notebook, you can plot rasters on basemaps using rt.plot, a lightweight raster plotting API that extends matplotlib. You can use rt.plot along with st.plot to view both rasters and vector geometries together in a single plot. For more information see Visualize results with rt.plot().

Raster functions

GeoAnalytics Engine 2.0.0 includes 30 new raster-type functions, also known as RT functions. These are row-level functions that operate on rasters and can be called with Python, SQL, or Scala syntax.

Some of these raster functions allow you to access raster properties in order to learn more about the data prior to performing analysis. For example, you can use RT_Info on a raster column to access properties like cell size, extent, spatial reference, and more. You can also access these properties individually using functions like RT_CellSizeX, RT_CellSizeY, RT_NumBands, RT_NumColumns, RT_NumRows, RT_PixelType, RT_Extent, RT_SRText, and RT_SRID.

You can also learn more about a raster using functions that calculate summary statistics, which include:

  • RT_Statistics—Calculate summary statistics for all values in a raster.
  • RT_BandStatistics—Calculate summary statistics for values in a specified raster band.
  • RT_ZonalStatistics—Calculate summary statistics for band values that fall within a specified zone (polygon).

Many functions allow you to manipulate your raster data, for example:

  • RT_Apply—Applies a user-defined function to each pixel value in a raster band.
  • RT_BBoxClip—Clips a raster using a bounding box specified with xmin, ymin, xmax, and ymax.
  • RT_Calculator—Applies a map algebra expression to calculate pixel values using up to four rasters.
  • RT_ConvertPixelType—Updates the input raster to utilize the specified pixel type.
  • RT_Merge—Combines two or more rasters into a single raster.
  • RT_Resample—Changes the spatial resolution of a raster using nearest neighbor cell assignment or bilinear interpolation.
  • RT_SelectBands—Select a subset of bands from a raster or reorder raster bands.
  • RT_SetExtent—Updates a raster's spatial extent using xmin, ymin, xmax, and ymax.
  • RT_Tiles—Re-tiles a raster and stores raster tiles of a specified size in each row of a raster column.
  • RT_Transform—Transforms a raster to the specified spatial reference.

There are also several functions to help you import and export raster data. These functions complement the functionality of the raster and image service data sources, and include:

  • RT_AddBand—Adds a band to an existing raster.
  • RT_BandMask—Returns an array of mask values for a raster.
  • RT_BandValues—Returns an array of pixel values for a raster.
  • RT_CreateRaster—Creates a new raster using an array of pixel values
  • RT_FromBinary—Creates a raster tile from the binary data in each row.
  • RT_Materialize—Forces loading of pixel values into memory which can improve performance in certain scenarios.
  • RT_ToBinary—Converts a raster column to a binary column.

All of these functions are scalable and will distribute computation across the cores of your Spark cluster, allowing analysis of big raster data. They can be seamlessly chained with ST and TRK functions where applicable, as well as with other SQL functions included with Spark.

Raster tools

GeoAnalytics Engine 2.0.0 includes 4 new analysis tools that work specifically with raster data and help you integrate raster and vector datasets. Like other tools in GeoAnalytics Engine, these are aware of all columns in a DataFrame and use all rows to compute a result if required. The new raster-focused tools are:

  • Bins to raster—Converts a square bin column to a raster column, using data from other columns as pixel values.
  • Enrich point with raster—Joins pixel values from a raster to point geometries.
  • Geometry to raster—Rasterizes point, line, or polygon geometries, using data from other columns as pixel values.
  • Zonal statistics—Computes summary statistics for raster band values within zone polygons.

Other new features

Beginning with GeoAnalytics Engine 2.0.0, Summarize Within allows you to summarize point, line, and polygon geometries into H3 bins of a specified resolution. This choice has been added to the existing options of either summarizing into Esri square/hexagon bins or polygon geometries that you provide.

Also introduced at GeoAnalytics Engine 2.0.0 is the st.with_geodisplay() DataFrame extension which generates a GeoDisplay column using a geometry column in your DataFrame. A GeoDisplay column is a spatial index used for fast rendering of geometries with the ArcGIS Maps SDK and other GeoDisplay-compatible mapping tools.

As with every release, version 2.0.0 adds support for new Apache Spark versions and related cloud runtimes. This release includes new compatibility with Spark 4.1.x, Databricks 18.0 and 18.1, and AWS EMR 7.11 and 7.12.

Breaking API changes

Dropped environment support

Installing GeoAnalytics Engine in the following runtimes is not supported beginning with version 2.0.0. These runtimes were formerly supported at version 1.7.x:

  • Spark 3.2.x
  • Spark 3.3.x
  • Amazon EMR 6.6.x – 6.11.x
  • Databricks 12.2 LTS
  • Google Dataproc 2.1-x
  • Azure Synapse Runtime for Apache Spark 3.4

Please update the environment in which you use GeoAnalytics Engine to a supported version if needed when upgrading to GeoAnalytics Engine 2.0. See the install guide for a list of supported versions.

SQL functions

GeoAnalytics Engine 2.0.0 introduces several breaking changes to ST and TRK functions to improve accuracy and usability. These changes include:

Tools

GeoAnalytics Engine 2.0.0 also introduces breaking changes to some analysis tools, including:

  • The setDistanceMethod setter is now required for running certain tools if the input geometry has projected coordinates. Additionally, the term “geodesic” has been replaced by the term “geodetic” in setDistanceMethod and in spatial relationship choice lists. The term "geodetic" is a more accurate descriptor of the distance calculation. These changes apply to the following tools:

  • The Summarize Within tool has been changed in several ways:

    • The addRateField setter has been removed. Formerly, fields included in addRateField would not be proportioned prior to calculating statistics, whereas other fields would be proportioned by default prior to calculating statistics. Beginning with version 2.0, proportioning is controlled by the proportion parameter in the addStandardSummaryField and addWeightedSummaryField setters. Set proportion to True to proportion any field prior to calculating statistics. The default is False, meaning all fields are not proportioned by default. The results returned from SummarizeWithin in version 2.0 will differ from results in 1.x unless the parameters are updated.
    • The tool output has been changed to return a DataFrame if run using the run method, and a named tuple if run using the new runIncludeGroupBy method. Formerly, the tool would always return a named tuple if run using the run method. The return type of Summarize Within in version 2.0 will differ from that of 1.x if no updates are made.
  • The Calculate Motion Statistics tool has been corrected to ignore track observations with null geometry and/or time. The tool would formerly return null for some statistics in cases where one or more track observations had a null geometry or timestamp. You may see differences in the returned results in version 2.0 as the null geometries will now be ignored. No mitigation is required, however, tool results may differ from those returned in earlier versions of GeoAnalytics Engine.

  • The Reconstruct Tracks tool has been updated to include timestamps as m-values in the resulting linestrings by default. This allows the resulting linestrings to be used by TRK functions and other track-related functionality. Any existing m-values in the input points will be overwritten by the timestamps and not included in the result linestrings by default. To obtain legacy results, call the new preserveM setter before running the tool.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.