Zonal statistics | ArcGIS GeoAnalytics Engine

Zonal Statistics computes summary statistics for raster band values within zone polygons. Each row in the output DataFrame represents statistics for a zone and raster band.

Usage notes

Zonal Statistics requires an input DataFrame containing rasters and a zone DataFrame containing polygon geometries. The tool calculates statistics for pixels that fall inside each zone. A pixel is included in the statistic calculation if the center of the pixel is contained within a zone polygon.
Use .setZones() to specify the DataFrame containing the zone geometries, and .setZoneIdColumn() to specify one or more columns in the zone DataFrame that identify zones. The specified columns will be included in the output DataFrame as the zone IDs.

When multiple polygon features share the same zone ID, raster pixels from all associated polygons are included in the statistics, and the statistics are returned per zone ID.

If the zone ID columns are not specified, the tool generates a unique zone ID for each polygon geometry in the zone DataFrame.
Use .includeZoneGeometry() to specify whether the zone geometries are included in the output DataFrame. When set to True, the output DataFrame includes the geometry column representing each zone. When not set or set to False, zone geometry is not included in the output DataFrame, which can improve performance if geometry is not needed.

When multiple polygon features share the same zone ID, the returned geometry is a multipart polygon composed of the original zone features. The result statistics, such as count, sum, and mean, reflect the combined contribution of all pixels from those polygons. The tool does not dissolve the zone geometries with the same zone ID.
If the raster and zone geometries have different spatial references, the tool will transform the raster to match the zone geometries. For better performance, it is recommended to have both the raster and zone geometries in the same coordinate system before running the tool.

Learn more about coordinate systems and transformations
You can specify one or more band IDs to summarize from the input raster with setBandIds(). Band IDs are 1-based. By default, all bands of the raster are used for statistical calculations.
The supported statistics type depends on the pixel type of the input raster and the statistics calculation type.

By default, the tool calculates arithmetic statistics, which are listed in the following table.

	Count	Minimum	Maximum	Range	Mean	Standard deviation	Sum	Median	Percentile	Variety	Majority	Majority count	Majority percentage	Minority	Minority count	Minority percentage
Integer pixel type
Float pixel type

Full supportPartial supportNo support

You can optionally use .setCircularWrap(low, high) to enable circular statistic calculation. Circular statistics can be used for directional or cyclic variables (for example, aspect or wind direction), where values wrap around at the range boundary.

The circular statistic types depend on the the pixel type. If pixel values are uniformly distributed across the circular range, the circular mean and standard deviation will be NULL.

	Count	Minimum	Maximum	Range	Mean	Standard deviation	Sum	Median	Percentile	Variety	Majority	Majority count	Majority percentage	Minority	Minority count	Minority percentage
Integer pixel type
Float pixel type

Full supportPartial supportNo support

For majority and minority calculations, when there is a tie, the output will be any of the tied values.
Percentile statistics are optional and can be enabled using includePercentiles() and setPercentileValue(). If neither is set, percentiles will not be calculated. If includePercentiles(True) is called, the tool calculates the median and the 90th percentile using the pixel values.

The percentile value can be specified using setPercentileValue() with any value from 0 to 100. For example, if you specify a percentile value of 25, the tool returns the 25th percentile value of pixel values that fall within each zone.
Percentile values are computed using approximate quantile implementation based on the Greenwald-Khanna algorithm with additional optimizations for performance. As a result, percentile statistics produced by the Zonal Statistics tool may differ slightly from results computed by RT_ZonalStatistics which uses an exact quantile calculation. This approach is designed to provide efficient and scalable performance for large datasets while still delivering near accurate percentile estimates.
Pixels with NoData values are excluded from all statistics calculations.

Results

Zonal Statistics returns a DataFrame for each unique combination of zone ID and band ID. The output DataFrame includes the Zone ID columns specified by .setZoneIdColumns() or a generated zone ID if none are specified. Statistics are returned in a wide format, where each statistic is represented as a separate column.

Field	Description
`ZoneID`	The zone identifier. If a zone ID column(s) is specified, the output includes the provided column(s). Otherwise, a `ZoneID` column is generated for each zone geometry.
`BandID`	The raster band ID (1-based).
`Count`	The number of raster pixels included in the zone.
`Min`	The minimum pixel value among all pixels in the zone.
`Max`	The maximum pixel value among all pixels in the zone.
`Range`	The difference between the maximum and minimum pixel value in the zone.
`Mean`	The mean pixel value among all pixels in the zone.
`Stdev`	The standard deviation of all pixels in the zone.
`Sum`	The total value of all pixels in the zone.
`Variety`	The number of unique pixel values among all pixels in the zone. It will be `NULL` if the pixel type is float.
`Majority`	The pixel value that occurs most often among all pixels in the zone. It will be `NULL` if the pixel type is float.
`MajorityCount`	The frequency of all pixels that contain the majority value in the zone. It will be `NULL` if the pixel type is float.
`MajorityPercent`	The percentage of pixels that contain the majority value in the zone. It will be `NULL` if the pixel type is float.
`Minority`	The value that occurs least often among all pixels in the zone. It will be `NULL` if the pixel type is float.
`MinorityCount`	The frequency of all pixels that contain the minority value in the zone. It will be `NULL` if the pixel type is float.
`MinorityPercent`	The percentage of pixels that contain the minority value in the zone. It will be `NULL` if the pixel type is float.

If percentile statistics are enabled, the output DataFrame will also include the following fields:

Field	Description
`Median`	The median pixel value among all pixels in the zone.
`Percentile`	The percentile value specified by `.setPercentileValue()`. If not specified, the 90th percentile is calculated by default.

If .includeZoneGeometry() is called, the output DataFrame includes the following field:

Field	Description
`zone_geometry`	The geometry of the zone.

Performance notes

Improve the performance of Zonal Statistics by doing one or more of the following:

Exclude percentile calculation if percentiles are not required for the analysis.
Exclude zone geometries from the output DataFrame.
Restrict the analysis to specific raster bands using .setBandIds().
Use unique zone IDs when possible. While duplicate zone IDs are supported, unique zone IDs avoid additional geometry grouping and multipart geometry handling, which can improve execution efficiency.
When the raster and zone geometries have different spatial references, the raster is transformed to match the zone geometries at runtime. Transforming the inputs to the same spatial reference before running the tool can reduce processing overhead.
Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:
- ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
- ST_BboxIntersects—Select records that intersect an envelope.
- ST_EnvIntersects—Select records having an evelope that intersects the envelope of another geometry.
- ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.

Similar capabilities

Syntax

For more details, go to the GeoAnalytics Engine API reference for zonal statistics.

Setter (Python)	Setter (Scala)	Description	Required
`setZones(dataframe)`	`setZones(zones)`	Sets the zone DataFrame containing the zone polygons and attributes.	Yes
`setRasterColumn(column)`	`setRasterColumn(column)`	Sets the raster column in the rasters DataFrame. The raster pixel values are used to calculate zonal statistics for each zone geometry.	Yes
`setZoneIdColumns(*columns)`	`setZoneIdColumns(columns)`	Sets one or more zone id columns from the zone DataFrame. If not specified, the tool generates the zone id for each of the zone geometries in the zone DataFrame.	No
`setBandIds(*band_ids)`	`setBandIds(bandIds)`	Sets the ids of one or more raster bands used to calculate zonal statistics. Band ids are 1-based. If not specified, the tool calculates zonal statistics for all bands in the input raster.	No
`setCircularWrap(low, high)`	`setCircularWrap(low, high)`	Sets the circular wrap values to enable circular statistic calculation.	No
`includePercentiles(value=True)`	`includePercentiles(value)`	Sets whether the output includes percentiles. When set to `True`, the output DataFrame includes percentile statistics columns, including median and the percentile.	No
`setPercentileValue(percentile_value)`	`setPercentileValue(percentileValue)`	Sets the percentile value to compute. The tool will include percentile results when either `includePercentiles()` is set to `True`, or this setter is called with the custom percentile value specified.	No
`includeZoneGeometry(value=True)`	`includeZoneGeometry(value)`	Sets whether the output includes the zone geometry column. When set to `True`, the output DataFrame includes a geometry column representing the zone geometries associated with each zone id. If not set or set to `False`, the output does not include the zone geometry column.	No
`run(dataframe)`	`run(rasters)`	Runs the Zonal Statistics tool using the input raster DataFrame.	Yes

Examples

Run Zonal Statistics

Python

Scala

Use dark colors for code blocksCopy

# Imports
from geoanalytics.tools import ZonalStatistics
from geoanalytics.sql import functions as ST

# Path to the US Annual Average Wind Speed image service
raster_path = "https://tiledimageservices.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/US_Annual_Average_Wind_Speed/ImageServer"
raster_df = spark.read.format("image-service").load(raster_path)

# Path to the US County layer
zones_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0"
zones_df = spark.read.format("feature-service").load(zones_path)\
             .withColumn("shape", ST.transform("shape", 102100))

# Use Zonal Statistics to summarize annual wind speed into county zones
result = ZonalStatistics() \
        .setZones(zones_df) \
        .setZoneIdColumns("STATE_NAME", "NAME") \
        .setRasterColumn("raster") \
        .includeZoneGeometry(True)\
        .run(raster_df)

# Show the first 5 rows of the result DataFrame
result.sort("STATE_NAME", "NAME").show(5)

Result
Use dark colors for code blocksCopy
+----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
|STATE_NAME|          NAME|BandID|Count|               Min|               Max|             Range|              Mean|              Stdev|               Sum|Variety|Majority|MajorityCount|MajorityPercent|Minority|MinorityCount|MinorityPercent|       zone_geometry|
+----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
|   Alabama|Autauga County|     1|  552| 2.087801456451416|3.3896076679229736|1.3018062114715576| 2.590863266284918| 0.2431685697847177|1430.1565229892747|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9664...|
|   Alabama|Baldwin County|     1| 1474|2.1721317768096924| 6.154635429382324| 3.982503652572632|2.9251511904083722| 0.5298230901258558| 4311.672854661941|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9793...|
|   Alabama|Barbour County|     1|  806|2.0778417587280273|3.1731228828430176|1.0952811241149902|2.5632828003715336|0.23228035438458972| 2066.005937099456|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9544...|
|   Alabama|   Bibb County|     1|  590|2.0551438331604004|3.0911967754364014| 1.036052942276001| 2.530248590647161|0.21468073344232885| 1492.846668481825|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9731...|
|   Alabama| Blount County|     1|  628|1.9109033346176147| 5.404839038848877| 3.493935704231262| 2.751029093174417| 0.5631666218622156|1727.6462705135339|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9681...|
+----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
only showing top 5 rows

Plot results

Python
Use dark colors for code blocksCopy

# Plot the county-level mean annual wind speed across the United States
result_plot = result.st.plot(cmap_values = "Mean",
                             legend=True,
                             legend_kwds={"orientation": "horizontal", "location": "bottom", "shrink": 0.7, "pad": 0.08},
                             figsize=(14,8),
                             basemap="light")

result_plot.set_title("County-level mean annual wind speed across the United States")

Version table

Release	Notes
2.0.0	Python tool introduced
2.0.0	Scala tool introduced