RT_ZonalStatistics takes a raster column and a polygon geometry column, and returns a struct column containing statistics of the raster's pixel values within each polygon zone. Each polygon in the geometry column defines a distinct zone, and the function calculates statistics for pixels that fall inside each zone. Statistics include count, minimum, maximum, range, mean, standard deviation, sum, median, the 90th percentile, variety, majority, majority count, majority percent, minority, minority count, and minority percent.
You can optionally specify the band index to analyze from the input raster. By default, the first band of the raster is used for statistical calculations.
The cell parameter determines how pixels are included in a zone. It has two options:
Center—A pixel is included if its center point lies within the polygon zone. This is the default.
Extent—A pixel is included if any part of the pixel overlaps the polygon zone. Use this option when you want to ensure all partially overlapping pixels are counted, especially for coarse raster resolutions relative to polygon size.
If both circular and circular are provided, the function computes circular statistics, ensuring values
at the wrap boundary are treated as adjacent. The circular statistics are appropriate for angular or cyclic data such as aspect,
wind direction, or time-of-day values. The calculation includes count, mean, standard deviation, variety, majority, majority count,
majority percent, minority, minority count, and minority percent.
The function also supports column-based input for the optional parameters including band, cell, circular,
and circular, allowing dynamic per-row configuration.
If the band ID is out of range, the function will return null.
| Function | Syntax |
|---|---|
| Python | zonal |
| SQL | RT |
| Scala | zonal |
For more details, go to the GeoAnalytics Engine API reference for zonal_statistics.
Examples
from geoanalytics.raster import functions as RT
from geoanalytics.sql import functions as ST
data = [(list(range(100)), "POLYGON ((1.5 -1.5, 7.5 -1.5, 7.5 -7.5, 4.5 -7.5, 4.5 -4.5, 1.5 -4.5))")]
df = spark.createDataFrame(data, ["pixels", "poly_wkt"]) \
.withColumn("raster", RT.create_raster("pixels", 10, 10, "float32")) \
.withColumn("polygon", ST.poly_from_text("poly_wkt"))
zonal_stats = df.select(RT.zonal_statistics("raster", zone_col="polygon", band_id=1, cell_assignment="extent").alias("zonal_stats"))
zonal_stats.select("zonal_stats.*").show()+-----+----+----+-----+----+-----------------+------+------+-----------------+-------+--------+-------------+---------------+--------+-------------+---------------+
|count| min| max|range|mean| stdev| sum|median| percentile|variety|majority|majorityCount|majorityPercent|minority|minorityCount|minorityPercent|
+-----+----+----+-----+----+-----------------+------+------+-----------------+-------+--------+-------------+---------------+--------+-------------+---------------+
| 27|22.0|77.0| 55.0|45.0|17.00980109623077|1215.0| 43.0|70.19999999999999| NULL| NULL| NULL| NULL| NULL| NULL| NULL|
+-----+----+----+-----+----+-----------------+------+------+-----------------+-------+--------+-------------+---------------+--------+-------------+---------------+Version table
| Release | Notes |
|---|---|
2.0.0 | Python, SQL, and Scala functions introduced |