Skip to content

Zonal Statistics computes summary statistics for raster band values within zone polygons. Each row in the output DataFrame represents statistics for a zone and raster band.

Zonal Statistics workflow

Usage notes

  • Zonal Statistics requires an input DataFrame containing rasters and a zone DataFrame containing polygon geometries. The tool calculates statistics for pixels that fall inside each zone. A pixel is included in the statistic calculation if the center of the pixel is contained within a zone polygon.

  • Use .setZones() to specify the DataFrame containing the zone geometries, and .setZoneIdColumn() to specify one or more columns in the zone DataFrame that identify zones. The specified columns will be included in the output DataFrame as the zone IDs.

    When multiple polygon features share the same zone ID, raster pixels from all associated polygons are included in the statistics, and the statistics are returned per zone ID.

    If the zone ID columns are not specified, the tool generates a unique zone ID for each polygon geometry in the zone DataFrame.

  • Use .includeZoneGeometry() to specify whether the zone geometries are included in the output DataFrame. When set to True, the output DataFrame includes the geometry column representing each zone. When not set or set to False, zone geometry is not included in the output DataFrame, which can improve performance if geometry is not needed.

    When multiple polygon features share the same zone ID, the returned geometry is a multipart polygon composed of the original zone features. The result statistics, such as count, sum, and mean, reflect the combined contribution of all pixels from those polygons. The tool does not dissolve the zone geometries with the same zone ID.

  • If the raster and zone geometries have different spatial references, the tool will transform the raster to match the zone geometries. For better performance, it is recommended to have both the raster and zone geometries in the same coordinate system before running the tool.

    Learn more about coordinate systems and transformations

  • You can specify one or more band IDs to summarize from the input raster with setBandIds(). Band IDs are 1-based. By default, all bands of the raster are used for statistical calculations.

  • The supported statistics type depends on the pixel type of the input raster and the statistics calculation type.

    By default, the tool calculates arithmetic statistics, which are listed in the following table.

CountMinimumMaximumRangeMeanStandard deviationSumMedianPercentileVarietyMajorityMajority countMajority percentageMinorityMinority countMinority percentage
Integer pixel type
Float pixel type
Full supportPartial supportNo support
    • You can optionally use .setCircularWrap(low, high) to enable circular statistic calculation. Circular statistics can be used for directional or cyclic variables (for example, aspect or wind direction), where values wrap around at the range boundary.

      The circular statistic types depend on the the pixel type. If pixel values are uniformly distributed across the circular range, the circular mean and standard deviation will be NULL.

    CountMinimumMaximumRangeMeanStandard deviationSumMedianPercentileVarietyMajorityMajority countMajority percentageMinorityMinority countMinority percentage
    Integer pixel type
    Float pixel type
    Full supportPartial supportNo support
      • For majority and minority calculations, when there is a tie, the output will be any of the tied values.

      • Percentile statistics are optional and can be enabled using includePercentiles() and setPercentileValue(). If neither is set, percentiles will not be calculated. If includePercentiles(True) is called, the tool calculates the median and the 90th percentile using the pixel values.

        The percentile value can be specified using setPercentileValue() with any value from 0 to 100. For example, if you specify a percentile value of 25, the tool returns the 25th percentile value of pixel values that fall within each zone.

      • Percentile values are computed using approximate quantile implementation based on the Greenwald-Khanna algorithm with additional optimizations for performance. As a result, percentile statistics produced by the Zonal Statistics tool may differ slightly from results computed by RT_ZonalStatistics which uses an exact quantile calculation. This approach is designed to provide efficient and scalable performance for large datasets while still delivering near accurate percentile estimates.

      • Pixels with NoData values are excluded from all statistics calculations.

      Results

      Zonal Statistics returns a DataFrame for each unique combination of zone ID and band ID. The output DataFrame includes the Zone ID columns specified by .setZoneIdColumns() or a generated zone ID if none are specified. Statistics are returned in a wide format, where each statistic is represented as a separate column.

      FieldDescription
      ZoneIDThe zone identifier. If a zone ID column(s) is specified, the output includes the provided column(s). Otherwise, a ZoneID column is generated for each zone geometry.
      BandIDThe raster band ID (1-based).
      CountThe number of raster pixels included in the zone.
      MinThe minimum pixel value among all pixels in the zone.
      MaxThe maximum pixel value among all pixels in the zone.
      RangeThe difference between the maximum and minimum pixel value in the zone.
      MeanThe mean pixel value among all pixels in the zone.
      StdevThe standard deviation of all pixels in the zone.
      SumThe total value of all pixels in the zone.
      VarietyThe number of unique pixel values among all pixels in the zone. It will be NULL if the pixel type is float.
      MajorityThe pixel value that occurs most often among all pixels in the zone. It will be NULL if the pixel type is float.
      MajorityCountThe frequency of all pixels that contain the majority value in the zone. It will be NULL if the pixel type is float.
      MajorityPercentThe percentage of pixels that contain the majority value in the zone. It will be NULL if the pixel type is float.
      MinorityThe value that occurs least often among all pixels in the zone. It will be NULL if the pixel type is float.
      MinorityCountThe frequency of all pixels that contain the minority value in the zone. It will be NULL if the pixel type is float.
      MinorityPercentThe percentage of pixels that contain the minority value in the zone. It will be NULL if the pixel type is float.

      If percentile statistics are enabled, the output DataFrame will also include the following fields:

      FieldDescription
      MedianThe median pixel value among all pixels in the zone.
      PercentileThe percentile value specified by .setPercentileValue(). If not specified, the 90th percentile is calculated by default.

      If .includeZoneGeometry() is called, the output DataFrame includes the following field:

      FieldDescription
      zone_geometryThe geometry of the zone.

      Performance notes

      Improve the performance of Zonal Statistics by doing one or more of the following:

      • Exclude percentile calculation if percentiles are not required for the analysis.
      • Exclude zone geometries from the output DataFrame.
      • Restrict the analysis to specific raster bands using .setBandIds().
      • Use unique zone IDs when possible. While duplicate zone IDs are supported, unique zone IDs avoid additional geometry grouping and multipart geometry handling, which can improve execution efficiency.
      • When the raster and zone geometries have different spatial references, the raster is transformed to match the zone geometries at runtime. Transforming the inputs to the same spatial reference before running the tool can reduce processing overhead.
      • Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:

        • ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
        • ST_BboxIntersects—Select records that intersect an envelope.
        • ST_EnvIntersects—Select records having an evelope that intersects the envelope of another geometry.
        • ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.

      Similar capabilities

      Syntax

      For more details, go to the GeoAnalytics Engine API reference for zonal statistics.

      Setter (Python)Setter (Scala)DescriptionRequired
      setZones(dataframe)setZones(zones)Sets the zone DataFrame containing the zone polygons and attributes.Yes
      setRasterColumn(column)setRasterColumn(column)Sets the raster column in the rasters DataFrame. The raster pixel values are used to calculate zonal statistics for each zone geometry.Yes
      setZoneIdColumns(*columns)setZoneIdColumns(columns)Sets one or more zone id columns from the zone DataFrame. If not specified, the tool generates the zone id for each of the zone geometries in the zone DataFrame.No
      setBandIds(*band_ids)setBandIds(bandIds)Sets the ids of one or more raster bands used to calculate zonal statistics. Band ids are 1-based. If not specified, the tool calculates zonal statistics for all bands in the input raster.No
      setCircularWrap(low, high)setCircularWrap(low, high)Sets the circular wrap values to enable circular statistic calculation.No
      includePercentiles(value=True)includePercentiles(value)Sets whether the output includes percentiles. When set to True, the output DataFrame includes percentile statistics columns, including median and the percentile.No
      setPercentileValue(percentile_value)setPercentileValue(percentileValue)Sets the percentile value to compute. The tool will include percentile results when either includePercentiles() is set to True, or this setter is called with the custom percentile value specified.No
      includeZoneGeometry(value=True)includeZoneGeometry(value)Sets whether the output includes the zone geometry column. When set to True, the output DataFrame includes a geometry column representing the zone geometries associated with each zone id. If not set or set to False, the output does not include the zone geometry column.No
      run(dataframe)run(rasters)Runs the Zonal Statistics tool using the input raster DataFrame.Yes

      Examples

      Run Zonal Statistics

      PythonPythonScala
      Use dark colors for code blocksCopy
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      
      # Imports
      from geoanalytics.tools import ZonalStatistics
      from geoanalytics.sql import functions as ST
      
      # Path to the US Annual Average Wind Speed image service
      raster_path = "https://tiledimageservices.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/US_Annual_Average_Wind_Speed/ImageServer"
      raster_df = spark.read.format("image-service").load(raster_path)
      
      # Path to the US County layer
      zones_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0"
      zones_df = spark.read.format("feature-service").load(zones_path)\
                   .withColumn("shape", ST.transform("shape", 102100))
      
      # Use Zonal Statistics to summarize annual wind speed into county zones
      result = ZonalStatistics() \
              .setZones(zones_df) \
              .setZoneIdColumns("STATE_NAME", "NAME") \
              .setRasterColumn("raster") \
              .includeZoneGeometry(True)\
              .run(raster_df)
      
      # Show the first 5 rows of the result DataFrame
      result.sort("STATE_NAME", "NAME").show(5)
      Result
      Use dark colors for code blocksCopy
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      +----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
      |STATE_NAME|          NAME|BandID|Count|               Min|               Max|             Range|              Mean|              Stdev|               Sum|Variety|Majority|MajorityCount|MajorityPercent|Minority|MinorityCount|MinorityPercent|       zone_geometry|
      +----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
      |   Alabama|Autauga County|     1|  552| 2.087801456451416|3.3896076679229736|1.3018062114715576| 2.590863266284918| 0.2431685697847177|1430.1565229892747|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9664...|
      |   Alabama|Baldwin County|     1| 1474|2.1721317768096924| 6.154635429382324| 3.982503652572632|2.9251511904083722| 0.5298230901258558| 4311.672854661941|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9793...|
      |   Alabama|Barbour County|     1|  806|2.0778417587280273|3.1731228828430176|1.0952811241149902|2.5632828003715336|0.23228035438458972| 2066.005937099456|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9544...|
      |   Alabama|   Bibb County|     1|  590|2.0551438331604004|3.0911967754364014| 1.036052942276001| 2.530248590647161|0.21468073344232885| 1492.846668481825|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9731...|
      |   Alabama| Blount County|     1|  628|1.9109033346176147| 5.404839038848877| 3.493935704231262| 2.751029093174417| 0.5631666218622156|1727.6462705135339|   NULL|    NULL|         NULL|           NULL|    NULL|         NULL|           NULL|{"rings":[[[-9681...|
      +----------+--------------+------+-----+------------------+------------------+------------------+------------------+-------------------+------------------+-------+--------+-------------+---------------+--------+-------------+---------------+--------------------+
      only showing top 5 rows

      Plot results

      Python
      Use dark colors for code blocksCopy
      1
      2
      3
      4
      5
      6
      7
      8
      9
      
      # Plot the county-level mean annual wind speed across the United States
      result_plot = result.st.plot(cmap_values = "Mean",
                                   legend=True,
                                   legend_kwds={"orientation": "horizontal", "location": "bottom", "shrink": 0.7, "pad": 0.08},
                                   figsize=(14,8),
                                   basemap="light")
      
      result_plot.set_title("County-level mean annual wind speed across the United States")
      Plotting example for Zonal Statistics.

      Version table

      ReleaseNotes

      2.0.0

      Python tool introduced

      2.0.0

      Scala tool introduced

      Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.