RT_Statistics takes a raster column and returns an array column. This array includes statistics about the raster such as minimum, maximum, mean, and standard deviation. The statistics provide valuable estimates of the raster's statistical properties but it is important to note that these estimates may not always reflect the exact statistics of the raster data.
| Function | Syntax |
|---|---|
| Python | statistics(raster |
| SQL | RT |
| Scala | statistics(raster) |
For more details, go to the GeoAnalytics Engine API reference for statistics.
Examples
from geoanalytics.raster import functions as RT
from pyspark.sql import functions as F
data = [(list(range(100)), )]
df = spark.createDataFrame(data, ["pixels"]) \
.withColumn("raster", RT.create_raster("pixels", 10, 10, "float32"))
stats = df.select(RT.statistics("raster").alias("statistics"))
stats.withColumn("item", F.explode("statistics")).select("item.*").show()Result
+---+----+----+------------------+
|min| max|mean| stdev|
+---+----+----+------------------+
|0.0|99.0|49.5|29.011491975882016|
+---+----+----+------------------+Version table
| Release | Notes |
|---|---|
2.0.0 | Python, SQL, and Scala functions introduced |