Aggregate functions summarize and find relationships between geometries in grouped data. You can group the rows
of a DataFrame using one more more column values by calling
groupBy
on the DataFrame. For example, in this tutorial you will be grouping wind turbines by the name of the project they belong to.
By grouping on a string column containing the project name, you can obtain a group of wind turbines for each project.
DataFrame.groupBy returns an instance of pyspark.sql.GroupedData,
which can be used to calculate the count, max, min, mean, and sum of each column in each group of rows. You can also use it to run
any of the aggregate functions in GeoAnalytics Engine.
In this tutorial you will learn how to use ST_Aggr_ConvexHull to
calculate the convex hulls of groups of geometries and summarize
each group.
Two aggregated groups of points (blue and orange) and the resulting convex hulls.
Create a DataFrame from a feature service of wind turbine point locations in the United States and print the schema.
Apply a filter to only obtain turbines located in the state of Iowa.
Group the wind turbines on the p_name field which contains the name of the project each turbine belongs to. Then
use GroupedData.agg
to call aggr_convex_hull. The result is a DataFrame with two columns: a polygon column containing the convex hull
around each group, and a string column containing the p_name of each group.
Group the data and create convex hulls with summary statistics
Because GroupedData.agg
supports running multiple expressions at once, you can calculate summary statistics for each group in the same
function call that creates convex hulls. This can be useful for visualizing the differences between groups
or for enriching the result for further analysis.
Perform the same grouping as earlier, except this time calculate
the total capacity, minimum year built, maximum height, and average height for each group of wind turbines in
addition to the convex hull.