Summarize Data¶
The features
module packs a set of data summarization tools to calculate total counts, lengths, areas, and basic descriptive statistics of features and their attributes within areas or near other features. You can access these tools using the summarize_data
sub module.
# connect to GIS
from arcgis.gis import GIS
gis = GIS("portal url", "username", "password")
#search for earthquakes data - point data
eq_search = gis.content.search("world earthquakes locations", "feature layer", max_items=1)
eq_item = eq_search[0]
eq_item
# search for USA states - area / polygon data
states_search = gis.content.search("title:'USA States'",
"feature layer", max_items=1)
states_item = states_search[0]
states_item
Lets draw the layers on a map and observe how they are distributed
map1 = gis.map("USA")
map1
map1.add_layer(states_item)
map1.add_layer(eq_item)
Aggregate earthquakes by state¶
As you can see, a number of earthquakes fall on the boundary of tectonic plates (ring of fire). However, there are a few more dispersed into other states. It would be interesting to aggregate all the earthquakes by state and plot that as a chart.
The aggregate_points
tool in the summarize_data
sub module is a valid candidate for such analyses. The example below shows how to run this tool using ArcGIS API for Python.
To start with, let us access the layers in the states and earthquakes items and view their attribute information to understand how the data can be summarized
eq_fl = eq_item.layers[0]
states_fl = states_item.layers[0]
We have accessed the layers in the items as FeatureLayer
objects. We can query the fields
property to understand what kind of attribute data is stored in the layers
#query the fields in eq_fl layer
for field in eq_fl.properties.fields:
print(field['name'])
# similarly for states data
for field in states_fl.properties.fields:
print(field['name'], end="\t")
Let us aggreate the points by state and summarize the magnitude
field and use mean
as the summary type.
from arcgis.features import summarize_data
sum_fields = ['magnitude Mean', 'depth Min']
eq_summary = summarize_data.aggregate_points(point_layer = eq_fl,
polygon_layer = states_fl,
keep_boundaries_with_no_points=False,
summary_fields=sum_fields)
When running the tool above, we did not specify a name for the output_name
parameter. Hence the analyses results were not stored on the portal, instead stored in the variable eq_summary
.
eq_summary
# access the aggregation feature colleciton
eq_aggregate_fc = eq_summary['aggregated_layer']
#query this feature collection to get a data as a feature set
eq_aggregate_fset = eq_aggregate_fc.query()
FeatureSet
objects support visualizing attribute information as a pandas dataframe. This is a neat feature since you do not have to iterate through each feature to view their attribute information.
Let us view the summary results as a pandas dataframe table. Note, the aggregate_points
tool appends the polygon layer's original set of fields to the analysis result in order to provide it context
aggregation_df = eq_aggregate_fset.sdf
aggregation_df.head()
Thus, from our data, of the 50 states, only 23 have had earthquakes. Let us plot a bar chart to view which states had the most earthquakes
%matplotlib inline
aggregation_df.plot('STATE_NAME','Point_Count', kind='bar')
Clearly, California tops the list with the most number of earthquakes. Let us view what the average intensity and minimum depth is in the plots below:
aggregation_df.plot('STATE_NAME',['MEAN_magnitude', 'MIN_depth'],kind='bar', subplots=True)
Feedback on this topic?