Summarize Data

The features module packs a set of data summarization tools to calculate total counts, lengths, areas, and basic descriptive statistics of features and their attributes within areas or near other features. You can access these tools using the summarize_data sub module.

Aggregate points

In this example, let us observe how to use aggregate_points tool to summarize data from spot measurements by area. To learn more about this tool and the formula it uses, refer to the documentation here

In [ ]:
# connect to GIS
from arcgis.gis import GIS
gis = GIS("portal url", "username", "password")
In [ ]:
#search for earthquakes data - point data
eq_search = gis.content.search("world earthquakes", "feature layer", max_items=1)
eq_item = eq_search[0]
eq_item
Out[ ]:
world earthquakes
Feature Layer Collection by arcgis_python_api
Last Modified: December 09, 2016
0 comments, 0 views
In [ ]:
# search for USA states - area / polygon data
states_search = gis.content.search("title:USA_states and owner:arcgis_python_api", 
                                   "feature layer", max_items=1)
states_item = states_search[0]
states_item
Out[ ]:
USA_states
USA statesFeature Layer Collection by arcgis_python_api
Last Modified: December 09, 2016
0 comments, 0 views

Lets draw the layers on a map and observe how they are distributed

In [ ]:
map1 = gis.map("USA")
map1

earthquakes and states

In [ ]:
map1.add_layer(states_item)
In [ ]:
map1.add_layer(eq_item)

Aggregate earthquakes by state

As you can see, a number of earthquakes fall on the boundary of tectonic plates (ring of fire). However, there are a few more dispersed into other states. It would be interesting to aggregate all the earthquakes by state and plot that as a chart.

The aggregate_points tool in the summarize_data sub module is a valid candidate for such analyses. The example below shows how to run this tool using ArcGIS API for Python.

To start with, let us access the layers in the states and earthquakes items and view their attribute information to understand how the data can be summarized

In [ ]:
eq_fl = eq_item.layers[0]
states_fl = states_item.layers[0]

We have accessed the layers in the items as FeatureLayer objects. We can query the fields property to understand what kind of attribute data is stored in the layers

In [ ]:
#query the fields in eq_fl layer
for field in eq_fl.properties.fields:
    print(field['name'])
objectid
datetime_
latitude
longitude
depth
magnitude
magtype
nbstations
gap
distance
rms
source
eventid
In [ ]:
# similarly for states data
for field in states_fl.properties.fields:
    print(field['name'], end="\t")
fid	state_name	state_fips	sub_region	state_abbr	pop2000	pop2007	pop00_sqmi	pop07_sqmi	white	black	ameri_es	asian	hawn_pi	other	mult_race	hispanic	males	females	age_under5	age_5_17	age_18_21	age_22_29	age_30_39	age_40_49	age_50_64	age_65_up	med_age	med_age_m	med_age_f	households	ave_hh_sz	hsehld_1_m	hsehld_1_f	marhh_chd	marhh_no_c	mhh_child	fhh_child	families	ave_fam_sz	hse_units	vacant	owner_occ	renter_occ	no_farms97	avg_size97	crop_acr97	avg_sale97	sqmi	shape_leng	

Let us aggreate the points by state and summarize the magnitude field and use mean as the summary type.

In [ ]:
from arcgis.features import summarize_data
sum_fields = ['magnitude Mean', 'depth Min']
eq_summary = summarize_data.aggregate_points(point_layer = eq_fl,
                                            polygon_layer = states_fl,
                                            keep_boundaries_with_no_points=False,
                                            summary_fields=sum_fields)
Submitted.
Executing...

When running the tool above, we did not specify a name for the output_name parameter. Hence the analyses results were not stored on the portal, instead stored in the variable eq_summary.

In [ ]:
eq_summary
Out[ ]:
{'aggregated_layer': <FeatureCollection>, 'group_summary': <FeatureCollection>}
In [ ]:
# access the aggregation feature colleciton
eq_aggregate_fc = eq_summary['aggregated_layer']

#query this feature collection to get a data as a feature set
eq_aggregate_fset = eq_aggregate_fc.query()

FeatureSet objects support visualizing attribute information as a pandas dataframe. This is a neat feature since you do not have to iterate through each feature to view their attribute information.

Let us view the summary results as a pandas dataframe table. Note, the aggregate_points tool appends the polygon layer's original set of fields to the analysis result in order to provide it context

In [ ]:
aggregation_df = eq_aggregate_fset.df
aggregation_df
Out[ ]:
AnalysisArea MEAN_magnitude MIN_depth Point_Count age_18_21 age_22_29 age_30_39 age_40_49 age_50_64 age_5_17 ... sqmi st_area_shape_ st_length_shape_ state_abbr state_fips state_name sub_region vacant white geometry.rings
0 67260.743755 5.133333 0.02 21 330755 632258 921428 945360 888329 1119537 ... 67290 3.798855e+11 4.578373e+06 WA 53 Washington Pacific 179677 4821823 [[[-13625730.0473, 6144404.9648], [-13632502.6...
1 147223.830668 5.237500 0.00 8 52410 84451 118755 148759 146809 175193 ... 147245 8.211744e+11 4.337068e+06 MT 30 Montana Mountain 53966 817229 [[[-12409387.5601, 5574754.266800001], [-12409...
2 97797.234190 5.230000 2.97 15 31070 48942 66252 82984 77968 97933 ... 97803 4.741171e+11 2.775149e+06 WY 56 Wyoming Mountain 30246 454670 [[[-11583195.4569, 5115880.6499999985], [-1158...
3 83350.802573 5.430769 0.80 13 84099 139858 179218 190227 185605 271387 ... 83344 4.235214e+11 3.858889e+06 ID 16 Idaho Mountain 58179 1177304 [[[-13027307.5952, 5415905.1391], [-13027029.2...
4 97082.455125 5.580000 6.53 5 191100 370634 492596 542138 540228 623521 ... 97074 4.853035e+11 3.152829e+06 OR 41 Oregon Pacific 118986 2961623 [[[-13518806.9335, 5160130.824299999], [-13612...
5 77327.471700 5.100000 41.00 2 105011 181463 241251 256521 244580 333194 ... 77330 3.577281e+11 2.818945e+06 NE 31 Nebraska West North Central 56484 1533261 [[[-11288619.3909, 4866088.134499997], [-11360...
6 48562.939593 5.300000 11.00 2 1030442 2039736 3018682 2849353 2899785 3450690 ... 48562 2.349835e+11 3.513052e+06 NY 36 New York Middle Atlantic 622447 12893689 [[[-8879202.7021, 5201108.239100002], [-884366...
7 45357.772978 5.200000 -5.00 2 675692 1151458 1779185 1905326 1928007 2194417 ... 45360 2.057164e+11 2.036410e+06 PA 42 Pennsylvania Middle Atlantic 472747 10484203 [[[-8624565.8676, 4825281.948399998], [-869360...
8 110678.139940 5.483228 0.00 254 100107 228327 323795 296265 319035 365982 ... 110670 4.807101e+11 3.025639e+06 NV 32 Nevada Mountain 76292 1501886 [[[-13263990.105700001, 4637763.929099999], [-...
9 84871.727436 5.328000 0.60 5 181770 314135 299285 280506 248553 509320 ... 84872 3.681590e+11 2.556139e+06 UT 49 Utah Mountain 67313 1992975 [[[-12695684.3592, 4598889.777599998], [-12695...
10 157762.181916 5.439736 0.00 303 1946127 3963444 5500264 5002390 4613936 6762848 ... 157776 6.479799e+11 5.237952e+06 CA 06 California Pacific 711679 20170059 [[[-13543710.3314, 4603367.824500002], [-13556...
11 41192.573849 5.000000 10.00 1 638561 1153565 1668083 1756376 1740459 2133409 ... 41194 1.836222e+11 2.055517e+06 OH 39 Ohio East North Central 337278 9645453 [[[-9269880.6674, 4665854.533399999], [-927180...
12 56298.224472 5.228571 4.60 7 706990 1395667 1916801 1860796 1793563 2368902 ... 56299 2.494717e+11 2.566522e+06 IL 17 Illinois East North Central 293836 9125471 [[[-9804084.7173, 4510580.3904], [-9805901.351...
13 113752.025742 5.200000 5.00 4 299335 588872 761246 708020 738373 984561 ... 113713 4.331534e+11 2.831601e+06 AZ 04 Arizona Mountain 287862 3873611 [[[-12748377.9572, 3898982.239], [-12752659.45...
14 104103.589617 5.320000 0.00 5 246393 515513 698324 705586 618577 803290 ... 104101 4.473323e+11 2.708163e+06 CO 08 Colorado Mountain 149799 3560005 [[[-11359536.8705, 4528901.220200002], [-11359...
15 40321.625732 5.100000 8.00 1 237234 445758 608905 614710 635551 728917 ... 40320 1.663573e+11 2.540552e+06 KY 21 Kentucky East South Central 160280 3640889 [[[-9630323.529, 4391137.012400001], [-9659251...
16 70001.180914 5.233333 3.10 3 215222 370889 481752 505196 529285 656007 ... 70003 2.748233e+11 3.061748e+06 OK 40 Oklahoma West South Central 172107 2628434 [[[-10512937.2526, 4154257.1931999996], [-1051...
17 39821.073291 5.700000 6.00 1 398296 778274 1150603 1116101 1104646 1276280 ... 39820 1.642689e+11 3.776967e+06 VA 51 Virginia South Atlantic 205019 5120110 [[[-8810276.7693, 4376040.050300002], [-881837...
18 69833.629817 5.250000 18.00 2 323670 574613 819678 839935 854244 1057794 ... 69833 2.948692e+11 2.892120e+06 MO 29 Missouri West North Central 247423 4748083 [[[-9919126.9943, 4432686.060800001], [-992186...
19 121759.923226 5.025000 0.00 4 107628 185335 259082 272631 273571 377946 ... 121757 4.650606e+11 2.883357e+06 NM 35 New Mexico Mountain 102608 1214253 [[[-12139334.289099999, 3821476.7324], [-12139...
20 42092.294446 5.600000 3.00 1 323219 629466 865399 861904 907463 1023641 ... 42092 1.662896e+11 2.477541e+06 TN 47 Tennessee East South Central 206538 4563310 [[[-9345784.1967, 4225961.4164], [-9352235.036...
21 264434.418889 5.450000 10.00 2 1288410 2501993 3259444 3049533 2793149 4262131 ... 264436 9.466702e+11 7.793795e+06 TX 48 Texas West South Central 764221 14799505 [[[-11799742.103500001, 3684016.136699997], [-...
22 52913.823552 5.150000 5.00 2 156692 281720 376511 379700 424389 498784 ... 52913 2.042944e+11 2.514666e+06 AR 05 Arkansas West South Central 130347 2138598 [[[-10515427.3911, 4055253.484700002], [-10514...

23 rows × 57 columns

Thus, from our data, of the 50 states, only 23 have had earthquakes. Let us plot a bar chart to view which states had the most earthquakes

In [ ]:
aggregation_df.plot('state_name','Point_Count', kind='bar')
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x65aff2cc18>

Clearly, California tops the list with the most number of earthquakes. Let us view what the average intensity and minimum depth is in the plots below:

In [ ]:
aggregation_df.plot('state_name',['MEAN_magnitude', 'MIN_depth'],kind='bar', subplots=True)
Out[ ]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x00000065AED171D0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x00000065AED8DB00>], dtype=object)

Feedback on this topic?