Part 2 - Where to enrich? (what are study areas?)

Enriching Study Areas

GeoEnrichment uses the concept of a study area to define the location of the point or area that you want to enrich with additional information or create reports about. The accepted forms of study areas are:

  1. Street address locations
    • a. Single line input
    • b. Multiple field input
  2. Point, line and polygon geometries
  3. Buffered study areas
  4. Named statistical areas

Before we look at the exmaples of study areas, let's understand the concept of Data collections and analysis variables. We will look at Data collections in detail in a later section.

Data collections and analysis variables

GeoEnrichment uses the concept of a data collection to define the data attributes (analysis variables) returned by the enrichment service. A data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features. We will introduce the concept of data collections here and look at the details in the next guide.

The Country class can be used to discover the data collections, sub-geographies and available reports for a country. When working with a particular country, you will find it convenient to get a reference to it using the Country.get() method.

The data_collections property of a Country object lists a combination of available data collections and analysis variables for each data collection as a Pandas dataframe.

Once we know the data collection we would like to use, we can look at all the unique analysisVariable available in that data collection.

# Import Libraries
from arcgis.gis import GIS
from arcgis.geoenrichment import Country, enrich, BufferStudyArea
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
# Get US as a country
usa = Country.get('US')
type(usa)
arcgis.geoenrichment.enrichment.Country
df = usa.data_collections

# print a few rows of the DataFrame
df.head()
analysisVariablealiasfieldCategoryvintage
dataCollectionID
1yearincrements1yearincrements.AGE0_CY2022 Population Age <12022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE1_CY2022 Population Age 12022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE2_CY2022 Population Age 22022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE3_CY2022 Population Age 32022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE4_CY2022 Population Age 42022 Age: 1 Year Increments (Esri)2022
# call the shape property to get the total number of rows and columns
df.shape
(18946, 4)

Each data collection can have multiple analysis variables as seen in the table above. Every such analysis variable has a unique ID, found in the analysisVariable column. When calling the enrich() method, these analysis variables can be passed in the data_collections and analysis_variables parameters.

You can filter the data_collections and query the collections analysis_variables using Pandas expressions.

# get all the unique data collections available for the current country
df.index.unique()
Index(['1yearincrements', '5yearincrements', 'Age', 'agebyracebysex',
       'AgeDependency', 'AtRisk', 'AutomobilesAutomotiveProducts',
       'BabyProductsToysGames', 'basicFactsForMobileApps', 'businesses',
       ...
       'travelMPI', 'unitsinstructure', 'urbanizationgroupsNEW', 'vacant',
       'vehiclesavailable', 'veterans', 'Wealth', 'women', 'yearbuilt',
       'yearmovedin'],
      dtype='object', name='dataCollectionID', length=115)

The snippet below shows how you can query the Age data collection and get all the unique analysisVariables under that collection.

df.loc['Age']['analysisVariable'].unique()
array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',
       'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',
       'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',
       'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',
       'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',
       'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',
       'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',
       'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)
# View a sample of the `Age` data collection
df.loc['Age'].head()
analysisVariablealiasfieldCategoryvintage
dataCollectionID
AgeAge.MALE02022 Males Age 0-42022 Age: 5 Year Increments (Esri)2022
AgeAge.MALE52022 Males Age 5-92022 Age: 5 Year Increments (Esri)2022
AgeAge.MALE102022 Males Age 10-142022 Age: 5 Year Increments (Esri)2022
AgeAge.MALE152022 Males Age 15-192022 Age: 5 Year Increments (Esri)2022
AgeAge.MALE202022 Males Age 20-242022 Age: 5 Year Increments (Esri)2022

Now, let's look at some examples of enriching each of the study areas.

Enriching street address

Street address locations can be passed as strings of input street addresses, points of interest or place names. A street address can be passed as a single line or as a multiple field input. If a point (e.g. a street address) is used as a study area, the service will create a 1 mile ring buffer around the point to collect and append enrichment data.

The example below uses a street address as a study area for enrichment using Age data collection.

Single line address

# Enriching single address as single line imput
single_address = enrich(study_areas=["380 New York St Redlands CA 92373"], 
                       data_collections=['Age'])
single_address
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USA-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...366.0392.0365.0345.0322.0277.0168.0103.0132.0{"rings": [[[-117.19479001927878, 34.071773611...

1 rows × 48 columns

Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

A buffer of 1 mile is created by default, as seen on this map, for any address.

# Plot on a map
address_map = gis.map('Redlands, CA',13)
address_map
single_address.spatial.plot(address_map)
True

Multiple addresses as single line input

# Enriching multiple addresses as single line input
enrich(study_areas=[{"address":{"text":"12 Concorde Place Toronto ON M3C 3R8","sourceCountry":"Canada"}},
                    {"address":{"text":"380 New York St Redlands CA 92373","sourceCountry":"USA"}}], 
       data_collections=['Age'])
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...ecypfa4549ecypfa5054ecypfa5559ecypfa6064ecypfa6569ecypfa7074ecypfa7579ecypfa8084ecypfa85_pSHAPE
0CAN-79.32877943.729724RingBufferesriMilesMiles1.0BlockApportionment:CAN.DA;PointsLayer:CAN.Bloc...-1.0-1.0...1307.01256.01282.01165.01175.01035.0857.0597.0978.0{"rings": [[[-79.32877900047636, 43.7442083312...
1CAN-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:CAN.DA;PointsLayer:CAN.Bloc...-1.0-1.0...0.00.00.00.00.00.00.00.00.0{"rings": [[[-117.19479001927878, 34.071773611...

2 rows × 48 columns

Multiple field input

enrich(study_areas=[{"address":{"Address":"380 New York Street", 
                                "City":"Redlands", "Region":"CA", "Postal":92373}}], 
       data_collections=['Age'])
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USA-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...366.0392.0365.0345.0322.0277.0168.0103.0132.0{"rings": [[[-117.19479001927878, 34.071773611...

1 rows × 48 columns

Enriching with various analysis variables for age such as FEM45, FEM50, FEM65 etc.

enrich(study_areas=["380 New York St Redlands CA 92373"], 
       analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datafem45fem55fem65SHAPE
0USA-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.5761366.0365.0322.0{"rings": [[[-117.19479001927878, 34.071773611...

Enriching point, line and polygon geometries

Point geometries can be passed as x and y coordinates to study_areas parameter. When points are specified as study areas, the service will analyze map areas surrounding or associated with the input point locations. Unless otherwise specified, the service will analyze a one mile ring around a point. This is also true for a line. Locations can also be given as polygon geometries.

Single Point described as map coordinates

from arcgis.geometry import Point
pt = Point({"x" : -117.1956, "y" : 34.0572, "spatialReference" : {"wkid" : 4326}})
enrich(study_areas=[pt], data_collections=['Age'])
source_countryarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USARingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.5761462.0...352.0378.0350.0333.0310.0268.0160.098.0125.0{"rings": [[[-117.19559999999998, 34.071708616...

1 rows × 46 columns

Multiple points with attributes described as map coordinates

pt1 = Point({"x" : -122.435, "y" : 37.785, "spatialReference" : {"wkid" : 4326}})
pt2 = Point({"x" : -122.433, "y" : 37.734, "spatialReference" : {"wkid" : 4326}})

enrich(study_areas=[pt1, pt2], data_collections=['Age'])
source_countryarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USARingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57611670.0...3013.02560.02535.02814.02677.02562.02096.01650.02327.0{"rings": [[[-122.43499999999999, 37.799499596...
1USARingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57611554.0...2340.02273.02239.02134.02034.01726.01183.0821.0864.0{"rings": [[[-122.43299999999999, 37.748499722...

2 rows × 46 columns

Line feature described as geometry

from arcgis.geometry import Polyline
line = Polyline({"paths":[[[-13048580,4036370],[-13046151,4036366]]],
                 "spatialReference":{"wkid":102100}})
enriched_line_df = enrich(study_areas=[line], data_collections=['Age'])
enriched_line_df
source_countryarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USARingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.5761646.0...554.0553.0506.0496.0450.0398.0247.0163.0258.0{"rings": [[[-117.21736177272676, 34.070851408...

1 rows × 46 columns

Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

# Plot on a map
line_map = gis.map('Redlands, CA',13)
line_map

We can clearly see the line and a 1 mile buffer around the line in this map

# Draw line
line_map.draw(line)

# Plot enriched area around line
enriched_line_df.spatial.plot(line_map)
True

Map area described as polygons

from arcgis.geometry import Polygon
poly = Polygon({"rings":[[[-117.185412,34.063170],[-122.81,37.81],
                        [-117.200570,34.057196],[-117.185412,34.063170]]],
                        "spatialReference":{"wkid":4326}})

enrich(study_areas=[poly], data_collections=['Age'])
source_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0male5male10male15male20...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57615430.05408.05184.05105.05001.0...3724.03770.03727.03517.02780.02157.01447.0911.01013.0{"rings": [[[-117.18541199999999, 34.063170000...

1 rows × 42 columns

Enriching Buffered study areas

BufferStudyArea instances are used to change the ring buffer size or create drive-time service areas around points specified using one of the above methods. BufferStudyArea allows you to buffer point and street address study areas. They can be created using the following parameters:

    * area: the point geometry or street address (string) study area to be buffered
    * radii: list of distances by which to buffer the study area, eg. [1, 2, 3]
    * units: distance unit, eg. Miles, Kilometers, Minutes (when using drive times/travel_mode)
    * overlap: boolean, uses overlapping rings/network service areas when True, or non-overlapping disks when False
    * travel_mode: None or string, one of the supported travel modes when using network service areas
    

BufferStudyArea also allows you to define drive time service areas around points as well as other advanced service areas such as walking and trucking.

Buffering location using driving distance

The example below creates disks of radii 1, 3 and 5 Miles respectively from a street address and enriches these using the 'Age' data collection.

buffered = BufferStudyArea(area='380 New York St Redlands CA 92373',
                           radii=[1,3,5], units='Miles', overlap=False)
drive_dist_df = enrich(study_areas=[buffered], data_collections=['Age'])
drive_dist_df
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USA-117.1947934.057265RingBufferMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...366.0392.0365.0345.0322.0277.0168.0103.0132.0{"rings": [[[-117.19479001927878, 34.071773611...
1USA-117.1947934.057265RingBufferMilesMiles3.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...2336.02361.02423.02514.02218.01811.01283.0899.01184.0{"rings": [[[-117.19479001927878, 34.100790740...
2USA-117.1947934.057265RingBufferMilesMiles5.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...4717.04841.05002.05009.04336.03444.02473.01723.02258.0{"rings": [[[-117.19479001927878, 34.129807732...

3 rows × 48 columns

Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

# Plot on a map
buffer_map1 = gis.map('Redlands, CA')
buffer_map1.basemap = 'dark-gray-vector'
buffer_map1
drive_dist_df.spatial.plot(map_widget=buffer_map1,
               renderer_type='c',  # for class breaks renderer
               method='esriClassifyNaturalBreaks',  # classification algorithm
               class_count=4,  # choose the number of classes
               col='bufferRadii',  # numeric column to classify
               cmap='viridis',  # color map to pick colors from for each class
               alpha=0.7  # specify opacity
               )
True

Buffering location using drive times

The example below creates 5 and 10 minute drive times from a street address and enriches these using the 'Age' data collection.

buffered = BufferStudyArea(area='380 New York St Redlands CA 92373', 
                           radii=[5, 10], units='Minutes', 
                           travel_mode='Driving')
drive_time_df = enrich(study_areas=[buffered], data_collections=['Age'])
drive_time_df
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USA-117.1947934.057265NetworkServiceAreaMinutesDrive Time Minutes5.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...457.0479.0461.0434.0404.0342.0215.0137.0175.0{"rings": [[[-117.19996384486102, 34.076732195...
1USA-117.1947934.057265NetworkServiceAreaMinutesDrive Time Minutes10.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...2831.02860.02954.03025.02642.02155.01537.01111.01604.0{"rings": [[[-117.19300193285012, 34.134897844...

2 rows × 48 columns

Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

# Plot on a map
buffer_map2 = gis.map('Redlands, CA')
buffer_map2.basemap = 'dark-gray-vector'
buffer_map2
drive_time_df.spatial.plot(map_widget=buffer_map2,
                   renderer_type='c',  # for class breaks renderer
                   method='esriClassifyNaturalBreaks',  # classification algorithm
                   class_count=3,  # choose the number of classes
                   col='bufferRadii',  # numeric column to classify
                   cmap='viridis',  # color map to pick colors from for each class
                   alpha=0.7  # specify opacity
                   )
True

Enriching a named statistical area

In all previous examples of different study area types, locations were defined as either points or polygons. Study area locations can also be passed as one or many named statistical areas. This form of study area lets you define an area as a standard geographic statistical feature, such as a census or postal area, for example, to obtain enrichment information for a U.S. state, county, or ZIP Code or a Canadian province or postal code. We will explore Named statistical areas in detail in the next section.

Enriching a zip code

Enriching zip code 92373 in California using the 'Age' data collection:

usa = Country.get('US')
redlands = usa.subgeographies.states['California'].zip5['92373']
type(redlands)
arcgis.geoenrichment.enrichment.NamedArea
redlands
<NamedArea name:"United States" area_id="92373", level="US.ZIP5", country="United States">
redlands_df = enrich(study_areas=[redlands], data_collections=['Age'] )
redlands_df
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0male5...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0US.ZIP5Redlands92373USAQuery:US.ZIP52.1912.5761888.0911.0...1056.01135.01152.01209.01149.01014.0730.0518.0810.0{"rings": [[[-117.16767396036383, 33.976847519...

1 rows × 45 columns

Visualize results on a map

The returned spatial dataframe can be visualized on a map as shown below:

zip_map = gis.map('Redlands, CA')
zip_map
redlands_df.spatial.plot(zip_map)
True

Enriching all counties in a state

Note: When getting NamedAreas in general a dictionary is returned. In this case, you can either pass in the dictionary as a variable to the study_areas parameter or a list of the dictionary values.

Here is an example of each case:

ca_counties = usa.subgeographies.states['California'].counties
counties_df = enrich(study_areas=ca_counties, data_collections=['Age'])
counties_df.head()
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamale0male5...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0US.CountiesAlameda County06001USAQuery:US.Counties2.1912.576149161.051493.0...55521.054569.055358.053052.046019.037534.025991.016889.019975.0{"rings": [[[-122.27036220796694, 37.904395965...
1US.CountiesAlpine County06003USAQuery:US.Counties2.1912.576133.037.0...31.042.050.044.068.030.023.015.08.0{"rings": [[[-119.90060896152161, 38.930765231...
2US.CountiesAmador County06005USAQuery:US.Counties2.1912.5761685.0787.0...1040.01181.01553.01851.01739.01579.01040.0697.0690.0{"rings": [[[-120.07765196234347, 38.708892198...
3US.CountiesButte County06007USAQuery:US.Counties2.1912.57615738.05744.0...5335.05784.06706.07248.06928.05700.04003.02574.03220.0{"rings": [[[-121.40463436180775, 40.146646214...
4US.CountiesCalaveras County06009USAQuery:US.Counties2.1912.5761870.0942.0...1205.01510.01968.02198.02257.01876.01172.0657.0617.0{"rings": [[[-120.07247293757524, 38.509161181...

5 rows × 45 columns

counties_df2 = usa.enrich(study_areas=list(ca_counties.values()), data_collections=['transportation'])
counties_df2.head()
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datax6001_xx6001_a...x6061fy_xx6061fy_ax6061fy_ix6062fy_xx6062fy_ax6062fy_ix6063fy_xx6063fy_ax6063fy_iSHAPE
0US.CountiesAlameda County06001USAQuery:US.Counties2.1912.57618938340557.014920.26...192310449.0325.29242.0115477218.0195.33202.02607315.04.41181.0{"rings": [[[-122.27036220796694, 37.904395965...
1US.CountiesAlpine County06003USAQuery:US.Counties2.1912.57615114164.09759.85...41728.079.3359.036960.070.2773.0775.01.4760.0{"rings": [[[-119.90060896152161, 38.930765231...
2US.CountiesAmador County06005USAQuery:US.Counties2.1912.5761151573467.09599.33...1328685.082.7562.01115814.069.4972.022837.01.4258.0{"rings": [[[-120.07765196234347, 38.708892198...
3US.CountiesButte County06007USAQuery:US.Counties2.1912.5761837128815.09924.94...9600470.0113.2784.08498160.0100.27104.0174002.02.0584.0{"rings": [[[-121.40463436180775, 40.146646214...
4US.CountiesCalaveras County06009USAQuery:US.Counties2.1912.5761182566082.09829.12...1417301.076.4557.01300752.070.1672.026749.01.4459.0{"rings": [[[-120.07247293757524, 38.509161181...

5 rows × 255 columns

Visualize results on a map
county_map = gis.map('California')
county_map
counties_df.spatial.plot(map_widget=county_map,
               renderer_type='c',  # for class breaks renderer
               method='esriClassifyNaturalBreaks',  # classification algorithm
               class_count=5,  # choose the number of classes
               col='FEM75',  # numeric column to classify
               cmap='viridis',  # color map to pick colors from for each class
               alpha=0.7  # specify opacity
               )
True
county_map.legend=True

Conclusion

In this part of the arcgis.geoenrichment module guide series, you were introduced to the concept of study areas and how Geoenrichment uses a study area to define the location of the point, polyline or area that you want to enrich. You have also seen in detail how different types of study areas can be enriched and visualized on a map.

In the subsequent pages, you will learn about:

  1. Exploring Named Statistical Areas (explains where to enrich continued)
  2. Data Collections and GeoEnrichment coverage (explains what datasets/variables to enrich with)
  3. Generating Reports
  4. Standard Geography Queries

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.