Part 6 - Standard Geography Queries

Input
# Import Libraries
from arcgis.gis import GIS
from arcgis.geoenrichment import Country, enrich, service_limits, standard_geography_query
Input
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
Input
# Get Country
usa = Country.get('USA')

Standard Geography Queries

Previously in Geoenrichment you have learnt that a study area is used to define the location of the point or area that you want to enrich with additional information. Now, you will be introduced a new form of study area - the Standard Geography Area which lets you define an area by the ID of a standard geographic statistical feature, such as a census or postal area. For example, to obtain enrichment information for a U.S. state, county or ZIP Code or a Canadian province or postal code. The most common workflow for this service is to find a FIPS (standard geography ID) for a geographic name.

standard_geography_query method allows you to query for standard geography IDs and features at the supported geographic levels, and then they can be used to obtain facts about the location using the enrich method or create reports using create_report.

Using Standard Geography Query

Let's look at an example to find the standard geography ID for all Orange counties in U.S. We will then use one of these IDs and enrich() the area with information from Age data collection.

We will use US as the source country and specify US.Counties as the standard geographic layer to be queried since we are looking for Orange counties across U.S. We will use orange as the text for the service to query.

Input
# Find FIPS for all Orange counties in US
orange = standard_geography_query(source_country='US', layers='US.Counties', geoquery='orange')
orange
Output
DatasetID DataLayerID AreaID AreaName MajorSubdivisionName MajorSubdivisionAbbr MajorSubdivisionType CountryAbbr Score ObjectId
0 USA_ESRI_2020 US.Counties 06059 Orange County California CA State US 100 1
1 USA_ESRI_2020 US.Counties 12095 Orange County Florida FL State US 100 2
2 USA_ESRI_2020 US.Counties 18117 Orange County Indiana IN State US 100 3
3 USA_ESRI_2020 US.Counties 36071 Orange County New York NY State US 100 4
4 USA_ESRI_2020 US.Counties 37135 Orange County North Carolina NC State US 100 5
5 USA_ESRI_2020 US.Counties 48361 Orange County Texas TX State US 100 6
6 USA_ESRI_2020 US.Counties 50017 Orange County Vermont VT State US 100 7
7 USA_ESRI_2020 US.Counties 51137 Orange County Virginia VA State US 100 8

The resulting dataframe shows DatasetID, DataLayerID which are the IDs for dataset and layer being queried. AreaID is the unique ID for each area in the results. AreaName is Orange County as we looked for Orange counties across U.S. MajorSubdivisionName, MajorSubdivisionAbbr and MajorSubdivisionType show the type of major subdivision i.e. State along with state name and abbrevation.

Enrich using results from Standard Geography Query

The standard_geography_query returns a list of Orange counties for different states, with the state name shown as field MajorSubdivisionName. Now, let's enrich() Orange County in California using AreaID: 06059.

Input
or_ca = {"sourceCountry":"US","layer":"US.Counties","ids":["06059"]}
Input
orange_df = enrich(study_areas=[or_ca], data_collections=['Age'] )
orange_df
Output
ID OBJECTID StdGeographyLevel StdGeographyName StdGeographyID sourceCountry aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData ... FEM45 FEM50 FEM55 FEM60 FEM65 FEM70 FEM75 FEM80 FEM85 SHAPE
0 0 1 US.Counties Orange County 06059 US Query:US.Counties 2.191 2.576 1 ... 106005 107845 107235 97121 81462 65917 47365 33050 40794 {"rings": [[[-117.9157650000062, 33.9469249994...

1 rows × 47 columns

Input
orange_df.columns
Output
Index(['ID', 'OBJECTID', 'StdGeographyLevel', 'StdGeographyName',
       'StdGeographyID', 'sourceCountry', 'aggregationMethod',
       'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData',
       'MALE0', 'MALE5', 'MALE10', 'MALE15', 'MALE20', 'MALE25', 'MALE30',
       'MALE35', 'MALE40', 'MALE45', 'MALE50', 'MALE55', 'MALE60', 'MALE65',
       'MALE70', 'MALE75', 'MALE80', 'MALE85', 'FEM0', 'FEM5', 'FEM10',
       'FEM15', 'FEM20', 'FEM25', 'FEM30', 'FEM35', 'FEM40', 'FEM45', 'FEM50',
       'FEM55', 'FEM60', 'FEM65', 'FEM70', 'FEM75', 'FEM80', 'FEM85', 'SHAPE'],
      dtype='object')

Enrichment using Age data collection resulted in many columns for various age groups. Other columns such as Standard Geography ID, Name, Level, country, and populationToPolygonSizeRating were also added with enrichment.

Visualize on a Map

Let's visualize the enriched geography on a map.

Input
or_ca_map = gis.map('Los Angeles, CA')
or_ca_map
Input
orange_df.spatial.plot(or_ca_map)
Output
True

Customizing your Query

geoquery parameter is used to specify the search criteria in order to query for the standard geography layers desired. A query is broken up into terms and operators. Multiple terms can be combined together with Boolean operators to form more complex queries. Learn more about using geoquery to create more complex queries here.

Let's look at an example of grouping the search terms to find all Orange or Lake counties in US. Search supports using parentheses to group clauses to form subqueries. This can be useful if you want to control the Boolean logic for a query.

Input
or_lake = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake)')
or_lake
Output
DatasetID DataLayerID AreaID AreaName MajorSubdivisionName MajorSubdivisionAbbr MajorSubdivisionType CountryAbbr Score ObjectId
0 USA_ESRI_2020 US.Counties 06059 Orange County California CA State US 100 1
1 USA_ESRI_2020 US.Counties 12095 Orange County Florida FL State US 100 2
2 USA_ESRI_2020 US.Counties 18117 Orange County Indiana IN State US 100 3
3 USA_ESRI_2020 US.Counties 48361 Orange County Texas TX State US 100 4
4 USA_ESRI_2020 US.Counties 50017 Orange County Vermont VT State US 100 5
5 USA_ESRI_2020 US.Counties 51137 Orange County Virginia VA State US 100 6
6 USA_ESRI_2020 US.Counties 36071 Orange County New York NY State US 99 7
7 USA_ESRI_2020 US.Counties 37135 Orange County North Carolina NC State US 99 8
8 USA_ESRI_2020 US.Counties 06033 Lake County California CA State US 87 9
9 USA_ESRI_2020 US.Counties 08065 Lake County Colorado CO State US 87 10
10 USA_ESRI_2020 US.Counties 12069 Lake County Florida FL State US 87 11
11 USA_ESRI_2020 US.Counties 17097 Lake County Illinois IL State US 87 12
12 USA_ESRI_2020 US.Counties 18089 Lake County Indiana IN State US 87 13
13 USA_ESRI_2020 US.Counties 26085 Lake County Michigan MI State US 87 14
14 USA_ESRI_2020 US.Counties 27075 Lake County Minnesota MN State US 87 15
15 USA_ESRI_2020 US.Counties 30047 Lake County Montana MT State US 87 16
16 USA_ESRI_2020 US.Counties 39085 Lake County Ohio OH State US 87 17
17 USA_ESRI_2020 US.Counties 41037 Lake County Oregon OR State US 87 18
18 USA_ESRI_2020 US.Counties 47095 Lake County Tennessee TN State US 87 19
19 USA_ESRI_2020 US.Counties 16007 Bear Lake County Idaho ID State US 87 20
20 USA_ESRI_2020 US.Counties 27125 Red Lake County Minnesota MN State US 87 21
21 USA_ESRI_2020 US.Counties 46079 Lake County South Dakota SD State US 87 22
22 USA_ESRI_2020 US.Counties 49035 Salt Lake County Utah UT State US 87 23
23 USA_ESRI_2020 US.Counties 55047 Green Lake County Wisconsin WI State US 87 24
24 USA_ESRI_2020 US.Counties 02164 Lake and Peninsula Borough Alaska AK State US 86 25
25 USA_ESRI_2020 US.Counties 27077 Lake of the Woods County Minnesota MN State US 85 26

We see that there are multiple Orange and Lake counties in US. Let's get the results for Orange or Lake county in California.

Input
or_lake_ca = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake) AND CA')
or_lake_ca
Output
DatasetID DataLayerID AreaID AreaName MajorSubdivisionName MajorSubdivisionAbbr MajorSubdivisionType CountryAbbr Score ObjectId
0 USA_ESRI_2020 US.Counties 06059 Orange County California CA State US 100 1
1 USA_ESRI_2020 US.Counties 06033 Lake County California CA State US 89 2

Enrich using results from Standard Geography Query

The standard_geography_query gave us details of Orange and Lake counties in California. Now, let's enrich() these counties using AreaID.

Input
or_lk = {"sourceCountry":"US","layer":"US.Counties","ids":["06059","06033"]}
Input
or_lake_df = enrich(study_areas=[or_lk], data_collections=['Age'] )
or_lake_df
Output
ID OBJECTID StdGeographyLevel StdGeographyName StdGeographyID sourceCountry aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData ... FEM45 FEM50 FEM55 FEM60 FEM65 FEM70 FEM75 FEM80 FEM85 SHAPE
0 0 1 US.Counties Orange County 06059 US Query:US.Counties 2.191 2.576 1 ... 106005 107845 107235 97121 81462 65917 47365 33050 40794 {"rings": [[[-117.9157650000062, 33.9469249994...
1 0 2 US.Counties Lake County 06033 US Query:US.Counties 2.191 2.576 1 ... 1918 2176 2680 2946 2742 2104 1382 834 926 {"rings": [[[-122.81409900076635, 39.581399999...

2 rows × 47 columns

Visualize on a Map

Let's visualize the enriched counties on a map.

Input
or_lake_map = gis.map('California, US',6)
or_lake_map
Input
or_lake_df.spatial.plot(or_lake_map)
Output
True

Data Apportionment

The GeoEnrichment service employs a sophisticated geographic retrieval methodology to aggregate data called Data Apportionment. This methodology determines how data is gathered and summarized or aggregated for input features.

For standard geographic units such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. So, the data retrieval is a simple process of gathering the data for those areas.

The non-standard geographic units such as ring buffers, drive-time service areas, and other non-standard polygons, the geographic retrieval process is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated.

The GeoEnrichment service uses Weighted Centroid geographic retrieval to aggregate data for rings and other polygons. With this methodology, data points within an area of interest are weighted more heavily than points outside that area. When the service aggregates data, the results are statistically adjusted to more accurately reflect the actual statistics within the area of interest.

The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates.

How Apportionment Works

The GeoEnrichment service uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.

Imagine you want to get statistics on total population for the study area represented by center polygon in this image.

Source: https://developers.arcgis.com/rest/geoenrichment/api-reference/data-apportionment.htm

The other four polygons represent census geographies that contain total population values. In the United States, these can be Block Groups with enrichment data; in Canada, they can be Dissemination Areas. The study area intersects 4 block groups that are partially inside the study area. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. For example, if 90 percent of the P3 Blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.

So, for those partially included blocks, the GeoEnrichment service uses data apportionment and the weighted centroid retrieval method to calculate the approximate statistics for those portions of block groups inside the study area. It considers all the block points within each block group touched by the study area but weights the block points inside the study area more heavily.

You can learn more about Data Apportionment and how it works here.

Service Limits

The GeoEnrichment service implements limits on users in order to guarantee accuracy and performance. The limits define the maximum size of a study area, maximum number of study areas, business records in an output, maximum drive time polygon size and many more. Exceeding these limits will cause your query to fail or be returned with a warning that you have exceeded one of these limits and will get results up until the limit is reached.

service_limits() method from arcgis.geoenrichment module can be used to discover and generate a list of service limits.

Let's look at all the service limits.

Input
# Check service limits
service_limits()
Output
paramName paramDescription dataType value
0 MaximumRingSize Maximum size of rings for simple rings builders. esriMiles 1000
1 MaximumRingSizeTime Maximum size of rings (time units) for drive t... esriDriveTimeUnitsMinutes 300
2 defaultFeaturesLimitPerComparisonLevel Default maximum number of features to return p... numeric 5
3 maxRecordCount Maximum number of features to return. numeric 1000
4 maximumAttributeDescriptionLength Maximum length of attribute’s description string. numeric 1000
5 maximumDataCollections Maximum number of data collections to return o... numeric 20
6 maximumDetailedMethodStudyAreasSize Maximum size of rings for drive time/simple ri... esriMiles 300
7 maximumDriveDistance Maximum size of rings for drive time rings bui... esriMiles 300
8 maximumDriveTimeStudyAreasNumber Maximum number of drive time study areas in on... numeric 100
9 maximumNumberOfStudyAreasWithDetailedMethod Maximum number of study areas in one enrich re... numeric 3
10 maximumOutFieldsNumber Maximum number of ‘outFields’ set in intersect... numeric 256
11 maximumRingsNumber Maximum number of rings for study area locatio... numeric 10
12 maximumSelectBusinessesResponseRecords Maximum number of features returned by select ... numeric 5000
13 maximumStdGeographyIDsNumber Maximum number of standard geography IDs to re... numeric 1000
14 maximumStudyAreasNumber Maximum number of study areas in one enrich re... numeric 100
15 maximumStudyAreasNumberInfographicReportHTML Maximum number of study areas in one create in... numeric 100
16 maximumStudyAreasNumberInfographicReportPDF Maximum number of study areas in one create in... numeric 50
17 optimalBatchStudyAreasNumber Optimal number of study areas to request in ea... numeric 50

The paramName shows maximum size, number, drive time etc. of a study area. The paramDescription column details the description of each parameter name. The dataType column shows the type of data for the parameter and the value column shows the service limit.

service_limits() method returns a Pandas' DataFrame that describes the service's limitations for each input parameter. We can store the dataframe and use Pandas operations to subset and get results for specific service.

Input
service_df = service_limits()
Input
service_df.head()
Output
paramName paramDescription dataType value
0 MaximumRingSize Maximum size of rings for simple rings builders. esriMiles 1000
1 MaximumRingSizeTime Maximum size of rings (time units) for drive t... esriDriveTimeUnitsMinutes 300
2 defaultFeaturesLimitPerComparisonLevel Default maximum number of features to return p... numeric 5
3 maxRecordCount Maximum number of features to return. numeric 1000
4 maximumAttributeDescriptionLength Maximum length of attribute’s description string. numeric 1000
Input
service_df[service_df['paramName']=='MaximumRingSize']
Output
paramName paramDescription dataType value
0 MaximumRingSize Maximum size of rings for simple rings builders. esriMiles 1000

Conclusion

In this final part of the arcgis.geoenrichment module guide series, you have seen how the standard_geography_query method is used to query for standard geography areas which can then be used for enrichment, and it being customized to meet more complex search criteria when targeting at more specific results. You have also seen how Data Apportionment utilizes geographic retrieval methodology to aggregate data and how service_limits() can be used to generate a list of limits for different services.

In this guide series, we have demonstrated a majority of the functionality showcasing the power of arcgis.geoenrichment module in various ways. To look up the API reference doc for GeoEnrichment see here.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.