Part 1 - Introduction to GeoEnrichment

Introduction

The GeoEnrichment module provides a python interface for access the demographic data provided through Business Analyst to enrich study areas, access standard geographies, and create reports. Accessing standard geographies enables retrieving and enriching standard jurisdictional areas such as counties, postal (zip) codes or US Census Block Groups in the United States.

Enriching Study Areas

The GeoEnrichment enrich capability adds location intelligence to the data by providing facts about a location or an area. Using GeoEnrichment, you can get information about the people and places in a specific area or within a certain distance or drive time from a location. It enables you to query and use information from a large collection of datasets including population, income, housing, consumer behavior, and the natural environment.

This enables you to answer questions about locations that you can't answer with maps alone. For example, what kind of people live here? What do people like to do in this area? What are their habits and lifestyles?

GeoEnrichment makes your analysis more powerful by adding demographic variables in a geographic context. Further, these variables can be accessed at multiple standard geographic resolutions based on the jurisdictional area.

Getting Started

GeoEnrichment Source

Utilizing these capabilities requires either a properly configured Web GIS instance and a login with permissions to utilize these capabilities, or a local installation of ArcGIS Pro with Business Analyst and at least one country data pack installed. A properly configured Web GIS instance can be either ArcGIS Online or ArcGIS Enterprise. ArcGIS Enterprise can support GeoEnrichment module capabilities by configuring the GeoEnrichment utility service to connect to ArcGIS Online or a fully configured ArcGIS Business Analyst Enterprise deployment.

Local GeoEnrichment Source

If utilizing a local GIS to perform enrichment, you need to have an environment configured with ArcGIS Pro with Business Analyst and at least one local data pack. To specify this local source, you must create an arcgis.gis.GIS object instance using the 'pro' keyword. This tells GeoEnrichment to use the locally installed source.

NOTE: At the 2.1.0 release, standard geography retrieval and reporting is not yet supported with a local source.

from arcgis.gis import GIS

gis = GIS("pro")

Web GIS GeoEnrichment Source

If using ArcGIS Online or ArcGIS Enterprise with the GeoEnrichment module, you need to create an arcgis.gis.GIS object instance connected to the properly configured Web GIS with a user who has permissions to perform enrichment and create reports.

NOTE: GeoEnrichment operations using ArcGIS Online consumes credits. Credits are the currency used across ArcGIS and are consumed for specific transactions. Learn more about credit consumption for GeoEnrichment here.

from arcgis.gis import GIS

gis = GIS(profile="your_online_profile")

Discovering Countries

Most of the data and jurisdictional areas are organized by country. First we will discover what countries are available. We set the optional as_df parameter to True in order to have a DataFrame returned as a result.

from arcgis.geoenrichment import get_countries

country_df = get_countries(gis, as_df=True)

country_df
iso2iso3namealt_namedatasetsdefault_datasetcontinent
0ALALBAlbaniaALBANIA[ALB_MBR_2021]ALB_MBR_2021Europe
1DZDZAAlgeriaALGERIA[DZA_MBR_2021]DZA_MBR_2021Africa
2ADANDAndorraANDORRA[AND_MBR_2021]AND_MBR_2021Europe
3AOAGOAngolaANGOLA[AGO_MBR_2021]AGO_MBR_2021Africa
4AIAIAAnguillaANGUILLA[AIA_MBR_2020]AIA_MBR_2020North America
........................
172VEVENVenezuelaVENEZUELA, BOLIVARIAN REPUBLIC OF[VEN_MBR_2021]VEN_MBR_2021South America
173VNVNMVietnamVIET NAM[VNM_MBR_2022]VNM_MBR_2022Asia
174VIVIRVirgin IslandsUNITED STATES VIRGIN ISLANDS[VIR_MBR_2020]VIR_MBR_2020North America
175ZMZMBZambiaZAMBIA[ZMB_MBR_2021]ZMB_MBR_2021Africa
176ZWZWEZimbabweZIMBABWE[ZWE_MBR_2021]ZWE_MBR_2021Africa

177 rows × 7 columns

Next, an arcgis.geoenrichment.Country object instance can be created to use for subsequent analysis steps.

from arcgis.geoenrichment import Country

country = Country("usa", gis=gis)

country
<Country - United States (GIS @ https://geosaurus.maps.arcgis.com version:10.3)>

Enrich Example

To provide context, we can apply a quick example:
A large retailer is evaluating potential sites for a new location. This retailer is interested in using key criteria to evaluate a few candidates. These criteria include competition, traffic, economic feasibility and market potential for the areas surroundinng the potential sites. Utilizing the GeoEnrichment module, the real estate site selection team can include demographic variables such as lifestyle, income, spending and education to understand potential customers in the study areas surrounding the candidate sites.

Discover Demographic Variables

First, we can discover the variables available with the enrich_variables property of the Country object.

ev = country.enrich_variables

ev
namealiasdata_collectionenrich_nameenrich_field_namedescriptionvintageunits
0AGE0_CY2022 Population Age <11yearincrements1yearincrements.AGE0_CYF1yearincrements_AGE0_CY2022 Total Population Age <1 (Esri)2022count
1AGE1_CY2022 Population Age 11yearincrements1yearincrements.AGE1_CYF1yearincrements_AGE1_CY2022 Total Population Age 1 (Esri)2022count
2AGE2_CY2022 Population Age 21yearincrements1yearincrements.AGE2_CYF1yearincrements_AGE2_CY2022 Total Population Age 2 (Esri)2022count
3AGE3_CY2022 Population Age 31yearincrements1yearincrements.AGE3_CYF1yearincrements_AGE3_CY2022 Total Population Age 3 (Esri)2022count
4AGE4_CY2022 Population Age 41yearincrements1yearincrements.AGE4_CYF1yearincrements_AGE4_CY2022 Total Population Age 4 (Esri)2022count
...........................
18941MOEMEDYRMV2020 Median Year Householder Moved In MOE (ACS...yearmovedinyearmovedin.MOEMEDYRMVyearmovedin_MOEMEDYRMV2020 Median Year Householder Moved into Unit M...2016-2020count
18942RELMEDYRMV2020 Median Year Householder Moved In REL (ACS...yearmovedinyearmovedin.RELMEDYRMVyearmovedin_RELMEDYRMV2020 Median Year Householder Moved into Unit R...2016-2020count
18943ACSOWNER2020 Owner Households (ACS 5-Yr)yearmovedinyearmovedin.ACSOWNERyearmovedin_ACSOWNER2020 Owner Households (ACS 5-Yr)2016-2020count
18944MOEOWNER2020 Owner Households MOE (ACS 5-Yr)yearmovedinyearmovedin.MOEOWNERyearmovedin_MOEOWNER2020 Owner Households MOE (ACS 5-Yr)2016-2020count
18945RELOWNER2020 Owner Households REL (ACS 5-Yr)yearmovedinyearmovedin.RELOWNERyearmovedin_RELOWNER2020 Owner Households REL (ACS 5-Yr)2016-2020count

18946 rows × 8 columns

Finding Variables

This list of economic variables can be filtered using a few useful patterns. First, any variable ending with CY is a current year variable, so we can filter to just current year variables using this pattern.

ev[ev.name.str.lower().str.contains("cy")].reset_index()
indexnamealiasdata_collectionenrich_nameenrich_field_namedescriptionvintageunits
00AGE0_CY2022 Population Age <11yearincrements1yearincrements.AGE0_CYF1yearincrements_AGE0_CY2022 Total Population Age <1 (Esri)2022count
11AGE1_CY2022 Population Age 11yearincrements1yearincrements.AGE1_CYF1yearincrements_AGE1_CY2022 Total Population Age 1 (Esri)2022count
22AGE2_CY2022 Population Age 21yearincrements1yearincrements.AGE2_CYF1yearincrements_AGE2_CY2022 Total Population Age 2 (Esri)2022count
33AGE3_CY2022 Population Age 31yearincrements1yearincrements.AGE3_CYF1yearincrements_AGE3_CY2022 Total Population Age 3 (Esri)2022count
44AGE4_CY2022 Population Age 41yearincrements1yearincrements.AGE4_CYF1yearincrements_AGE4_CY2022 Total Population Age 4 (Esri)2022count
..............................
159518794VAL1M_CY2022 Home Value $1 Million-1499999WealthWealth.VAL1M_CYWealth_VAL1M_CY2022 Home Value $1,000,000-$1,499,999 (Esri)2022count
159618795MEDVAL_CY2022 Median Home ValueWealthWealth.MEDVAL_CYWealth_MEDVAL_CY2022 Median Home Value (Esri)2022currency
159718796AVGVAL_CY2022 Average Home ValueWealthWealth.AVGVAL_CYWealth_AVGVAL_CY2022 Average Home Value (Esri)2022currency
159818797VALBASE_CY2022 Home Value BaseWealthWealth.VALBASE_CYWealth_VALBASE_CY2022 Owner Occupied Housing Units by Value Bas...2022count
159918827WLTHINDXCY2022 Wealth IndexWealthWealth.WLTHINDXCYWealth_WLTHINDXCY2022 Wealth Index (Esri)2022count

1600 rows × 9 columns

Because we are working with a DataFrame, we can easily filter by key words in the description. Here, we are searching for a metric representing relative diversity. We see there is a variable available, the 2021 Diversity Index. There are three rows that result from our filtering.

Data Collections are groupings of variables. Frequently these groupings can speed up analysis by offering a selection of variables to use for quickly getting started.

ev[
    (ev.name.str.lower().str.contains("cy"))
    & (ev.alias.str.lower().str.contains("diversity"))
].reset_index(drop=True)
namealiasdata_collectionenrich_nameenrich_field_namedescriptionvintageunits
0DIVINDX_CY2022 Diversity IndexKeyUSFactsKeyUSFacts.DIVINDX_CYKeyUSFacts_DIVINDX_CY2022 Diversity Index (Esri)2022count
1DIVINDX_CY2022 Diversity IndexPolicyPolicy.DIVINDX_CYPolicy_DIVINDX_CY2022 Diversity Index (Esri)2022count
2DIVINDX_CY2022 Diversity Indexraceandhispanicoriginraceandhispanicorigin.DIVINDX_CYraceandhispanicorigin_DIVINDX_CY2022 Diversity Index (Esri)2022count

Next, we can select a few variables to use for analysis.

analysis_variables = [
    "TOTPOP_CY",  # Population: Total Population (Esri)
    "DIVINDX_CY",  # Diversity Index (Esri)
    "AVGHHSZ_CY",  # Average Household Size (Esri)
    "MEDAGE_CY",  # Age: Median Age (Esri)
    "MEDHINC_CY",  # Income: Median Household Income (Esri)
    "BACHDEG_CY",  # Education: Bachelor"s Degree (Esri)
]

analysis_variables
['TOTPOP_CY',
 'DIVINDX_CY',
 'AVGHHSZ_CY',
 'MEDAGE_CY',
 'MEDHINC_CY',
 'BACHDEG_CY']

Load Data

We can load the study areas surrounding each location from a Python pickle file. The enrich capability in Business Analyst requires polygon areas to use for apportioning demographic data to the input geographies. The polygons delineating the area to be used for apportioning selected demographic data to each location, these are referred to as study areas. While, for this example, we already have study areas created, it is possible to specify parameters for study areas for the enrich tool. This is demonstrated in a later example.

import pandas as pd
from arcgis.features import (
    GeoAccessor,  # adds "spatial" namespace to Pandas DataFrame object
)

itm_id = "379bdcc3f34b4407bef1135956edcf4b"
candidate_df = (
    gis.content.get(itm_id).layers[0].query(out_fields="loc_id", as_df=True)
)

candidate_df
OBJECTIDloc_idSHAPE
01Facility 1{"rings": [[[-118.309153568, 34.074037262], [-...
12Facility 2{"rings": [[[-118.309153568, 34.082122063], [-...
23Facility 4{"rings": [[[-118.376302328, 34.090880596], [-...
34Facility 5{"rings": [[[-118.376302328, 34.0911051740001]...
45Facility 3{"rings": [[[-118.153970313, 34.0778550840001]...

Enrich

Finally, we can run the enrich method found in the Country class to get data about the study areas using the enrich variables selected above. If you are enriching a study area where you do not know the country you can also use the enrich method found outside of the Country class.

enrich_df = country.enrich(candidate_df, enrich_variables=analysis_variables)

enrich_df
objectidloc_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamedage_cytotpop_cyavghhsz_cybachdeg_cymedhinc_cydivindx_cySHAPE
01Facility 1USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576133.7441440.02.6161913.050088.088.4{"rings": [[[-118.309153568, 34.07403726200000...
12Facility 2USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576134.3454965.02.5272400.052284.088.7{"rings": [[[-118.309153568, 34.082122063], [-...
23Facility 4USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576138.4224109.02.2358747.093112.082.2{"rings": [[[-118.376302328, 34.09088059599999...
34Facility 5USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576138.5221385.02.2058655.094416.081.5{"rings": [[[-118.376302328, 34.0911051740001]...
45Facility 3USABlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576131.4230872.03.5116674.055399.071.2{"rings": [[[-118.15397031299999, 34.077855084...

The response includes metadata related to how the enrichment was performed. However, if we are only interested in the actual demographic columns added, we can filter to just these using the available enrich variable names.

# get just the enrich columns
enrich_cols = [c for c in enrich_df if c in ev.name.str.lower().values]

# combine the enrich columns with a few others we want to keep
keep_cols = ["loc_id"] + enrich_cols + ["SHAPE"]

# filter the enrich data frame to just these columns
enrich_df = enrich_df.loc[:, keep_cols].set_index("loc_id")

# re-enable spatial awareness
enrich_df.spatial.set_geometry("SHAPE")

enrich_df
medage_cytotpop_cyavghhsz_cybachdeg_cymedhinc_cydivindx_cySHAPE
loc_id
Facility 133.7441440.02.6161913.050088.088.4{"rings": [[[-118.309153568, 34.07403726200000...
Facility 234.3454965.02.5272400.052284.088.7{"rings": [[[-118.309153568, 34.082122063], [-...
Facility 438.4224109.02.2358747.093112.082.2{"rings": [[[-118.376302328, 34.09088059599999...
Facility 538.5221385.02.2058655.094416.081.5{"rings": [[[-118.376302328, 34.0911051740001]...
Facility 331.4230872.03.5116674.055399.071.2{"rings": [[[-118.15397031299999, 34.077855084...

Evaluate Results

An extremely effective starting point for analysis is simply visualizing the results. Here, we are using matplotlib to visualize the differencees between the locations based on the enriched data.

# this is due to a deprication warning inside matplotlib
import warnings

import matplotlib.pyplot as plt

warnings.filterwarnings("ignore")

fig, axs = plt.subplots(2, 3)
fig.set_figheight(10.0)
fig.set_figwidth(18.0)
fig.subplots_adjust(hspace=0.4)

plt.sca(axs[0, 0])
_ = enrich_df.medage_cy.plot(title="Median Age", kind="bar")

plt.sca(axs[0, 1])
_ = enrich_df.totpop_cy.plot(title="Total Population", kind="bar")

plt.sca(axs[0, 2])
_ = enrich_df.avghhsz_cy.plot(title="Average Household Size", kind="bar")

plt.sca(axs[1, 0])
_ = enrich_df.bachdeg_cy.plot(title="Bachelor's Degree", kind="bar")

plt.sca(axs[1, 1])
_ = enrich_df.medhinc_cy.plot(title="Median Household Income", kind="bar")

plt.sca(axs[1, 2])
_ = enrich_df.divindx_cy.plot(title="Diversity Index", kind="bar")
<Figure size 1296x720 with 6 Axes>

Facility 1 and facility 2 have higher populations, and are diverse with less income. Facility 3 is far younger with larger households, less education, and have lower incomes. Facility 4 and facility 5 are older, more educated and have a higher income.

If interested in opening a discount department store, facility 2 is the most attractive location with facility 1 as a close second. The diversity and lower income can allow us to conclude that people will buy at lower prices.

If interested in opening a quick service restaurant, facility 3 may be the best option to meet the needs of a young, busy and price conscious population.

Obviously, depending on the key characteristics of the business looking for a new location, the key demographic indicators will be different. Using geoenrichment, paired with the ArcGIS API for Python, enables extremely quick access to demographic variables for informed decision making.

Conclusion

GeoEnrichment makes any location data intelligent by providing facts about the location. In this part of the Geoenrichment guide series, you have seen a high-level example of how arcgis.geoenrichment.Country country can be used to enrich a dataset with various socio-demographic features, and also an introduction of the different ways in which data can be enriched. In the subsequent pages, you will learn about:

  1. Enriching Study Areas (explains where to enrich)
  2. Exploring Named Statistical Areas (explains where to enrich continued)
  3. Enriching Data Collections and Spatially Enabled Dataframe (explains what datasets/variables to enrich with)
  4. Generating Reports
  5. Standard Geography Queries

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.