Introduction
The GeoEnrichment module provides a python interface for access the demographic data provided through Business Analyst to enrich study areas, access standard geographies, and create reports. Accessing standard geographies enables retrieving and enriching standard jurisdictional areas such as counties, postal (zip) codes or US Census Block Groups in the United States.
Enriching Study Areas
The GeoEnrichment enrich
capability adds location intelligence to the data by providing facts about a location or an area. Using GeoEnrichment, you can get information about the people and places in a specific area or within a certain distance or drive time from a location. It enables you to query and use information from a large collection of datasets including population, income, housing, consumer behavior, and the natural environment.
This enables you to answer questions about locations that you can't answer with maps alone. For example, what kind of people live here? What do people like to do in this area? What are their habits and lifestyles?
GeoEnrichment makes your analysis more powerful by adding demographic variables in a geographic context. Further, these variables can be accessed at multiple standard geographic resolutions based on the jurisdictional area.
Getting Started
GeoEnrichment Source
Utilizing these capabilities requires either a properly configured Web GIS instance and a login with permissions to utilize these capabilities, or a local installation of ArcGIS Pro with Business Analyst and at least one country data pack installed. A properly configured Web GIS instance can be either ArcGIS Online or ArcGIS Enterprise. ArcGIS Enterprise can support GeoEnrichment module capabilities by configuring the GeoEnrichment utility service to connect to ArcGIS Online or a fully configured ArcGIS Business Analyst Enterprise deployment.
Local GeoEnrichment Source
If utilizing a local GIS to perform enrichment, you need to have an environment configured with ArcGIS Pro with Business Analyst and at least one local data pack. To specify this local source, you must create an arcgis.gis.GIS
object instance using the 'pro'
keyword. This tells GeoEnrichment to use the locally installed source.
NOTE: At the 2.1.0 release, standard geography retrieval and reporting is not yet supported with a local source.
from arcgis.gis import GIS
gis = GIS("pro")
Web GIS GeoEnrichment Source
If using ArcGIS Online or ArcGIS Enterprise with the GeoEnrichment module, you need to create an arcgis.gis.GIS
object instance connected to the properly configured Web GIS with a user who has permissions to perform enrichment and create reports.
NOTE: GeoEnrichment operations using ArcGIS Online consumes credits. Credits are the currency used across ArcGIS and are consumed for specific transactions. Learn more about credit consumption for GeoEnrichment here.
from arcgis.gis import GIS
gis = GIS(profile="your_online_profile")
Discovering Countries
Most of the data and jurisdictional areas are organized by country. First we will discover what countries are available. We set the optional as_df
parameter to True
in order to have a DataFrame returned as a result.
from arcgis.geoenrichment import get_countries
country_df = get_countries(gis, as_df=True)
country_df
iso2 | iso3 | name | alt_name | datasets | default_dataset | continent | |
---|---|---|---|---|---|---|---|
0 | AL | ALB | Albania | ALBANIA | [ALB_MBR_2021] | ALB_MBR_2021 | Europe |
1 | DZ | DZA | Algeria | ALGERIA | [DZA_MBR_2021] | DZA_MBR_2021 | Africa |
2 | AD | AND | Andorra | ANDORRA | [AND_MBR_2021] | AND_MBR_2021 | Europe |
3 | AO | AGO | Angola | ANGOLA | [AGO_MBR_2021] | AGO_MBR_2021 | Africa |
4 | AI | AIA | Anguilla | ANGUILLA | [AIA_MBR_2020] | AIA_MBR_2020 | North America |
... | ... | ... | ... | ... | ... | ... | ... |
172 | VE | VEN | Venezuela | VENEZUELA, BOLIVARIAN REPUBLIC OF | [VEN_MBR_2021] | VEN_MBR_2021 | South America |
173 | VN | VNM | Vietnam | VIET NAM | [VNM_MBR_2022] | VNM_MBR_2022 | Asia |
174 | VI | VIR | Virgin Islands | UNITED STATES VIRGIN ISLANDS | [VIR_MBR_2020] | VIR_MBR_2020 | North America |
175 | ZM | ZMB | Zambia | ZAMBIA | [ZMB_MBR_2021] | ZMB_MBR_2021 | Africa |
176 | ZW | ZWE | Zimbabwe | ZIMBABWE | [ZWE_MBR_2021] | ZWE_MBR_2021 | Africa |
177 rows × 7 columns
Next, an arcgis.geoenrichment.Country
object instance can be created to use for subsequent analysis steps.
from arcgis.geoenrichment import Country
country = Country("usa", gis=gis)
country
<Country - United States (GIS @ https://geosaurus.maps.arcgis.com version:10.3)>
Enrich Example
To provide context, we can apply a quick example:
A large retailer is evaluating potential sites for a new location. This retailer is interested in using key criteria to evaluate a few candidates. These criteria include competition, traffic, economic feasibility and market potential for the areas surroundinng the potential sites. Utilizing the GeoEnrichment module, the real estate site selection team can include demographic variables such as lifestyle, income, spending and education to understand potential customers in the study areas surrounding the candidate sites.
Discover Demographic Variables
First, we can discover the variables available with the enrich_variables
property of the Country
object.
ev = country.enrich_variables
ev
name | alias | data_collection | enrich_name | enrich_field_name | description | vintage | units | |
---|---|---|---|---|---|---|---|---|
0 | AGE0_CY | 2022 Population Age <1 | 1yearincrements | 1yearincrements.AGE0_CY | F1yearincrements_AGE0_CY | 2022 Total Population Age <1 (Esri) | 2022 | count |
1 | AGE1_CY | 2022 Population Age 1 | 1yearincrements | 1yearincrements.AGE1_CY | F1yearincrements_AGE1_CY | 2022 Total Population Age 1 (Esri) | 2022 | count |
2 | AGE2_CY | 2022 Population Age 2 | 1yearincrements | 1yearincrements.AGE2_CY | F1yearincrements_AGE2_CY | 2022 Total Population Age 2 (Esri) | 2022 | count |
3 | AGE3_CY | 2022 Population Age 3 | 1yearincrements | 1yearincrements.AGE3_CY | F1yearincrements_AGE3_CY | 2022 Total Population Age 3 (Esri) | 2022 | count |
4 | AGE4_CY | 2022 Population Age 4 | 1yearincrements | 1yearincrements.AGE4_CY | F1yearincrements_AGE4_CY | 2022 Total Population Age 4 (Esri) | 2022 | count |
... | ... | ... | ... | ... | ... | ... | ... | ... |
18941 | MOEMEDYRMV | 2020 Median Year Householder Moved In MOE (ACS... | yearmovedin | yearmovedin.MOEMEDYRMV | yearmovedin_MOEMEDYRMV | 2020 Median Year Householder Moved into Unit M... | 2016-2020 | count |
18942 | RELMEDYRMV | 2020 Median Year Householder Moved In REL (ACS... | yearmovedin | yearmovedin.RELMEDYRMV | yearmovedin_RELMEDYRMV | 2020 Median Year Householder Moved into Unit R... | 2016-2020 | count |
18943 | ACSOWNER | 2020 Owner Households (ACS 5-Yr) | yearmovedin | yearmovedin.ACSOWNER | yearmovedin_ACSOWNER | 2020 Owner Households (ACS 5-Yr) | 2016-2020 | count |
18944 | MOEOWNER | 2020 Owner Households MOE (ACS 5-Yr) | yearmovedin | yearmovedin.MOEOWNER | yearmovedin_MOEOWNER | 2020 Owner Households MOE (ACS 5-Yr) | 2016-2020 | count |
18945 | RELOWNER | 2020 Owner Households REL (ACS 5-Yr) | yearmovedin | yearmovedin.RELOWNER | yearmovedin_RELOWNER | 2020 Owner Households REL (ACS 5-Yr) | 2016-2020 | count |
18946 rows × 8 columns
Finding Variables
This list of economic variables can be filtered using a few useful patterns. First, any variable ending with CY
is a current year variable, so we can filter to just current year variables using this pattern.
ev[ev.name.str.lower().str.contains("cy")].reset_index()
index | name | alias | data_collection | enrich_name | enrich_field_name | description | vintage | units | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | AGE0_CY | 2022 Population Age <1 | 1yearincrements | 1yearincrements.AGE0_CY | F1yearincrements_AGE0_CY | 2022 Total Population Age <1 (Esri) | 2022 | count |
1 | 1 | AGE1_CY | 2022 Population Age 1 | 1yearincrements | 1yearincrements.AGE1_CY | F1yearincrements_AGE1_CY | 2022 Total Population Age 1 (Esri) | 2022 | count |
2 | 2 | AGE2_CY | 2022 Population Age 2 | 1yearincrements | 1yearincrements.AGE2_CY | F1yearincrements_AGE2_CY | 2022 Total Population Age 2 (Esri) | 2022 | count |
3 | 3 | AGE3_CY | 2022 Population Age 3 | 1yearincrements | 1yearincrements.AGE3_CY | F1yearincrements_AGE3_CY | 2022 Total Population Age 3 (Esri) | 2022 | count |
4 | 4 | AGE4_CY | 2022 Population Age 4 | 1yearincrements | 1yearincrements.AGE4_CY | F1yearincrements_AGE4_CY | 2022 Total Population Age 4 (Esri) | 2022 | count |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1595 | 18794 | VAL1M_CY | 2022 Home Value $1 Million-1499999 | Wealth | Wealth.VAL1M_CY | Wealth_VAL1M_CY | 2022 Home Value $1,000,000-$1,499,999 (Esri) | 2022 | count |
1596 | 18795 | MEDVAL_CY | 2022 Median Home Value | Wealth | Wealth.MEDVAL_CY | Wealth_MEDVAL_CY | 2022 Median Home Value (Esri) | 2022 | currency |
1597 | 18796 | AVGVAL_CY | 2022 Average Home Value | Wealth | Wealth.AVGVAL_CY | Wealth_AVGVAL_CY | 2022 Average Home Value (Esri) | 2022 | currency |
1598 | 18797 | VALBASE_CY | 2022 Home Value Base | Wealth | Wealth.VALBASE_CY | Wealth_VALBASE_CY | 2022 Owner Occupied Housing Units by Value Bas... | 2022 | count |
1599 | 18827 | WLTHINDXCY | 2022 Wealth Index | Wealth | Wealth.WLTHINDXCY | Wealth_WLTHINDXCY | 2022 Wealth Index (Esri) | 2022 | count |
1600 rows × 9 columns
Because we are working with a DataFrame, we can easily filter by key words in the description. Here, we are searching for a metric representing relative diversity. We see there is a variable available, the 2021 Diversity Index. There are three rows that result from our filtering.
Data Collections are groupings of variables. Frequently these groupings can speed up analysis by offering a selection of variables to use for quickly getting started.
ev[
(ev.name.str.lower().str.contains("cy"))
& (ev.alias.str.lower().str.contains("diversity"))
].reset_index(drop=True)
name | alias | data_collection | enrich_name | enrich_field_name | description | vintage | units | |
---|---|---|---|---|---|---|---|---|
0 | DIVINDX_CY | 2022 Diversity Index | KeyUSFacts | KeyUSFacts.DIVINDX_CY | KeyUSFacts_DIVINDX_CY | 2022 Diversity Index (Esri) | 2022 | count |
1 | DIVINDX_CY | 2022 Diversity Index | Policy | Policy.DIVINDX_CY | Policy_DIVINDX_CY | 2022 Diversity Index (Esri) | 2022 | count |
2 | DIVINDX_CY | 2022 Diversity Index | raceandhispanicorigin | raceandhispanicorigin.DIVINDX_CY | raceandhispanicorigin_DIVINDX_CY | 2022 Diversity Index (Esri) | 2022 | count |
Next, we can select a few variables to use for analysis.
analysis_variables = [
"TOTPOP_CY", # Population: Total Population (Esri)
"DIVINDX_CY", # Diversity Index (Esri)
"AVGHHSZ_CY", # Average Household Size (Esri)
"MEDAGE_CY", # Age: Median Age (Esri)
"MEDHINC_CY", # Income: Median Household Income (Esri)
"BACHDEG_CY", # Education: Bachelor"s Degree (Esri)
]
analysis_variables
['TOTPOP_CY', 'DIVINDX_CY', 'AVGHHSZ_CY', 'MEDAGE_CY', 'MEDHINC_CY', 'BACHDEG_CY']
Load Data
We can load the study areas surrounding each location from a Python pickle file. The enrich capability in Business Analyst requires polygon areas to use for apportioning demographic data to the input geographies. The polygons delineating the area to be used for apportioning selected demographic data to each location, these are referred to as study areas. While, for this example, we already have study areas created, it is possible to specify parameters for study areas for the enrich tool. This is demonstrated in a later example.
import pandas as pd
from arcgis.features import (
GeoAccessor, # adds "spatial" namespace to Pandas DataFrame object
)
itm_id = "379bdcc3f34b4407bef1135956edcf4b"
candidate_df = (
gis.content.get(itm_id).layers[0].query(out_fields="loc_id", as_df=True)
)
candidate_df
OBJECTID | loc_id | SHAPE | |
---|---|---|---|
0 | 1 | Facility 1 | {"rings": [[[-118.309153568, 34.074037262], [-... |
1 | 2 | Facility 2 | {"rings": [[[-118.309153568, 34.082122063], [-... |
2 | 3 | Facility 4 | {"rings": [[[-118.376302328, 34.090880596], [-... |
3 | 4 | Facility 5 | {"rings": [[[-118.376302328, 34.0911051740001]... |
4 | 5 | Facility 3 | {"rings": [[[-118.153970313, 34.0778550840001]... |
Enrich
Finally, we can run the enrich
method found in the Country class to get data about the study areas using the enrich variables selected above. If you are enriching a study area where you do not know the country you can also use the enrich
method found outside of the Country class.
enrich_df = country.enrich(candidate_df, enrich_variables=analysis_variables)
enrich_df
objectid | loc_id | source_country | aggregation_method | population_to_polygon_size_rating | apportionment_confidence | has_data | medage_cy | totpop_cy | avghhsz_cy | bachdeg_cy | medhinc_cy | divindx_cy | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Facility 1 | USA | BlockApportionment:US.BlockGroups;PointsLayer:... | 2.191 | 2.576 | 1 | 33.7 | 441440.0 | 2.61 | 61913.0 | 50088.0 | 88.4 | {"rings": [[[-118.309153568, 34.07403726200000... |
1 | 2 | Facility 2 | USA | BlockApportionment:US.BlockGroups;PointsLayer:... | 2.191 | 2.576 | 1 | 34.3 | 454965.0 | 2.52 | 72400.0 | 52284.0 | 88.7 | {"rings": [[[-118.309153568, 34.082122063], [-... |
2 | 3 | Facility 4 | USA | BlockApportionment:US.BlockGroups;PointsLayer:... | 2.191 | 2.576 | 1 | 38.4 | 224109.0 | 2.23 | 58747.0 | 93112.0 | 82.2 | {"rings": [[[-118.376302328, 34.09088059599999... |
3 | 4 | Facility 5 | USA | BlockApportionment:US.BlockGroups;PointsLayer:... | 2.191 | 2.576 | 1 | 38.5 | 221385.0 | 2.20 | 58655.0 | 94416.0 | 81.5 | {"rings": [[[-118.376302328, 34.0911051740001]... |
4 | 5 | Facility 3 | USA | BlockApportionment:US.BlockGroups;PointsLayer:... | 2.191 | 2.576 | 1 | 31.4 | 230872.0 | 3.51 | 16674.0 | 55399.0 | 71.2 | {"rings": [[[-118.15397031299999, 34.077855084... |
The response includes metadata related to how the enrichment was performed. However, if we are only interested in the actual demographic columns added, we can filter to just these using the available enrich variable names.
# get just the enrich columns
enrich_cols = [c for c in enrich_df if c in ev.name.str.lower().values]
# combine the enrich columns with a few others we want to keep
keep_cols = ["loc_id"] + enrich_cols + ["SHAPE"]
# filter the enrich data frame to just these columns
enrich_df = enrich_df.loc[:, keep_cols].set_index("loc_id")
# re-enable spatial awareness
enrich_df.spatial.set_geometry("SHAPE")
enrich_df
medage_cy | totpop_cy | avghhsz_cy | bachdeg_cy | medhinc_cy | divindx_cy | SHAPE | |
---|---|---|---|---|---|---|---|
loc_id | |||||||
Facility 1 | 33.7 | 441440.0 | 2.61 | 61913.0 | 50088.0 | 88.4 | {"rings": [[[-118.309153568, 34.07403726200000... |
Facility 2 | 34.3 | 454965.0 | 2.52 | 72400.0 | 52284.0 | 88.7 | {"rings": [[[-118.309153568, 34.082122063], [-... |
Facility 4 | 38.4 | 224109.0 | 2.23 | 58747.0 | 93112.0 | 82.2 | {"rings": [[[-118.376302328, 34.09088059599999... |
Facility 5 | 38.5 | 221385.0 | 2.20 | 58655.0 | 94416.0 | 81.5 | {"rings": [[[-118.376302328, 34.0911051740001]... |
Facility 3 | 31.4 | 230872.0 | 3.51 | 16674.0 | 55399.0 | 71.2 | {"rings": [[[-118.15397031299999, 34.077855084... |
Evaluate Results
An extremely effective starting point for analysis is simply visualizing the results. Here, we are using matplotlib
to visualize the differencees between the locations based on the enriched data.
# this is due to a deprication warning inside matplotlib
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
fig, axs = plt.subplots(2, 3)
fig.set_figheight(10.0)
fig.set_figwidth(18.0)
fig.subplots_adjust(hspace=0.4)
plt.sca(axs[0, 0])
_ = enrich_df.medage_cy.plot(title="Median Age", kind="bar")
plt.sca(axs[0, 1])
_ = enrich_df.totpop_cy.plot(title="Total Population", kind="bar")
plt.sca(axs[0, 2])
_ = enrich_df.avghhsz_cy.plot(title="Average Household Size", kind="bar")
plt.sca(axs[1, 0])
_ = enrich_df.bachdeg_cy.plot(title="Bachelor's Degree", kind="bar")
plt.sca(axs[1, 1])
_ = enrich_df.medhinc_cy.plot(title="Median Household Income", kind="bar")
plt.sca(axs[1, 2])
_ = enrich_df.divindx_cy.plot(title="Diversity Index", kind="bar")
Facility 1 and facility 2 have higher populations, and are diverse with less income. Facility 3 is far younger with larger households, less education, and have lower incomes. Facility 4 and facility 5 are older, more educated and have a higher income.
If interested in opening a discount department store, facility 2 is the most attractive location with facility 1 as a close second. The diversity and lower income can allow us to conclude that people will buy at lower prices.
If interested in opening a quick service restaurant, facility 3 may be the best option to meet the needs of a young, busy and price conscious population.
Obviously, depending on the key characteristics of the business looking for a new location, the key demographic indicators will be different. Using geoenrichment, paired with the ArcGIS API for Python, enables extremely quick access to demographic variables for informed decision making.
Conclusion
GeoEnrichment makes any location data intelligent by providing facts about the location. In this part of the Geoenrichment guide series, you have seen a high-level example of how arcgis.geoenrichment.Country
country can be used to enrich
a dataset with various socio-demographic features, and also an introduction of the different ways in which data can be enriched. In the subsequent pages, you will learn about:
- Enriching Study Areas (explains where to enrich)
- Exploring Named Statistical Areas (explains where to enrich continued)
- Enriching Data Collections and Spatially Enabled Dataframe (explains what datasets/variables to enrich with)
- Generating Reports
- Standard Geography Queries