Part 1 - Introduction to GeoEnrichment

Introduction

GeoEnrichment adds location intelligence to the data by providing facts about a location or an area. Using GeoEnrichment, you can get information about the people and places in a specific area or within a certain distance or drive time from a location. It enables you to query and use information from a large collection of data sets including population, income, housing, consumer behavior, and the natural environment. GeoEnrichment enables you to answer questions about locations that you can't answer with maps alone. For example: What kind of people live here? What do people like to do in this area? What are their habits and lifestyles?

GeoEnrichment makes your analysis more powerful by adding global demographic, spending, lifestyle or business features at different geographical levels such as city, county, region, state and country. Demographic features (Population, Age, Education etc.) and Socio-economic features (Income, Education, Wealth etc.) can be easily added to your location data, making it more intelligent. Feature Engineering is one of the key aspects of any Data Science project as it involves adding new features to the data to increase the predictive power of a learning algorithm. With GeoEnrichment, you can quickly add more features to your location data, helping your algorithms make better predictions.

To understand how GeoEnrichment adds value, let's imagine that a retail giant is evaluating potential sites to open new stores where the conditions for evaluation include competition, traffic, economic feasibility and market potential of different geographic areas. With GeoEnrichment, they can dig into at an average shoppers' lifestyle, income, spending, education and other socio-demographic factors for different neighborhoods to understand their potential customers and make an educated decision when choosing new sites.

Getting Started

A user must be logged on to a GIS in order to use GeoEnrichment. Geoenrichment functionality is available in the arcgis.geoenrichment module.

To enable GeoEnrichment, an ArcGIS Online subscription is needed or ArcGIS Enterprise needs to be configured with GeoEnrichment utility service. GeoEnrichment operations consume credits. Credits are the currency used across ArcGIS and are consumed for specific transactions. Learn more about credit consumption for GeoEnrichment here.

Object Model Diagram

The picture below illustrates how geoenrichment module is organized.

Ways to enrich your data

You can enrich your data in 2 ways:

  1. enrich() method from arcgis.geoenrichment module.
  2. enrich_layer( ) method from the features module.

The enrich() method returns a Spatiallly Enabled Data Frame. This data frame can be saved as a new feature layer Item in your GIS and used for analysis or visualization on a map. However, if you would like to enrich an existing FeatureLayer, then use the enrich_layer() method from the arcgis.features module. The result will be a new layer of input features that includes enriched data.

Quick Example

Let's look at a simple example of GeoEnrichment in action. Suppose a company wants to open a healthcare facility somewhere in Los Angeles, CA. They have a sample dataset of existing healthcare providers with their address details for the target areas (represented by their zip codes). The company wants to understand the demographics of each zip code to make the right decision.

Let's import this data and make it richer with GeoEnrichment.

# Import Libraries
import pandas as pd
from arcgis.gis import GIS
from arcgis.geoenrichment import Country
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
# Read the data
df = pd.read_csv('../data/health.csv')
df
Number of BedsNameAddressCityStateZip Code
0156Facility 12468 SOUTH ST ANDREWS PLACELOS ANGELESCA90018
159Facility 22300 W. WASHINGTON BLVD.LOS ANGELESCA90018
225Facility 34060 E. WHITTIER BLVD.LOS ANGELESCA90023
349Facility 46070 W. PICO BOULEVARDLOS ANGELESCA90035
455Facility 51480 S. LA CIENEGA BLLOS ANGELESCA90035

This dataset shows 5 providers with their address details. The providers are located in Zip Codes 90018, 90023 and 90035.

Let's enrich this dataset with socio-demographic factors such as Total Population, Median Age, Median Household Income, Diversity Index, Education for each zip code to better understand these areas.

# Define Analysis variables
analysis_variables = [
    'TOTPOP_CY',  # Population: Total Population (Esri)
    'DIVINDX_CY', # Diversity Index (Esri)
    'AVGHHSZ_CY', # Average Household Size (Esri)
    'MEDAGE_CY',  # Age: Median Age (Esri)
    'MEDHINC_CY', # Income: Median Household Income (Esri)
    'BACHDEG_CY', # Education: Bachelor's Degree (Esri)
]
# Get enriched data for each zip code
from arcgis.geoenrichment import *

usa = Country.get('US')
zip1 = usa.subgeographies.states['California'].zip5['90018']
zip2 = usa.subgeographies.states['California'].zip5['90023']
zip3 = usa.subgeographies.states['California'].zip5['90035']

enrich_df = enrich(study_areas=[zip1, zip2, zip3], analysis_variables=analysis_variables)

enrich_df
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamedage_cytotpop_cyavghhsz_cybachdeg_cymedhinc_cydivindx_cySHAPE
0US.ZIP5Los Angeles90018USAQuery:US.ZIP52.1912.576134.351468.02.956229.054812.085.4{"rings": [[[-118.30017996000157, 34.017329969...
1US.ZIP5Los Angeles90023USAQuery:US.ZIP52.1912.576129.546619.03.781695.051574.060.1{"rings": [[[-118.20579012022486, 34.035229979...
2US.ZIP5Los Angeles90035USAQuery:US.ZIP52.1912.576139.030946.02.368950.0101008.061.6{"rings": [[[-118.37620007956639, 34.059439979...
# Merge provider data with GeoEnrichment data
df['Zip Code'] = df['Zip Code'].apply(str)
merged = pd.merge(enrich_df, df, left_on='std_geography_id',right_on='Zip Code')
merged.iloc[:,-10:]
bachdeg_cymedhinc_cydivindx_cySHAPENumber of BedsNameAddressCityStateZip Code
06229.054812.085.4{'rings': [[[-118.30017996000157, 34.017329969...156Facility 12468 SOUTH ST ANDREWS PLACELOS ANGELESCA90018
16229.054812.085.4{'rings': [[[-118.30017996000157, 34.017329969...59Facility 22300 W. WASHINGTON BLVD.LOS ANGELESCA90018
21695.051574.060.1{'rings': [[[-118.20579012022486, 34.035229979...25Facility 34060 E. WHITTIER BLVD.LOS ANGELESCA90023
38950.0101008.061.6{'rings': [[[-118.37620007956639, 34.059439979...49Facility 46070 W. PICO BOULEVARDLOS ANGELESCA90035
48950.0101008.061.6{'rings': [[[-118.37620007956639, 34.059439979...55Facility 51480 S. LA CIENEGA BLLOS ANGELESCA90035

Visualize on a map

Let's visualize the 3 zip codes on a map.

map1 = gis.map('Los Angeles, CA',12)
map1
# Plot on map
merged.spatial.plot(map1)
True

With enriched data, we can now make the following observations about these zip codes:

  1. Zip Code 90018 seems to have lowest median income, higher total population and few people with bachelor's degree.
  2. Zip Code 90023 also has lower median income, higher total population and very few people with bachelor's degree.
  3. Zip Code 90035 contradicts others with higher median income, lower total population and more people with bachelor's degree.

Which zip code should the company pick? Using geoenriched data, this company can now make the right decision depending on the type of healthcare facility they want to open.

By enriching data with a few socio-demographic factors, we just prevented a company from throwing away millions of dollars... well sort of! Imagine all the great things you can now accomplish with GeoEnrichment and stay tuned because we are just getting started.

Conclusion

GeoEnrichment makes any location data intelligent by providing facts about the location. In this part of the Geoenrichment guide series, you have seen a high-level example of how arcgis.geoenrichment module can be used to enrich a dataset with various socio-demographic features, and also an introduction of the different ways in which data can be enriched. In the subsequent pages, you will learn about:

  1. Enriching Study Areas (explains where to enrich)
  2. Exploring Named Statistical Areas (explains where to enrich continued)
  3. Enriching Data Collections and Spatially Enabled Dataframe (explains what datasets/variables to enrich with)
  4. Generating Reports
  5. Standard Geography Queries

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.