Skip To Content ArcGIS for Developers Sign In Dashboard

ArcGIS API for Python

Part 4 - What to enrich with? (What are Data Collections and Analysis Variables?)

Data Collections and GeoEnrichment coverage

As described earlier, a data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features.

Some data collections (such as default) can be used in all supported countries. Other data collections may only be available in one or a collection of countries. Data Browser can be used to examine the entire global listing of variables, and associated datasets for each country.

List Countries with GeoEnrichment Data

The get_countries() method can be used to query the countries for which GeoEnrichment data is available, and it returns a list of Country objects with which you can further query for properties. This list can also be viewed here.

In [1]:
# Import Libraries
from arcgis.gis import GIS
from arcgis.geoenrichment import Country, enrich
In [2]:
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
In [3]:
countries = get_countries()
print("Number of countries for which GeoEnrichment data is available: " + str(len(countries)))

#print a few countries for a sample
countries[0:10]
Number of countries for which GeoEnrichment data is available: 136
Out[3]:
[<Country name:Albania>,
 <Country name:Algeria>,
 <Country name:Andorra>,
 <Country name:Angola>,
 <Country name:Argentina>,
 <Country name:Armenia>,
 <Country name:Aruba>,
 <Country name:Australia>,
 <Country name:Austria>,
 <Country name:Azerbaijan>]

Data Collections for U.S.

The data_collections property of a Country object lists its available data collections and analysis variables under each data collection as a Pandas dataframe.

In order to discover the data collections for a particular country, you may first access the reference variable to it using the country.get() method, and then fetch the data collections from country.data_collections property. Once we know the data collection we would like to use, we can look at analysisVariables available in that data collection.

In [4]:
# Get US as a country
usa = Country.get('US')
type(usa)
Out[4]:
arcgis.geoenrichment.enrichment.Country
In [5]:
usa_df = usa.data_collections

# print a few rows of the DataFrame
usa_df.head()
Out[5]:
analysisVariable alias fieldCategory vintage
dataCollectionID
1yearincrements 1yearincrements.AGE0_CY 2020 Population Age <1 2020 Age: 1 Year Increments (Esri) 2020
1yearincrements 1yearincrements.AGE1_CY 2020 Population Age 1 2020 Age: 1 Year Increments (Esri) 2020
1yearincrements 1yearincrements.AGE2_CY 2020 Population Age 2 2020 Age: 1 Year Increments (Esri) 2020
1yearincrements 1yearincrements.AGE3_CY 2020 Population Age 3 2020 Age: 1 Year Increments (Esri) 2020
1yearincrements 1yearincrements.AGE4_CY 2020 Population Age 4 2020 Age: 1 Year Increments (Esri) 2020
In [6]:
usa_df.shape
Out[6]:
(17608, 4)

Unique Data Collections for U.S.

Each data collection and analysis variable has a unique ID. When calling the enrich() method (explained earlier in this guide) these analysis variables can be passed in the data_collections and analysis_variables parameters.

As an example, here we see a subset of the data collections for US showing 2 different data collections and multiple analysis variables for each collection.

In [7]:
usa_df.iloc[500:600,:]
Out[7]:
analysisVariable alias fieldCategory vintage
dataCollectionID
1yearincrements 1yearincrements.FAGE75_FY 2025 Females Age 75 2025 Age: 1 Year Increments (Esri) 2025
1yearincrements 1yearincrements.FAGE76_FY 2025 Females Age 76 2025 Age: 1 Year Increments (Esri) 2025
1yearincrements 1yearincrements.FAGE77_FY 2025 Females Age 77 2025 Age: 1 Year Increments (Esri) 2025
1yearincrements 1yearincrements.FAGE78_FY 2025 Females Age 78 2025 Age: 1 Year Increments (Esri) 2025
1yearincrements 1yearincrements.FAGE79_FY 2025 Females Age 79 2025 Age: 1 Year Increments (Esri) 2025
... ... ... ... ...
5yearincrements 5yearincrements.MEDAGE_CY 2020 Median Age 2020 Age: 5 Year Increments (Esri) 2020
5yearincrements 5yearincrements.MALES_CY 2020 Male Population 2020 Age: 5 Year Increments (Esri) 2020
5yearincrements 5yearincrements.MALE0_CY 2020 Males Age 0-4 2020 Age: 5 Year Increments (Esri) 2020
5yearincrements 5yearincrements.MALE5_CY 2020 Males Age 5-9 2020 Age: 5 Year Increments (Esri) 2020
5yearincrements 5yearincrements.MALE10_CY 2020 Males Age 10-14 2020 Age: 5 Year Increments (Esri) 2020

100 rows × 4 columns

The table above shows 2 different data collections (1yearincrements and 5yearincrements). Since these are Age data collections, the analysisVariables for these collections are similar. vintage shows the year that the demographic data represents. For example, a vintage of 2020 means that the data represents the year 2020.

Let's get a list of unique data collections that are available for U.S.

In [8]:
usa_df.index.nunique()
Out[8]:
150

United States has 150 unique data collections. Here are the first 10 data collections.

In [9]:
list(usa_df.index.unique())[:10]
Out[9]:
['1yearincrements',
 '5yearincrements',
 'ACS_Housing_Summary_rep',
 'ACS_Population_Summary_rep',
 'Age',
 'AgeDependency',
 'Age_50_Profile_rep',
 'Age_by_Sex_Profile_rep',
 'Age_by_Sex_by_Race_Profile_rep',
 'AtRisk']

Looking at fieldCategory is a great way to clearly understand what the data collection is about. fieldCategory combines vintage, datacollectionID columns along with the year and data collection. However, to query a data collection its unique ID (dataCollectionID) must be used.

Let's look at the fieldCategory column for a few data collections in US.

In [10]:
usa_df.fieldCategory.unique()[:10]
Out[10]:
array(['2020 Age: 1 Year Increments (Esri)',
       '2025 Age: 1 Year Increments (Esri)',
       '2010 Age: 1 Year Increments (U.S. Census)',
       '2020 Age: 5 Year Increments (Esri)',
       '2025 Age: 5 Year Increments (Esri)',
       '2010 Age: 5 Year Increments (U.S. Census)',
       '2014-2018 Key Demographic Indicators (ACS)',
       '2014-2018 Mortgage Status (ACS)', '2014-2018 Contract Rent (ACS)',
       '2014-2018 Year Householder Moved In (ACS)'], dtype=object)

Data Collections by Socio-demographic Factors

You can filter the data_collections to get collections for a specific factor using Pandas expressions. Let's loook at data collections for different socio-demographic factors such as Age, Population, Income.

Data Collections for Age

In [11]:
Age_Collections = usa_df['fieldCategory'].str.contains('Age', na=False)
usa_df[Age_Collections].fieldCategory.unique()
Out[11]:
array(['2020 Age: 1 Year Increments (Esri)',
       '2025 Age: 1 Year Increments (Esri)',
       '2010 Age: 1 Year Increments (U.S. Census)',
       '2020 Age: 5 Year Increments (Esri)',
       '2025 Age: 5 Year Increments (Esri)',
       '2010 Age: 5 Year Increments (U.S. Census)',
       '2014-2018 Females by Age of Children and Employment Status (ACS)',
       '2014-2018 Households by Type and Size and Age (ACS)',
       '2020 Age Dependency (Esri)', '2025 Age Dependency (Esri)',
       '2020 Income by Age (Esri)', '2025 Income by Age (Esri)',
       '2020 Disposable Income by Age (Esri)', 'Age: 5 Year Increments',
       '2020 Net Worth by Age (Esri)',
       '2010 Households by Age of Householder (U.S. Census)',
       '2010 Age by Sex by Race (U.S. Census)',
       '2025 Age by Sex by Race (Esri)', '2020 Age by Sex by Race (Esri)',
       '2019 Age: 5 Year Increments (Esri)',
       '2010 Housing by Age of Householder (U.S. Census)'], dtype=object)

Data Collections for Population

In [12]:
Pop_Collections = usa_df['fieldCategory'].str.contains('Population', na=False)
usa_df[Pop_Collections].fieldCategory.unique()
Out[12]:
array(['2014-2018 Population 65+ by Relationship and Household Type (ACS)',
       '2010 Population (U.S. Census)',
       '2014-2018 Population by Language Spoken at Home (ACS)',
       '2020 Daytime Population (Esri)',
       '2020 Population by Generation (Esri)',
       '2025 Population by Generation (Esri)',
       '2020 Population Time Series (Esri)', '2020 Population (Esri)',
       '2010 Group Quarters Population (U.S. Census)',
       '2010 Population by Relationship and Household Type (U.S. Census)'],
      dtype=object)

Data Collections for Income

In [13]:
Income_Collections = usa_df['fieldCategory'].str.contains('Income', na=False)
Income_Collections.index.unique()
Out[13]:
Index(['1yearincrements', '5yearincrements', 'ACS_Housing_Summary_rep',
       'ACS_Population_Summary_rep', 'Age', 'AgeDependency',
       'Age_50_Profile_rep', 'Age_by_Sex_Profile_rep',
       'Age_by_Sex_by_Race_Profile_rep', 'AtRisk',
       ...
       'transportation', 'travelMPI', 'unitsinstructure',
       'urbanizationgroupsNEW', 'vacant', 'vehiclesavailable', 'veterans',
       'women', 'yearbuilt', 'yearmovedin'],
      dtype='object', name='dataCollectionID', length=150)

As mentioned earlier, using a data_collection's unique ID (dataCollectionID) is the best way to further query a data collection. Let's look at the dataCollectionID for various Income data collections.

In [14]:
usa_df[Income_Collections].index.unique()
Out[14]:
Index(['ACS_Population_Summary_rep', 'Age_50_Profile_rep', 'AtRisk',
       'Community_Profile_rep',
       'Demographic_and_Income_Comparison_Profile_rep',
       'Demographic_and_Income_Profile_rep', 'Disposable_Income_Profile_rep',
       'Graphic_Profile_rep', 'Health', 'Household_Income_Profile_rep',
       'KeyUSFacts', 'Market_Profile_rep', 'Policy',
       'Retail_MarketPlace_Profile_rep', 'Wealth', 'basicFactsForMobileApps',
       'disposableincome', 'foodstampsSNAP', 'householdincome', 'households',
       'incomebyage', 'population'],
      dtype='object', name='dataCollectionID')

Analysis variables for Data Collections

Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover analysisVariables for some of the data collections.

Analysis variables for Age data collection

In [15]:
usa_df.loc['Age']['analysisVariable'].unique()
Out[15]:
array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',
       'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',
       'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',
       'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',
       'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',
       'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',
       'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',
       'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)

Analysis variables are typically represented as dataCollectionID.<analysis variable name> as seen above.

Analysis variables for Age_by_Sex_by_Race_Profile_rep data collection

In [16]:
usa_df.loc['Age_by_Sex_by_Race_Profile_rep']['analysisVariable'].unique()
Out[16]:
array(['Age_by_Sex_by_Race_Profile_rep.MEDMAGE_CY',
       'Age_by_Sex_by_Race_Profile_rep.MEDFAGE_CY',
       'Age_by_Sex_by_Race_Profile_rep.TOTPOP_FY',
       'Age_by_Sex_by_Race_Profile_rep.TOTHH_FY',
       'Age_by_Sex_by_Race_Profile_rep.POPGRWCYFY',
       'Age_by_Sex_by_Race_Profile_rep.HHGRWCYFY',
       'Age_by_Sex_by_Race_Profile_rep.MEDAGE_FY',
       'Age_by_Sex_by_Race_Profile_rep.MEDMAGE_FY',
       'Age_by_Sex_by_Race_Profile_rep.MEDFAGE_FY',
       'Age_by_Sex_by_Race_Profile_rep.TOTPOP10',
       'Age_by_Sex_by_Race_Profile_rep.MEDAGE10',
       'Age_by_Sex_by_Race_Profile_rep.MEDMAGE10',
       'Age_by_Sex_by_Race_Profile_rep.MEDFAGE10',
       'Age_by_Sex_by_Race_Profile_rep.TOTHH10'], dtype=object)

Analysis variables for DaytimePopulation data collection

In [17]:
usa_df.loc['DaytimePopulation']['analysisVariable'].unique()
Out[17]:
array(['DaytimePopulation.DPOP_CY', 'DaytimePopulation.DPOPWRK_CY',
       'DaytimePopulation.DPOPRES_CY', 'DaytimePopulation.DPOPDENSCY'],
      dtype=object)

Data Collections for Another Country

Let's look at data collections for New Zealand. Data Browser can be used to examine the entire global listing of variables, and associated datasets for New Zealand.

In order to discover the data collections for a particular country, you may first access the reference variable to it using the country.get() method, and then fetch the data collections from country.data_collections property. Once we know the data collection we would like to use, we can look at analysisVariables available in that data collection.

In [18]:
# Get US as a country
nz = Country.get('New Zealand')
type(nz)
Out[18]:
arcgis.geoenrichment.enrichment.Country
In [19]:
nz_df = nz.data_collections

# print a few rows of the DataFrame
nz_df.head()
Out[19]:
analysisVariable alias fieldCategory vintage
dataCollectionID
15YearIncrements 15YearIncrements.PAGE01_CY 2018 Total Population Age 0-14 2018 Population Totals (MBR) 2018
15YearIncrements 15YearIncrements.PAGE02_CY 2018 Total Population Age 15-29 2018 Population Totals (MBR) 2018
15YearIncrements 15YearIncrements.PAGE03_CY 2018 Total Population Age 30-44 2018 Population Totals (MBR) 2018
15YearIncrements 15YearIncrements.PAGE04_CY 2018 Total Population Age 45-59 2018 Population Totals (MBR) 2018
15YearIncrements 15YearIncrements.PAGE05_CY 2018 Total Population Age 60+ 2018 Population Totals (MBR) 2018
In [20]:
nz_df.shape
Out[20]:
(191, 4)

Unique Data Collections for New Zealand

Let's get a list of unique data collections that are available for New Zealand.

In [21]:
nz_df.index.unique()
Out[21]:
Index(['15YearIncrements', 'EducationalAttainment', 'Gender',
       'HouseholdTotals', 'HouseholdsbyIncome', 'HouseholdsbyType', 'KeyFacts',
       'KeyGlobalFacts', 'MaritalStatus', 'PopulationTotals',
       'PurchasingPower', 'Spending'],
      dtype='object', name='dataCollectionID')

New Zealand has 12 unique data collections.

We can look at the fieldCategory column to understand each category better.

In [22]:
nz_df.fieldCategory.unique()
Out[22]:
array(['2018 Population Totals (MBR)',
       '2018 Male Population Totals (MBR)',
       '2018 Female Population Totals (MBR)',
       '2018 Educational Attainment (MBR)', '2018 Household Totals (MBR)',
       '2018 Households by Income (MBR)', '2018 Households by Type (MBR)',
       '2018 Marital Status (MBR)', '2018 Purchasing Power (MBR)',
       'Key Demographic Indicators', 'Age: 5 Year Increments',
       '2018 Food & Beverage Expenditures (MBR)',
       '2018 Alcoholic Beverage Expenditures (MBR)',
       '2018 Tobacco Expenditures (MBR)',
       '2018 Clothing Expenditures (MBR)',
       '2018 Footwear Expenditures (MBR)',
       '2018 Furniture & Furnishing Expenditures (MBR)',
       '2018 Household Textiles Expenditures (MBR)',
       '2018 Household Appliances Expenditures (MBR)',
       '2018 Household Utensils Expenditures (MBR)',
       '2018 House & Garden Expenditures (MBR)',
       '2018 Household Maintenance Expenditures (MBR)',
       '2018 Medical Products & Supplies Expenditures (MBR)',
       '2018 Consumer Electronics Expenditures (MBR)',
       '2018 Recreation & Culture Durable Expenditures (MBR)',
       '2018 Entertainment Expenditures (MBR)',
       '2018 Recreational & Cultural Service Expenditures (MBR)',
       '2018 Books & Stationery Expenditures (MBR)',
       '2018 Catering Services Expenditures (MBR)',
       '2018 Personal Care Expenditures (MBR)',
       '2018 Jewelry & Personal Effects Expenditures (MBR)'], dtype=object)

Looking at fieldCategory is a great way to clearly understand what the data collection is about. However, to query a data collection its unique ID (dataCollectionID) must be used.

Data Collections for Socio-demographic Factors

New Zealand has fewer data_collections compared to U.S. Let's look at data collections for Key Facts, Education and Spending.

Data Collection for Key Facts

In [23]:
nz_df.loc['KeyGlobalFacts']
Out[23]:
analysisVariable alias fieldCategory vintage
dataCollectionID
KeyGlobalFacts KeyGlobalFacts.TOTPOP Total Population Key Demographic Indicators NaN
KeyGlobalFacts KeyGlobalFacts.TOTHH Total Households Key Demographic Indicators NaN
KeyGlobalFacts KeyGlobalFacts.AVGHHSZ Average Household Size Key Demographic Indicators NaN
KeyGlobalFacts KeyGlobalFacts.TOTMALES Male Population Age: 5 Year Increments NaN
KeyGlobalFacts KeyGlobalFacts.TOTFEMALES Female Population Age: 5 Year Increments NaN

Data Collection for Education

In [24]:
nz_df.loc['EducationalAttainment']
Out[24]:
analysisVariable alias fieldCategory vintage
dataCollectionID
EducationalAttainment EducationalAttainment.EDUC01A_CY 2018 Pop 15+/Edu: No Qualification 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC02A_CY 2018 Pop 15+/Edu: Level 1 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC03A_CY 2018 Pop 15+/Edu: Level 2 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC04A_CY 2018 Pop 15+/Edu: Level 3 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC05A_CY 2018 Pop 15+/Edu: Level 4 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC06A_CY 2018 Pop 15+/Edu: Level 5 or 6 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC07_CY 2018 Pop 15+/Edu: Bachelor or Level 7 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC08_CY 2018 Pop 15+/Edu: Postgrad/Honors 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC09_CY 2018 Pop 15+/Edu: Master's 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC10_CY 2018 Pop 15+/Edu: Doctorate 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC11_CY 2018 Pop 15+/Edu: Overseas Secondary 2018 Educational Attainment (MBR) 2018
EducationalAttainment EducationalAttainment.EDUC12_CY 2018 Pop 15+/Edu: Not Elsewhere included 2018 Educational Attainment (MBR) 2018

Data Collection for Spending

In [25]:
nz_df.loc['Spending']
Out[25]:
analysisVariable alias fieldCategory vintage
dataCollectionID
Spending Spending.CS01_CY 2018 Food & Beverage: Total 2018 Food & Beverage Expenditures (MBR) 2018
Spending Spending.CS01PRM_CY 2018 Food & Beverage: Per Mill 2018 Food & Beverage Expenditures (MBR) 2018
Spending Spending.CSPC01_CY 2018 Food & Beverage: Per Capita 2018 Food & Beverage Expenditures (MBR) 2018
Spending Spending.CS01IDX_CY 2018 Food & Beverage: Index 2018 Food & Beverage Expenditures (MBR) 2018
Spending Spending.CS02_CY 2018 Alcoholic Beverage: Total 2018 Alcoholic Beverage Expenditures (MBR) 2018
... ... ... ... ...
Spending Spending.CS19IDX_CY 2018 Personal Care: Index 2018 Personal Care Expenditures (MBR) 2018
Spending Spending.CS20_CY 2018 Personal Effects: Total 2018 Jewelry & Personal Effects Expenditures (... 2018
Spending Spending.CS20PRM_CY 2018 Personal Effects: Per Mill 2018 Jewelry & Personal Effects Expenditures (... 2018
Spending Spending.CSPC20_CY 2018 Personal Effects: Per Capita 2018 Jewelry & Personal Effects Expenditures (... 2018
Spending Spending.CS20IDX_CY 2018 Personal Effects: Index 2018 Jewelry & Personal Effects Expenditures (... 2018

80 rows × 4 columns

Analysis variables for Data Collections

Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover analysisVariables for some of the data collections we looked at earlier.

Analysis variables for KeyGlobalFacts data collection

In [26]:
nz_df.loc['KeyGlobalFacts']['analysisVariable'].unique()
Out[26]:
array(['KeyGlobalFacts.TOTPOP', 'KeyGlobalFacts.TOTHH',
       'KeyGlobalFacts.AVGHHSZ', 'KeyGlobalFacts.TOTMALES',
       'KeyGlobalFacts.TOTFEMALES'], dtype=object)

Analysis variables for EducationalAttainment data collection

In [27]:
nz_df.loc['EducationalAttainment']['analysisVariable'].unique()
Out[27]:
array(['EducationalAttainment.EDUC01A_CY',
       'EducationalAttainment.EDUC02A_CY',
       'EducationalAttainment.EDUC03A_CY',
       'EducationalAttainment.EDUC04A_CY',
       'EducationalAttainment.EDUC05A_CY',
       'EducationalAttainment.EDUC06A_CY',
       'EducationalAttainment.EDUC07_CY',
       'EducationalAttainment.EDUC08_CY',
       'EducationalAttainment.EDUC09_CY',
       'EducationalAttainment.EDUC10_CY',
       'EducationalAttainment.EDUC11_CY',
       'EducationalAttainment.EDUC12_CY'], dtype=object)

Analysis variables for Spending data collection

In [28]:
nz_df.loc['Spending']['analysisVariable'].unique()
Out[28]:
array(['Spending.CS01_CY', 'Spending.CS01PRM_CY', 'Spending.CSPC01_CY',
       'Spending.CS01IDX_CY', 'Spending.CS02_CY', 'Spending.CS02PRM_CY',
       'Spending.CSPC02_CY', 'Spending.CS02IDX_CY', 'Spending.CS03_CY',
       'Spending.CS03PRM_CY', 'Spending.CSPC03_CY', 'Spending.CS03IDX_CY',
       'Spending.CS04_CY', 'Spending.CS04PRM_CY', 'Spending.CSPC04_CY',
       'Spending.CS04IDX_CY', 'Spending.CS05_CY', 'Spending.CS05PRM_CY',
       'Spending.CSPC05_CY', 'Spending.CS05IDX_CY', 'Spending.CS06_CY',
       'Spending.CS06PRM_CY', 'Spending.CSPC06_CY', 'Spending.CS06IDX_CY',
       'Spending.CS07_CY', 'Spending.CS07PRM_CY', 'Spending.CSPC07_CY',
       'Spending.CS07IDX_CY', 'Spending.CS08_CY', 'Spending.CS08PRM_CY',
       'Spending.CSPC08_CY', 'Spending.CS08IDX_CY', 'Spending.CS09_CY',
       'Spending.CS09PRM_CY', 'Spending.CSPC09_CY', 'Spending.CS09IDX_CY',
       'Spending.CS10_CY', 'Spending.CS10PRM_CY', 'Spending.CSPC10_CY',
       'Spending.CS10IDX_CY', 'Spending.CS11_CY', 'Spending.CS11PRM_CY',
       'Spending.CSPC11_CY', 'Spending.CS11IDX_CY', 'Spending.CS12_CY',
       'Spending.CS12PRM_CY', 'Spending.CSPC12_CY', 'Spending.CS12IDX_CY',
       'Spending.CS13_CY', 'Spending.CS13PRM_CY', 'Spending.CSPC13_CY',
       'Spending.CS13IDX_CY', 'Spending.CS14_CY', 'Spending.CS14PRM_CY',
       'Spending.CSPC14_CY', 'Spending.CS14IDX_CY', 'Spending.CS15_CY',
       'Spending.CS15PRM_CY', 'Spending.CSPC15_CY', 'Spending.CS15IDX_CY',
       'Spending.CS16_CY', 'Spending.CS16PRM_CY', 'Spending.CSPC16_CY',
       'Spending.CS16IDX_CY', 'Spending.CS17_CY', 'Spending.CS17PRM_CY',
       'Spending.CSPC17_CY', 'Spending.CS17IDX_CY', 'Spending.CS18_CY',
       'Spending.CS18PRM_CY', 'Spending.CSPC18_CY', 'Spending.CS18IDX_CY',
       'Spending.CS19_CY', 'Spending.CS19PRM_CY', 'Spending.CSPC19_CY',
       'Spending.CS19IDX_CY', 'Spending.CS20_CY', 'Spending.CS20PRM_CY',
       'Spending.CSPC20_CY', 'Spending.CS20IDX_CY'], dtype=object)

Perform Enrichment using Data Collections and Analysis Variables

Data Collections can be used to enrich various study areas. data_collections and analysis_variables can be passed in the enrich() method. Details about enriching study areas can be found in Enriching Study Areas section.

Let's look at a few similar examples of GeoEnrichment here.

Enrich using Data Collections

Enrich with Age data collection

Here we see an address being enriched by data from Age data collection.

In [29]:
# Enriching single address as single line imput
age_coll = enrich(study_areas=["380 New York St Redlands CA 92373"], 
                       data_collections=['Age'])
In [30]:
age_coll
Out[30]:
ID OBJECTID sourceCountry X Y areaType bufferUnits bufferUnitsAlias bufferRadii aggregationMethod ... FEM45 FEM50 FEM55 FEM60 FEM65 FEM70 FEM75 FEM80 FEM85 SHAPE
0 0 1 US -117.194872 34.057237 RingBuffer esriMiles Miles 1 BlockApportionment:US.BlockGroups ... 376 398 374 340 310 262 153 98 129 {"rings": [[[-117.19487199429183, 34.071745616...

1 rows × 50 columns

In [31]:
age_coll.columns
Out[31]:
Index(['ID', 'OBJECTID', 'sourceCountry', 'X', 'Y', 'areaType', 'bufferUnits',
       'bufferUnitsAlias', 'bufferRadii', 'aggregationMethod',
       'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData',
       'MALE0', 'MALE5', 'MALE10', 'MALE15', 'MALE20', 'MALE25', 'MALE30',
       'MALE35', 'MALE40', 'MALE45', 'MALE50', 'MALE55', 'MALE60', 'MALE65',
       'MALE70', 'MALE75', 'MALE80', 'MALE85', 'FEM0', 'FEM5', 'FEM10',
       'FEM15', 'FEM20', 'FEM25', 'FEM30', 'FEM35', 'FEM40', 'FEM45', 'FEM50',
       'FEM55', 'FEM60', 'FEM65', 'FEM70', 'FEM75', 'FEM80', 'FEM85', 'SHAPE'],
      dtype='object')

When a data collection is specified without specific analysis variables, all variables under the data collection are used for enrichment as can be seen above.

Enrich with Health data collection

Here we see a zip code being enriched by data from Health data collection.

In [32]:
redlands = usa.subgeographies.states['California'].zip5['92373']
In [33]:
redlands_df = enrich(study_areas=[redlands], data_collections=['Health'] )
In [34]:
redlands_df
Out[34]:
ID OBJECTID StdGeographyLevel StdGeographyName StdGeographyID sourceCountry aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData ... POP85_CY POP18UP_CY POP21UP_CY MEDAGE_CY HHU18C10 MEDHINC_CY S27_BUS S27_SALES S27_EMP SHAPE
0 0 1 US.ZIP5 Redlands 92373 US Query:US.ZIP5 2.191 2.576 1 ... 1081 27277 26168 41.1 3942 83577 193 298594 4020 {"rings": [[[-117.13603999963594, 34.032169999...

1 rows × 434 columns

In [35]:
redlands_df.columns
Out[35]:
Index(['ID', 'OBJECTID', 'StdGeographyLevel', 'StdGeographyName',
       'StdGeographyID', 'sourceCountry', 'aggregationMethod',
       'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData',
       ...
       'POP85_CY', 'POP18UP_CY', 'POP21UP_CY', 'MEDAGE_CY', 'HHU18C10',
       'MEDHINC_CY', 'S27_BUS', 'S27_SALES', 'S27_EMP', 'SHAPE'],
      dtype='object', length=434)

Enrich using Analysis Variables

Data can be enriched by specifying specific analysis variables of a data collection with which we want to enrich our data. In this example, we will look at analysis_variables for Age data_collection and then use specific analysis variables to enrich() a study area.

In [36]:
# Unique analysis variables for Age data collection
usa = Country.get('US')
usa.data_collections.loc['Age']['analysisVariable'].unique()
Out[36]:
array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',
       'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',
       'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',
       'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',
       'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',
       'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',
       'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',
       'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)

Now, we will enrich our study area with Age.FEM45, Age.FEM55, Age.FEM65 variables

In [37]:
enrich(study_areas=["380 New York St Redlands CA 92373"], 
       analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])
Out[37]:
ID OBJECTID sourceCountry X Y areaType bufferUnits bufferUnitsAlias bufferRadii aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData FEM45 FEM55 FEM65 SHAPE
0 0 1 US -117.194872 34.057237 RingBuffer esriMiles Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 376 374 310 {"rings": [[[-117.19487199429183, 34.071745616...

Enriching Spatially Enabled Dataframes

One of the most common use case for GeoEnrichment is enriching existing data in feature layers. As a user, you may need to analyze and enrich your data that already exists in feature layers. Spatially Enabled DataFrame (SeDF) helps us bring the data from layer into a dataframe which can then be GeoEnriched.

Let's look at an example using an existing layer of Covid-19 dataset. This feature layer includes latest Covid-19 Cases, Recovered and Deaths data for U.S. at the county level.

In [38]:
# Get the layer
gis = GIS(set_active=False)
covid_item = gis.content.get('628578697fb24d8ea4c32fa0c5ae1843')
print(covid_item)
covid_layer = covid_item.layers[0]
covid_layer
<Item title:"COVID-19 Cases US" type:Feature Layer Collection owner:CSSE_GISandData>
Out[38]:
<FeatureLayer url:"https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/ncov_cases_US/FeatureServer/0">

We can query the layer as a dataframe and then use the dataframe for enrichment.

In [39]:
covid_df = covid_layer.query(as_df=True)
covid_df.shape
Out[39]:
(3249, 19)
In [40]:
covid_df.head()
Out[40]:
OBJECTID Province_State Country_Region Last_Update Lat Long_ Confirmed Recovered Deaths Active Admin2 FIPS Combined_Key Incident_Rate People_Tested People_Hospitalized UID ISO3 SHAPE
0 1 South Carolina US 2020-08-07 20:35:25 34.223334 -82.461707 321 0 8 313.0 Abbeville 45001 Abbeville, South Carolina, US 1308.761773 None None 84045001 USA {"x": -82.46170657999994, "y": 34.223333780000...
1 2 Louisiana US 2020-08-07 20:35:25 30.295065 -92.414197 2570 0 79 2491.0 Acadia 22001 Acadia, Louisiana, US 4142.154888 None None 84022001 USA {"x": -92.41419697999999, "y": 30.295064900000...
2 3 Virginia US 2020-08-07 20:35:25 37.767072 -75.632346 1087 0 16 1071.0 Accomack 51001 Accomack, Virginia, US 3363.658869 None None 84051001 USA {"x": -75.63234614999999, "y": 37.767071610000...
3 4 Idaho US 2020-08-07 20:35:25 43.452658 -116.241552 8569 0 72 8497.0 Ada 16001 Ada, Idaho, US 1779.325439 None None 84016001 USA {"x": -116.24155159999998, "y": 43.45265750000...
4 5 Iowa US 2020-08-07 20:35:25 41.330756 -94.471059 28 0 0 28.0 Adair 19001 Adair, Iowa, US 391.498881 None None 84019001 USA {"x": -94.47105873999999, "y": 41.330756090000...

To showcase GeoEnrichment, we will create a subset of the original data and then enrich() the subset.

In [41]:
# Create subset
test_df = covid_df.iloc[:100].copy()
test_df.shape
Out[41]:
(100, 19)
In [42]:
# Check geometry
test_df.spatial.geometry_type
Out[42]:
['point']

A dataframe can be passed as a value to study_areas parameter of the enrich() method. Here we are enriching our dataframe with specific variables from Age data collection.

In [43]:
# Enrich dataframe
new_df = enrich(study_areas=test_df, 
       analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])
In [44]:
new_df.head()
Out[44]:
ID OBJECTID_0 sourceCountry Long_ Recovered Country_Region FIPS Last_Update Combined_Key ISO3 ... bufferUnitsAlias bufferRadii aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData FEM45 FEM55 FEM65 SHAPE
0 0 1 US -82.461707 0 US 45001 1596857725000 Abbeville, South Carolina, US USA ... Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 2 2 2 {"rings": [[[-82.46170657999994, 34.2378420028...
1 1 2 US -92.414197 0 US 22001 1596857725000 Acadia, Louisiana, US USA ... Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 3 3 2 {"rings": [[[-92.41419697999997, 30.3095821536...
2 2 3 US -75.632346 0 US 51001 1596857725000 Accomack, Virginia, US USA ... Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 14 16 14 {"rings": [[[-75.63234615, 37.781571251121655]...
3 3 4 US -116.241552 0 US 16001 1596857725000 Ada, Idaho, US USA ... Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 0 0 0 {"rings": [[[-116.24155159999998, 43.467142851...
4 4 5 US -94.471059 0 US 19001 1596857725000 Adair, Iowa, US USA ... Miles 1 BlockApportionment:US.BlockGroups 2.191 2.576 1 1 1 1 {"rings": [[[-94.47105873999998, 41.3452468224...

5 rows × 31 columns

In [45]:
new_df.drop(['OBJECTID_0', 'ID','Last_Update'], axis=1, inplace=True)
In [52]:
# Check shape
new_df.shape
Out[52]:
(91, 28)

We can see that enrichment resulted in 91 records and 31 columns. There are some areas in our dataframe for which enrichment information is not available. Hence, we have 91 records instead of 100. Geoenrichment adds some additional columns along with the analysis variables we enriched for and so we see 31 columns however we are dropping duplicates and unnecessary columns to bring the count down to 28 columns.

Visualize on a Map

Let's visualize the enriched dataframe on a map. We will use FEM65 column to classify our data for plotting on the map.

In [67]:
covid_map = gis.map('USA', 4)
covid_map
Out[67]:
In [61]:
# Plot on a map
covid_map.remove_layers()
new_df.spatial.plot(map_widget=covid_map,
                    renderer_type='c',  # for class breaks renderer,
                    method='esriClassifyNaturalBreaks',  # classification algorithm,
                    class_count=5,  # choose the number of classes,
                    col='FEM65',  # numeric column to classify,
                    cmap='viridis',  # color map to pick colors from for each class,
                    alpha=0.7)
Out[61]:
True

Conclusion

In this part of the arcgis.geoenrichment module guide series, you saw how data_collections property of a Country object lists its available data_collections and analysis_variables. You explored different data collections, their analysis variables and then enriched study areas using the same. Towards the end, you experienced how spatially enabled dataframes can be enriched.

In the subsequent pages, you will learn about Generating Reports and Standard Geography Queries.


Feedback on this topic?