Locating a new retirement community

Introduction

In the past, retirement communities in the United states were built in the suburbs, in warmest parts of the country. These days, however, people approaching retirement are not willing to relocate. They want to be connected to their friends and family, remain close to existng doctors and enjoy cultural and educational oportunites.

This sample demonstrates the utility of ArcGIS API for Python to identify some great locations for a new retirement community, which will satisfy these needs of senior citizens. It will demostrate the use tools such as create_drive_time_areas, enrich_layer, find_similar_locations, and editing field definitions and layer data.

First, we will look for locations that have large number of senior citizens but very few existing retirement communities. We will then rank these locations by comparing it to the most current successsful retirement community.

The LocatingRetirementCommunity Feature Layer Collection includes two layers. The first layer, called Target Community, contains the current best performing retirement community near Knoxville, Tennessee. The second layer, called Candidates, contains the 898 ZIP Codes in the continental USA associated with statistically significant hot or cold spot areas for all of these criteria:

  • High demand for retirement housing opportunities
  • Low supply of retirement housing
  • Low housing unit vacancy rates
  • Large projected age of 55 and older populations

Hot spot analysis to identify these candidate ZIP Codes was done using ArcMap. This analysis is included in both the ArcMap and ArcGIS Pro workflows.

In the workflow below, we will be using ArcGIS API for Python to create a 5-mile drive distance around the best performing community and obtaining tapestry and demographic data for the area. We will then obtain the same data for the candidate ZIP Codes. Finally, we will use the Find Similar Locations tool to identify the top four high demand, low vacancy, large projected age of 55+ population ZIP Codes that are most similar to the area surrounding the best performing community.

Workflow

Necessary Imports

import pandas as pd
from datetime import datetime as dt

from arcgis.gis import GIS
from arcgis.geoenrichment import *
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.enrich_data import enrich_layer
from arcgis.features import FeatureLayerCollection
from arcgis.features.find_locations import find_similar_locations

Get the data for your analysis

gis = GIS('home')

Search for the LocatingRetirementCommunity layer. You can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

items = gis.content.search('title: LocatingRetirementCommunity owner:api_data_owner',
                           'Feature layer',
                           outside_org=True)

Display the list of results.

from IPython.display import display

for item in items:
    display(item)
LocatingRetirementCommunity
Data to use with the Locating a New Retirement Community case study.Feature Layer Collection by api_data_owner
Last Modified: June 17, 2019
0 comments, 3 views

Since the first item is a Feature Layer Collection, accessing the layers property will give us a list of FeatureLayer objects.

lyrs = items[0].layers
for lyr in lyrs:
    print(lyr.properties.name)
TargetCommunity
Candidates
target_community = lyrs[0]
candidates = lyrs[1]
m1 = gis.map('Knoxville')
m1
m1.add_layer(target_community)

Create a 5-mile drive time polygon around the best performing community.

Proximity analysis

Proximity analysis tools help us answer one of the most common questions posed in spatial analysis: What is near what?

Proximity tools are available in the features.use_proximity module of ArcGIS API for Python. We can use the create_drive_time_areas tool to create a 5-mile drive distance buffer around the best performing community.

target_area = create_drive_time_areas(target_community,
                                      break_values=[5],
                                      break_units='Miles',
                                      overlap_policy='Overlap',
                                      travel_mode='Driving Distance',
                                      output_name='DriveTimeAreasOfTargetCommunities' + str(dt.now().microsecond))
target_area
DriveTimeAreasOfTargetCommunities585475
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
target_area_map = gis.map('Knoxville')
target_area_map
target_area_map.add_layer(target_area)
target_area_map.add_layer(target_community)

Determine the top tapestry segments.

We will be looking for ZIP Codes that are similar to the area surrounding the best performing retirement community. We will take advantage of tapestry variables because they summarize many aspects of a population, such as age, income, home value, occupation, education, and consumer spending behaviors. To identify the top tapestries within the 5-mile drive distance area, we will obtain and compare all 68 tapestry segments. We will also obtain the tapestry base variable so you can calculate percentages.

countries = get_countries()
usa = Country.get('US')
type(usa)
df = usa.data_collections
th_var = list(df.loc['tapestryhouseholdsNEW']['analysisVariable'].unique())
other_var = ["KeyUSFacts.POPGRWCYFY","AtRisk.TOTPOP_CY","industry.UNEMPRT_CY"]
analysis_var = th_var + other_var
tot_var= [th_var[x] for x in range(len(th_var)) if not th_var[x].split('.')[1].startswith('THHG')]
tot_var
['tapestryhouseholdsNEW.TSEGNUM',
 'tapestryhouseholdsNEW.TSEGCODE',
 'tapestryhouseholdsNEW.TSEGNAME',
 'tapestryhouseholdsNEW.THHBASE',
 'tapestryhouseholdsNEW.THH01',
 'tapestryhouseholdsNEW.THH02',
 'tapestryhouseholdsNEW.THH03',
 'tapestryhouseholdsNEW.THH04',
 'tapestryhouseholdsNEW.THH05',
 'tapestryhouseholdsNEW.THH06',
 'tapestryhouseholdsNEW.THH07',
 'tapestryhouseholdsNEW.THH08',
 'tapestryhouseholdsNEW.THH09',
 'tapestryhouseholdsNEW.THH10',
 'tapestryhouseholdsNEW.THH11',
 'tapestryhouseholdsNEW.THH12',
 'tapestryhouseholdsNEW.THH13',
 'tapestryhouseholdsNEW.THH14',
 'tapestryhouseholdsNEW.THH15',
 'tapestryhouseholdsNEW.THH16',
 'tapestryhouseholdsNEW.THH17',
 'tapestryhouseholdsNEW.THH18',
 'tapestryhouseholdsNEW.THH19',
 'tapestryhouseholdsNEW.THH20',
 'tapestryhouseholdsNEW.THH21',
 'tapestryhouseholdsNEW.THH22',
 'tapestryhouseholdsNEW.THH23',
 'tapestryhouseholdsNEW.THH24',
 'tapestryhouseholdsNEW.THH25',
 'tapestryhouseholdsNEW.THH26',
 'tapestryhouseholdsNEW.THH27',
 'tapestryhouseholdsNEW.THH28',
 'tapestryhouseholdsNEW.THH29',
 'tapestryhouseholdsNEW.THH30',
 'tapestryhouseholdsNEW.THH31',
 'tapestryhouseholdsNEW.THH32',
 'tapestryhouseholdsNEW.THH33',
 'tapestryhouseholdsNEW.THH34',
 'tapestryhouseholdsNEW.THH35',
 'tapestryhouseholdsNEW.THH36',
 'tapestryhouseholdsNEW.THH37',
 'tapestryhouseholdsNEW.THH38',
 'tapestryhouseholdsNEW.THH39',
 'tapestryhouseholdsNEW.THH40',
 'tapestryhouseholdsNEW.THH41',
 'tapestryhouseholdsNEW.THH42',
 'tapestryhouseholdsNEW.THH43',
 'tapestryhouseholdsNEW.THH44',
 'tapestryhouseholdsNEW.THH45',
 'tapestryhouseholdsNEW.THH46',
 'tapestryhouseholdsNEW.THH47',
 'tapestryhouseholdsNEW.THH48',
 'tapestryhouseholdsNEW.THH49',
 'tapestryhouseholdsNEW.THH50',
 'tapestryhouseholdsNEW.THH51',
 'tapestryhouseholdsNEW.THH52',
 'tapestryhouseholdsNEW.THH53',
 'tapestryhouseholdsNEW.THH54',
 'tapestryhouseholdsNEW.THH55',
 'tapestryhouseholdsNEW.THH56',
 'tapestryhouseholdsNEW.THH57',
 'tapestryhouseholdsNEW.THH58',
 'tapestryhouseholdsNEW.THH59',
 'tapestryhouseholdsNEW.THH60',
 'tapestryhouseholdsNEW.THH61',
 'tapestryhouseholdsNEW.THH62',
 'tapestryhouseholdsNEW.THH63',
 'tapestryhouseholdsNEW.THH64',
 'tapestryhouseholdsNEW.THH65',
 'tapestryhouseholdsNEW.THH66',
 'tapestryhouseholdsNEW.THH67',
 'tapestryhouseholdsNEW.THH68']
result = [ x.split('.')[1] for x in tot_var]
result
['TSEGNUM',
 'TSEGCODE',
 'TSEGNAME',
 'THHBASE',
 'THH01',
 'THH02',
 'THH03',
 'THH04',
 'THH05',
 'THH06',
 'THH07',
 'THH08',
 'THH09',
 'THH10',
 'THH11',
 'THH12',
 'THH13',
 'THH14',
 'THH15',
 'THH16',
 'THH17',
 'THH18',
 'THH19',
 'THH20',
 'THH21',
 'THH22',
 'THH23',
 'THH24',
 'THH25',
 'THH26',
 'THH27',
 'THH28',
 'THH29',
 'THH30',
 'THH31',
 'THH32',
 'THH33',
 'THH34',
 'THH35',
 'THH36',
 'THH37',
 'THH38',
 'THH39',
 'THH40',
 'THH41',
 'THH42',
 'THH43',
 'THH44',
 'THH45',
 'THH46',
 'THH47',
 'THH48',
 'THH49',
 'THH50',
 'THH51',
 'THH52',
 'THH53',
 'THH54',
 'THH55',
 'THH56',
 'THH57',
 'THH58',
 'THH59',
 'THH60',
 'THH61',
 'THH62',
 'THH63',
 'THH64',
 'THH65',
 'THH66',
 'THH67',
 'THH68']

The 68 distinct markets of Tapestry detail the diversity of the American population.There are 14 LifeMode groups and 6 Urbanization groups which summarize markets that share similar traits.We will not use this for our case study. See Tapestry Segmentation for a detailed overview.

Enriching study areas

The enrich_layer tool gives us demographic and landascape data for the people, places, and businesses in a specific area, or within a selected travel time or distance from a location.

To obtain the tapestry and demographic data for the area, we will use enrich_layer tool from the arcgis.features.enrich_data module.

target_area_data = enrich_layer(target_area,
                                analysis_variables=analysis_var,
                                output_name="GetEnrichedHouseholdTapestry" + str(dt.now().microsecond))
target_area_data
GetEnrichedHouseholdTapestry489058
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
data_lyr = target_area_data.layers[0]

We will convert the layer into spatially enabled dataframe to analyze top 4 tapestries.

sdf = pd.DataFrame.spatial.from_layer(data_lyr)
sdf.head()
AnalysisAreaENRICH_FIDFAMGRW10_14FacilityOIDFromBreakHasDataIDID_1NameOBJECTID...UNEMPRT_CY_1ZIPaggregationMethodapportionmentConfidencep14Seg1Bp14Seg5Ap14Seg5Bp14Seg8CpopulationToPolygonSizeRatingsourceCountry
031.43261410.48101Best Performing Community0Location 1 : 0 - 5.01...2.437922BlockApportionment:US.BlockGroups2.5760.1075750.1074990.2070020.1045052.191US

1 rows × 118 columns

df = sdf[result]
df.select_dtypes(include=['float64','int64']).T.sort_values(by=0, ascending=False).head().index
Index(['THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'], dtype='object')

THHBASE contains the total and the highest four tapestry segments in the target community are 'THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'. We will normalize them by the base value.

Convert the top four target area tapestry counts to percentages.

Rather than counts, we want to compare tapestry percentages between each candidate ZIP Code and the target area.

Let's create four new fields to hold the tapestry percentages. We can create a feature layer instance to add new fields into your layer.

A feature service serves a collection of feature layers and tables, with the associated relationships among the entities. It is represented by arcgis.features.FeatureLayerCollection in the ArcGIS Python API.

Instances of FeatureLayerCollection can be constructed using a feature service, as shown below:

target_data = FeatureLayerCollection.fromitem(target_area_data)

The collection of layers and tables in a FeatureLayerCollection can be accessed using the layers and tables properties respectively:

data_lyr = target_data.layers[0]
data_lyr.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})
{'success': True}
data_lyr.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}
data_lyr.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}
data_lyr.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}

Refresh to update the fields in our layer.

target_data.manager.refresh()
{'success': True}
df = pd.DataFrame.spatial.from_layer(data_lyr)
df[['THH17PERC','THH35PERC','THH02PERC','THH05PERC']]
THH17PERCTHH35PERCTHH02PERCTHH05PERC
0NoneNoneNoneNone

Next we calculate percentage and add it to the fields.

data_layer = target_data.layers[0]
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 1}
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 1}
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 1}
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 1}
sf = pd.DataFrame.spatial.from_layer(data_lyr)
sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]
THH17PERCTHH35PERCTHH02PERCTHH05PERC
00.2002690.1309420.1102350.095993
sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]*100
THH17PERCTHH35PERCTHH02PERCTHH05PERC
020.026913.094211.02359.5993

Obtain the same data for the candidate ZIP Codes.

candidates_data = enrich_layer(candidates,
                               analysis_variables=analysis_var,
                               output_name="EnrichCandidatesWithHouseholdTapestry" + str(dt.now().microsecond))
cand_data = FeatureLayerCollection.fromitem(candidates_data)
cand_layer = cand_data.layers[0]
cand_layer.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})
{'success': True}
cand_layer.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}
cand_layer.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}
cand_layer.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})
{'success': True}
rf = pd.DataFrame.spatial.from_layer(cand_layer)
rf.THHBASE.sort_values().head()
789    0
469    0
715    0
854    0
369    0
Name: THHBASE, dtype: int64

Notice that some of the base counts are zero. When calculating percentations with zero values, we will get a division by zero error. To avoid this, we will filter these zero (or very small) population ZIP Codes and exclude them from further analysis.

cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 886}
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 886}
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 886}
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])
{'success': True, 'updatedFeatureCount': 886}
cf = pd.DataFrame.spatial.from_layer(cand_layer)
cf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]
THH17PERCTHH35PERCTHH02PERCTHH05PERC
00.0000000.0000000.0000000.000000
10.0000000.0000000.0000000.000000
20.0000000.0000000.0000000.000000
30.0000000.0000000.0000000.199850
40.0000000.0000000.0000000.000000
50.0000000.0000000.0000000.000000
60.0000000.0000000.0000000.213139
70.0000000.0000000.0000000.000000
80.0000000.0000000.0000000.029358
90.0000000.0000000.0693090.000000
100.0000000.0000000.0000690.000000
110.0000000.0000000.0000000.000000
120.0000000.0000000.0000000.000000
130.0000000.0000000.0000000.000000
140.0000000.0000000.0000000.000000
150.0000000.0000000.2110900.000000
160.0000000.0000000.0000000.000000
170.0000000.0000000.0000000.000000
180.0000000.0000000.3922340.000000
190.0000000.0000000.0000000.000000
200.0000000.0000000.0000000.000000
210.0000000.0000000.0000000.000000
220.0000000.0000000.0000000.000000
230.0000000.0000000.0000000.000000
240.0000000.0000000.0000000.000000
250.0000000.0000000.1029090.054188
260.0000000.0000000.0394150.000000
270.0000000.0000000.0000000.000000
280.0000000.0000000.0000000.000000
290.0000000.0000000.0000000.000000
...............
8680.0000000.0000000.0392230.000000
8690.0000000.0000000.0000000.017223
8700.0000000.0000000.0000000.000000
8710.0429640.0389950.0000000.048521
8720.0000000.0000000.0140480.036941
8730.0000000.0000000.0000000.149811
8740.0000000.0000000.0000000.000000
8750.0000000.0000000.0000000.011015
8760.0000000.0000000.0000000.000000
8770.0000000.0000000.0000000.000000
8780.0297630.0000000.0000000.000000
8790.0000000.0000000.0235480.080144
8800.0000000.0000000.0000000.055476
8810.0000000.0000000.0000000.143125
8820.0000000.0000000.0000000.000000
8830.0000000.0592800.0000000.000000
8840.0000000.0000000.0000000.000000
8850.0000000.0000000.0000000.000000
8860.0000000.0000000.0000000.000000
8870.0000000.0000000.0000000.000000
8880.0000000.0000000.0000000.095757
8890.0000000.0000000.0000000.000000
8900.0000000.0000000.3805850.012765
8910.0000000.0000000.0000000.000000
8920.0000000.0000000.0653820.000000
8930.0000000.0000000.0690860.021716
8940.0000000.0000000.3392600.011854
8950.0000000.0000000.1728260.144199
8960.0416530.0398160.1056180.000000
8970.0000000.0000000.0528140.006123

898 rows × 4 columns

Rank the candidate ZIP Codes by their similarity to the target area.

top_4_most_similar_results = find_similar_locations(data_layer,
                                                    cand_layer,
                                                    analysis_fields=['THH17','THH35','THH02','THH05','POPDENS14','FAMGRW10_14','UNEMPRT_CY'],
                                                    output_name = "Top4SimilarLocations" + str(dt.now().microsecond),
                                                    number_of_results=4)
top_4_most_similar_results = top_4_most_similar_results['similar_result_layer']
top_4_most_similar_results
Top4SimilarLocations933786
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
map1 = gis.map('Atlanta')
map1.add_layer(top_4_most_similar_results)
map2 = gis.map('Houston')
map2.add_layer(top_4_most_similar_results)
from ipywidgets import *

map1.layout=Layout(flex='1 1', padding='10px', height='420px')
map2.layout=Layout(flex='1 1', padding='10px', height='420px')

box = HBox([map1, map2])
box

One of the top ZIP Codes is located near Houston and three are located near Atlanta.

Conclusion

We have analysed great locations to start a new retiremnent community project.

References

Summary of tools

Method Generic Question Examples
Attribute Query Which features have the characteristics I'm interested in? Which features have more than 50,000 people and median annual incomes larger than $50,000?
Which hospitals have readmission rates larger than 10 percent?
Find Similar Locations Which features are most like my target feature? Which stores are similar to my best performing store?
Which crimes in the database are most like the current one I want to solve?
Identify drive time or drive distance areas What are the dimensions, based on time or distance, of the area surrounding a location? What does a 5-mile or 5-minute walking, biking, or driving distance around the hospital look like?
What can I visit within a 2KM walk from my hotel?
Enrich layer Want to know about the tapestry and demographic data for the area? Which zip codes have higher population of +55 age and above?

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.