Locating a new retirement community

Introduction

In the past, retirement communities in the United states were built in the suburbs, in warmest parts of the country. These days, however, people approaching retirement are not willing to relocate. They want to be connected to their friends and family, remain close to existng doctors and enjoy cultural and educational oportunites.

This sample demonstrates the utility of ArcGIS API for Python to identify some great locations for a new retirement community, which will satisfy these needs of senior citizens. It will demostrate the use tools such as create_drive_time_areas, enrich_layer, find_similar_locations, and editing field definitions and layer data.

First, we will look for locations that have large number of senior citizens but very few existing retirement communities. We will then rank these locations by comparing it to the most current successsful retirement community.

The LocatingRetirementCommunity Feature Layer Collection includes two layers. The first layer, called Target Community, contains the current best performing retirement community near Knoxville, Tennessee. The second layer, called Candidates, contains the 898 ZIP Codes in the continental USA associated with statistically significant hot or cold spot areas for all of these criteria:

  • High demand for retirement housing opportunities
  • Low supply of retirement housing
  • Low housing unit vacancy rates
  • Large projected age of 55 and older populations

Hot spot analysis to identify these candidate ZIP Codes was done using ArcMap. This analysis is included in both the ArcMap and ArcGIS Pro workflows.

In the workflow below, we will be using ArcGIS API for Python to create a 5-mile drive distance around the best performing community and obtaining tapestry and demographic data for the area. We will then obtain the same data for the candidate ZIP Codes. Finally, we will use the Find Similar Locations tool to identify the top four high demand, low vacancy, large projected age of 55+ population ZIP Codes that are most similar to the area surrounding the best performing community.

Workflow

Necessary Imports

Input
import pandas as pd
from datetime import datetime as dt

from arcgis.gis import GIS
from arcgis.geoenrichment import *
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.enrich_data import enrich_layer
from arcgis.features import FeatureLayerCollection
from arcgis.features.find_locations import find_similar_locations

Get the data for your analysis

Input
gis = GIS('home')

Search for the LocatingRetirementCommunity layer. You can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

Input
items = gis.content.search('title: LocatingRetirementCommunity owner:api_data_owner',
                           'Feature layer',
                           outside_org=True)

Display the list of results.

Input
from IPython.display import display

for item in items:
    display(item)
LocatingRetirementCommunity
Data to use with the Locating a New Retirement Community case study.Feature Layer Collection by api_data_owner
Last Modified: June 17, 2019
0 comments, 3 views

Since the first item is a Feature Layer Collection, accessing the layers property will give us a list of FeatureLayer objects.

Input
lyrs = items[0].layers
Input
for lyr in lyrs:
    print(lyr.properties.name)
TargetCommunity
Candidates
Input
target_community = lyrs[0]
candidates = lyrs[1]
Input
m1 = gis.map('Knoxville')
m1
Output
Input
m1.add_layer(target_community)

Create a 5-mile drive time polygon around the best performing community.

Proximity analysis

Proximity analysis tools help us answer one of the most common questions posed in spatial analysis: What is near what?

Proximity tools are available in the features.use_proximity module of ArcGIS API for Python. We can use the create_drive_time_areas tool to create a 5-mile drive distance buffer around the best performing community.

Input
target_area = create_drive_time_areas(target_community,
                                      break_values=[5],
                                      break_units='Miles',
                                      overlap_policy='Overlap',
                                      output_name='DriveTimeAreasOfTargetCommunities' + str(dt.now().microsecond))
Input
target_area
Output
DriveTimeAreasOfTargetCommunities585475
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
Input
target_area_map = gis.map('Knoxville')
target_area_map
Output
Input
target_area_map.add_layer(target_area)
target_area_map.add_layer(target_community)

Determine the top tapestry segments.

We will be looking for ZIP Codes that are similar to the area surrounding the best performing retirement community. We will take advantage of tapestry variables because they summarize many aspects of a population, such as age, income, home value, occupation, education, and consumer spending behaviors. To identify the top tapestries within the 5-mile drive distance area, we will obtain and compare all 68 tapestry segments. We will also obtain the tapestry base variable so you can calculate percentages.

Input
countries = get_countries()
usa = Country.get('US')
type(usa)
df = usa.data_collections
th_var = list(df.loc['tapestryhouseholdsNEW']['analysisVariable'].unique())
other_var = ["KeyUSFacts.POPGRWCYFY","AtRisk.TOTPOP_CY","industry.UNEMPRT_CY"]
Input
analysis_var = th_var + other_var
Input
tot_var= [th_var[x] for x in range(len(th_var)) if not th_var[x].split('.')[1].startswith('THHG')]
Input
tot_var
Output
['tapestryhouseholdsNEW.TSEGNUM',
 'tapestryhouseholdsNEW.TSEGCODE',
 'tapestryhouseholdsNEW.TSEGNAME',
 'tapestryhouseholdsNEW.THHBASE',
 'tapestryhouseholdsNEW.THH01',
 'tapestryhouseholdsNEW.THH02',
 'tapestryhouseholdsNEW.THH03',
 'tapestryhouseholdsNEW.THH04',
 'tapestryhouseholdsNEW.THH05',
 'tapestryhouseholdsNEW.THH06',
 'tapestryhouseholdsNEW.THH07',
 'tapestryhouseholdsNEW.THH08',
 'tapestryhouseholdsNEW.THH09',
 'tapestryhouseholdsNEW.THH10',
 'tapestryhouseholdsNEW.THH11',
 'tapestryhouseholdsNEW.THH12',
 'tapestryhouseholdsNEW.THH13',
 'tapestryhouseholdsNEW.THH14',
 'tapestryhouseholdsNEW.THH15',
 'tapestryhouseholdsNEW.THH16',
 'tapestryhouseholdsNEW.THH17',
 'tapestryhouseholdsNEW.THH18',
 'tapestryhouseholdsNEW.THH19',
 'tapestryhouseholdsNEW.THH20',
 'tapestryhouseholdsNEW.THH21',
 'tapestryhouseholdsNEW.THH22',
 'tapestryhouseholdsNEW.THH23',
 'tapestryhouseholdsNEW.THH24',
 'tapestryhouseholdsNEW.THH25',
 'tapestryhouseholdsNEW.THH26',
 'tapestryhouseholdsNEW.THH27',
 'tapestryhouseholdsNEW.THH28',
 'tapestryhouseholdsNEW.THH29',
 'tapestryhouseholdsNEW.THH30',
 'tapestryhouseholdsNEW.THH31',
 'tapestryhouseholdsNEW.THH32',
 'tapestryhouseholdsNEW.THH33',
 'tapestryhouseholdsNEW.THH34',
 'tapestryhouseholdsNEW.THH35',
 'tapestryhouseholdsNEW.THH36',
 'tapestryhouseholdsNEW.THH37',
 'tapestryhouseholdsNEW.THH38',
 'tapestryhouseholdsNEW.THH39',
 'tapestryhouseholdsNEW.THH40',
 'tapestryhouseholdsNEW.THH41',
 'tapestryhouseholdsNEW.THH42',
 'tapestryhouseholdsNEW.THH43',
 'tapestryhouseholdsNEW.THH44',
 'tapestryhouseholdsNEW.THH45',
 'tapestryhouseholdsNEW.THH46',
 'tapestryhouseholdsNEW.THH47',
 'tapestryhouseholdsNEW.THH48',
 'tapestryhouseholdsNEW.THH49',
 'tapestryhouseholdsNEW.THH50',
 'tapestryhouseholdsNEW.THH51',
 'tapestryhouseholdsNEW.THH52',
 'tapestryhouseholdsNEW.THH53',
 'tapestryhouseholdsNEW.THH54',
 'tapestryhouseholdsNEW.THH55',
 'tapestryhouseholdsNEW.THH56',
 'tapestryhouseholdsNEW.THH57',
 'tapestryhouseholdsNEW.THH58',
 'tapestryhouseholdsNEW.THH59',
 'tapestryhouseholdsNEW.THH60',
 'tapestryhouseholdsNEW.THH61',
 'tapestryhouseholdsNEW.THH62',
 'tapestryhouseholdsNEW.THH63',
 'tapestryhouseholdsNEW.THH64',
 'tapestryhouseholdsNEW.THH65',
 'tapestryhouseholdsNEW.THH66',
 'tapestryhouseholdsNEW.THH67',
 'tapestryhouseholdsNEW.THH68']
Input
result = [ x.split('.')[1] for x in tot_var]
Input
result
Output
['TSEGNUM',
 'TSEGCODE',
 'TSEGNAME',
 'THHBASE',
 'THH01',
 'THH02',
 'THH03',
 'THH04',
 'THH05',
 'THH06',
 'THH07',
 'THH08',
 'THH09',
 'THH10',
 'THH11',
 'THH12',
 'THH13',
 'THH14',
 'THH15',
 'THH16',
 'THH17',
 'THH18',
 'THH19',
 'THH20',
 'THH21',
 'THH22',
 'THH23',
 'THH24',
 'THH25',
 'THH26',
 'THH27',
 'THH28',
 'THH29',
 'THH30',
 'THH31',
 'THH32',
 'THH33',
 'THH34',
 'THH35',
 'THH36',
 'THH37',
 'THH38',
 'THH39',
 'THH40',
 'THH41',
 'THH42',
 'THH43',
 'THH44',
 'THH45',
 'THH46',
 'THH47',
 'THH48',
 'THH49',
 'THH50',
 'THH51',
 'THH52',
 'THH53',
 'THH54',
 'THH55',
 'THH56',
 'THH57',
 'THH58',
 'THH59',
 'THH60',
 'THH61',
 'THH62',
 'THH63',
 'THH64',
 'THH65',
 'THH66',
 'THH67',
 'THH68']

The 68 distinct markets of Tapestry detail the diversity of the American population.There are 14 LifeMode groups and 6 Urbanization groups which summarize markets that share similar traits.We will not use this for our case study. See Tapestry Segmentation for a detailed overview.

Enriching study areas

The enrich_layer tool gives us demographic and landascape data for the people, places, and businesses in a specific area, or within a selected travel time or distance from a location.

To obtain the tapestry and demographic data for the area, we will use enrich_layer tool from the arcgis.features.enrich_data module.

Input
target_area_data = enrich_layer(target_area,
                                analysis_variables=analysis_var,
                                output_name="GetEnrichedHouseholdTapestry" + str(dt.now().microsecond))
Input
target_area_data
Output
GetEnrichedHouseholdTapestry489058
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
Input
data_lyr = target_area_data.layers[0]

We will convert the layer into spatially enabled dataframe to analyze top 4 tapestries.

Input
sdf = pd.DataFrame.spatial.from_layer(data_lyr)
sdf.head()
Output
AnalysisArea ENRICH_FID FAMGRW10_14 FacilityOID FromBreak HasData ID ID_1 Name OBJECTID ... UNEMPRT_CY_1 ZIP aggregationMethod apportionmentConfidence p14Seg1B p14Seg5A p14Seg5B p14Seg8C populationToPolygonSizeRating sourceCountry
0 31.432614 1 0.48 1 0 1 Best Performing Community 0 Location 1 : 0 - 5.0 1 ... 2.4 37922 BlockApportionment:US.BlockGroups 2.576 0.107575 0.107499 0.207002 0.104505 2.191 US

1 rows × 118 columns

Input
df = sdf[result]
df.select_dtypes(include=['float64','int64']).T.sort_values(by=0, ascending=False).head().index
Output
Index(['THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'], dtype='object')

THHBASE contains the total and the highest four tapestry segments in the target community are 'THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'. We will normalize them by the base value.

Convert the top four target area tapestry counts to percentages.

Rather than counts, we want to compare tapestry percentages between each candidate ZIP Code and the target area.

Let's create four new fields to hold the tapestry percentages. We can create a feature layer instance to add new fields into your layer.

A feature service serves a collection of feature layers and tables, with the associated relationships among the entities. It is represented by arcgis.features.FeatureLayerCollection in the ArcGIS Python API.

Instances of FeatureLayerCollection can be constructed using a feature service, as shown below:

Input
target_data = FeatureLayerCollection.fromitem(target_area_data)

The collection of layers and tables in a FeatureLayerCollection can be accessed using the layers and tables properties respectively:

Input
data_lyr = target_data.layers[0]
Input
data_lyr.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
data_lyr.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
data_lyr.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
data_lyr.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}

Refresh to update the fields in our layer.

Input
target_data.manager.refresh()
Output
{'success': True}
Input
df = pd.DataFrame.spatial.from_layer(data_lyr)
Input
df[['THH17PERC','THH35PERC','THH02PERC','THH05PERC']]
Output
THH17PERC THH35PERC THH02PERC THH05PERC
0 None None None None

Next we calculate percentage and add it to the fields.

Input
data_layer = target_data.layers[0]
Input
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 1}
Input
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 1}
Input
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 1}
Input
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 1}
Input
sf = pd.DataFrame.spatial.from_layer(data_lyr)
Input
sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]
Output
THH17PERC THH35PERC THH02PERC THH05PERC
0 0.200269 0.130942 0.110235 0.095993
Input
sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]*100
Output
THH17PERC THH35PERC THH02PERC THH05PERC
0 20.0269 13.0942 11.0235 9.5993

Obtain the same data for the candidate ZIP Codes.

Input
candidates_data = enrich_layer(candidates,
                               analysis_variables=analysis_var,
                               output_name="EnrichCandidatesWithHouseholdTapestry" + str(dt.now().microsecond))
Input
cand_data = FeatureLayerCollection.fromitem(candidates_data)
Input
cand_layer = cand_data.layers[0]
Input
cand_layer.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
cand_layer.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
cand_layer.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
cand_layer.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})
Output
{'success': True}
Input
rf = pd.DataFrame.spatial.from_layer(cand_layer)
Input
rf.THHBASE.sort_values().head()
Output
789    0
469    0
715    0
854    0
369    0
Name: THHBASE, dtype: int64

Notice that some of the base counts are zero. When calculating percentations with zero values, we will get a division by zero error. To avoid this, we will filter these zero (or very small) population ZIP Codes and exclude them from further analysis.

Input
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 886}
Input
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 886}
Input
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 886}
Input
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])
Output
{'success': True, 'updatedFeatureCount': 886}
Input
cf = pd.DataFrame.spatial.from_layer(cand_layer)
Input
cf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]
Output
THH17PERC THH35PERC THH02PERC THH05PERC
0 0.000000 0.000000 0.000000 0.000000
1 0.000000 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.000000 0.000000
3 0.000000 0.000000 0.000000 0.199850
4 0.000000 0.000000 0.000000 0.000000
5 0.000000 0.000000 0.000000 0.000000
6 0.000000 0.000000 0.000000 0.213139
7 0.000000 0.000000 0.000000 0.000000
8 0.000000 0.000000 0.000000 0.029358
9 0.000000 0.000000 0.069309 0.000000
10 0.000000 0.000000 0.000069 0.000000
11 0.000000 0.000000 0.000000 0.000000
12 0.000000 0.000000 0.000000 0.000000
13 0.000000 0.000000 0.000000 0.000000
14 0.000000 0.000000 0.000000 0.000000
15 0.000000 0.000000 0.211090 0.000000
16 0.000000 0.000000 0.000000 0.000000
17 0.000000 0.000000 0.000000 0.000000
18 0.000000 0.000000 0.392234 0.000000
19 0.000000 0.000000 0.000000 0.000000
20 0.000000 0.000000 0.000000 0.000000
21 0.000000 0.000000 0.000000 0.000000
22 0.000000 0.000000 0.000000 0.000000
23 0.000000 0.000000 0.000000 0.000000
24 0.000000 0.000000 0.000000 0.000000
25 0.000000 0.000000 0.102909 0.054188
26 0.000000 0.000000 0.039415 0.000000
27 0.000000 0.000000 0.000000 0.000000
28 0.000000 0.000000 0.000000 0.000000
29 0.000000 0.000000 0.000000 0.000000
... ... ... ... ...
868 0.000000 0.000000 0.039223 0.000000
869 0.000000 0.000000 0.000000 0.017223
870 0.000000 0.000000 0.000000 0.000000
871 0.042964 0.038995 0.000000 0.048521
872 0.000000 0.000000 0.014048 0.036941
873 0.000000 0.000000 0.000000 0.149811
874 0.000000 0.000000 0.000000 0.000000
875 0.000000 0.000000 0.000000 0.011015
876 0.000000 0.000000 0.000000 0.000000
877 0.000000 0.000000 0.000000 0.000000
878 0.029763 0.000000 0.000000 0.000000
879 0.000000 0.000000 0.023548 0.080144
880 0.000000 0.000000 0.000000 0.055476
881 0.000000 0.000000 0.000000 0.143125
882 0.000000 0.000000 0.000000 0.000000
883 0.000000 0.059280 0.000000 0.000000
884 0.000000 0.000000 0.000000 0.000000
885 0.000000 0.000000 0.000000 0.000000
886 0.000000 0.000000 0.000000 0.000000
887 0.000000 0.000000 0.000000 0.000000
888 0.000000 0.000000 0.000000 0.095757
889 0.000000 0.000000 0.000000 0.000000
890 0.000000 0.000000 0.380585 0.012765
891 0.000000 0.000000 0.000000 0.000000
892 0.000000 0.000000 0.065382 0.000000
893 0.000000 0.000000 0.069086 0.021716
894 0.000000 0.000000 0.339260 0.011854
895 0.000000 0.000000 0.172826 0.144199
896 0.041653 0.039816 0.105618 0.000000
897 0.000000 0.000000 0.052814 0.006123

898 rows × 4 columns

Rank the candidate ZIP Codes by their similarity to the target area.

Input
top_4_most_similar_results = find_similar_locations(data_layer,
                                                    cand_layer,
                                                    analysis_fields=['THH17','THH35','THH02','THH05','POPDENS14','FAMGRW10_14','UNEMPRT_CY'],
                                                    output_name = "Top4SimilarLocations" + str(dt.now().microsecond),
                                                    number_of_results=4)
Input
top_4_most_similar_results = top_4_most_similar_results['similar_result_layer']
top_4_most_similar_results
Output
Top4SimilarLocations933786
Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views
Input
map1 = gis.map('Atlanta')
map1.add_layer(top_4_most_similar_results)
Input
map2 = gis.map('Houston')
map2.add_layer(top_4_most_similar_results)
Input
from ipywidgets import *

map1.layout=Layout(flex='1 1', padding='10px', height='420px')
map2.layout=Layout(flex='1 1', padding='10px', height='420px')

box = HBox([map1, map2])
box
Output

One of the top ZIP Codes is located near Houston and three are located near Atlanta.

Conclusion

We have analysed great locations to start a new retiremnent community project.

References

Summary of tools

Method Generic Question Examples
Attribute Query Which features have the characteristics I'm interested in? Which features have more than 50,000 people and median annual incomes larger than $50,000?
Which hospitals have readmission rates larger than 10 percent?
Find Similar Locations Which features are most like my target feature? Which stores are similar to my best performing store?
Which crimes in the database are most like the current one I want to solve?
Identify drive time or drive distance areas What are the dimensions, based on time or distance, of the area surrounding a location? What does a 5-mile or 5-minute walking, biking, or driving distance around the hospital look like?
What can I visit within a 2KM walk from my hotel?
Enrich layer Want to know about the tapestry and demographic data for the area? Which zip codes have higher population of +55 age and above?

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.