Locating a new retirement community

Introduction

In the past, retirement communities in the United states were built in the suburbs, in warmest parts of the country. These days, however, people approaching retirement are not willing to relocate. They want to be connected to their friends and family, remain close to existng doctors and enjoy cultural and educational oportunites.

This sample demonstrates the utility of ArcGIS API for Python to identify some great locations for a new retirement community, which will satisfy these needs of senior citizens. It will demostrate the use tools such as create_drive_time_areas, enrich_layer, find_similar_locations, and editing field definitions and layer data.

First, we will look for locations that have large number of senior citizens but very few existing retirement communities. We will then rank these locations by comparing it to the most current successsful retirement community.

The LocatingRetirementCommunity Feature Layer Collection includes two layers. The first layer, called Target Community, contains the current best performing retirement community near Knoxville, Tennessee. The second layer, called Candidates, contains the 898 ZIP Codes in the continental USA associated with statistically significant hot or cold spot areas for all of these criteria:

High demand for retirement housing opportunities
Low supply of retirement housing
Low housing unit vacancy rates
Large projected age of 55 and older populations

Hot spot analysis to identify these candidate ZIP Codes was done using ArcMap. This analysis is included in both the ArcMap and ArcGIS Pro workflows.

In the workflow below, we will be using ArcGIS API for Python to create a 5-mile drive distance around the best performing community and obtaining tapestry and demographic data for the area. We will then obtain the same data for the candidate ZIP Codes. Finally, we will use the Find Similar Locations tool to identify the top four high demand, low vacancy, large projected age of 55+ population ZIP Codes that are most similar to the area surrounding the best performing community.

Workflow

Necessary Imports

import pandas as pd
from datetime import datetime as dt

from arcgis.gis import GIS
from arcgis.geoenrichment import *
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.enrich_data import enrich_layer
from arcgis.features import FeatureLayerCollection
from arcgis.features.find_locations import find_similar_locations

Get the data for your analysis

gis = GIS('home')

Search for the LocatingRetirementCommunity layer. You can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

items = gis.content.search('title: LocatingRetirementCommunity owner:api_data_owner',
                           'Feature layer',
                           outside_org=True)

Display the list of results.

from IPython.display import display

for item in items:
    display(item)

LocatingRetirementCommunity
Data to use with the Locating a New Retirement Community case study.

Feature Layer Collection by api_data_owner
Last Modified: June 17, 2019
0 comments, 3 views

Since the first item is a Feature Layer Collection, accessing the layers property will give us a list of FeatureLayer objects.

lyrs = items[0].layers

for lyr in lyrs:
    print(lyr.properties.name)

TargetCommunity
Candidates

target_community = lyrs[0]
candidates = lyrs[1]

m1 = gis.map('Knoxville')
m1

m1.add_layer(target_community)

Create a 5-mile drive time polygon around the best performing community.

Proximity analysis

Proximity analysis tools help us answer one of the most common questions posed in spatial analysis: What is near what?

Proximity tools are available in the features.use_proximity module of ArcGIS API for Python. We can use the create_drive_time_areas tool to create a 5-mile drive distance buffer around the best performing community.

target_area = create_drive_time_areas(target_community,
                                      break_values=[5],
                                      break_units='Miles',
                                      overlap_policy='Overlap',
                                      travel_mode='Driving Distance',
                                      output_name='DriveTimeAreasOfTargetCommunities' + str(dt.now().microsecond))

target_area

DriveTimeAreasOfTargetCommunities585475

Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views

target_area_map = gis.map('Knoxville')
target_area_map

target_area_map.add_layer(target_area)
target_area_map.add_layer(target_community)

Determine the top tapestry segments.

We will be looking for ZIP Codes that are similar to the area surrounding the best performing retirement community. We will take advantage of tapestry variables because they summarize many aspects of a population, such as age, income, home value, occupation, education, and consumer spending behaviors. To identify the top tapestries within the 5-mile drive distance area, we will obtain and compare all 68 tapestry segments. We will also obtain the tapestry base variable so you can calculate percentages.

countries = get_countries()
usa = Country.get('US')
type(usa)
df = usa.data_collections
th_var = list(df.loc['tapestryhouseholdsNEW']['analysisVariable'].unique())
other_var = ["KeyUSFacts.POPGRWCYFY","AtRisk.TOTPOP_CY","industry.UNEMPRT_CY"]

analysis_var = th_var + other_var

tot_var= [th_var[x] for x in range(len(th_var)) if not th_var[x].split('.')[1].startswith('THHG')]

tot_var

['tapestryhouseholdsNEW.TSEGNUM',
 'tapestryhouseholdsNEW.TSEGCODE',
 'tapestryhouseholdsNEW.TSEGNAME',
 'tapestryhouseholdsNEW.THHBASE',
 'tapestryhouseholdsNEW.THH01',
 'tapestryhouseholdsNEW.THH02',
 'tapestryhouseholdsNEW.THH03',
 'tapestryhouseholdsNEW.THH04',
 'tapestryhouseholdsNEW.THH05',
 'tapestryhouseholdsNEW.THH06',
 'tapestryhouseholdsNEW.THH07',
 'tapestryhouseholdsNEW.THH08',
 'tapestryhouseholdsNEW.THH09',
 'tapestryhouseholdsNEW.THH10',
 'tapestryhouseholdsNEW.THH11',
 'tapestryhouseholdsNEW.THH12',
 'tapestryhouseholdsNEW.THH13',
 'tapestryhouseholdsNEW.THH14',
 'tapestryhouseholdsNEW.THH15',
 'tapestryhouseholdsNEW.THH16',
 'tapestryhouseholdsNEW.THH17',
 'tapestryhouseholdsNEW.THH18',
 'tapestryhouseholdsNEW.THH19',
 'tapestryhouseholdsNEW.THH20',
 'tapestryhouseholdsNEW.THH21',
 'tapestryhouseholdsNEW.THH22',
 'tapestryhouseholdsNEW.THH23',
 'tapestryhouseholdsNEW.THH24',
 'tapestryhouseholdsNEW.THH25',
 'tapestryhouseholdsNEW.THH26',
 'tapestryhouseholdsNEW.THH27',
 'tapestryhouseholdsNEW.THH28',
 'tapestryhouseholdsNEW.THH29',
 'tapestryhouseholdsNEW.THH30',
 'tapestryhouseholdsNEW.THH31',
 'tapestryhouseholdsNEW.THH32',
 'tapestryhouseholdsNEW.THH33',
 'tapestryhouseholdsNEW.THH34',
 'tapestryhouseholdsNEW.THH35',
 'tapestryhouseholdsNEW.THH36',
 'tapestryhouseholdsNEW.THH37',
 'tapestryhouseholdsNEW.THH38',
 'tapestryhouseholdsNEW.THH39',
 'tapestryhouseholdsNEW.THH40',
 'tapestryhouseholdsNEW.THH41',
 'tapestryhouseholdsNEW.THH42',
 'tapestryhouseholdsNEW.THH43',
 'tapestryhouseholdsNEW.THH44',
 'tapestryhouseholdsNEW.THH45',
 'tapestryhouseholdsNEW.THH46',
 'tapestryhouseholdsNEW.THH47',
 'tapestryhouseholdsNEW.THH48',
 'tapestryhouseholdsNEW.THH49',
 'tapestryhouseholdsNEW.THH50',
 'tapestryhouseholdsNEW.THH51',
 'tapestryhouseholdsNEW.THH52',
 'tapestryhouseholdsNEW.THH53',
 'tapestryhouseholdsNEW.THH54',
 'tapestryhouseholdsNEW.THH55',
 'tapestryhouseholdsNEW.THH56',
 'tapestryhouseholdsNEW.THH57',
 'tapestryhouseholdsNEW.THH58',
 'tapestryhouseholdsNEW.THH59',
 'tapestryhouseholdsNEW.THH60',
 'tapestryhouseholdsNEW.THH61',
 'tapestryhouseholdsNEW.THH62',
 'tapestryhouseholdsNEW.THH63',
 'tapestryhouseholdsNEW.THH64',
 'tapestryhouseholdsNEW.THH65',
 'tapestryhouseholdsNEW.THH66',
 'tapestryhouseholdsNEW.THH67',
 'tapestryhouseholdsNEW.THH68']

result = [ x.split('.')[1] for x in tot_var]

result

['TSEGNUM',
 'TSEGCODE',
 'TSEGNAME',
 'THHBASE',
 'THH01',
 'THH02',
 'THH03',
 'THH04',
 'THH05',
 'THH06',
 'THH07',
 'THH08',
 'THH09',
 'THH10',
 'THH11',
 'THH12',
 'THH13',
 'THH14',
 'THH15',
 'THH16',
 'THH17',
 'THH18',
 'THH19',
 'THH20',
 'THH21',
 'THH22',
 'THH23',
 'THH24',
 'THH25',
 'THH26',
 'THH27',
 'THH28',
 'THH29',
 'THH30',
 'THH31',
 'THH32',
 'THH33',
 'THH34',
 'THH35',
 'THH36',
 'THH37',
 'THH38',
 'THH39',
 'THH40',
 'THH41',
 'THH42',
 'THH43',
 'THH44',
 'THH45',
 'THH46',
 'THH47',
 'THH48',
 'THH49',
 'THH50',
 'THH51',
 'THH52',
 'THH53',
 'THH54',
 'THH55',
 'THH56',
 'THH57',
 'THH58',
 'THH59',
 'THH60',
 'THH61',
 'THH62',
 'THH63',
 'THH64',
 'THH65',
 'THH66',
 'THH67',
 'THH68']

The 68 distinct markets of Tapestry detail the diversity of the American population.There are 14 LifeMode groups and 6 Urbanization groups which summarize markets that share similar traits.We will not use this for our case study. See Tapestry Segmentation for a detailed overview.

Enriching study areas

The enrich_layer tool gives us demographic and landascape data for the people, places, and businesses in a specific area, or within a selected travel time or distance from a location.

To obtain the tapestry and demographic data for the area, we will use enrich_layer tool from the arcgis.features.enrich_data module.

target_area_data = enrich_layer(target_area,
                                analysis_variables=analysis_var,
                                output_name="GetEnrichedHouseholdTapestry" + str(dt.now().microsecond))

target_area_data

GetEnrichedHouseholdTapestry489058

Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views

data_lyr = target_area_data.layers[0]

We will convert the layer into spatially enabled dataframe to analyze top 4 tapestries.

sdf = pd.DataFrame.spatial.from_layer(data_lyr)
sdf.head()

	AnalysisArea	ENRICH_FID	FAMGRW10_14	FacilityOID	FromBreak	HasData	ID	ID_1	Name	OBJECTID	...	UNEMPRT_CY_1	ZIP	aggregationMethod	apportionmentConfidence	p14Seg1B	p14Seg5A	p14Seg5B	p14Seg8C	populationToPolygonSizeRating	sourceCountry
0	31.432614	1	0.48	1	0	1	Best Performing Community	0	Location 1 : 0 - 5.0	1	...	2.4	37922	BlockApportionment:US.BlockGroups	2.576	0.107575	0.107499	0.207002	0.104505	2.191	US

1 rows × 118 columns

df = sdf[result]
df.select_dtypes(include=['float64','int64']).T.sort_values(by=0, ascending=False).head().index

Index(['THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'], dtype='object')

THHBASE contains the total and the highest four tapestry segments in the target community are 'THHBASE', 'THH17', 'THH35', 'THH02', 'THH05'. We will normalize them by the base value.

Convert the top four target area tapestry counts to percentages.

Rather than counts, we want to compare tapestry percentages between each candidate ZIP Code and the target area.

Let's create four new fields to hold the tapestry percentages. We can create a feature layer instance to add new fields into your layer.

A feature service serves a collection of feature layers and tables, with the associated relationships among the entities. It is represented by arcgis.features.FeatureLayerCollection in the ArcGIS Python API.

Instances of FeatureLayerCollection can be constructed using a feature service, as shown below:

target_data = FeatureLayerCollection.fromitem(target_area_data)

The collection of layers and tables in a FeatureLayerCollection can be accessed using the layers and tables properties respectively:

data_lyr = target_data.layers[0]

data_lyr.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})

{'success': True}

data_lyr.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

data_lyr.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

data_lyr.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

Refresh to update the fields in our layer.

target_data.manager.refresh()

{'success': True}

df = pd.DataFrame.spatial.from_layer(data_lyr)

df[['THH17PERC','THH35PERC','THH02PERC','THH05PERC']]

	THH17PERC	THH35PERC	THH02PERC	THH05PERC
0	None	None	None	None

Next we calculate percentage and add it to the fields.

data_layer = target_data.layers[0]

data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

sf = pd.DataFrame.spatial.from_layer(data_lyr)

sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]

	THH17PERC	THH35PERC	THH02PERC	THH05PERC
0	0.200269	0.130942	0.110235	0.095993

sf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]*100

	THH17PERC	THH35PERC	THH02PERC	THH05PERC
0	20.0269	13.0942	11.0235	9.5993

Obtain the same data for the candidate ZIP Codes.

candidates_data = enrich_layer(candidates,
                               analysis_variables=analysis_var,
                               output_name="EnrichCandidatesWithHouseholdTapestry" + str(dt.now().microsecond))

cand_data = FeatureLayerCollection.fromitem(candidates_data)

cand_layer = cand_data.layers[0]

cand_layer.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"In Style (5B) PERC",
                                                 "nullable":True,"editable":True,"length":256}]})

{'success': True}

cand_layer.manager.add_to_definition({"fields":[{"name":"THH35PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Bright Young Professionals (8C) PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

cand_layer.manager.add_to_definition({"fields":[{"name":"THH02PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Professional Pride (1B) PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

cand_layer.manager.add_to_definition({"fields":[{"name":"THH05PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"Exurbanites",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

rf = pd.DataFrame.spatial.from_layer(cand_layer)

rf.THHBASE.sort_values().head()

789    0
469    0
715    0
854    0
369    0
Name: THHBASE, dtype: int64

Notice that some of the base counts are zero. When calculating percentations with zero values, we will get a division by zero error. To avoid this, we will filter these zero (or very small) population ZIP Codes and exclude them from further analysis.

cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH35PERC","sqlExpression":"THH35 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH02PERC","sqlExpression":"THH02 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH05PERC","sqlExpression":"THH05 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

cf = pd.DataFrame.spatial.from_layer(cand_layer)

cf[['THH17PERC', 'THH35PERC', 'THH02PERC', 'THH05PERC']]

	THH17PERC	THH35PERC	THH02PERC	THH05PERC
0	0.000000	0.000000	0.000000	0.000000
1	0.000000	0.000000	0.000000	0.000000
2	0.000000	0.000000	0.000000	0.000000
3	0.000000	0.000000	0.000000	0.199850
4	0.000000	0.000000	0.000000	0.000000
5	0.000000	0.000000	0.000000	0.000000
6	0.000000	0.000000	0.000000	0.213139
7	0.000000	0.000000	0.000000	0.000000
8	0.000000	0.000000	0.000000	0.029358
9	0.000000	0.000000	0.069309	0.000000
10	0.000000	0.000000	0.000069	0.000000
11	0.000000	0.000000	0.000000	0.000000
12	0.000000	0.000000	0.000000	0.000000
13	0.000000	0.000000	0.000000	0.000000
14	0.000000	0.000000	0.000000	0.000000
15	0.000000	0.000000	0.211090	0.000000
16	0.000000	0.000000	0.000000	0.000000
17	0.000000	0.000000	0.000000	0.000000
18	0.000000	0.000000	0.392234	0.000000
19	0.000000	0.000000	0.000000	0.000000
20	0.000000	0.000000	0.000000	0.000000
21	0.000000	0.000000	0.000000	0.000000
22	0.000000	0.000000	0.000000	0.000000
23	0.000000	0.000000	0.000000	0.000000
24	0.000000	0.000000	0.000000	0.000000
25	0.000000	0.000000	0.102909	0.054188
26	0.000000	0.000000	0.039415	0.000000
27	0.000000	0.000000	0.000000	0.000000
28	0.000000	0.000000	0.000000	0.000000
29	0.000000	0.000000	0.000000	0.000000
...	...	...	...	...
868	0.000000	0.000000	0.039223	0.000000
869	0.000000	0.000000	0.000000	0.017223
870	0.000000	0.000000	0.000000	0.000000
871	0.042964	0.038995	0.000000	0.048521
872	0.000000	0.000000	0.014048	0.036941
873	0.000000	0.000000	0.000000	0.149811
874	0.000000	0.000000	0.000000	0.000000
875	0.000000	0.000000	0.000000	0.011015
876	0.000000	0.000000	0.000000	0.000000
877	0.000000	0.000000	0.000000	0.000000
878	0.029763	0.000000	0.000000	0.000000
879	0.000000	0.000000	0.023548	0.080144
880	0.000000	0.000000	0.000000	0.055476
881	0.000000	0.000000	0.000000	0.143125
882	0.000000	0.000000	0.000000	0.000000
883	0.000000	0.059280	0.000000	0.000000
884	0.000000	0.000000	0.000000	0.000000
885	0.000000	0.000000	0.000000	0.000000
886	0.000000	0.000000	0.000000	0.000000
887	0.000000	0.000000	0.000000	0.000000
888	0.000000	0.000000	0.000000	0.095757
889	0.000000	0.000000	0.000000	0.000000
890	0.000000	0.000000	0.380585	0.012765
891	0.000000	0.000000	0.000000	0.000000
892	0.000000	0.000000	0.065382	0.000000
893	0.000000	0.000000	0.069086	0.021716
894	0.000000	0.000000	0.339260	0.011854
895	0.000000	0.000000	0.172826	0.144199
896	0.041653	0.039816	0.105618	0.000000
897	0.000000	0.000000	0.052814	0.006123

898 rows × 4 columns

Rank the candidate ZIP Codes by their similarity to the target area.

top_4_most_similar_results = find_similar_locations(data_layer,
                                                    cand_layer,
                                                    analysis_fields=['THH17','THH35','THH02','THH05','POPDENS14','FAMGRW10_14','UNEMPRT_CY'],
                                                    output_name = "Top4SimilarLocations" + str(dt.now().microsecond),
                                                    number_of_results=4)

top_4_most_similar_results = top_4_most_similar_results['similar_result_layer']
top_4_most_similar_results

Top4SimilarLocations933786

Feature Layer Collection by arcgis_python
Last Modified: June 18, 2019
0 comments, 0 views

map1 = gis.map('Atlanta')
map1.add_layer(top_4_most_similar_results)

map2 = gis.map('Houston')
map2.add_layer(top_4_most_similar_results)

from ipywidgets import *

map1.layout=Layout(flex='1 1', padding='10px', height='420px')
map2.layout=Layout(flex='1 1', padding='10px', height='420px')

box = HBox([map1, map2])
box

One of the top ZIP Codes is located near Houston and three are located near Atlanta.

Conclusion

We have analysed great locations to start a new retiremnent community project.

References

Brennan, Morgan. 2012. "America's Friendliest Towns." Forbes, December 19, 2012. http://www.forbes.com/sites/morganbrennan/2012/12/19/americas-friendliest-towns/
City of Roswell, Georgia. "City Awards and Achievements." http://www.roswellgov.com/discover-us/city-awards-achievements
City of Taylor Lake Village, Texas. http://www.taylorlakevillage.us
Grunewald, Will. 2014. "How Baby Boomers Are Changing Retirement Living." Washingtonian, March 13, 2014. http://www.washingtonian.com/articles/people/how-baby-boomers-are-changing-retirement-living/
Heneghan, Carolyn. 2014. "The 50 Safest Cities in Georgia." The SafeWise Report, Safewise.com, February 17, 2014. http://www.safewise.com/blog/50-safest-cities-georgia/
Holley, Peter; Lomax, John; and Shilcutt, Katharine. 2014. "Where to Live Now: The 25 Hottest Neighborhoods of 2014." Houstonia, April 3, 2014. http://www.houstoniamag.com/articles/2014/4/3/where-to-live-now-hottest-neighborhoods-april-2014
Kilborn, Peter T. 2009. "In Depth: America's 25 Best Places to Move." Forbes, July 7, 2009. http://www.forbes.com/2009/07/07/relocate-relocation-cities-lifestyle-real-estate-affordable-moving_slide.html
Northern, Amanda. 2015. "Here Are The Best Places To Live In Georgia… And Why." OnlyInYourState.com, June 19, 2015. http://www.onlyinyourstate.com/georgia/best-places-to-live-in-ga/

Summary of tools

Method	Generic Question	Examples
Attribute Query	Which features have the characteristics I'm interested in?	Which features have more than 50,000 people and median annual incomes larger than $50,000? Which hospitals have readmission rates larger than 10 percent?
Find Similar Locations	Which features are most like my target feature?	Which stores are similar to my best performing store? Which crimes in the database are most like the current one I want to solve?
Identify drive time or drive distance areas	What are the dimensions, based on time or distance, of the area surrounding a location?	What does a 5-mile or 5-minute walking, biking, or driving distance around the hospital look like? What can I visit within a 2KM walk from my hotel?
Enrich layer	Want to know about the tapestry and demographic data for the area?	Which zip codes have higher population of +55 age and above?