Finding a New Home

Buying or selling of a house can be a very stressful event in one's life. The process could be frustrating as it is lengthy, uncertain and needs a lot of examination. Through this workflow we will guide a couple (Mark and Lisa) who is interested in selling their home and relocating to a place nearest to both of their work places. In this case study, we will explore the current housing market, estimate average house prices in their area and hunt for a new one. We will download the Zillow data for their current home for our analysis. You can use your own data and follow along with this workflow which aims to help Mark and Lisa in finding their new home.

The notebook is divided into two parts. In the first part, we will calculate the following:

  • Percentage of decrease/increase in house price since Mark and Lisa bought their home.
  • Suggested selling price for their home.
  • Whether their zip code is a buyer’s market or seller’s market.
  • Average number of days it takes for homes to sell in their neighbourhood.

In the second part of the notebook, we will explore the investment potential of homes close to their work places. Based on how much a person is willing to spend commuting to work, we will create a drive-time buffer. This will narrow down the search areas. Zillow also provides data for market health and projected home value appreciation. Visualizing the zip codes by their market health will help them focus only on areas with good market health. Hence they will get a list of areas to choose from, for buying their new home.

Selling your home

Execute the following command to install the openpyxl library if not already. This package is used to read from any Excel or CSV files.

!pip install openpyxl

Also, when matplotlib is not present, run the following command to have it installed or upgraded:

import sys 
!{sys.executable} -m pip install matplotlib

Determine an appropriate selling price

1) Download home sales time series data from Zillow at www.zillow.com/research/data.

Mark and Lisa have a 3-bedroom home, so we will select the ZHVI 3-Bedroom time-series ($) data set at the ZIP Code level.

2) Prepare the Excel data as follows:

a) Using Excel, open the .csv file.

Notice that the RegionName field has ZIP Codes as numbers (if we sort the RegionName field we will notice the ZIP Codes for Massachusetts, for example, don't have leading zeros; 01001 is 1001). Also, notice the median home value columns are named using the year and month. The first data available is for April 1996 (1996-04). b) Copy all the column headings and the one record with data for their ZIP Code to a new Excel sheet.

Apply a filter to the RegionName field. Mark and Lisa live in Crestline, California, so we will apply a filter for the 92325 ZIP Code.

c) Select (highlight) fields starting with the month and year when they bought their home and continuing to the last month and year column in the Excel table. So, for example, since Mark and Lisa bought their home in December 2007, they highlight the the two rows from column 2007-01 to column 2018-08.

d) Copy (press Ctrl+C) the selected data and paste it, along with the column headings, to a new Excel sheet using Paste Transposed (right-click in the first cell of the new sheet to see the paste options; select Paste Transposed). This gives two columns of data.

e) The first column has date values but only includes the year and month. In column C, create a proper date field.

  • Right-click column C and format the cells to be category date.
  • In the first cell of column C, enter the following formula: = DATEVALUE(CONCATENATE(A1, "-01"))
  • Drag the Autofill handle down to the last data cell in the column.

f) Insert a top row and type the column headings:

YYYYMM, Value, and date.

g) Rename the Excel sheet (probably called Sheet2 at present) something like AveSellingPrice and delete the other sheets (the first sheet contains a large amount of data that we won't be using further in the workflow).

Mark and Lisa named their price Excel sheet CrestlineAveSellingPrice.

h) Save this new sheet as an Excel workbook.

Mark and Lisa named their Excel file Crestline3BdrmAveSellingPrice.xlsx.

3) Connect to your ArcGIS Online organization.

from arcgis.gis import GIS
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime as dt
gis = GIS('home')

Use the boolean has_arcpy to flag whether arcpy is present on the local environement:

has_arcpy = False
try:
    import arcpy
    has_arcpy = True
    print("arcpy present")
except:
    print("arcpy not present")
arcpy not present

4) Load the excel file for analysis.

data = gis.content.search('finding_a_new_home owner:api_data_owner type: csv collection')[0]
data
finding_a_new_home
CSV Collection by api_data_owner
Last Modified: March 17, 2021
0 comments, 46 views
filepath = data.download(file_name=data.name)
import os
import zipfile
from pathlib import Path
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
data_path = Path(os.path.join(os.path.splitext(filepath)[0]))
data_path
WindowsPath('C:/Users/pri10421/AppData/Local/Temp/finding_a_new_home')
datapath = [os.path.abspath(os.path.join(data_path, p)) for p in os.listdir(data_path)]
datapath
['C:\\Users\\pri10421\\AppData\\Local\\Temp\\finding_a_new_home\\BuyerSellerIndex.xlsx',
 'C:\\Users\\pri10421\\AppData\\Local\\Temp\\finding_a_new_home\\Crestline3BdrmAveSellingPrice.xlsx',
 'C:\\Users\\pri10421\\AppData\\Local\\Temp\\finding_a_new_home\\ImportantPlaces.xlsx',
 'C:\\Users\\pri10421\\AppData\\Local\\Temp\\finding_a_new_home\\MarketHealthIndex.xlsx']
file_name1 = datapath[1]
data1 = pd.pandas.read_excel(file_name1)
data1.head()
YYYYMMvaluedate
02007-012910002007-01-01
12007-022890002007-02-01
22007-032874002007-03-01
32007-042861002007-04-01
42007-052840002007-05-01
data1.tail()
YYYYMMvaluedate
1352018-042522002018-04-01
1362018-052540002018-05-01
1372018-062548002018-06-01
1382018-072549002018-07-01
1392018-082549002018-08-01
data1.shape
(140, 3)
data1[['year','month','day']] = data1.date.apply(lambda x: pd.Series(
    x.strftime("%Y,%m,%d").split(","))) # split date into year, month, day
data1.head()
YYYYMMvaluedateyearmonthday
02007-012910002007-01-0120070101
12007-022890002007-02-0120070201
22007-032874002007-03-0120070301
32007-042861002007-04-0120070401
42007-052840002007-05-0120070501
grpby_data1 = data1.groupby(['year']).mean()
type(grpby_data1)
pandas.core.frame.DataFrame

5) We will Create a graph showing how average home prices have changed since they bought their home.

grpby_data1.reset_index(inplace=True)
grpby_data1.head()
yearvalue
02007276616.666667
12008221875.000000
22009188391.666667
32010176216.666667
42011154766.666667
grpby_data1.value
0     276616.666667
1     221875.000000
2     188391.666667
3     176216.666667
4     154766.666667
5     145158.333333
6     163741.666667
7     183975.000000
8     197133.333333
9     229566.666667
10    235966.666667
11    252025.000000
Name: value, dtype: float64
grpby_data1.year
0     2007
1     2008
2     2009
3     2010
4     2011
5     2012
6     2013
7     2014
8     2015
9     2016
10    2017
11    2018
Name: year, dtype: object
plt.plot(grpby_data1.year, grpby_data1.value)
plt.title("average home prices (2007-2018)")
plt.xlabel("date")
plt.ylabel("average house price")
Text(0, 0.5, 'average house price')
<Figure size 640x480 with 1 Axes>

7) Determine an appropriate selling price based on home sales trends as follows:

a) Determine the current average selling price and the average selling price when they bought their home. Divide the current average price by the beginning average price to see how much homes in their ZIP Code have appreciated or depreciated. When Mark and Lisa bought their home in December of 2007, 3-bedroom homes were selling for \$276,617.

price_initial = grpby_data1.iloc[0]
price_initial
year              2007
value    276616.666667
Name: 0, dtype: object
price_current = grpby_data1.iloc[-1]
price_current
year         2018
value    252025.0
Name: 11, dtype: object
house_worth = price_current.value / price_initial.value
house_worth
0.9110983912755316

This indicates that homes in Crestline are only worth 91 percent of what they were at the end of 2007.

b) We can get a rough estimate of what their home is worth by summing what they paid for their home plus what they invested in it, and multiplying that sum by the ratio computed above. Mark and Lisa, for example, paid \$291,000 in 2007 and invested \$100,000 in solid improvements (new kitchen, major landscaping, hardwood flooring, and so on). Multiplying (\$291,000 + \$100,000) by 0.91 gives a rough suggested selling price of \$343,134.

(price_initial.value + 100000)*house_worth
343134.83912755316

Get additional information about the local real estate market

If their home is part of a seller's market, they are more likely to get their asking price.

1) Download the Buyer-Seller Index data at the ZIP Code level from www.zillow.com/research/data. In Home Listings and Sales select data type as seller-buyer index and geography as zip codes.

2) Open the .csv file using Excel. Zillow reports ZIP Codes as numbers. We will need to pad the ZIP Code numbers with leading zeros so the Zillow data will link to the ArcGIS ZIP Code geometry.

Follow these steps:

a) Sort the RegionName column from smallest to largest so we will be able to see how the formula below works.

b) Name a new column in the Excel table zipstring.

c) In the first cell of the new column, enter the formula to pad each RegionName value with leading zeros, keeping the rightmost five characters: =RIGHT(CONCAT("00000",B2),5)

d) Drag the Autofill handle down to the last data cell in the column.

3) Load the excel file for analysis

file_name2 = datapath[0]
data2 = pd.read_excel(file_name2)
data2.head()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
0Zip1001MASpringfield, MA82738322.91666781.06.1904769.1850831001
1Zip1002MASpringfield, MA66848329.78723494.010.0000009.1850831002
2Zip1007MASpringfield, MA90888316.09195483.02.8571439.1850831007
3Zip1013MASpringfield, MA70618331.14754176.58.3333339.1850831013
4Zip1020MASpringfield, MA51728325.00000073.03.8095249.1850831020
data2.dtypes
RegionType                object
RegionName                 int64
State                     object
CBSA Title                object
SizeRankCity               int64
SizeRankMetro              int64
PctPriceCut              float64
DaysOnMarket             float64
BuyerSellerIndex         float64
BuyerSellerIndexMetro    float64
zipstring                  int64
dtype: object
data2.columns
Index(['RegionType', 'RegionName', 'State', 'CBSA Title', 'SizeRankCity',
       'SizeRankMetro', 'PctPriceCut', 'DaysOnMarket', 'BuyerSellerIndex',
       'BuyerSellerIndexMetro', 'zipstring'],
      dtype='object')

4) Select BuyerSellerIndex, DaysOnMarket, zipstring fields

cols = ['BuyerSellerIndex', 'DaysOnMarket', 'zipstring']
selected_data2 = data2[cols]

5) Explore the data values as follows:

a) Sort on the DaysOnMarket field and notice the range. For the data Mark and Lisa downloaded, the range is 35 to 294.5 days.

days_on_mrkt_df = selected_data2.sort_values(by='DaysOnMarket', axis=0)
days_on_mrkt_df.DaysOnMarket.min()
35.0
days_on_mrkt_df.DaysOnMarket.max()
294.5
days_on_mrkt_df.head()
BuyerSellerIndexDaysOnMarketzipstring
74201.12000035.098107
74344.96000035.098136
70970.52631635.094022
74260.64000035.098117
73990.08000035.098043
days_on_mrkt_df.tail()
BuyerSellerIndexDaysOnMarketzipstring
8009.006849252.08752
117710.000000254.012428
11418.938356261.511963
11899.160959288.012545
7536.000000294.58403

b) Sort on the BuyerSellerIndex field and notice the range of values. ZIP Codes with index values near 0 are part of a strong seller's market; ZIP Codes with index values near 10 are part of a strong buyer's market.

buyer_seller_df = data2.sort_values(by='BuyerSellerIndex', axis=0)
buyer_seller_df.head()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
517Zip7063NJNew York-Newark-Jersey City, NY-NJ-PA1042314.68750090.00.0171239.5856357063
4546Zip53168WIChicago-Naperville-Elgin, IL-IN-WI1050539.75609859.50.0332239.48895053168
485Zip7004NJNew York-Newark-Jersey City, NY-NJ-PA1084617.89473792.00.0342479.5856357004
6572Zip90025CALos Angeles-Long Beach-Anaheim, CA212226.45161346.00.0363644.18508390025
1609Zip19152PAPhiladelphia-Camden-Wilmington, PA-NJ-DE-MD492756.81818267.00.0416679.15055219152
buyer_seller_df.tail()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
6091Zip80016CODenver-Aurora-Lakewood, CO26582127.22689162.010.04.50276280016
4598Zip53954WIMadison, WI107738721.87500075.510.03.52209953954
262Zip3049NHManchester-Nashua, NH1069313128.78787974.010.03.5773483049
7321Zip97209ORPortland-Vancouver-Hillsboro, OR-WA47422326.25899378.510.05.09668597209
1224Zip13104NYSyracuse, NY89167928.32369989.010.09.03314913104
buyer_seller_df.BuyerSellerIndex.min()
0.017123288
buyer_seller_df.BuyerSellerIndex.max()
10.0

c) Filter the data to only show their home's ZIP code

selected_data2[selected_data2['zipstring'] == 92325]
BuyerSellerIndexDaysOnMarketzipstring
68889.62264279.092325

Notice the average number of days. Determine if the home is part of a buyer's or seller's market. Mark and Lisa learn that their home is part of a buyer's market (9.6), and they can expect their home to be on the market approximately 79 days before it sells.

selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)
C:\Users\pri10421\AppData\Local\Temp\ipykernel_17192\3169181311.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)
selected_data2.shape
(7548, 3)
selected_data2.dtypes
BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int64
dtype: object
selected_data2 = selected_data2.astype({"ZIP_CODE": int})
selected_data2.dtypes
BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int32
dtype: object

6) Search for the United States ZIP Code Boundaries 2017 layer. We can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True

items = gis.content.search('United States ZIP Code Boundaries 2021 owner: esri_dm',
                           outside_org=True)

Display the list of results.

from IPython.display import display

for item in items:
    display(item)
USA ZIP Code Boundaries
U.S. ZIP Code Boundaries provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.Feature Layer Collection by esri_dm
Last Modified: October 14, 2022
0 comments, 215574910 views
United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.Feature Layer Collection by esri_dm
Last Modified: September 14, 2022
2 comments, 27023 views
United States County Boundaries 2018
This layer shows the County level boundaries of United States in 2018. The boundaries are optimized to improve Data Enrichment analysis performance.Feature Layer Collection by esri_dm
Last Modified: August 17, 2021
0 comments, 16998 views
United States State Boundaries 2018
This layer shows the State level boundaries of United States in 2018. The boundaries are optimized to improve Data Enrichment analysis performance.Feature Layer Collection by esri_dm
Last Modified: August 17, 2021
1 comments, 1978667 views
USA ZIP Code Areas
U.S. ZIP Code Areas provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.Layer Package by esri_dm
Last Modified: October 15, 2022
3 comments, 193024 views
USA ZIP Code Three-Digit Areas
USA ZIP Code Three-Digit Areas provides the three-digit ZIP Code areas in the United States.Layer Package by esri_dm
Last Modified: December 08, 2022
0 comments, 22135 views
United States Tract Boundaries 2021
This layer shows the Tract level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.Feature Layer Collection by esri_dm
Last Modified: September 14, 2022
0 comments, 390 views
United States County Boundaries 2021
This layer shows the County level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.Feature Layer Collection by esri_dm
Last Modified: September 14, 2022
0 comments, 7412 views
United States ZIP Code Boundaries 2018
This layer shows the ZIP Code level boundaries of United States in 2018. The boundaries are optimized to improve Data Enrichment analysis performance.Feature Layer Collection by esri_dm
Last Modified: August 17, 2021
3 comments, 214544 views
United States Boundaries 2021
United States Boundaries 2021 provides boundaries for several layers of administrative divisions.Feature Layer Collection by esri_dm
Last Modified: September 14, 2022
0 comments, 4103 views

Select the desired item from the list.

us_zip = items[1]
us_zip
United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.Feature Layer Collection by esri_dm
Last Modified: September 14, 2022
2 comments, 27023 views

Get the layer names from the item

for lyr in us_zip.layers:
    print(lyr.properties.name)
USA_Country
USA_State
USA_County
USA_ZipCode
USA_Tract
USA_BlockGroup

7) We want to merge the zip_code layer with data2 to visualize the result on the map.

us_zip_lyr = us_zip.layers[3]
zip_df = pd.DataFrame.spatial.from_layer(us_zip_lyr)
zip_df.head()
OBJECTIDPOPULATIONPO_NAMESHAPESQMISTATEShape__AreaShape__LengthZIP_CODE
01<NA>N Dillingham Census Area{"rings": [[[-160.431152, 58.689351], [-160.43...16019.53AK6.65714124.67745400001
12<NA>Yukon Flats Nat Wildlife{"rings": [[[-160.038452, 61.947605], [-160.03...95862.85AK48.948815131.7764500002
23<NA>Alaska Peninsula NWR{"rings": [[[-159.900745, 56.439047], [-159.90...14572.9AK5.65540541.56416500003
34<NA>W Kenai Peninsula Borough{"rings": [[[-154.748861, 59.259518], [-154.70...6510.85AK2.72876420.55320300004
45<NA>N Lake and Peninsula Borough{"rings": [[[-156.0002144, 60.9074352], [-155....3760.07AK1.5937229.57168400005
zip_df.shape
(32201, 9)
zip_df.dtypes
OBJECTID            Int64
POPULATION          Int32
PO_NAME            string
SHAPE            geometry
SQMI              Float64
STATE              string
Shape__Area       Float64
Shape__Length     Float64
ZIP_CODE           string
dtype: object
zip_df = zip_df.astype({"ZIP_CODE": int})
zip_df.dtypes
OBJECTID            Int64
POPULATION          Int32
PO_NAME            string
SHAPE            geometry
SQMI              Float64
STATE              string
Shape__Area       Float64
Shape__Length     Float64
ZIP_CODE            int32
dtype: object
merged_df = pd.merge(zip_df, selected_data2, on='ZIP_CODE')
merged_df.shape
(7548, 11)
merged_df.spatial.set_geometry('SHAPE')
mergd_lyr = gis.content.import_data(merged_df,
                                    title='MergedLayer',
                                    tags='datascience')

When arcpy is present, the import_data will upload the local SeDF (Spatially Enabled DataFrame) as a FGDB (File geodatabase) to your organization, and publish to a hosted feature layer; On the other hand, when arcpy is not present, then the import_data method would have the local SeDF upload to your organization as a shapefile, and then publish as a hosted Feature Layer. This minor difference will result in column/property name differences from what's defined in the original SeDF.

The has_arcpy flag is to be used in determine which naming convention the newly created Feature Layer would be conforming to, when we are adding the Feature Layer for display based on variables.

8) Create a map of the BuyerSellerIndex field using the following steps:

Visualize Results

m1 = gis.map('United States', 8)
m1
cur_field_name = "BuyerSellerIndex"
if has_arcpy:
    if cur_field_name not in mergd_lyr.layers[0].properties.fields:
        cur_field_name = "buyer_seller_index"
else:
    cur_field_name = "BuyerSelle"
m1.add_layer(mergd_lyr, {"renderer":"ClassedColorRenderer",
                         "field_name":cur_field_name,
                         "opacity":0.7
                  })

9) Create a map on DaysOnMarket field as follows:

m2 = gis.map('Redlands, CA')
m2
cur_field_name = "DaysOnMarket"
if has_arcpy:
    if cur_field_name not in mergd_lyr.layers[0].properties.fields:
        cur_field_name = "days_on_market"
else:
    cur_field_name = "DaysOnMark"
m2.add_layer(mergd_lyr, {"renderer":"ClassedSizeRenderer",
                         "field_name":cur_field_name,
                         "opacity":0.7
              })

House Hunting

Find all ZIP Codes within a specified drive time of important places

1) Create an Excel table with columns for Street, City, State, and Zip. Add addresses for the locations you want to access from your new home. Mark and Lisa's table below has their current job addresses. They named their Excel file ImportantPlaces.xlsx and the Excel sheet WorkLocations.

2) Load the excel file for analysis

file_name3 = datapath[2]
data3 = pd.read_excel(file_name3)
data3.head()
PlaceStreetCityStateZip
0Lisa's job380 New York StreetRedlandsCA92373
1Mark's job4511 E Guasti RoadOntarioCA91761
data3['Address'] = data3['Street'] + ' ' + data3['City'] + ' ' + data3['State']
data3['Address']
0    380 New York Street Redlands CA
1      4511 E Guasti Road Ontario CA
Name: Address, dtype: object

3) Draw the address on map

m3 = gis.map('Redlands, CA', 10)
m3
from arcgis.geocoding import geocode
data3_addr1 = geocode(data3.Address[0])[0]
popup = { 
    "title" : "Lisa's job", 
    "content" : data3_addr1['address']
    }
m3.draw(data3_addr1['location'], popup,
        symbol = {"angle":0,"xoffset":0,"yoffset":0,
                  "type":"esriPMS", "url":"https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
                  "contentType":"image/png","width":24,"height":24})
from arcgis.geocoding import geocode
data3_addr2 = geocode(data3.Address[1])[0]
popup = { 
    "title" : "Mark's job", 
    "content" : data3_addr2['address']
    }
m3.draw(data3_addr2['location'], popup,
        symbol = {"angle":0,"xoffset":0,"yoffset":0,
                  "type":"esriPMS", "url":"https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
                  "contentType":"image/png","width":24,"height":24})

4) We will Create buffer and enter the maximum time they are willing to spend commuting from their new home to their work.

from arcgis.geoenrichment import BufferStudyArea, enrich
marks = BufferStudyArea(area='4511 E Guasti Road, Ontario, CA 91761', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')
lisas = BufferStudyArea(area='380 New York St Redlands CA 92373', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')

drive_time_df = enrich(study_areas=[marks, lisas], data_collections=['Age'])
drive_time_lyr = gis.content.import_data(drive_time_df,
                                         title="DriveTimeLayer")

5) Select the Dissolve buffer style to merge overlapping polygons.

from arcgis.features.manage_data import dissolve_boundaries
dissolved_lyr = dissolve_boundaries(drive_time_lyr)
m_3 = gis.map('Redlands, CA', 9)
m_3
m_3.add_layer(dissolved_lyr)
m_3.draw(data3_addr1['location'], popup,
        symbol = {"angle":0,"xoffset":0,"yoffset":0,
                  "type":"esriPMS", "url":"https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
                  "contentType":"image/png","width":24,"height":24})
m_3.draw(data3_addr2['location'], popup,
        symbol = {"angle":0,"xoffset":0,"yoffset":0,
                  "type":"esriPMS", "url":"https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
                  "contentType":"image/png","width":24,"height":24})

Map market health, home values, and projected appreciation

1) Download and prepare the Excel Market Health data as follows:

a) From www.zillow.com/reserach/data, download ZIP Code level Market Health Index data.

b) Open the .csv file using Excel and add the ZIPString column as a text field. Compute the values using =RIGHT(CONCAT("00000",B2),5). Drag the Autofill handle down to the last cell in the column to create text formatted ZIP Code values with leading zeros.

c) Save the file as an Excel workbook. Close Excel.

Mark and Lisa named their file MarketHealth.xlsx.

2) Load th Excel file for analysis.

file_name4 = datapath[3]
data4 = pd.read_excel(file_name4)
data4
RegionTypeRegionNameCityStateMetroCBSATitleSizeRankMarketHealthIndexSellForGainPrevForeclosed...ZHVIMoMYoYForecastYoYPctChangeStockOfREOsNegativeEquityDelinquencyDaysOnMarketUnnamed: 19zipstring
0Zip1001AgawamMASpringfield, MA, MASpringfield, MANaN1.62236575.000.0500...214000.00.2811625.2631580.047047NaN0.0690280.06806381.0NaN1001
1Zip1002AmherstMASpringfield, MA, MASpringfield, MANaN5.49134192.310.0000...331400.00.4851436.9722400.046192NaN0.0454150.03846287.5NaN1002
2Zip1007BelchertownMASpringfield, MA, MASpringfield, MANaN4.664384100.000.0000...277400.00.3254974.6792450.054387NaN0.0463030.04838794.5NaN1007
3Zip1008BlandfordMASpringfield, MA, MASpringfield, MANaN2.5412810.00NaN...224000.00.2237148.8435370.061817NaN0.0601090.09090973.0NaN1008
4Zip1010BrimfieldMASpringfield, MA, MASpringfield, MANaN3.103101100.00NaN...255700.00.4715132.6083470.060555NaN0.0667780.07500073.0NaN1010
..................................................................
14893Zip75033FriscoTXDallas-Fort Worth, TX, TXDallas-Fort Worth, TXNaN9.095852100.000.0000...395800.0-0.0757381.0467190.066809NaNNaNNaN74.0NaN75033
14894Zip84009South JordanUTSalt Lake City, UT, UTSalt Lake City, UTNaN6.135052NaN0.0000...409200.00.5405419.1200000.040208NaNNaNNaN48.0NaN84009
14895Zip97003BeavertonORPortland, OR, ORPortland, ORNaN6.806954100.000.0000...346900.00.1732605.3766710.037766NaNNaNNaN44.0NaN97003
14896Zip97703BendORBend, OR, ORBend, ORNaN6.566653100.00NaN...590700.00.2886256.489995-0.011788NaNNaNNaN77.0NaN97703
14897Zip29486SummervilleSCCharleston, SC, SCCharleston, SCNaN6.81500991.300.0196...192200.00.4179737.3143500.019964NaNNaNNaN83.5NaN29486

14898 rows × 21 columns

3) Select City, MarketHealthIndex, ZHVI, ForecastYoYPctChange, and zipstring fields.

col = ['City', 'MarketHealthIndex', 'ZHVI', 'ForecastYoYPctChange', 'zipstring']
matket_health_index = data4[col]
matket_health_index.head()
CityMarketHealthIndexZHVIForecastYoYPctChangezipstring
0Agawam1.622365214000.00.0470471001
1Amherst5.491341331400.00.0461921002
2Belchertown4.664384277400.00.0543871007
3Blandford2.541281224000.00.0618171008
4Brimfield3.103101255700.00.0605551010
matket_health_index.dtypes
City                     object
MarketHealthIndex       float64
ZHVI                    float64
ForecastYoYPctChange    float64
zipstring                 int64
dtype: object
matket_health_index['MarketHealthIndex'].min()
0.000671231
matket_health_index['MarketHealthIndex'].max()
10.0
matket_health_index.rename(columns={"zipstring": "ZIP_CODE"},
                           inplace=True)
C:\Users\shi10484\AppData\Local\ESRI\conda\envs\dl_testing2\lib\site-packages\pandas\core\frame.py:4301: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
matket_health_index[matket_health_index['City']=='Crestline']
CityMarketHealthIndexZHVIForecastYoYPctChangeZIP_CODE
13351Crestline9.944288205600.00.06808492322
13353Crestline2.882937228900.00.06729692325

4) Sort the table on the ZIP_CODE field so we can locate their ZIP Code. Make a note of the values for MarketHealthIndex, ZHVI, and ForecastYoYPctChange. In Crestline, for example, the market health index is fair: 6.4 on a scale that ranges from 0 to 10. The median home value for all homes (not just 3-bedroom homes) is $214,100. Homes are expected to appreciate 4.8 percent.

zip_df.head()
OBJECTIDPOPULATIONPO_NAMESHAPESQMISTATEShape__AreaShape__LengthZIP_CODE
01-99N Dillingham Census Area{"rings": [[[-160.186183929443, 58.82004642486...16279.47AK6.76504824.6029211
12-99Yukon Flats Nat Wildlife{"rings": [[[-159.971336364746, 64.42843627929...95704.72AK48.867324130.9445742
23-99Alaska Peninsula NWR{"rings": [[[-159.347519999648, 55.77196200034...14491.70AK5.62272141.4431073
34-99W Kenai Peninsula Boroug{"rings": [[[-153.309794000393, 58.85487400023...6568.13AK2.75154620.4609704
45-99N Lake and Peninsula Bor{"rings": [[[-153.436194999999, 60.90853799962...3713.14AK1.5737909.4747105
zip_df = zip_df.astype({"ZIP_CODE": int})
health_df = pd.merge(zip_df, matket_health_index, on='ZIP_CODE')
health_df.head()
OBJECTIDPOPULATIONPO_NAMESHAPESQMISTATEShape__AreaShape__LengthZIP_CODECityMarketHealthIndexZHVIForecastYoYPctChange
024117332Agawam{'rings': [[[-72.6304370002673, 42.09945499964...12.08MA0.0034040.3172091001Agawam1.622365214000.00.047047
124229871Amherst{'rings': [[[-72.4439000000849, 42.42276600025...58.03MA0.0164250.9263671002Amherst5.491341331400.00.046192
224515242Belchertown{'rings': [[[-72.4083589996681, 42.35153699983...55.85MA0.0157860.6986731007Belchertown4.664384277400.00.054387
32461749Blandford{'rings': [[[-72.9898204997162, 42.24787710014...60.62MA0.0171120.6761751008Blandford2.541281224000.00.061817
42474398Brimfield{'rings': [[[-72.2559969998616, 42.18105099970...37.28MA0.0105150.5873401010Brimfield3.103101255700.00.060555
health_df.shape
(14890, 13)
health_df.spatial.set_geometry('SHAPE')
hlth_lyr = gis.content.import_data(health_df,
                                  title="MarketHealthLayer")
m4 = gis.map('United States', 5)
m4
cur_field_name = "MarketHealthIndex"
if cur_field_name not in hlth_lyr.layers[0].properties.fields:
    if has_arcpy:
        cur_field_name = "market_health_index"
    else:
        cur_field_name = "MarketHeal"
m4.add_layer(hlth_lyr, {"renderer":"ClassedColorRenderer",
                        "field_name":cur_field_name,
                        "classificationMethod":'quantile',
                        "opacity":0.7
                        })
market_hlth_lyr = hlth_lyr.layers[0]
from arcgis.features.find_locations import find_centroids
poly_to_point = find_centroids(market_hlth_lyr, output_name="HealthLyrPolygonToPoint" + str(dt.now().microsecond))
from  arcgis.features.manage_data import overlay_layers
zip_intersect = overlay_layers(drive_time_lyr, 
                               market_hlth_lyr, 
                               output_name="Market Health Data Within drive time Buffer" + str(dt.now().microsecond))
zip_intersect
Market Health Data Within drive time Buffer556026
Feature Layer Collection by arcgis_python
Last Modified: March 16, 2021
0 comments, 0 views

5) Notice how many ZIP Codes intersect the drive time buffer.

zip_hlth_intersect = zip_intersect.layers[0]
overlay_df = pd.DataFrame.spatial.from_layer(zip_hlth_intersect)
overlay_df.shape
(326, 66)
m5 = gis.map('Redlands, CA', 9)
m5
m5.add_layer(hlth_lyr, {"renderer":"ClassedColorRenderer",
                        "field_name":cur_field_name,
                        "classificationMethod":'quantile',
                        "opacity":0.7
                        })
m5.add_layer(drive_time_lyr)

The larger the index, the darker the color, and the healthier the housing market is. They want to buy their new home in an area with a healthy housing market (rather than a location where there are vacancies, homes that aren't selling, and numerous foreclosures).

This result has all the variables one should be interested in mapping, narrowed down to the ZIP Codes that are within an acceptable drive time to their work.

m6 = gis.map('Redlands, CA', 9)
m6
m6.add_layer(hlth_lyr, { "renderer":"ClassedColorRenderer",
                         "field_name":"ZHVI",
                         "classificationMethod":'quantile',
                         "opacity":0.7
              })
m6.add_layer(drive_time_lyr)

The dark ZIP Codes have the most expensive average home value estimates.

Similarly plot for the field ForecastYoYPctChange

m7 = gis.map('Redlands, CA', 9)
m7
cur_field_name2 = "ForecastYoYPctChange"
if cur_field_name2 not in hlth_lyr.layers[0].properties.fields:
    if has_arcpy:
        cur_field_name2 = "forecast_yo_y_pct_change"
    else:
        cur_field_name2 = "ForecastYo"
m7.add_layer(hlth_lyr, {"renderer":"ClassedColorRenderer",
                        "field_name":cur_field_name2,
                        "classificationMethod":'quantile',
                        "opacity":0.7
              })
m7.add_layer(drive_time_lyr)

The darkest ZIP Codes are expected to have the largest increase in home values over the next year.

Find the best cities to begin house hunting

1) Narrow the search to ZIP Codes with home values in price range as follows:

Mark and Lisa will only sell their home for a loss if they can purchase a home valued the same as theirs with better investment potential. They will look for homes that cost about \$340,000 in neighborhoods with median home values between \$300,000 and \$500,000. Being surrounded by homes that are valued higher than theirs is much better than being the most expensive home on the block. That's why Mark and Lisa extended their search to ZIP Codes with median home values as high as \$500,000.

They also want to narrow their search to areas belonging to healthy housing markets with expected home value appreciation. Mark and Lisa, for example, exclude ZIP Codes in unhealthy housing markets by filtering for MarketHealthIndex > 8. They also focus on ZIP Codes with expected home appreciation higher than Crestline by filtering for ForecastYoYPctChange > 0.060. Set filters to reflect your own criteria for the MarketHealthIndex and ForecastYoYPctChange fields.

query_str = '((ZHVI > 350000) AND (ZHVI < 600000) AND (' + cur_field_name + ' > 8) AND (' + cur_field_name2 + '> 0.06)) AND (1=1)'

zip_hlth_intersect_df = zip_hlth_intersect.query(where=query_str).sdf
zip_hlth_intersect_df
OBJECTIDFID_AC9111_A7E54F7Didsource_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radii...statezip_codecitymarket_health_indexzhviforecast_yo_y_pct_changeShape__Area_1Shape__Length_1AnalysisAreaSHAPE
0610US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA90606West Whittier-Los Nietos8.2722514855000.0723751.417656e+0722316.3230332.240247{"rings": [[[-13143194.2534, 4028915.1404], [-...
1910US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA90640Montebello8.1675395385000.0618073.105525e+0734167.8786351.995545{"rings": [[[-13143598.3654, 4032788.2902], [-...
21910US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA91702Azusa9.1381394576000.0863002.658000e+08142848.23483922.824592{"rings": [[[-13123459.7947, 4048654.942], [-1...
32010US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA91706Baldwin Park9.2153314603000.0881735.812360e+0745215.15646639.756145{"rings": [[[-13129357.7083, 4047200.0046], [-...
42910US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA91732El Monte9.2032495065000.0652991.813936e+0725827.62055812.414794{"rings": [[[-13135851.7534, 4040662.4451], [-...
53010US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA91733South El Monte9.6153855017000.0932552.645891e+0727366.70430617.640998{"rings": [[[-13139099.6109, 4037245.849], [-1...
617910US-117.55283034.064356NetworkServiceAreaMinutesDrive Time Minutes45...CA91752Eastvale8.2709095038000.0645085.733285e+0737863.19770539.309396{"rings": [[[-13082747.7254, 4033299.2511], [-...
718021US-117.19487234.057237NetworkServiceAreaMinutesDrive Time Minutes45...CA91752Eastvale8.2709095038000.0645085.733285e+0737863.19770539.309396{"rings": [[[-13082747.7254, 4033299.2511], [-...

8 rows × 66 columns

m9 = gis.map('United States')
m9
m9.add_layer(zip_hlth_intersect,
             {"definition_expression": query_str,
             "classificationMethod":'quantile'})
m9.zoom_to_layer(zip_hlth_intersect)

Notice that when the filter is applied, the map changes. ZIP Codes that don't meet the filter criteria are removed from the map and the colors change to reflect the recomputed Quantile classification.

Conclusion

The results show possible cities and ZIP Codes where they can explore homes for sale. We can use real estate websites such as Zillow.com or Realtor.com to see if they can find a home in their price range with the characteristics and qualities they're are looking for.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.