Finding a New Home¶
Buying or selling of a house can be a very stressful event in one's life. The process could be frustrating as it is lengthy, uncertain and needs a lot of examination. Through this workflow we will guide a couple (Mark and Lisa) who is interested in selling their home and relocating to a place nearest to both of their work places. In this case study, we will explore the current housing market, estimate average house prices in their area and hunt for a new one. We will download the Zillow data for their current home for our analysis. You can use your own data and follow along with this workflow which aims to help Mark and Lisa in finding their new home.
The notebook is divided into two parts. In the first part, we will calculate the following:
- Percentage of decrease/increase in house price since Mark and Lisa bought their home.
- Suggested selling price for their home.
- Whether their zip code is a buyer’s market or seller’s market.
- Average number of days it takes for homes to sell in their neighbourhood.
In the second part of the notebook, we will explore the investment potential of homes close to their work places. Based on how much a person is willing to spend commuting to work, we will create a drive-time buffer. This will narrow down the search areas. Zillow also provides data for market health and projected home value appreciation. Visualizing the zip codes by their market health will help them focus only on areas with good market health. Hence they will get a list of areas to choose from, for buying their new home.
Table of Contents
Selling your home¶
Determine an appropriate selling price¶
1) Download home sales time series data from Zillow at www.zillow.com/research/data.
Mark and Lisa have a 3-bedroom home, so we will select the ZHVI 3-Bedroom time-series ($) data set at the ZIP Code level.
2) Prepare the Excel data as follows:
a) Using Excel, open the .csv file.
Notice that the RegionName field has ZIP Codes as numbers (if we sort the RegionName field we will notice the ZIP Codes for Massachusetts, for example, don't have leading zeros; 01001 is 1001). Also, notice the median home value columns are named using the year and month. The first data available is for April 1996 (1996-04). b) Copy all the column headings and the one record with data for their ZIP Code to a new Excel sheet.
Apply a filter to the RegionName field. Mark and Lisa live in Crestline, California, so we will apply a filter for the 92325 ZIP Code.
c) Select (highlight) fields starting with the month and year when they bought their home and continuing to the last month and year column in the Excel table. So, for example, since Mark and Lisa bought their home in December 2007, they highlight the the two rows from column 2007-01 to column 2018-08.
d) Copy (press Ctrl+C) the selected data and paste it, along with the column headings, to a new Excel sheet using Paste Transposed (right-click in the first cell of the new sheet to see the paste options; select Paste Transposed). This gives two columns of data.
e) The first column has date values but only includes the year and month. In column C, create a proper date field.
- Right-click column C and format the cells to be category date.
- In the first cell of column C, enter the following formula: = DATEVALUE(CONCATENATE(A1, "-01"))
- Drag the Autofill handle down to the last data cell in the column.
f) Insert a top row and type the column headings:
YYYYMM, Value, and date.
g) Rename the Excel sheet (probably called Sheet2 at present) something like
AveSellingPrice and delete the other sheets (the first sheet contains a large amount of data that we won't be using further in the workflow). Mark and Lisa named their price Excel sheet CrestlineAveSellingPrice.
h) Save this new sheet as an Excel workbook.
Mark and Lisa named their Excel file Crestline3BdrmAveSellingPrice.xlsx.
3) Connect to your ArcGIS Online organization.
from arcgis.gis import GIS
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
gis = GIS("home')
4) Load the excel file for analysis.
file_name1 = r'/arcgis/home/data/finding_a_new_home/Crestline3BdrmAveSellingPrice.xlsx'
data1 = pd.read_excel(file_name1)
data1.head()
data1.tail()
data1.shape
data1[['year','month','day']] = data1.date.apply(lambda x: pd.Series(
x.strftime("%Y,%m,%d").split(","))) # split date into year, month, day
data1.head()
grpby_data1 = data1.groupby(['year']).mean()
type(grpby_data1)
5) We will Create a graph showing how average home prices have changed since they bought their home.
grpby_data1.reset_index(inplace=True)
grpby_data1.head()
grpby_data1.value
grpby_data1.year
plt.plot(grpby_data1.year, grpby_data1.value)
plt.title("average home prices (2007-2018)")
plt.xlabel("date")
plt.ylabel("average house price")
7) Determine an appropriate selling price based on home sales trends as follows:
a) Determine the current average selling price and the average selling price when they bought their home. Divide the current average price by the beginning average price to see how much homes in their ZIP Code have appreciated or depreciated. When Mark and Lisa bought their home in December of 2007, 3-bedroom homes were selling for \$276,617.
price_initial = grpby_data1.iloc[0]
price_initial
price_current = grpby_data1.iloc[-1]
price_current
house_worth = price_current.value / price_initial.value
house_worth
This indicates that homes in Crestline are only worth 91 percent of what they were at the end of 2007.
b) We can get a rough estimate of what their home is worth by summing what they paid for their home plus what they invested in it, and multiplying that sum by the ratio computed above. Mark and Lisa, for example, paid \$291,000 in 2007 and invested \\$100,000 in solid improvements (new kitchen, major landscaping, hardwood flooring, and so on). Multiplying (\$291,000 + \\$100,000) by 0.91 gives a rough suggested selling price of \$343,134.
(price_initial.value + 100000)*house_worth
Get additional information about the local real estate market¶
If their home is part of a seller's market, they are more likely to get their asking price.
1) Download the Buyer-Seller Index data at the ZIP Code level from www.zillow.com/research/data. In Home Listings and Sales select data type as seller-buyer index and geography as zip codes.
2) Open the .csv file using Excel. Zillow reports ZIP Codes as numbers. We will need to pad the ZIP Code numbers with leading zeros so the Zillow data will link to the ArcGIS ZIP Code geometry.
Follow these steps:
a) Sort the RegionName column from smallest to largest so we will be able to see how the formula below works.
b) Name a new column in the Excel table zipstring.
c) In the first cell of the new column, enter the formula to pad each RegionName value with leading zeros, keeping the rightmost five characters: =RIGHT(CONCAT("00000",B2),5)
d) Drag the Autofill handle down to the last data cell in the column.
3) Load the excel file for analysis
file_name2 = r'/arcgis/home/data/finding_a_new_home/BuyerSellerIndex.xlsx'
data2 = pd.read_excel(file_name2)
data2.head()
data2.dtypes
data2.columns
4) Select BuyerSellerIndex, DaysOnMarket, zipstring fields
cols = ['BuyerSellerIndex', 'DaysOnMarket', 'zipstring']
selected_data2 = data2[cols]
5) Explore the data values as follows:
a) Sort on the DaysOnMarket field and notice the range. For the data Mark and Lisa downloaded, the range is 35 to 294.5 days.
days_on_mrkt_df = selected_data2.sort_values(by='DaysOnMarket', axis=0)
days_on_mrkt_df.DaysOnMarket.min()
days_on_mrkt_df.DaysOnMarket.max()
days_on_mrkt_df.head()
days_on_mrkt_df.tail()
b) Sort on the BuyerSellerIndex field and notice the range of values. ZIP Codes with index values near 0 are part of a strong seller's market; ZIP Codes with index values near 10 are part of a strong buyer's market.
buyer_seller_df = data2.sort_values(by='BuyerSellerIndex', axis=0)
buyer_seller_df.head()
buyer_seller_df.tail()
buyer_seller_df.BuyerSellerIndex.min()
buyer_seller_df.BuyerSellerIndex.max()
c) Filter the data to only show their home's ZIP code
selected_data2[selected_data2['zipstring'] == 92325]
Notice the average number of days. Determine if the home is part of a buyer's or seller's market. Mark and Lisa learn that their home is part of a buyer's market (9.6), and they can expect their home to be on the market approximately 79 days before it sells.
selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)
selected_data2.shape
selected_data2.dtypes
selected_data2 = selected_data2.astype({"ZIP_CODE": int})
selected_data2.dtypes
6) Search for the United States ZIP Code Boundaries 2017 layer. We can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True
items = gis.content.search('United States ZIP Code Boundaries 2017 owner: esri_dm',
outside_org=True)
Display the list of results.
from IPython.display import display
for item in items:
display(item)
Select the desired item from the list.
us_zip = items[0]
us_zip
Get the layer names from the item
for lyr in us_zip.layers:
print(lyr.properties.name)
7) We want to merge the zip_code layer with data2 to visualize the result on the map.
us_zip_lyr = us_zip.layers[3]
zip_df = pd.DataFrame.spatial.from_layer(us_zip_lyr)
zip_df.head()
zip_df.shape
zip_df.dtypes
zip_df = zip_df.astype({"ZIP_CODE": int})
zip_df.dtypes
merged_df = pd.merge(zip_df, selected_data2, on='ZIP_CODE')
merged_df
merged_df.shape
merged_df.spatial.set_geometry('SHAPE')
mergd_lyr = gis.content.import_data(merged_df,
title='MergedLayer',
tags='datascience')
8) Create a map of the BuyerSellerIndex field using the following steps:
Visualize Results¶
m1 = gis.map('United States')
m1