Finding a New Home

Buying or selling of a house can be a very stressful event in one's life. The process could be frustrating as it is lengthy, uncertain and needs a lot of examination. Through this workflow we will guide a couple (Mark and Lisa) who is interested in selling their home and relocating to a place nearest to both of their work places. In this case study, we will explore the current housing market, estimate average house prices in their area and hunt for a new one. We will download the Zillow data for their current home for our analysis. You can use your own data and follow along with this workflow which aims to help Mark and Lisa in finding their new home.

The notebook is divided into two parts. In the first part, we will calculate the following:

Percentage of decrease/increase in house price since Mark and Lisa bought their home.
Suggested selling price for their home.
Whether their zip code is a buyer market or seller market.
Average number of days it takes for homes to sell in their neighbourhood.

In the second part of the notebook, we will explore the investment potential of homes close to their work places. Based on how much a person is willing to spend commuting to work, we will create a drive-time buffer. This will narrow down the search areas. Zillow also provides data for market health and projected home value appreciation. Visualizing the zip codes by their market health will help them focus only on areas with good market health. Hence they will get a list of areas to choose from, for buying their new home.

Selling your home

Execute the following command to install the openpyxl library if not already. This package is used to read from any Excel or CSV files.

!pip install openpyxl

Also, when matplotlib is not present, run the following command to have it installed or upgraded:

import sys 
!{sys.executable} -m pip install matplotlib

Then, connect to your ArcGIS Online organization, and import necessary libraries.

from arcgis.gis import GIS

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime as dt

gis = GIS(profile="your_online_profile")

Use the boolean has_arcpy to flag whether arcpy is present on the local environement.

has_arcpy = False

try:
    import arcpy
    has_arcpy = True
    print("arcpy present")
except:
    print("arcpy not present")

arcpy not present

Load the csv collection from ArcGIS Online for analysis, and download it as a zip file to the temporary folder. We will explain how these files created in the following sections.

data = gis.content.search('finding_a_new_home owner:api_data_owner type: csv collection', outside_org= True)[0]
data

finding_a_new_home

CSV Collection by api_data_owner
Last Modified: March 17, 2021
0 comments, 147 views

filepath = data.download(file_name=data.name)

import os
import zipfile
from pathlib import Path
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)

data_path = Path(os.path.join(os.path.splitext(filepath)[0]))
data_path

WindowsPath('C:/Users/shu12142/AppData/Local/Temp/1/finding_a_new_home')

datapath = [os.path.abspath(os.path.join(data_path, p)) for p in os.listdir(data_path)]
datapath

['C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\BuyerSellerIndex.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\Crestline3BdrmAveSellingPrice.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\ImportantPlaces.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\MarketHealthIndex.xlsx']

Determine an appropriate selling price

In this section, we will use Crestline3BdrmAveSellingPrice.xlsx for selling price analysis. The folloing steps are how we get Crestline3BdrmAveSellingPrice.xlsx prepared from open source:

Download home sales time series data from Zillow at www.zillow.com/research/data.

Mark and Lisa have a 3-bedroom home, so we will select the **ZHVI 3-Bedroom time-series ($) ** data set at the ZIP Code level.

Prepare the Excel data:

a) Using Excel, open the .csv file.

Notice that the RegionName field has ZIP Codes as numbers (if we sort the RegionName field we will notice the ZIP Codes for Massachusetts, for example, don't have leading zeros; 01001 is 1001). Also, notice the median home value columns are named using the year and month. The first data available is for April 1996 (1996-04).

b) Copy all the column headings and the one record with data for their ZIP Code to a new Excel sheet.

Apply a filter to the RegionName field. Mark and Lisa live in Crestline, California, so we will apply a filter for the 92325 ZIP Code.

c) Select (highlight) fields starting with the month and year when they bought their home and continuing to the last month and year column in the Excel table. So, for example, since Mark and Lisa bought their home in December 2007, they highlight the the two rows from column 2007-01 to column 2018-08.

d) Copy (press Ctrl+C) the selected data and paste it, along with the column headings, to a new Excel sheet using Paste Transposed (right-click in the first cell of the new sheet to see the paste options; select Paste Transposed). This gives two columns of data.

e) The first column has date values but only includes the year and month. In column C, create a proper date field.

Right-click column C and format the cells to be category date.

In the first cell of column C, enter the following formula: = DATEVALUE(CONCATENATE(A1, "-01"))

Drag the Autofill handle down to the last data cell in the column.

f) Insert a top row and type the column headings:

YYYYMM, Value, and date.

g) Rename the Excel sheet (probably called Sheet2 at present) something like AveSellingPrice and delete the other sheets (the first sheet contains a large amount of data that we won't be using further in the workflow).

Mark and Lisa named their price Excel sheet CrestlineAveSellingPrice.

h) Save this new sheet as an Excel workbook.

Mark and Lisa named their Excel file Crestline3BdrmAveSellingPrice.xlsx.

Read the Crestline3BdrmAveSellingPrice excel data from local datapath, and restructure it as a Dataframe.

file_name1 = datapath[1]
data1 = pd.pandas.read_excel(file_name1)

data1.head()

	YYYYMM	value	date
0	2007-01	291000	2007-01-01
1	2007-02	289000	2007-02-01
2	2007-03	287400	2007-03-01
3	2007-04	286100	2007-04-01
4	2007-05	284000	2007-05-01

data1.tail()

	YYYYMM	value	date
135	2018-04	252200	2018-04-01
136	2018-05	254000	2018-05-01
137	2018-06	254800	2018-06-01
138	2018-07	254900	2018-07-01
139	2018-08	254900	2018-08-01

data1.shape

(140, 3)

The following line of code adds three columns (year, month, day) by applying lambda function to each row. The lambda function creates pandas Series for each row to store the year, month, day separately.

data1[['year','month','day']] = data1.date.apply(lambda x: pd.Series(
    x.strftime("%Y,%m,%d").split(","))) # split date into year, month, day

data1.head()

	YYYYMM	value	date	year	month	day
0	2007-01	291000	2007-01-01	2007	01	01
1	2007-02	289000	2007-02-01	2007	02	01
2	2007-03	287400	2007-03-01	2007	03	01
3	2007-04	286100	2007-04-01	2007	04	01
4	2007-05	284000	2007-05-01	2007	05	01

We will also use unique() method to check the unique years, and create a new pandas Dataframe by calling groupby method that groups data1 house value column by year and performs mean operation. The new Dataframe will illustrate the mean house value of each year.

data1.year.unique()

array(['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018'], dtype=object)

grpby_data1 = data1.groupby(['year']).mean(numeric_only=True)

grpby_data1.head()

	value
year
2007	276616.666667
2008	221875.000000
2009	188391.666667
2010	176216.666667
2011	154766.666667

Create a graph using matplotlib library to show how average home prices have changed since they bought their home.

grpby_data1.reset_index(inplace=True)

grpby_data1.head()

	year	value
0	2007	276616.666667
1	2008	221875.000000
2	2009	188391.666667
3	2010	176216.666667
4	2011	154766.666667

grpby_data1.value

0     276616.666667
1     221875.000000
2     188391.666667
3     176216.666667
4     154766.666667
5     145158.333333
6     163741.666667
7     183975.000000
8     197133.333333
9     229566.666667
10    235966.666667
11    252025.000000
Name: value, dtype: float64

grpby_data1.year

0     2007
1     2008
2     2009
3     2010
4     2011
5     2012
6     2013
7     2014
8     2015
9     2016
10    2017
11    2018
Name: year, dtype: object

plt.plot(grpby_data1.year, grpby_data1.value)
plt.title("average home prices (2007-2018)")
plt.xlabel("date")
plt.ylabel("average house price")

Text(0, 0.5, 'average house price')

Determine an appropriate selling price based on home sales trends as follows:

a) Determine the current average selling price and the average selling price when they bought their home. Divide the current average price by the beginning average price to see how much homes in their ZIP Code have appreciated or depreciated. When Mark and Lisa bought their home in December of 2007, 3-bedroom homes were selling for $276,617.

price_initial = grpby_data1.iloc[0]

price_initial

year              2007
value    276616.666667
Name: 0, dtype: object

price_current = grpby_data1.iloc[-1]

price_current

year         2018
value    252025.0
Name: 11, dtype: object

house_worth = price_current.value / price_initial.value

house_worth

0.9110983912755316

This indicates that homes in Crestline are only worth 91 percent of what they were at the end of 2007.

b) We can get a rough estimate of what their home is worth by summing what they paid for their home plus what they invested in it, and multiplying that sum by the ratio computed above. Mark and Lisa, for example, paid \$291,000 in 2007 and invested \$100,000 in solid improvements (new kitchen, major landscaping, hardwood flooring, and so on). Multiplying (\$291,000 + \$100,000) by 0.91 gives a rough suggested selling price of \$343,134.

(price_initial.value + 100000) * house_worth

343134.83912755316

Get additional information about the local real estate market

If their home is part of a seller's market, they are more likely to get their asking price. In this section, BuyerSellerIndex.xlsx data is being used for local real estate market analysis. The folloing steps are how we get BuyerSellerIndex.xlsx prepared from open source data:

Download the Buyer-Seller Index data at the ZIP Code level from www.zillow.com/research/data. In Home Listings and Sales select data type as seller-buyer index and geography as zip codes.

Open the .csv file using Excel. Zillow reports ZIP Codes as numbers. We will need to pad the ZIP Code numbers with leading zeros so the Zillow data will link to the ArcGIS ZIP Code geometry.

Follow these steps:

a) Sort the RegionName column from smallest to largest so we will be able to see how the formula below works.

b) Name a new column in the Excel table zipstring.

c) In the first cell of the new column, enter the formula to pad each RegionName value with leading zeros, keeping the rightmost five characters: =RIGHT(CONCAT("00000",B2),5)

d) Drag the Autofill handle down to the last data cell in the column.

Read the BuyerSellerIndex excel data from local datapath, and restructure it as Dataframe.

file_name2 = datapath[0]
data2 = pd.read_excel(file_name2)

data2.head()

	RegionType	RegionName	State	CBSA Title	SizeRankCity	SizeRankMetro	PctPriceCut	DaysOnMarket	BuyerSellerIndex	BuyerSellerIndexMetro	zipstring
0	Zip	1001	MA	Springfield, MA	8273	83	22.916667	81.0	6.190476	9.185083	1001
1	Zip	1002	MA	Springfield, MA	6684	83	29.787234	94.0	10.000000	9.185083	1002
2	Zip	1007	MA	Springfield, MA	9088	83	16.091954	83.0	2.857143	9.185083	1007
3	Zip	1013	MA	Springfield, MA	7061	83	31.147541	76.5	8.333333	9.185083	1013
4	Zip	1020	MA	Springfield, MA	5172	83	25.000000	73.0	3.809524	9.185083	1020

data2.dtypes

RegionType                object
RegionName                 int64
State                     object
CBSA Title                object
SizeRankCity               int64
SizeRankMetro              int64
PctPriceCut              float64
DaysOnMarket             float64
BuyerSellerIndex         float64
BuyerSellerIndexMetro    float64
zipstring                  int64
dtype: object

data2.columns

Index(['RegionType', 'RegionName', 'State', 'CBSA Title', 'SizeRankCity',
       'SizeRankMetro', 'PctPriceCut', 'DaysOnMarket', 'BuyerSellerIndex',
       'BuyerSellerIndexMetro', 'zipstring'],
      dtype='object')

Select BuyerSellerIndex, DaysOnMarket, zipstring fields

cols = ['BuyerSellerIndex', 'DaysOnMarket', 'zipstring']

selected_data2 = data2[cols]

Explore the data values as follows:

a) Sort on the DaysOnMarket field and notice the range. For the data Mark and Lisa downloaded, the range is 35 to 294.5 days.

days_on_mrkt_df = selected_data2.sort_values(by='DaysOnMarket', axis=0)

days_on_mrkt_df.DaysOnMarket.min()

35.0

days_on_mrkt_df.DaysOnMarket.max()

294.5

days_on_mrkt_df.head()

	BuyerSellerIndex	DaysOnMarket	zipstring
7420	1.120000	35.0	98107
7434	4.960000	35.0	98136
7097	0.526316	35.0	94022
7426	0.640000	35.0	98117
7399	0.080000	35.0	98043

days_on_mrkt_df.tail()

	BuyerSellerIndex	DaysOnMarket	zipstring
800	9.006849	252.0	8752
1177	10.000000	254.0	12428
1141	8.938356	261.5	11963
1189	9.160959	288.0	12545
753	6.000000	294.5	8403

b) Sort on the BuyerSellerIndex field and notice the range of values. ZIP Codes with index values near 0 are part of a strong seller's market; ZIP Codes with index values near 10 are part of a strong buyer's market.

buyer_seller_df = data2.sort_values(by='BuyerSellerIndex', axis=0)

buyer_seller_df.head()

	RegionType	RegionName	State	CBSA Title	SizeRankCity	SizeRankMetro	PctPriceCut	DaysOnMarket	BuyerSellerIndex	BuyerSellerIndexMetro	zipstring
517	Zip	7063	NJ	New York-Newark-Jersey City, NY-NJ-PA	10423	1	4.687500	90.0	0.017123	9.585635	7063
4546	Zip	53168	WI	Chicago-Naperville-Elgin, IL-IN-WI	10505	3	9.756098	59.5	0.033223	9.488950	53168
485	Zip	7004	NJ	New York-Newark-Jersey City, NY-NJ-PA	10846	1	7.894737	92.0	0.034247	9.585635	7004
6572	Zip	90025	CA	Los Angeles-Long Beach-Anaheim, CA	2122	2	6.451613	46.0	0.036364	4.185083	90025
1609	Zip	19152	PA	Philadelphia-Camden-Wilmington, PA-NJ-DE-MD	4927	5	6.818182	67.0	0.041667	9.150552	19152

buyer_seller_df.tail()

	RegionType	RegionName	State	CBSA Title	SizeRankCity	SizeRankMetro	PctPriceCut	DaysOnMarket	BuyerSellerIndex	BuyerSellerIndexMetro	zipstring
6091	Zip	80016	CO	Denver-Aurora-Lakewood, CO	2658	21	27.226891	62.0	10.0	4.502762	80016
4598	Zip	53954	WI	Madison, WI	10773	87	21.875000	75.5	10.0	3.522099	53954
262	Zip	3049	NH	Manchester-Nashua, NH	10693	131	28.787879	74.0	10.0	3.577348	3049
7321	Zip	97209	OR	Portland-Vancouver-Hillsboro, OR-WA	4742	23	26.258993	78.5	10.0	5.096685	97209
1224	Zip	13104	NY	Syracuse, NY	8916	79	28.323699	89.0	10.0	9.033149	13104

buyer_seller_df.BuyerSellerIndex.min()

0.017123288

buyer_seller_df.BuyerSellerIndex.max()

10.0

c) Filter the data to only show their home's ZIP code

selected_data2[selected_data2['zipstring'] == 92325]

	BuyerSellerIndex	DaysOnMarket	zipstring
6888	9.622642	79.0	92325

Notice the average number of days. Determine if the home is part of a buyer's or seller's market. Mark and Lisa learn that their home is part of a buyer's market (9.6), and they can expect their home to be on the market approximately 79 days before it sells.

selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)

C:\Users\shu12142\AppData\Local\Temp\1\ipykernel_26008\3169181311.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)

selected_data2.shape

(7548, 3)

selected_data2.dtypes

BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int64
dtype: object

selected_data2 = selected_data2.astype({"ZIP_CODE": int})

selected_data2.dtypes

BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int32
dtype: object

Search for the United States ZIP Code Boundaries 2017 layer. We can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

items = gis.content.search('United States ZIP Code Boundaries 2021 owner: esri_dm', outside_org=True)

Display the list of results.

from IPython.display import display

for item in items:
    display(item)

USA ZIP Code Boundaries
U.S. ZIP Code Boundaries provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.

Feature Layer Collection by esri_dm
Last Modified: August 19, 2024
0 comments, 216842598 views

United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
2 comments, 119769 views

USA ZIP Code Areas
U.S. ZIP Code Areas provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.

Layer Package by esri_dm
Last Modified: December 20, 2023
0 comments, 227691 views

USA ZIP Code Three-Digit Areas
USA ZIP Code Three-Digit Areas provides the three-digit ZIP Code areas in the United States.

Layer Package by esri_dm
Last Modified: December 20, 2023
0 comments, 24768 views

United States County Boundaries 2021
This layer shows the County level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 17478 views

United States Tract Boundaries 2021
This layer shows the Tract level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 863 views

United States Boundaries 2021
United States Boundaries 2021 provides boundaries for several layers of administrative divisions.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 13105 views

United States State Boundaries 2021
This layer shows the State level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 96405 views

United States Block Group Boundaries 2021
This layer shows the Block Group boundaries of United States in 2021.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 4326 views

United States Country Boundary 2021
This layer shows the Country boundary of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 102403 views

Select the desired item from the list.

us_zip = items[1]

us_zip

United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.

Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
2 comments, 119769 views

Get the layer names from the item

for lyr in us_zip.layers:
    print(lyr.properties.name)

USA_Country
USA_State
USA_County
USA_ZipCode
USA_Tract
USA_BlockGroup

We want to merge the zip_code layer with data2 to visualize the result on the map.

us_zip_lyr = us_zip.layers[3]

The from_layer() method helps convert feature layer to pandas Dataframe.

zip_df = pd.DataFrame.spatial.from_layer(us_zip_lyr)

zip_df.head()

	OBJECTID	ZIP_CODE	PO_NAME	STATE	POPULATION	SQMI	Shape__Area	Shape__Length	SHAPE
0	1	1	N Dillingham Census Area	AK	<NA>	16019.53	6.657141	24.677454	{"rings": [[[-160.431152, 58.689351], [-160.43...
1	2	2	Yukon Flats Nat Wildlife	AK	<NA>	95862.85	48.948815	131.77645	{"rings": [[[-160.038452, 61.947605], [-160.03...
2	3	3	Alaska Peninsula NWR	AK	<NA>	14572.9	5.655405	41.564165	{"rings": [[[-159.900745, 56.439047], [-159.90...
3	4	4	W Kenai Peninsula Borough	AK	<NA>	6510.85	2.728764	20.553203	{"rings": [[[-154.748861, 59.259518], [-154.70...
4	5	5	N Lake and Peninsula Borough	AK	<NA>	3760.07	1.593722	9.571684	{"rings": [[[-156.0002144, 60.9074352], [-155....

zip_df.shape

(32201, 9)

zip_df.dtypes

OBJECTID                  Int64
ZIP_CODE         string[python]
PO_NAME          string[python]
STATE            string[python]
POPULATION                Int32
SQMI                    Float64
Shape__Area             Float64
Shape__Length           Float64
SHAPE                  geometry
dtype: object

zip_df = zip_df.astype({"ZIP_CODE": int})

zip_df.dtypes

OBJECTID                  Int64
ZIP_CODE                  int32
PO_NAME          string[python]
STATE            string[python]
POPULATION                Int32
SQMI                    Float64
Shape__Area             Float64
Shape__Length           Float64
SHAPE                  geometry
dtype: object

We use the merge() method from pandas library to join zip_df and selected_data2 Dataframes.

merged_df = pd.merge(zip_df, selected_data2, on='ZIP_CODE')

merged_df.shape

(7548, 11)

The import_data() method helps us import the Dataframe merged_df with geometry namespace into ArcGIS Online.

merged_df.spatial.set_geometry('SHAPE')

mergd_lyr = gis.content.import_data(merged_df,
                                    title='MergedLayer_2024',
                                    tags='datascience, dlpk')

C:\Users\shu12142\AppData\Local\anaconda3\envs\geosaurus_dev_env\Lib\site-packages\urllib3\connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geosaurus.maps.arcgis.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

When arcpy is present, the import_data will upload the local SeDF (Spatially Enabled DataFrame) as a FGDB (File geodatabase) to your organization, and publish to a hosted feature layer; On the other hand, when arcpy is not present, then the import_data method would have the local SeDF upload to your organization as a shapefile, and then publish as a hosted Feature Layer. This minor difference will result in column/property name differences from what's defined in the original SeDF.

To get accurate field names from imported layers, we will double check the field names before drawing them on maps in the following sections.

Visualize Results

Create a map of the BuyerSellerIndex field

m1 = gis.map('Redlands, CA')
m1

m1.zoom = 8

To get the accurate column name, let's check the layer properties and display all field names from the imported mergd_lyr data.

field_names = mergd_lyr.layers[0].properties['fields']
for field in field_names:
    print(field['name'])

FID
objectid
zip_code
po_name
state
population
sqmi
shape_area
shape_leng
buyer_sell
days_on_ma
Shape__Area
Shape__Length

We will use the smart mapping capability to render merged layer with colors that varies based on buyer_sell (buyer_seller_index) field. The code below shows that a SmartMappingManager is created first by calling content.renderer(0).smart_mapping(). Then, we will call the class_breaks_renderer method to classify zip code boundary areas in colors. Please refer to smart mapping for more details.

m1.content.add(mergd_lyr)
sm = m1.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type = "color",
    field = "buyer_sell",
)

Create a map on DaysOnMarket field

m2 = gis.map('Redlands, CA')
m2

m2.zoom = 8

Similar to m1, we will use smart mapping again to visualize zip code boundaries that classified by size based on days_on_ma (days_on_market) index.

m2.content.add(mergd_lyr)
sm = m2.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type="size",
    field = "days_on_ma",
)

House Hunting

Find all ZIP Codes within a specified drive time of important places

In this section, we will use ImportantPlaces.xlsx data for selling price analysis. The folloing steps are how we get ImportantPlaces.xlsx downloaded from open source and prepared.

Create an Excel table with columns for Street, City, State, and Zip. Add addresses for the locations you want to access from your new home. Mark and Lisa's table below has their current job addresses. They named their Excel file ImportantPlaces.xlsx and the Excel sheet WorkLocations.

Load the ImportantPlaces.xlsx excel data from local datapath as a Dataframe, and merge the street, city, state colomns as address column

file_name3 = datapath[2]
data3 = pd.read_excel(file_name3)

data3.head()

	Place	Street	City	State	Zip
0	Lisa's job	380 New York Street	Redlands	CA	92373
1	Mark's job	4511 E Guasti Road	Ontario	CA	91761

data3['Address'] = data3['Street'] + ' ' + data3['City'] + ' ' + data3['State']

data3['Address']

0    380 New York Street Redlands CA
1      4511 E Guasti Road Ontario CA
Name: Address, dtype: object

Draw the addresses on map

m3_1 = gis.map('Redlands, CA')
m3_1

m3_1.zoom = 9

To visualize popup address info and red house symbols of Lisa and Mark's job locations, we will create the PopupInfo and PictureMarkerSymbolEsriPMS objects and passing them as parameters when calling content.draw() method.

from arcgis.geocoding import geocode
from arcgis.map.popups import PopupInfo
from arcgis.map.symbols import PictureMarkerSymbolEsriPMS

sr = m3_1.extent['spatialReference']['latestWkid']
data3_addr1 = geocode(data3.Address[0], out_sr=sr)[0]

popup = PopupInfo(title = "Lisa's job", description = data3_addr1['address'])

symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_1.content.draw(data3_addr1['location'], popup = popup, symbol = symbol)

data3_addr2 = geocode(data3.Address[1], out_sr=sr)[0]

popup = PopupInfo(title = "Mark's job", description = data3_addr2['address'])

symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_1.content.draw(data3_addr2['location'], popup = popup, symbol = symbol)

We will Create buffer and enter the maximum time they are willing to spend commuting from their new home to their work.

from arcgis.geoenrichment import BufferStudyArea, enrich

Note: Organizations should review the data attributions and Master Agreement to make sure they are in compliance when geoenriching data and making it available to other systems.

marks = BufferStudyArea(area='4511 E Guasti Road, Ontario, CA 91761', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')
lisas = BufferStudyArea(area='380 New York St Redlands CA 92373', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')

drive_time_df = enrich(study_areas=[marks, lisas], data_collections=['Age'])

drive_time_lyr = gis.content.import_data(drive_time_df,
                                         title="DriveTimeLayer_2024")

Select the Dissolve buffer style to merge overlapping polygons.

from arcgis.features.manage_data import dissolve_boundaries

dissolved_lyr = dissolve_boundaries(drive_time_lyr)

m3_2 = gis.map('Redlands, CA')
m3_2

m3_2.zoom = 8

In this map, we are going to display Lisa and mark's job locations, as well as the dissolved 45 minutes' drive time layer.

m3_2.content.add(dissolved_lyr)

symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_2.content.draw(data3_addr1['location'], popup = popup, symbol = symbol)
m3_2.content.draw(data3_addr2['location'], popup = popup, symbol = symbol)

Map market health, home values, and projected appreciation

In this section, we will use MarketHealthIndex.xlsx for market health, home values, and projected appreciation analysis. The folloing steps are how we get MarketHealthIndex.xlsx downloaded from open source and prepared.

Download and prepare the Excel Market Health data as follows:

a) From www.zillow.com/reserach/data, download ZIP Code level Market Health Index data.

b) Open the .csv file using Excel and add the ZIPString column as a text field. Compute the values using =RIGHT(CONCAT("00000",B2),5). Drag the Autofill handle down to the last cell in the column to create text formatted ZIP Code values with leading zeros.

c) Save the file as an Excel workbook. Close Excel.

Mark and Lisa named their file MarketHealth.xlsx.

Load th MarketHealthIndex.xlsx excel data from local datapath, and restructure it as Dataframe.

file_name4 = datapath[3]
data4 = pd.read_excel(file_name4)

data4

	RegionType	RegionName	City	State	Metro	CBSATitle	SizeRank	MarketHealthIndex	SellForGain	PrevForeclosed	...	ZHVI	MoM	YoY	ForecastYoYPctChange	StockOfREOs	NegativeEquity	Delinquency	DaysOnMarket	Unnamed: 19	zipstring
0	Zip	1001	Agawam	MA	Springfield, MA, MA	Springfield, MA	NaN	1.622365	75.00	0.0500	...	214000.0	0.281162	5.263158	0.047047	NaN	0.069028	0.068063	81.0	NaN	1001
1	Zip	1002	Amherst	MA	Springfield, MA, MA	Springfield, MA	NaN	5.491341	92.31	0.0000	...	331400.0	0.485143	6.972240	0.046192	NaN	0.045415	0.038462	87.5	NaN	1002
2	Zip	1007	Belchertown	MA	Springfield, MA, MA	Springfield, MA	NaN	4.664384	100.00	0.0000	...	277400.0	0.325497	4.679245	0.054387	NaN	0.046303	0.048387	94.5	NaN	1007
3	Zip	1008	Blandford	MA	Springfield, MA, MA	Springfield, MA	NaN	2.541281	0.00	NaN	...	224000.0	0.223714	8.843537	0.061817	NaN	0.060109	0.090909	73.0	NaN	1008
4	Zip	1010	Brimfield	MA	Springfield, MA, MA	Springfield, MA	NaN	3.103101	100.00	NaN	...	255700.0	0.471513	2.608347	0.060555	NaN	0.066778	0.075000	73.0	NaN	1010
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
14893	Zip	75033	Frisco	TX	Dallas-Fort Worth, TX, TX	Dallas-Fort Worth, TX	NaN	9.095852	100.00	0.0000	...	395800.0	-0.075738	1.046719	0.066809	NaN	NaN	NaN	74.0	NaN	75033
14894	Zip	84009	South Jordan	UT	Salt Lake City, UT, UT	Salt Lake City, UT	NaN	6.135052	NaN	0.0000	...	409200.0	0.540541	9.120000	0.040208	NaN	NaN	NaN	48.0	NaN	84009
14895	Zip	97003	Beaverton	OR	Portland, OR, OR	Portland, OR	NaN	6.806954	100.00	0.0000	...	346900.0	0.173260	5.376671	0.037766	NaN	NaN	NaN	44.0	NaN	97003
14896	Zip	97703	Bend	OR	Bend, OR, OR	Bend, OR	NaN	6.566653	100.00	NaN	...	590700.0	0.288625	6.489995	-0.011788	NaN	NaN	NaN	77.0	NaN	97703
14897	Zip	29486	Summerville	SC	Charleston, SC, SC	Charleston, SC	NaN	6.815009	91.30	0.0196	...	192200.0	0.417973	7.314350	0.019964	NaN	NaN	NaN	83.5	NaN	29486

14898 rows × 21 columns

Select City, MarketHealthIndex, ZHVI, ForecastYoYPctChange, and zipstring fields.

col = ['City', 'MarketHealthIndex', 'ZHVI', 'ForecastYoYPctChange', 'zipstring']

matket_health_index = data4[col]

matket_health_index.head()

	City	MarketHealthIndex	ZHVI	ForecastYoYPctChange	zipstring
0	Agawam	1.622365	214000.0	0.047047	1001
1	Amherst	5.491341	331400.0	0.046192	1002
2	Belchertown	4.664384	277400.0	0.054387	1007
3	Blandford	2.541281	224000.0	0.061817	1008
4	Brimfield	3.103101	255700.0	0.060555	1010

matket_health_index.dtypes

City                     object
MarketHealthIndex       float64
ZHVI                    float64
ForecastYoYPctChange    float64
zipstring                 int64
dtype: object

matket_health_index['MarketHealthIndex'].min()

0.000671231

matket_health_index['MarketHealthIndex'].max()

10.0

matket_health_index.rename(columns={"zipstring": "ZIP_CODE"},
                           inplace=True)

C:\Users\shu12142\AppData\Local\Temp\1\ipykernel_26008\1442553068.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  matket_health_index.rename(columns={"zipstring": "ZIP_CODE"},

matket_health_index[matket_health_index['City']=='Crestline']

	City	MarketHealthIndex	ZHVI	ForecastYoYPctChange	ZIP_CODE
13351	Crestline	9.944288	205600.0	0.068084	92322
13353	Crestline	2.882937	228900.0	0.067296	92325

Sort the table on the ZIP_CODE field so we can locate their ZIP Code. Make a note of the values for MarketHealthIndex, ZHVI, and ForecastYoYPctChange. In Crestline, for example, the market health index is fair: 6.4 on a scale that ranges from 0 to 10. The median home value for all homes (not just 3-bedroom homes) is $214,100. Homes are expected to appreciate 4.8 percent.

We also want to merge the zip_code layer with market_health_index layer to visualize the result on map.

zip_df.head()

	OBJECTID	ZIP_CODE	PO_NAME	STATE	POPULATION	SQMI	Shape__Area	Shape__Length	SHAPE
0	1	1	N Dillingham Census Area	AK	<NA>	16019.53	6.657141	24.677454	{"rings": [[[-160.431152, 58.689351], [-160.43...
1	2	2	Yukon Flats Nat Wildlife	AK	<NA>	95862.85	48.948815	131.77645	{"rings": [[[-160.038452, 61.947605], [-160.03...
2	3	3	Alaska Peninsula NWR	AK	<NA>	14572.9	5.655405	41.564165	{"rings": [[[-159.900745, 56.439047], [-159.90...
3	4	4	W Kenai Peninsula Borough	AK	<NA>	6510.85	2.728764	20.553203	{"rings": [[[-154.748861, 59.259518], [-154.70...
4	5	5	N Lake and Peninsula Borough	AK	<NA>	3760.07	1.593722	9.571684	{"rings": [[[-156.0002144, 60.9074352], [-155....

zip_df = zip_df.astype({"ZIP_CODE": int})

health_df = pd.merge(zip_df, matket_health_index, on='ZIP_CODE')

health_df.head()

	OBJECTID	ZIP_CODE	PO_NAME	STATE	POPULATION	SQMI	Shape__Area	Shape__Length	SHAPE	City	MarketHealthIndex	ZHVI	ForecastYoYPctChange
0	245	1001	Agawam	MA	16979	12.08	0.003404	0.318991	{"rings": [[[-72.66152, 42.052804], [-72.66099...	Agawam	1.622365	214000.0	0.047047
1	246	1002	Amherst	MA	35703	58.03	0.016429	0.932599	{"rings": [[[-72.546763, 42.399994], [-72.5467...	Amherst	5.491341	331400.0	0.046192
2	249	1007	Belchertown	MA	15616	55.85	0.015786	0.70547	{"rings": [[[-72.471439, 42.346695], [-72.4713...	Belchertown	4.664384	277400.0	0.054387
3	250	1008	Blandford	MA	1618	60.52	0.017082	0.68638	{"rings": [[[-73.06734, 42.236958], [-73.06329...	Blandford	2.541281	224000.0	0.061817
4	252	1010	Brimfield	MA	3985	37.36	0.010537	0.601482	{"rings": [[[-72.274433, 42.140342], [-72.2742...	Brimfield	3.103101	255700.0	0.060555

health_df.shape

(14894, 13)

health_df.spatial.set_geometry('SHAPE')

hlth_lyr = gis.content.import_data(health_df,
                                  title="MarketHealthLayer_2024")

Create a map on MarketHealthIndex field

m4 = gis.map('Redlands, CA')
m4

m4.zoom = 8

field_names = hlth_lyr.layers[0].properties['fields']
for field in field_names:
    print(field['name'])

FID
objectid
zip_code
po_name
state
population
sqmi
shape_area
shape_leng
city
market_hea
zhvi
forecast_y
Shape__Area
Shape__Length

Similarly, we will still use class_breaks_renderer method to map the zip code areas out and classify it based on market_hea (market_health_index) field. In this case, we are also passing quantile as parameter to generate class breaks that the total number of data values in each class is the same.

m4.content.add(hlth_lyr)
sm = m4.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type="color",
    field = "market_hea",
    classification_method = "quantile",
)

Notice how many ZIP Codes intersect the drive time buffer.

We will utilize overlay_layers method from Python API's feature analysis functionality to create a feature layer of intersect zip code boundaries.

market_hlth_lyr = hlth_lyr.layers[0]

from arcgis.features.manage_data import overlay_layers

zip_intersect = overlay_layers(drive_time_lyr, 
                               market_hlth_lyr, 
                               output_name="Market Health Data Within drive time Buffer" + str(dt.now().microsecond))

{"cost": 14.896}

zip_intersect

Market_Health_Data_Within_drive_time_Buffer488424

Feature Layer Collection by arcgis_python
Last Modified: December 05, 2024
0 comments, 0 views

zip_hlth_intersect = zip_intersect.layers[0]

overlay_df = pd.DataFrame.spatial.from_layer(zip_hlth_intersect)

overlay_df.shape

(360, 65)

Create a map that displays the overlap by adding both hlth_lyr (classified by MarketHealthIndex field) and drive_time_lyr.

m5 = gis.map('Redlands, CA')
m5

m5.zoom = 8

m5.content.add(hlth_lyr)
sm5 = m5.content.renderer(0).smart_mapping()
sm5.class_breaks_renderer(
    break_type="color",
    field="market_hea",
    classification_method="quantile",
)

m5.content.add(drive_time_lyr)

The larger the index, the darker the color, and the healthier the housing market is. They want to buy their new home in an area with a healthy housing market (rather than a location where there are vacancies, homes that aren't selling, and numerous foreclosures).

This result has all the variables one should be interested in mapping, narrowed down to the ZIP Codes that are within an acceptable drive time to their work.

Create a map that displays the overlap by adding both hlth_lyr (classified by ZHVI field) and drive_time_lyr.

m6 = gis.map('Redlands, CA')
m6

m6.zoom = 8

m6.content.add(hlth_lyr)
sm6 = m6.content.renderer(0).smart_mapping()
sm6.class_breaks_renderer(
    break_type="color",
    field="zhvi",
    classification_method="quantile",
)

m6.content.add(drive_time_lyr)

The dark ZIP Codes have the most expensive average home value estimates.

Create a map that displays the overlap by adding both hlth_lyr (classified by ForecastYoYPctChange field) and drive_time_lyr.

m7 = gis.map('Redlands, CA')
m7

m7.zoom = 8

m7.content.add(hlth_lyr)
sm7 = m7.content.renderer(0).smart_mapping()
sm7.class_breaks_renderer(
    break_type="color",
    field="forecast_y",
    classification_method="quantile",
)

m7.content.add(drive_time_lyr)

The darkest ZIP Codes are expected to have the largest increase in home values over the next year.

Find the best cities to begin house hunting

Narrow the search to ZIP Codes with home values in price range as follows:

Mark and Lisa will only sell their home for a loss if they can purchase a home valued the same as theirs with better investment potential. They will look for homes that cost about \$340,000 in neighborhoods with median home values between \$300,000 and \$500,000. Being surrounded by homes that are valued higher than theirs is much better than being the most expensive home on the block. That's why Mark and Lisa extended their search to ZIP Codes with median home values as high as $500,000.

They also want to narrow their search to areas belonging to healthy housing markets with expected home value appreciation. Mark and Lisa, for example, exclude ZIP Codes in unhealthy housing markets by filtering for MarketHealthIndex > 8. They also focus on ZIP Codes with expected home appreciation higher than Crestline by filtering for ForecastYoYPctChange > 0.060. Set filters to reflect your own criteria for the MarketHealthIndex and ForecastYoYPctChange fields.

field_name = "market_hea"
field_name2 = "forecast_y"

query_str = '((ZHVI > 350000) AND (ZHVI < 600000) AND (' + field_name + ' > 8) AND (' + field_name2 + '> 0.06)) AND (1=1)'

zip_hlth_intersect_df = zip_hlth_intersect.query(where=query_str).sdf
zip_hlth_intersect_df

	OBJECTID_1	FID_DRIVETIMELAYER_20242_DRIVET	source_cou	x	y	area_type	buffer_uni	buffer_u_1	buffer_rad	aggregatio	...	sqmi	shape_leng	city	market_hea	zhvi	forecast_y	Shape__Area_1	Shape__Length_1	AnalysisArea	SHAPE
0	2	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	4.2	0.214615	Los Angeles	9.055578	408100.0	0.09147	15880559.84375	25687.278616	0.482092	{"rings": [[[-13157517.9452, 4032773.4488], [-...
1	4	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	3.01	0.147502	Los Angeles	9.930192	472000.0	0.07689	11375466.695312	17800.685279	1.779017	{"rings": [[[-13157514.1907, 4037443.2303], [-...
2	14	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	3.74	0.183214	West Whittier-Los Nietos	8.272251	485500.0	0.072375	14136917.117188	22634.260753	8.36099	{"rings": [[[-13143194.2534, 4028915.1404], [-...
3	18	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	8.23	0.28339	Montebello	8.167539	538500.0	0.061807	31089924.613281	34373.962355	13.017801	{"rings": [[[-13143598.3654, 4032788.2902], [-...
4	20	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	8.53	0.438871	Santa Fe Springs	9.151564	494500.0	0.063482	32164136.523438	53469.063308	2.26762	{"rings": [[[-13143509.5925, 4023388.3935], [-...
5	35	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	69.99	1.187015	Azusa	9.138139	457600.0	0.0863	265860978.933594	144989.856887	23.94345	{"rings": [[[-13123457.5912, 4048674.4978], [-...
6	36	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	15.37	0.381127	Baldwin Park	9.215331	460300.0	0.088173	58195009.007812	46203.383216	39.805003	{"rings": [[[-13129355.916, 4047201.7934], [-1...
7	45	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	4.8	0.210419	El Monte	9.203249	506500.0	0.065299	18149214.558594	25938.507566	12.421541	{"rings": [[[-13135851.7534, 4040662.4451], [-...
8	46	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	6.99	0.228549	South El Monte	9.615385	501700.0	0.093255	26450814.414062	27596.205783	18.114647	{"rings": [[[-13139099.6109, 4037245.849], [-1...
9	213	1	USA	-117.552866	34.064359	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	15.19	0.305009	Eastvale	8.270909	503800.0	0.064508	57390709.253906	38277.204055	39.349068	{"rings": [[[-13082747.7254, 4033299.2511], [-...
10	214	2	USA	-117.19479	34.057265	NetworkServiceArea	Minutes	Drive Time Minutes	45.0	BlockApportionment:US.BlockGroups;PointsLayer:...	...	15.19	0.305009	Eastvale	8.270909	503800.0	0.064508	57390709.253906	38277.204055	39.349068	{"rings": [[[-13082747.7254, 4033299.2511], [-...

11 rows × 65 columns

m9 = gis.map('United States')
m9

m9.content.add(zip_hlth_intersect_df)

m9.zoom_to_layer(zip_hlth_intersect_df)

Notice that when the filter is applied, the map changes. ZIP Codes that don't meet the filter criteria are removed from the map and the colors change to reflect the recomputed Quantile classification.

Conclusion

The results show possible cities and ZIP Codes where they can explore homes for sale. We can use real estate websites such as Zillow.com or Realtor.com to see if they can find a home in their price range with the characteristics and qualities they're are looking for.