Skip to content

Finding a New Home

Buying or selling of a house can be a very stressful event in one's life. The process could be frustrating as it is lengthy, uncertain and needs a lot of examination. Through this workflow we will guide a couple (Mark and Lisa) who is interested in selling their home and relocating to a place nearest to both of their work places. In this case study, we will explore the current housing market, estimate average house prices in their area and hunt for a new one. We will download the Zillow data for their current home for our analysis. You can use your own data and follow along with this workflow which aims to help Mark and Lisa in finding their new home.

The notebook is divided into two parts. In the first part, we will calculate the following:

  • Percentage of decrease/increase in house price since Mark and Lisa bought their home.
  • Suggested selling price for their home.
  • Whether their zip code is a buyer market or seller market.
  • Average number of days it takes for homes to sell in their neighbourhood.

In the second part of the notebook, we will explore the investment potential of homes close to their work places. Based on how much a person is willing to spend commuting to work, we will create a drive-time buffer. This will narrow down the search areas. Zillow also provides data for market health and projected home value appreciation. Visualizing the zip codes by their market health will help them focus only on areas with good market health. Hence they will get a list of areas to choose from, for buying their new home.

Selling your home

Execute the following command to install the openpyxl library if not already. This package is used to read from any Excel or CSV files.

!pip install openpyxl

Also, when matplotlib is not present, run the following command to have it installed or upgraded:

import sys 
!{sys.executable} -m pip install matplotlib

Then, connect to your ArcGIS Online organization, and import necessary libraries.

from arcgis.gis import GIS
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime as dt
gis = GIS(profile="your_online_profile")

Use the boolean has_arcpy to flag whether arcpy is present on the local environement.

has_arcpy = False
try:
    import arcpy
    has_arcpy = True
    print("arcpy present")
except:
    print("arcpy not present")
arcpy not present

Load the csv collection from ArcGIS Online for analysis, and download it as a zip file to the temporary folder. We will explain how these files created in the following sections.

data = gis.content.search('finding_a_new_home owner:api_data_owner type: csv collection', outside_org= True)[0]
data
finding_a_new_home

CSV Collection by api_data_owner
Last Modified: March 17, 2021
0 comments, 147 views
filepath = data.download(file_name=data.name)
import os
import zipfile
from pathlib import Path
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
data_path = Path(os.path.join(os.path.splitext(filepath)[0]))
data_path
WindowsPath('C:/Users/shu12142/AppData/Local/Temp/1/finding_a_new_home')
datapath = [os.path.abspath(os.path.join(data_path, p)) for p in os.listdir(data_path)]
datapath
['C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\BuyerSellerIndex.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\Crestline3BdrmAveSellingPrice.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\ImportantPlaces.xlsx',
 'C:\\Users\\shu12142\\AppData\\Local\\Temp\\1\\finding_a_new_home\\MarketHealthIndex.xlsx']

Determine an appropriate selling price

In this section, we will use Crestline3BdrmAveSellingPrice.xlsx for selling price analysis. The folloing steps are how we get Crestline3BdrmAveSellingPrice.xlsx prepared from open source:

  1. Download home sales time series data from Zillow at www.zillow.com/research/data.

Mark and Lisa have a 3-bedroom home, so we will select the **ZHVI 3-Bedroom time-series ($) ** data set at the ZIP Code level.

  1. Prepare the Excel data:

a) Using Excel, open the .csv file.

Notice that the RegionName field has ZIP Codes as numbers (if we sort the RegionName field we will notice the ZIP Codes for Massachusetts, for example, don't have leading zeros; 01001 is 1001). Also, notice the median home value columns are named using the year and month. The first data available is for April 1996 (1996-04).

b) Copy all the column headings and the one record with data for their ZIP Code to a new Excel sheet.

Apply a filter to the RegionName field. Mark and Lisa live in Crestline, California, so we will apply a filter for the 92325 ZIP Code.

c) Select (highlight) fields starting with the month and year when they bought their home and continuing to the last month and year column in the Excel table. So, for example, since Mark and Lisa bought their home in December 2007, they highlight the the two rows from column 2007-01 to column 2018-08.

d) Copy (press Ctrl+C) the selected data and paste it, along with the column headings, to a new Excel sheet using Paste Transposed (right-click in the first cell of the new sheet to see the paste options; select Paste Transposed). This gives two columns of data.

e) The first column has date values but only includes the year and month. In column C, create a proper date field.

  • Right-click column C and format the cells to be category date.
  • In the first cell of column C, enter the following formula: = DATEVALUE(CONCATENATE(A1, "-01"))
  • Drag the Autofill handle down to the last data cell in the column.

f) Insert a top row and type the column headings:

YYYYMM, Value, and date.

g) Rename the Excel sheet (probably called Sheet2 at present) something like AveSellingPrice and delete the other sheets (the first sheet contains a large amount of data that we won't be using further in the workflow).

Mark and Lisa named their price Excel sheet CrestlineAveSellingPrice.

h) Save this new sheet as an Excel workbook.

Mark and Lisa named their Excel file Crestline3BdrmAveSellingPrice.xlsx.

  1. Read the Crestline3BdrmAveSellingPrice excel data from local datapath, and restructure it as a Dataframe.
file_name1 = datapath[1]
data1 = pd.pandas.read_excel(file_name1)
data1.head()
YYYYMMvaluedate
02007-012910002007-01-01
12007-022890002007-02-01
22007-032874002007-03-01
32007-042861002007-04-01
42007-052840002007-05-01
data1.tail()
YYYYMMvaluedate
1352018-042522002018-04-01
1362018-052540002018-05-01
1372018-062548002018-06-01
1382018-072549002018-07-01
1392018-082549002018-08-01
data1.shape
(140, 3)

The following line of code adds three columns (year, month, day) by applying lambda function to each row. The lambda function creates pandas Series for each row to store the year, month, day separately.

data1[['year','month','day']] = data1.date.apply(lambda x: pd.Series(
    x.strftime("%Y,%m,%d").split(","))) # split date into year, month, day
data1.head()
YYYYMMvaluedateyearmonthday
02007-012910002007-01-0120070101
12007-022890002007-02-0120070201
22007-032874002007-03-0120070301
32007-042861002007-04-0120070401
42007-052840002007-05-0120070501

We will also use unique() method to check the unique years, and create a new pandas Dataframe by calling groupby method that groups data1 house value column by year and performs mean operation. The new Dataframe will illustrate the mean house value of each year.

data1.year.unique()
array(['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018'], dtype=object)
grpby_data1 = data1.groupby(['year']).mean(numeric_only=True)
grpby_data1.head()
value
year
2007276616.666667
2008221875.000000
2009188391.666667
2010176216.666667
2011154766.666667
  1. Create a graph using matplotlib library to show how average home prices have changed since they bought their home.
grpby_data1.reset_index(inplace=True)
grpby_data1.head()
yearvalue
02007276616.666667
12008221875.000000
22009188391.666667
32010176216.666667
42011154766.666667
grpby_data1.value
0     276616.666667
1     221875.000000
2     188391.666667
3     176216.666667
4     154766.666667
5     145158.333333
6     163741.666667
7     183975.000000
8     197133.333333
9     229566.666667
10    235966.666667
11    252025.000000
Name: value, dtype: float64
grpby_data1.year
0     2007
1     2008
2     2009
3     2010
4     2011
5     2012
6     2013
7     2014
8     2015
9     2016
10    2017
11    2018
Name: year, dtype: object
plt.plot(grpby_data1.year, grpby_data1.value)
plt.title("average home prices (2007-2018)")
plt.xlabel("date")
plt.ylabel("average house price")
Text(0, 0.5, 'average house price')
<Figure size 640x480 with 1 Axes>
  1. Determine an appropriate selling price based on home sales trends as follows:

a) Determine the current average selling price and the average selling price when they bought their home. Divide the current average price by the beginning average price to see how much homes in their ZIP Code have appreciated or depreciated. When Mark and Lisa bought their home in December of 2007, 3-bedroom homes were selling for $276,617.

price_initial = grpby_data1.iloc[0]
price_initial
year              2007
value    276616.666667
Name: 0, dtype: object
price_current = grpby_data1.iloc[-1]
price_current
year         2018
value    252025.0
Name: 11, dtype: object
house_worth = price_current.value / price_initial.value
house_worth
0.9110983912755316

This indicates that homes in Crestline are only worth 91 percent of what they were at the end of 2007.

b) We can get a rough estimate of what their home is worth by summing what they paid for their home plus what they invested in it, and multiplying that sum by the ratio computed above. Mark and Lisa, for example, paid \$291,000 in 2007 and invested \$100,000 in solid improvements (new kitchen, major landscaping, hardwood flooring, and so on). Multiplying (\$291,000 + \$100,000) by 0.91 gives a rough suggested selling price of \$343,134.

(price_initial.value + 100000) * house_worth
343134.83912755316

Get additional information about the local real estate market

If their home is part of a seller's market, they are more likely to get their asking price. In this section, BuyerSellerIndex.xlsx data is being used for local real estate market analysis. The folloing steps are how we get BuyerSellerIndex.xlsx prepared from open source data:

  1. Download the Buyer-Seller Index data at the ZIP Code level from www.zillow.com/research/data. In Home Listings and Sales select data type as seller-buyer index and geography as zip codes.
  1. Open the .csv file using Excel. Zillow reports ZIP Codes as numbers. We will need to pad the ZIP Code numbers with leading zeros so the Zillow data will link to the ArcGIS ZIP Code geometry.

Follow these steps:

a) Sort the RegionName column from smallest to largest so we will be able to see how the formula below works.

b) Name a new column in the Excel table zipstring.

c) In the first cell of the new column, enter the formula to pad each RegionName value with leading zeros, keeping the rightmost five characters: =RIGHT(CONCAT("00000",B2),5)

d) Drag the Autofill handle down to the last data cell in the column.

  1. Read the BuyerSellerIndex excel data from local datapath, and restructure it as Dataframe.
file_name2 = datapath[0]
data2 = pd.read_excel(file_name2)
data2.head()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
0Zip1001MASpringfield, MA82738322.91666781.06.1904769.1850831001
1Zip1002MASpringfield, MA66848329.78723494.010.0000009.1850831002
2Zip1007MASpringfield, MA90888316.09195483.02.8571439.1850831007
3Zip1013MASpringfield, MA70618331.14754176.58.3333339.1850831013
4Zip1020MASpringfield, MA51728325.00000073.03.8095249.1850831020
data2.dtypes
RegionType                object
RegionName                 int64
State                     object
CBSA Title                object
SizeRankCity               int64
SizeRankMetro              int64
PctPriceCut              float64
DaysOnMarket             float64
BuyerSellerIndex         float64
BuyerSellerIndexMetro    float64
zipstring                  int64
dtype: object
data2.columns
Index(['RegionType', 'RegionName', 'State', 'CBSA Title', 'SizeRankCity',
       'SizeRankMetro', 'PctPriceCut', 'DaysOnMarket', 'BuyerSellerIndex',
       'BuyerSellerIndexMetro', 'zipstring'],
      dtype='object')
  1. Select BuyerSellerIndex, DaysOnMarket, zipstring fields
cols = ['BuyerSellerIndex', 'DaysOnMarket', 'zipstring']
selected_data2 = data2[cols]
  1. Explore the data values as follows:

a) Sort on the DaysOnMarket field and notice the range. For the data Mark and Lisa downloaded, the range is 35 to 294.5 days.

days_on_mrkt_df = selected_data2.sort_values(by='DaysOnMarket', axis=0)
days_on_mrkt_df.DaysOnMarket.min()
35.0
days_on_mrkt_df.DaysOnMarket.max()
294.5
days_on_mrkt_df.head()
BuyerSellerIndexDaysOnMarketzipstring
74201.12000035.098107
74344.96000035.098136
70970.52631635.094022
74260.64000035.098117
73990.08000035.098043
days_on_mrkt_df.tail()
BuyerSellerIndexDaysOnMarketzipstring
8009.006849252.08752
117710.000000254.012428
11418.938356261.511963
11899.160959288.012545
7536.000000294.58403

b) Sort on the BuyerSellerIndex field and notice the range of values. ZIP Codes with index values near 0 are part of a strong seller's market; ZIP Codes with index values near 10 are part of a strong buyer's market.

buyer_seller_df = data2.sort_values(by='BuyerSellerIndex', axis=0)
buyer_seller_df.head()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
517Zip7063NJNew York-Newark-Jersey City, NY-NJ-PA1042314.68750090.00.0171239.5856357063
4546Zip53168WIChicago-Naperville-Elgin, IL-IN-WI1050539.75609859.50.0332239.48895053168
485Zip7004NJNew York-Newark-Jersey City, NY-NJ-PA1084617.89473792.00.0342479.5856357004
6572Zip90025CALos Angeles-Long Beach-Anaheim, CA212226.45161346.00.0363644.18508390025
1609Zip19152PAPhiladelphia-Camden-Wilmington, PA-NJ-DE-MD492756.81818267.00.0416679.15055219152
buyer_seller_df.tail()
RegionTypeRegionNameStateCBSA TitleSizeRankCitySizeRankMetroPctPriceCutDaysOnMarketBuyerSellerIndexBuyerSellerIndexMetrozipstring
6091Zip80016CODenver-Aurora-Lakewood, CO26582127.22689162.010.04.50276280016
4598Zip53954WIMadison, WI107738721.87500075.510.03.52209953954
262Zip3049NHManchester-Nashua, NH1069313128.78787974.010.03.5773483049
7321Zip97209ORPortland-Vancouver-Hillsboro, OR-WA47422326.25899378.510.05.09668597209
1224Zip13104NYSyracuse, NY89167928.32369989.010.09.03314913104
buyer_seller_df.BuyerSellerIndex.min()
0.017123288
buyer_seller_df.BuyerSellerIndex.max()
10.0

c) Filter the data to only show their home's ZIP code

selected_data2[selected_data2['zipstring'] == 92325]
BuyerSellerIndexDaysOnMarketzipstring
68889.62264279.092325

Notice the average number of days. Determine if the home is part of a buyer's or seller's market. Mark and Lisa learn that their home is part of a buyer's market (9.6), and they can expect their home to be on the market approximately 79 days before it sells.

selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)
C:\Users\shu12142\AppData\Local\Temp\1\ipykernel_26008\3169181311.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_data2.rename(columns={"zipstring": "ZIP_CODE"}, inplace=True)
selected_data2.shape
(7548, 3)
selected_data2.dtypes
BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int64
dtype: object
selected_data2 = selected_data2.astype({"ZIP_CODE": int})
selected_data2.dtypes
BuyerSellerIndex    float64
DaysOnMarket        float64
ZIP_CODE              int32
dtype: object
  1. Search for the United States ZIP Code Boundaries 2017 layer. We can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.
items = gis.content.search('United States ZIP Code Boundaries 2021 owner: esri_dm', outside_org=True)

Display the list of results.

from IPython.display import display

for item in items:
    display(item)
USA ZIP Code Boundaries
U.S. ZIP Code Boundaries provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.
Feature Layer Collection by esri_dm
Last Modified: August 19, 2024
0 comments, 216842598 views
United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
2 comments, 119769 views
USA ZIP Code Areas
U.S. ZIP Code Areas provides ZIP Code, postal district name, population, and area for the ZIP Code areas in the United States.
Layer Package by esri_dm
Last Modified: December 20, 2023
0 comments, 227691 views
USA ZIP Code Three-Digit Areas
USA ZIP Code Three-Digit Areas provides the three-digit ZIP Code areas in the United States.
Layer Package by esri_dm
Last Modified: December 20, 2023
0 comments, 24768 views
United States County Boundaries 2021
This layer shows the County level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 17478 views
United States Tract Boundaries 2021
This layer shows the Tract level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 863 views
United States Boundaries 2021
United States Boundaries 2021 provides boundaries for several layers of administrative divisions.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 13105 views
United States State Boundaries 2021
This layer shows the State level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 96405 views
United States Block Group Boundaries 2021
This layer shows the Block Group boundaries of United States in 2021.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 4326 views
United States Country Boundary 2021
This layer shows the Country boundary of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
0 comments, 102403 views

Select the desired item from the list.

us_zip = items[1]
us_zip
United States ZIP Code Boundaries 2021
This layer shows the ZIP Code level boundaries of United States in 2021, designed to be used in Data Enrichment analysis.
Feature Layer Collection by esri_dm
Last Modified: May 30, 2023
2 comments, 119769 views

Get the layer names from the item

for lyr in us_zip.layers:
    print(lyr.properties.name)
USA_Country
USA_State
USA_County
USA_ZipCode
USA_Tract
USA_BlockGroup
  1. We want to merge the zip_code layer with data2 to visualize the result on the map.
us_zip_lyr = us_zip.layers[3]

The from_layer() method helps convert feature layer to pandas Dataframe.

zip_df = pd.DataFrame.spatial.from_layer(us_zip_lyr)
zip_df.head()
OBJECTIDZIP_CODEPO_NAMESTATEPOPULATIONSQMIShape__AreaShape__LengthSHAPE
011N Dillingham Census AreaAK<NA>16019.536.65714124.677454{"rings": [[[-160.431152, 58.689351], [-160.43...
122Yukon Flats Nat WildlifeAK<NA>95862.8548.948815131.77645{"rings": [[[-160.038452, 61.947605], [-160.03...
233Alaska Peninsula NWRAK<NA>14572.95.65540541.564165{"rings": [[[-159.900745, 56.439047], [-159.90...
344W Kenai Peninsula BoroughAK<NA>6510.852.72876420.553203{"rings": [[[-154.748861, 59.259518], [-154.70...
455N Lake and Peninsula BoroughAK<NA>3760.071.5937229.571684{"rings": [[[-156.0002144, 60.9074352], [-155....
zip_df.shape
(32201, 9)
zip_df.dtypes
OBJECTID                  Int64
ZIP_CODE         string[python]
PO_NAME          string[python]
STATE            string[python]
POPULATION                Int32
SQMI                    Float64
Shape__Area             Float64
Shape__Length           Float64
SHAPE                  geometry
dtype: object
zip_df = zip_df.astype({"ZIP_CODE": int})
zip_df.dtypes
OBJECTID                  Int64
ZIP_CODE                  int32
PO_NAME          string[python]
STATE            string[python]
POPULATION                Int32
SQMI                    Float64
Shape__Area             Float64
Shape__Length           Float64
SHAPE                  geometry
dtype: object

We use the merge() method from pandas library to join zip_df and selected_data2 Dataframes.

merged_df = pd.merge(zip_df, selected_data2, on='ZIP_CODE')
merged_df.shape
(7548, 11)

The import_data() method helps us import the Dataframe merged_df with geometry namespace into ArcGIS Online.

merged_df.spatial.set_geometry('SHAPE')
mergd_lyr = gis.content.import_data(merged_df,
                                    title='MergedLayer_2024',
                                    tags='datascience, dlpk')
C:\Users\shu12142\AppData\Local\anaconda3\envs\geosaurus_dev_env\Lib\site-packages\urllib3\connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geosaurus.maps.arcgis.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

When arcpy is present, the import_data will upload the local SeDF (Spatially Enabled DataFrame) as a FGDB (File geodatabase) to your organization, and publish to a hosted feature layer; On the other hand, when arcpy is not present, then the import_data method would have the local SeDF upload to your organization as a shapefile, and then publish as a hosted Feature Layer. This minor difference will result in column/property name differences from what's defined in the original SeDF.

To get accurate field names from imported layers, we will double check the field names before drawing them on maps in the following sections.

Visualize Results

  1. Create a map of the BuyerSellerIndex field
m1 = gis.map('Redlands, CA')
m1
m1.zoom = 8

To get the accurate column name, let's check the layer properties and display all field names from the imported mergd_lyr data.

field_names = mergd_lyr.layers[0].properties['fields']
for field in field_names:
    print(field['name'])
FID
objectid
zip_code
po_name
state
population
sqmi
shape_area
shape_leng
buyer_sell
days_on_ma
Shape__Area
Shape__Length

We will use the smart mapping capability to render merged layer with colors that varies based on buyer_sell (buyer_seller_index) field. The code below shows that a SmartMappingManager is created first by calling content.renderer(0).smart_mapping(). Then, we will call the class_breaks_renderer method to classify zip code boundary areas in colors. Please refer to smart mapping for more details.

m1.content.add(mergd_lyr)
sm = m1.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type = "color",
    field = "buyer_sell",
)
  1. Create a map on DaysOnMarket field
m2 = gis.map('Redlands, CA')
m2
m2.zoom = 8

Similar to m1, we will use smart mapping again to visualize zip code boundaries that classified by size based on days_on_ma (days_on_market) index.

m2.content.add(mergd_lyr)
sm = m2.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type="size",
    field = "days_on_ma",
)

House Hunting

Find all ZIP Codes within a specified drive time of important places

In this section, we will use ImportantPlaces.xlsx data for selling price analysis. The folloing steps are how we get ImportantPlaces.xlsx downloaded from open source and prepared.

  1. Create an Excel table with columns for Street, City, State, and Zip. Add addresses for the locations you want to access from your new home. Mark and Lisa's table below has their current job addresses. They named their Excel file ImportantPlaces.xlsx and the Excel sheet WorkLocations.
  1. Load the ImportantPlaces.xlsx excel data from local datapath as a Dataframe, and merge the street, city, state colomns as address column
file_name3 = datapath[2]
data3 = pd.read_excel(file_name3)
data3.head()
PlaceStreetCityStateZip
0Lisa's job380 New York StreetRedlandsCA92373
1Mark's job4511 E Guasti RoadOntarioCA91761
data3['Address'] = data3['Street'] + ' ' + data3['City'] + ' ' + data3['State']
data3['Address']
0    380 New York Street Redlands CA
1      4511 E Guasti Road Ontario CA
Name: Address, dtype: object
  1. Draw the addresses on map
m3_1 = gis.map('Redlands, CA')
m3_1
m3_1.zoom = 9

To visualize popup address info and red house symbols of Lisa and Mark's job locations, we will create the PopupInfo and PictureMarkerSymbolEsriPMS objects and passing them as parameters when calling content.draw() method.

from arcgis.geocoding import geocode
from arcgis.map.popups import PopupInfo
from arcgis.map.symbols import PictureMarkerSymbolEsriPMS

sr = m3_1.extent['spatialReference']['latestWkid']
data3_addr1 = geocode(data3.Address[0], out_sr=sr)[0]

popup = PopupInfo(title = "Lisa's job", description = data3_addr1['address'])

symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_1.content.draw(data3_addr1['location'], popup = popup, symbol = symbol)
data3_addr2 = geocode(data3.Address[1], out_sr=sr)[0]

popup = PopupInfo(title = "Mark's job", description = data3_addr2['address'])

symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_1.content.draw(data3_addr2['location'], popup = popup, symbol = symbol)
  1. We will Create buffer and enter the maximum time they are willing to spend commuting from their new home to their work.
from arcgis.geoenrichment import BufferStudyArea, enrich

Note: Organizations should review the data attributions and Master Agreement to make sure they are in compliance when geoenriching data and making it available to other systems.

marks = BufferStudyArea(area='4511 E Guasti Road, Ontario, CA 91761', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')
lisas = BufferStudyArea(area='380 New York St Redlands CA 92373', 
                           radii=[45], units='Minutes', 
                           travel_mode='Driving')

drive_time_df = enrich(study_areas=[marks, lisas], data_collections=['Age'])
drive_time_lyr = gis.content.import_data(drive_time_df,
                                         title="DriveTimeLayer_2024")
  1. Select the Dissolve buffer style to merge overlapping polygons.
from arcgis.features.manage_data import dissolve_boundaries
dissolved_lyr = dissolve_boundaries(drive_time_lyr)
m3_2 = gis.map('Redlands, CA')
m3_2
m3_2.zoom = 8

In this map, we are going to display Lisa and mark's job locations, as well as the dissolved 45 minutes' drive time layer.

m3_2.content.add(dissolved_lyr)
symbol = PictureMarkerSymbolEsriPMS(
    angle=0,
    xoffset=0,
    yoffset=0,
    content_type="image/png",
    width=24,
    height=25,
    type="esriPMS",
    url="https://static.arcgis.com/images/Symbols/PeoplePlaces/School.png",
)

m3_2.content.draw(data3_addr1['location'], popup = popup, symbol = symbol)
m3_2.content.draw(data3_addr2['location'], popup = popup, symbol = symbol)

Map market health, home values, and projected appreciation

In this section, we will use MarketHealthIndex.xlsx for market health, home values, and projected appreciation analysis. The folloing steps are how we get MarketHealthIndex.xlsx downloaded from open source and prepared.

  1. Download and prepare the Excel Market Health data as follows:

a) From www.zillow.com/reserach/data, download ZIP Code level Market Health Index data.

b) Open the .csv file using Excel and add the ZIPString column as a text field. Compute the values using =RIGHT(CONCAT("00000",B2),5). Drag the Autofill handle down to the last cell in the column to create text formatted ZIP Code values with leading zeros.

c) Save the file as an Excel workbook. Close Excel.

Mark and Lisa named their file MarketHealth.xlsx.

  1. Load th MarketHealthIndex.xlsx excel data from local datapath, and restructure it as Dataframe.
file_name4 = datapath[3]
data4 = pd.read_excel(file_name4)
data4
RegionTypeRegionNameCityStateMetroCBSATitleSizeRankMarketHealthIndexSellForGainPrevForeclosed...ZHVIMoMYoYForecastYoYPctChangeStockOfREOsNegativeEquityDelinquencyDaysOnMarketUnnamed: 19zipstring
0Zip1001AgawamMASpringfield, MA, MASpringfield, MANaN1.62236575.000.0500...214000.00.2811625.2631580.047047NaN0.0690280.06806381.0NaN1001
1Zip1002AmherstMASpringfield, MA, MASpringfield, MANaN5.49134192.310.0000...331400.00.4851436.9722400.046192NaN0.0454150.03846287.5NaN1002
2Zip1007BelchertownMASpringfield, MA, MASpringfield, MANaN4.664384100.000.0000...277400.00.3254974.6792450.054387NaN0.0463030.04838794.5NaN1007
3Zip1008BlandfordMASpringfield, MA, MASpringfield, MANaN2.5412810.00NaN...224000.00.2237148.8435370.061817NaN0.0601090.09090973.0NaN1008
4Zip1010BrimfieldMASpringfield, MA, MASpringfield, MANaN3.103101100.00NaN...255700.00.4715132.6083470.060555NaN0.0667780.07500073.0NaN1010
..................................................................
14893Zip75033FriscoTXDallas-Fort Worth, TX, TXDallas-Fort Worth, TXNaN9.095852100.000.0000...395800.0-0.0757381.0467190.066809NaNNaNNaN74.0NaN75033
14894Zip84009South JordanUTSalt Lake City, UT, UTSalt Lake City, UTNaN6.135052NaN0.0000...409200.00.5405419.1200000.040208NaNNaNNaN48.0NaN84009
14895Zip97003BeavertonORPortland, OR, ORPortland, ORNaN6.806954100.000.0000...346900.00.1732605.3766710.037766NaNNaNNaN44.0NaN97003
14896Zip97703BendORBend, OR, ORBend, ORNaN6.566653100.00NaN...590700.00.2886256.489995-0.011788NaNNaNNaN77.0NaN97703
14897Zip29486SummervilleSCCharleston, SC, SCCharleston, SCNaN6.81500991.300.0196...192200.00.4179737.3143500.019964NaNNaNNaN83.5NaN29486

14898 rows × 21 columns

  1. Select City, MarketHealthIndex, ZHVI, ForecastYoYPctChange, and zipstring fields.
col = ['City', 'MarketHealthIndex', 'ZHVI', 'ForecastYoYPctChange', 'zipstring']
matket_health_index = data4[col]
matket_health_index.head()
CityMarketHealthIndexZHVIForecastYoYPctChangezipstring
0Agawam1.622365214000.00.0470471001
1Amherst5.491341331400.00.0461921002
2Belchertown4.664384277400.00.0543871007
3Blandford2.541281224000.00.0618171008
4Brimfield3.103101255700.00.0605551010
matket_health_index.dtypes
City                     object
MarketHealthIndex       float64
ZHVI                    float64
ForecastYoYPctChange    float64
zipstring                 int64
dtype: object
matket_health_index['MarketHealthIndex'].min()
0.000671231
matket_health_index['MarketHealthIndex'].max()
10.0
matket_health_index.rename(columns={"zipstring": "ZIP_CODE"},
                           inplace=True)
C:\Users\shu12142\AppData\Local\Temp\1\ipykernel_26008\1442553068.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  matket_health_index.rename(columns={"zipstring": "ZIP_CODE"},
matket_health_index[matket_health_index['City']=='Crestline']
CityMarketHealthIndexZHVIForecastYoYPctChangeZIP_CODE
13351Crestline9.944288205600.00.06808492322
13353Crestline2.882937228900.00.06729692325
  1. Sort the table on the ZIP_CODE field so we can locate their ZIP Code. Make a note of the values for MarketHealthIndex, ZHVI, and ForecastYoYPctChange. In Crestline, for example, the market health index is fair: 6.4 on a scale that ranges from 0 to 10. The median home value for all homes (not just 3-bedroom homes) is $214,100. Homes are expected to appreciate 4.8 percent.

We also want to merge the zip_code layer with market_health_index layer to visualize the result on map.

zip_df.head()
OBJECTIDZIP_CODEPO_NAMESTATEPOPULATIONSQMIShape__AreaShape__LengthSHAPE
011N Dillingham Census AreaAK<NA>16019.536.65714124.677454{"rings": [[[-160.431152, 58.689351], [-160.43...
122Yukon Flats Nat WildlifeAK<NA>95862.8548.948815131.77645{"rings": [[[-160.038452, 61.947605], [-160.03...
233Alaska Peninsula NWRAK<NA>14572.95.65540541.564165{"rings": [[[-159.900745, 56.439047], [-159.90...
344W Kenai Peninsula BoroughAK<NA>6510.852.72876420.553203{"rings": [[[-154.748861, 59.259518], [-154.70...
455N Lake and Peninsula BoroughAK<NA>3760.071.5937229.571684{"rings": [[[-156.0002144, 60.9074352], [-155....
zip_df = zip_df.astype({"ZIP_CODE": int})
health_df = pd.merge(zip_df, matket_health_index, on='ZIP_CODE')
health_df.head()
OBJECTIDZIP_CODEPO_NAMESTATEPOPULATIONSQMIShape__AreaShape__LengthSHAPECityMarketHealthIndexZHVIForecastYoYPctChange
02451001AgawamMA1697912.080.0034040.318991{"rings": [[[-72.66152, 42.052804], [-72.66099...Agawam1.622365214000.00.047047
12461002AmherstMA3570358.030.0164290.932599{"rings": [[[-72.546763, 42.399994], [-72.5467...Amherst5.491341331400.00.046192
22491007BelchertownMA1561655.850.0157860.70547{"rings": [[[-72.471439, 42.346695], [-72.4713...Belchertown4.664384277400.00.054387
32501008BlandfordMA161860.520.0170820.68638{"rings": [[[-73.06734, 42.236958], [-73.06329...Blandford2.541281224000.00.061817
42521010BrimfieldMA398537.360.0105370.601482{"rings": [[[-72.274433, 42.140342], [-72.2742...Brimfield3.103101255700.00.060555
health_df.shape
(14894, 13)
health_df.spatial.set_geometry('SHAPE')
hlth_lyr = gis.content.import_data(health_df,
                                  title="MarketHealthLayer_2024")
  1. Create a map on MarketHealthIndex field
m4 = gis.map('Redlands, CA')
m4
m4.zoom = 8
field_names = hlth_lyr.layers[0].properties['fields']
for field in field_names:
    print(field['name'])
FID
objectid
zip_code
po_name
state
population
sqmi
shape_area
shape_leng
city
market_hea
zhvi
forecast_y
Shape__Area
Shape__Length

Similarly, we will still use class_breaks_renderer method to map the zip code areas out and classify it based on market_hea (market_health_index) field. In this case, we are also passing quantile as parameter to generate class breaks that the total number of data values in each class is the same.

m4.content.add(hlth_lyr)
sm = m4.content.renderer(0).smart_mapping()
sm.class_breaks_renderer(
    break_type="color",
    field = "market_hea",
    classification_method = "quantile",
)
  1. Notice how many ZIP Codes intersect the drive time buffer.

We will utilize overlay_layers method from Python API's feature analysis functionality to create a feature layer of intersect zip code boundaries.

market_hlth_lyr = hlth_lyr.layers[0]
from arcgis.features.manage_data import overlay_layers
zip_intersect = overlay_layers(drive_time_lyr, 
                               market_hlth_lyr, 
                               output_name="Market Health Data Within drive time Buffer" + str(dt.now().microsecond))
{"cost": 14.896}
zip_intersect
Market_Health_Data_Within_drive_time_Buffer488424

Feature Layer Collection by arcgis_python
Last Modified: December 05, 2024
0 comments, 0 views
zip_hlth_intersect = zip_intersect.layers[0]
overlay_df = pd.DataFrame.spatial.from_layer(zip_hlth_intersect)
overlay_df.shape
(360, 65)
  1. Create a map that displays the overlap by adding both hlth_lyr (classified by MarketHealthIndex field) and drive_time_lyr.
m5 = gis.map('Redlands, CA')
m5
m5.zoom = 8
m5.content.add(hlth_lyr)
sm5 = m5.content.renderer(0).smart_mapping()
sm5.class_breaks_renderer(
    break_type="color",
    field="market_hea",
    classification_method="quantile",
)
m5.content.add(drive_time_lyr)

The larger the index, the darker the color, and the healthier the housing market is. They want to buy their new home in an area with a healthy housing market (rather than a location where there are vacancies, homes that aren't selling, and numerous foreclosures).

This result has all the variables one should be interested in mapping, narrowed down to the ZIP Codes that are within an acceptable drive time to their work.

  1. Create a map that displays the overlap by adding both hlth_lyr (classified by ZHVI field) and drive_time_lyr.
m6 = gis.map('Redlands, CA')
m6
m6.zoom = 8
m6.content.add(hlth_lyr)
sm6 = m6.content.renderer(0).smart_mapping()
sm6.class_breaks_renderer(
    break_type="color",
    field="zhvi",
    classification_method="quantile",
)
m6.content.add(drive_time_lyr)

The dark ZIP Codes have the most expensive average home value estimates.

  1. Create a map that displays the overlap by adding both hlth_lyr (classified by ForecastYoYPctChange field) and drive_time_lyr.
m7 = gis.map('Redlands, CA')
m7
m7.zoom = 8
m7.content.add(hlth_lyr)
sm7 = m7.content.renderer(0).smart_mapping()
sm7.class_breaks_renderer(
    break_type="color",
    field="forecast_y",
    classification_method="quantile",
)
m7.content.add(drive_time_lyr)

The darkest ZIP Codes are expected to have the largest increase in home values over the next year.

Find the best cities to begin house hunting

  1. Narrow the search to ZIP Codes with home values in price range as follows:

Mark and Lisa will only sell their home for a loss if they can purchase a home valued the same as theirs with better investment potential. They will look for homes that cost about \$340,000 in neighborhoods with median home values between \$300,000 and \$500,000. Being surrounded by homes that are valued higher than theirs is much better than being the most expensive home on the block. That's why Mark and Lisa extended their search to ZIP Codes with median home values as high as $500,000.

They also want to narrow their search to areas belonging to healthy housing markets with expected home value appreciation. Mark and Lisa, for example, exclude ZIP Codes in unhealthy housing markets by filtering for MarketHealthIndex > 8. They also focus on ZIP Codes with expected home appreciation higher than Crestline by filtering for ForecastYoYPctChange > 0.060. Set filters to reflect your own criteria for the MarketHealthIndex and ForecastYoYPctChange fields.

field_name = "market_hea"
field_name2 = "forecast_y"
query_str = '((ZHVI > 350000) AND (ZHVI < 600000) AND (' + field_name + ' > 8) AND (' + field_name2 + '> 0.06)) AND (1=1)'

zip_hlth_intersect_df = zip_hlth_intersect.query(where=query_str).sdf
zip_hlth_intersect_df
OBJECTID_1FID_DRIVETIMELAYER_20242_DRIVETsource_couxyarea_typebuffer_unibuffer_u_1buffer_radaggregatio...sqmishape_lengcitymarket_heazhviforecast_yShape__Area_1Shape__Length_1AnalysisAreaSHAPE
021USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......4.20.214615Los Angeles9.055578408100.00.0914715880559.8437525687.2786160.482092{"rings": [[[-13157517.9452, 4032773.4488], [-...
141USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......3.010.147502Los Angeles9.930192472000.00.0768911375466.69531217800.6852791.779017{"rings": [[[-13157514.1907, 4037443.2303], [-...
2141USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......3.740.183214West Whittier-Los Nietos8.272251485500.00.07237514136917.11718822634.2607538.36099{"rings": [[[-13143194.2534, 4028915.1404], [-...
3181USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......8.230.28339Montebello8.167539538500.00.06180731089924.61328134373.96235513.017801{"rings": [[[-13143598.3654, 4032788.2902], [-...
4201USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......8.530.438871Santa Fe Springs9.151564494500.00.06348232164136.52343853469.0633082.26762{"rings": [[[-13143509.5925, 4023388.3935], [-...
5351USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......69.991.187015Azusa9.138139457600.00.0863265860978.933594144989.85688723.94345{"rings": [[[-13123457.5912, 4048674.4978], [-...
6361USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......15.370.381127Baldwin Park9.215331460300.00.08817358195009.00781246203.38321639.805003{"rings": [[[-13129355.916, 4047201.7934], [-1...
7451USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......4.80.210419El Monte9.203249506500.00.06529918149214.55859425938.50756612.421541{"rings": [[[-13135851.7534, 4040662.4451], [-...
8461USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......6.990.228549South El Monte9.615385501700.00.09325526450814.41406227596.20578318.114647{"rings": [[[-13139099.6109, 4037245.849], [-1...
92131USA-117.55286634.064359NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......15.190.305009Eastvale8.270909503800.00.06450857390709.25390638277.20405539.349068{"rings": [[[-13082747.7254, 4033299.2511], [-...
102142USA-117.1947934.057265NetworkServiceAreaMinutesDrive Time Minutes45.0BlockApportionment:US.BlockGroups;PointsLayer:......15.190.305009Eastvale8.270909503800.00.06450857390709.25390638277.20405539.349068{"rings": [[[-13082747.7254, 4033299.2511], [-...

11 rows × 65 columns

m9 = gis.map('United States')
m9
m9.content.add(zip_hlth_intersect_df)
m9.zoom_to_layer(zip_hlth_intersect_df)

Notice that when the filter is applied, the map changes. ZIP Codes that don't meet the filter criteria are removed from the map and the colors change to reflect the recomputed Quantile classification.

Conclusion

The results show possible cities and ZIP Codes where they can explore homes for sale. We can use real estate websites such as Zillow.com or Realtor.com to see if they can find a home in their price range with the characteristics and qualities they're are looking for.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.