Data Visualization - Construction permits, part 1/2

Overview

One indicator of a region's growth is the number of permits issued for new construction. Exploring and analyzing permit activity can help regional planners ensure that development occurs in accordance to the area's long-term goals. One area that has recently experienced rapid growth is Montgomery County, Maryland, a suburban county near Washington, D.C. County planners want to observe spatial and temporal growth trends, find out why certain areas are growing faster than others, and communicate key information about the county's growth to the public.

In this notebook, you'll explore Montgomery County permit data. First, you'll add the permit data from ArcGIS Living Atlas of the World. You'll explore the data and become familiar with exactly what kind of information it contains. Then, you'll analyze the data to detect patterns and find out why growth is occurring. Once you've gathered your findings from your exploration and analysis, you'll share your work online.

Explore the data

To better understand trends in permit activity in Montgomery County, you'll add a dataset of permits issued since 2010. Before you begin your analysis, however, it's important to explore your data and understand what it shows and does not show. You'll familiarize yourself with the data's attributes, sort the data by type, and visualize spatial and temporal trends. In doing so, you'll gain context for your analysis and know exactly which questions you still need to ask to find out why, where, and when growth is occurring.

Connect to your ArcGIS online organization.

from arcgis.gis import GIS
import pandas as pd

from arcgis.features import GeoAccessor, GeoSeriesAccessor

agol_gis = GIS()

Search for the Commercial Permits since 2010 layer. You can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

data = agol_gis.content.search('title: Commercial Permits since 2010', 'Feature layer', 
                               outside_org=True)
data[0]

Commercial Permits Since 2010

Feature Layer Collection by rbohnert
Last Modified: February 27, 2020
0 comments, 206 views

Get the first item from the results.

permits = data[0]

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The permit layer is the first layer in this item. Visualize this layer on a map of Montgomery County, Maryland.

permit_layer = permits.layers[0]

permit_map = agol_gis.map('Montgomery County, Maryland', zoomlevel=9)
permit_map

You can add a number of different layer objects such as FeatureLayer, FeatureCollection, ImageryLayer, MapImageLayer to the map by calling the add_layer() method.

permit_map.add_layer(permit_layer)

Data Exploration

Now that you've added the permit data, you'll explore its contents. Geographic data doesn't only contain information about location; it can also include other attributes not seen on a map.

Convert the layer into a spatially-enabled dataframe to explore these attributes.

permit_layer

<FeatureLayer url:"https://services8.arcgis.com/gNrBvIUXaCagAxJ2/arcgis/rest/services/Commercial_Permits_Since_2010_q9lDty/FeatureServer/0">

sdf = pd.DataFrame.spatial.from_layer(permit_layer)

tail() method gives the last 5 rows of the dataframe.

sdf.tail()

	SHAPE	added_date	address	applicatio	bldgareanu	building_a	city	dayofmonth_added_date	dayofmonth_final_date	dayofmonth_issue_date	...	second_final_date	state	status	use_code	work_type	year_added_date	year_final_date	year_issue_date	zip_code
7507	{"x": -77.19663, "y": 39.104148, "spatialRefer...	2014-01-02	9609 MEDICAL CENTER DR	COMMERCIAL BUILDING	2069.43	2069.43	ROCKVILLE	2	<NA>	27	...	<NA>	MD	Stop Work	BUSINESS BUILDING	ADD	2014	<NA>	2015	20850
7508	{"x": -77.027461, "y": 38.999474, "spatialRefe...	2014-01-31	1015 SPRING ST	COMMERCIAL BUILDING	707.91	707.91	SILVER SPRING	31	25	18	...	0	MD	Stop Work	BUSINESS BUILDING	ADD	2014	2014	2014	20910
7509	{"x": -77.201951, "y": 39.285193, "spatialRefe...	2014-02-11	26100 WOODFIELD RD	COMMERCIAL BUILDING	0.0	0.0	DAMASCUS	11	<NA>	13	...	<NA>	MD	Stop Work	BUSINESS BUILDING	CONSTRUCT	2014	<NA>	2014	20872
7510	{"x": -77.17405, "y": 38.999234, "spatialRefer...	2014-03-06	8500 RIVER RD	COMMERCIAL BUILDING	472.02	472.02	BETHESDA	6	<NA>	11	...	<NA>	MD	Stop Work	COMMERCIAL MISCELLANEOUS STRUC	CONSTRUCT	2014	<NA>	2014	20817
7511	{"x": -77.17405, "y": 38.999234, "spatialRefer...	2014-03-10	8500 RIVER RD	COMMERCIAL BUILDING	8461.55	8461.55	BETHESDA	10	<NA>	11	...	<NA>	MD	Stop Work	COMMERCIAL MISCELLANEOUS STRUC	CONSTRUCT	2014	<NA>	2014	20817

5 rows × 48 columns

The permit data contains a long list of attributes. Some attributes have self-explanatory names, while others may have names that can be difficult to understand without context. The list of attributes can be obtained using the columns of the dataframe.

sdf.columns

Index(['SHAPE', 'added_date', 'address', 'applicatio', 'bldgareanu',
       'building_a', 'city', 'dayofmonth_added_date', 'dayofmonth_final_date',
       'dayofmonth_issue_date', 'dayofweek_added_date', 'dayofweek_final_date',
       'dayofweek_issue_date', 'declared_v', 'declvalnu', 'descriptio', 'fid',
       'final_date', 'hour_added_date', 'hour_final_date', 'hour_issue_date',
       'issue_date', 'latitude', 'location', 'longitude', 'minute_added_date',
       'minute_final_date', 'minute_issue_date', 'month_added_date',
       'month_final_date', 'month_issue_date', 'objectid', 'permit_num',
       'quarter_added_date', 'quarter_final_date', 'quarter_issue_date',
       'rowid', 'second_added_date', 'second_final_date', 'second_issue_date',
       'state', 'status', 'use_code', 'work_type', 'year_added_date',
       'year_final_date', 'year_issue_date', 'zip_code'],
      dtype='object')

sdf.describe().T

	count	mean	min	25%	50%	75%	max	std
added_date	7512	2013-09-03 16:26:27.220447232	2010-01-04 00:00:00	2011-11-16 18:00:00	2013-08-29 00:00:00	2015-06-16 00:00:00	2017-04-17 00:00:00	NaN
bldgareanu	7512.0	8693.80027	0.0	120.168	1379.0	4186.5	1548205.0	41041.713725
building_a	7512.0	8693.80027	0.0	120.168	1379.0	4186.5	1548205.0	41041.713725
dayofmonth_added_date	7512.0	15.883253	1.0	9.0	16.0	23.0	31.0	8.567638
dayofmonth_final_date	3890.0	15.57455	1.0	8.0	16.0	23.0	31.0	8.892878
dayofmonth_issue_date	7054.0	15.95336	1.0	8.0	16.0	23.0	31.0	8.813776
dayofweek_added_date	7512.0	3.087593	0.0	2.0	3.0	4.0	6.0	1.403647
dayofweek_final_date	3890.0	3.028021	0.0	2.0	3.0	4.0	6.0	1.496026
dayofweek_issue_date	7054.0	3.107315	1.0	2.0	3.0	4.0	6.0	1.370517
declared_v	7512.0	551598.996733	0.0	20000.0	57465.0	164175.0	85000000.0	3562007.866605
declvalnu	7512.0	551598.996733	0.0	20000.0	57465.0	164175.0	85000000.0	3562007.866605
fid	7512.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
final_date	3890	2014-01-06 10:47:48.894601728	2010-02-26 00:00:00	2012-07-18 00:00:00	2014-03-13 12:00:00	2015-07-22 00:00:00	2017-04-17 00:00:00	NaN
hour_added_date	7512.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
hour_final_date	3890.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
hour_issue_date	7054.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
issue_date	7054	2013-10-17 19:48:05.511766528	2010-01-21 00:00:00	2012-02-28 00:00:00	2013-09-30 00:00:00	2015-07-09 00:00:00	2017-04-17 00:00:00	NaN
latitude	7512.0	39.078655	38.943754	38.998268	39.05274	39.155405	39.327524	0.087368
longitude	7512.0	-77.129365	-77.484224	-77.203195	-77.112745	-77.058127	-76.924048	0.096566
minute_added_date	7512.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
minute_final_date	3890.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
minute_issue_date	7054.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
month_added_date	7512.0	6.45434	1.0	4.0	6.0	9.0	12.0	3.288958
month_final_date	3890.0	6.739589	1.0	4.0	7.0	10.0	12.0	3.405606
month_issue_date	7054.0	6.627871	1.0	4.0	7.0	9.0	12.0	3.299935
objectid	7512.0	5346.088525	1.0	2611.5	5202.0	8041.25	11224.0	3176.491099
permit_num	7512.0	652461.075346	528631.0	584859.5	647250.5	716736.25	796193.0	76139.847677
quarter_added_date	7512.0	2.487487	1.0	2.0	2.0	3.0	4.0	1.080123
quarter_final_date	3890.0	2.574036	1.0	2.0	3.0	4.0	4.0	1.09994
quarter_issue_date	7054.0	2.539269	1.0	2.0	3.0	3.0	4.0	1.08367
rowid	7512.0	5346.088525	1.0	2611.5	5202.0	8041.25	11224.0	3176.491099
second_added_date	7512.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
second_final_date	3890.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
second_issue_date	7054.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
year_added_date	7512.0	2013.180511	2010.0	2011.0	2013.0	2015.0	2017.0	2.051371
year_final_date	3890.0	2013.498972	2010.0	2012.0	2014.0	2015.0	2017.0	1.972848
year_issue_date	7054.0	2013.286646	2010.0	2012.0	2013.0	2015.0	2017.0	1.999234
zip_code	7512.0	20852.257455	0.0	20832.0	20871.0	20901.0	21771.0	483.077952

Query the types of attributes and explore the data.

sdf.dtypes

SHAPE                          geometry
added_date               datetime64[ns]
address                  string[python]
applicatio               string[python]
bldgareanu                      Float64
building_a                      Float64
city                     string[python]
dayofmonth_added_date             Int32
dayofmonth_final_date             Int32
dayofmonth_issue_date             Int32
dayofweek_added_date              Int32
dayofweek_final_date              Int32
dayofweek_issue_date              Int32
declared_v                      Float64
declvalnu                       Float64
descriptio               string[python]
fid                               Int32
final_date               datetime64[ns]
hour_added_date                   Int32
hour_final_date                   Int32
hour_issue_date                   Int32
issue_date               datetime64[ns]
latitude                        Float64
location                 string[python]
longitude                       Float64
minute_added_date                 Int32
minute_final_date                 Int32
minute_issue_date                 Int32
month_added_date                  Int32
month_final_date                  Int32
month_issue_date                  Int32
objectid                          Int64
permit_num                        Int32
quarter_added_date                Int32
quarter_final_date                Int32
quarter_issue_date                Int32
rowid                             Int32
second_added_date                 Int32
second_final_date                 Int32
second_issue_date                 Int32
state                    string[python]
status                   string[python]
use_code                 string[python]
work_type                string[python]
year_added_date                   Int32
year_final_date                   Int32
year_issue_date                   Int32
zip_code                          Int32
dtype: object

sdf['work_type'].unique()

<StringArray>
[                 'CONSTRUCT',                      'ALTER',
   'COMMERCIAL CHANGE OF USE',    'RESTORE AND / OR REPAIR',
                        'ADD',           'BUILD FOUNDATION',
                    'INSTALL',                    'REPLACE',
 'CONSTRUCT SHEETING/SHORING',              'FINAL ONLY AP']
Length: 10, dtype: string

sdf['status'].unique()

<StringArray>
['Finaled', 'Issued', 'Open', 'Stop Work']
Length: 4, dtype: string

sdf['use_code'].unique()

<StringArray>
[         'MULTI-FAMILY DWELLING',              'BUSINESS BUILDING',
            'MERCANTILE BUILDING', 'COMMERCIAL MISCELLANEOUS STRUC']
Length: 4, dtype: string

Permits by Status

The groupby() method groups the rows per the column and does calculations, such as finding their counts, as shown in the following code.

permits_by_status = sdf.groupby(sdf['status']).size()
permits_by_status

status
Finaled      3860
Issued       2997
Open          455
Stop Work     200
dtype: int64

There are only four permit statuses: Issued, Finaled, Open, and Stop Work. To visualize the number of permits for each status, you'll create a pie chart.

Since the dataframe attributes just show the count of status, you can consider any attribute to graph the status count.

%matplotlib inline
import matplotlib.pyplot as plt

plt.axis('equal') 
permits_by_status.plot(kind='pie', legend=False, label='Permits by Status');

The pie chart above shows the four permit statuses, with the size of each status determined by the number of permits. The vast majority of permits are either Issued or Finaled. Finaled permits are issued permits that have also had the requisite inspections performed.

It's helpful to visualize the spatial distribution of permit attributes on a map. You'll change the map so that each permit's symbol represents its status.

permits_by_status_map = agol_gis.map('Montgomery County, Maryland')
permits_by_status_map

sdf.spatial.plot(kind='map', map_widget=permits_by_status_map,
        renderer_type='u', # specify the unique value renderer using its notation 'u'
        col='status')  # column to get unique values from

True

Permits by Type

permits_by_type = sdf.groupby(['use_code']).size()
permits_by_type

use_code
BUSINESS BUILDING                 3461
COMMERCIAL MISCELLANEOUS STRUC    1197
MERCANTILE BUILDING               1016
MULTI-FAMILY DWELLING             1838
dtype: int64

The series is not sorted properly. Use the sort() method to sort it from highest count to lowest count. The most common use code, Business Buildings, has almost twice as many permits as the second highest, Multi-family Dwelling. The top four use codes together comprise the majority of all permits, so these use codes may be the most important to focus on in your analysis later.

permits_by_type.sort_values(ascending=False, inplace=True)
permits_by_type.head()

use_code
BUSINESS BUILDING                 3461
MULTI-FAMILY DWELLING             1838
COMMERCIAL MISCELLANEOUS STRUC    1197
MERCANTILE BUILDING               1016
dtype: int64

Clean up the data

Before you begin analysis of your data, you'll hide attribute fields you don't intend to use, rename fields with unclear names, and filter your dataset to only show permits with the four most common use codes. These changes won't permanently affect the original dataset, but they will make the data easier to work with and understand.

'Declared_V', 'Building_A', 'Applicatio' attribute fields describe aspects of the data that aren't important for your analysis. You'll drop these fields.

sdf.drop(['declared_v', 'building_a', 'applicatio'], axis=1, inplace=True)

sdf.columns

Index(['SHAPE', 'added_date', 'address', 'bldgareanu', 'city',
       'dayofmonth_added_date', 'dayofmonth_final_date',
       'dayofmonth_issue_date', 'dayofweek_added_date', 'dayofweek_final_date',
       'dayofweek_issue_date', 'declvalnu', 'descriptio', 'fid', 'final_date',
       'hour_added_date', 'hour_final_date', 'hour_issue_date', 'issue_date',
       'latitude', 'location', 'longitude', 'minute_added_date',
       'minute_final_date', 'minute_issue_date', 'month_added_date',
       'month_final_date', 'month_issue_date', 'objectid', 'permit_num',
       'quarter_added_date', 'quarter_final_date', 'quarter_issue_date',
       'rowid', 'second_added_date', 'second_final_date', 'second_issue_date',
       'state', 'status', 'use_code', 'work_type', 'year_added_date',
       'year_final_date', 'year_issue_date', 'zip_code'],
      dtype='object')

The fields are no longer listed.

Next, you'll rename some of the attribute fields with shortened or unclear names so that their names are more descriptive.

sdf.rename(columns={"descriptio": "Description", "bldgareanu": "Building_Area", "declvalnu": "Declared_Value"}, inplace=True)

sdf.columns

Index(['SHAPE', 'added_date', 'address', 'Building_Area', 'city',
       'dayofmonth_added_date', 'dayofmonth_final_date',
       'dayofmonth_issue_date', 'dayofweek_added_date', 'dayofweek_final_date',
       'dayofweek_issue_date', 'Declared_Value', 'Description', 'fid',
       'final_date', 'hour_added_date', 'hour_final_date', 'hour_issue_date',
       'issue_date', 'latitude', 'location', 'longitude', 'minute_added_date',
       'minute_final_date', 'minute_issue_date', 'month_added_date',
       'month_final_date', 'month_issue_date', 'objectid', 'permit_num',
       'quarter_added_date', 'quarter_final_date', 'quarter_issue_date',
       'rowid', 'second_added_date', 'second_final_date', 'second_issue_date',
       'state', 'status', 'use_code', 'work_type', 'year_added_date',
       'year_final_date', 'year_issue_date', 'zip_code'],
      dtype='object')

There are other fields that you may want to either rename or remove, but for the purposes of this lesson, these are enough.

Filter the permits

Next, you'll filter the permits to reduce the number of records in your analysis. As you saw previously, there are four types of permits that comprise over half the total number of permits. Focusing your analysis on just these four types will reduce the amount of data to analyze without ignoring the most important types of development. To remove the other use codes, you'll create a filter.

permits_by_type.head(4) # top 4 Use_Codes

use_code
BUSINESS BUILDING                 3461
MULTI-FAMILY DWELLING             1838
COMMERCIAL MISCELLANEOUS STRUC    1197
MERCANTILE BUILDING               1016
dtype: int64

filtered_permits = list(permits_by_type.head(4).index)
filtered_permits

['BUSINESS BUILDING',
 'MULTI-FAMILY DWELLING',
 'COMMERCIAL MISCELLANEOUS STRUC',
 'MERCANTILE BUILDING']

To visualize the top 4 Use Codes on a map, you can filer the dataframe with Use_Code containing only the top 4 attribute value.

filtered_df = sdf.loc[sdf['use_code'].isin(filtered_permits)]

filtered_df.head()

	SHAPE	added_date	address	Building_Area	city	dayofmonth_added_date	dayofmonth_final_date	dayofmonth_issue_date	dayofweek_added_date	dayofweek_final_date	...	state	status	use_code	work_type	year_added_date	year_final_date	year_issue_date	zip_code
0	{"x": -77.278324, "y": 39.178067, "spatialRefe...	2010-01-07	13536 WATERFORD HILLS BLVD	1336.0	GERMANTOWN	7	13	8	4	2	...	MD	Finaled	MULTI-FAMILY DWELLING	CONSTRUCT	2010	2012	2011	20874
1	{"x": -77.27829, "y": 39.178056, "spatialRefer...	2010-01-07	13538 WATERFORD HILLS BLVD	1730.0	GERMANTOWN	7	15	8	4	4	...	MD	Finaled	MULTI-FAMILY DWELLING	CONSTRUCT	2010	2012	2011	20874
2	{"x": -77.278256, "y": 39.178045, "spatialRefe...	2010-01-07	13540 WATERFORD HILLS BLVD	1336.0	GERMANTOWN	7	15	8	4	4	...	MD	Finaled	MULTI-FAMILY DWELLING	CONSTRUCT	2010	2012	2011	20874
3	{"x": -77.278223, "y": 39.178035, "spatialRefe...	2010-01-07	13542 WATERFORD HILLS BLVD	1730.0	GERMANTOWN	7	15	8	4	4	...	MD	Finaled	MULTI-FAMILY DWELLING	CONSTRUCT	2010	2012	2011	20874
4	{"x": -77.278189, "y": 39.178024, "spatialRefe...	2010-01-07	13544 WATERFORD HILLS BLVD	1336.0	GERMANTOWN	7	15	8	4	4	...	MD	Finaled	MULTI-FAMILY DWELLING	CONSTRUCT	2010	2012	2011	20874

5 rows × 45 columns

sdf.shape, filtered_df.shape

((7512, 45), (7512, 45))

The dataset is filtered. Instead of more than 11,000 permits, the filtered dataframe has about 7,500.

Visualize filtered dataset

filtered_map = agol_gis.map('Montgomery County, Maryland')

filtered_map

filtered_df.spatial.plot(kind='map', map_widget=filtered_map,
        renderer_type='u', # specify the unique value renderer using its notation 'u'
        col='use_code')  # column to get unique values from

True

Visualize temporal and spatial trends

Your data show permits, but what do these permits say about when and where growth is happening in the county? Your data also contains temporal attribute fields, such as Added_Date, which indicates when a permit was first added to the system. The field has several values that break down the data by year, month, and even hour.

Split the Added_date to get year, month, week_of_day

sdf['datetime'] = pd.to_datetime(sdf['added_date'], unit='ms')
sdf['year'], sdf['month'], sdf['day_of_week'] = sdf.datetime.dt.year, sdf.datetime.dt.month, sdf.datetime.dt.dayofweek

Visualize permits by time of issue

You'll create chart cards for the year, month, and day subfields to visualize patterns in permit activity over time.

import seaborn as sns

sns.countplot(x="year", data=sdf);

The chart shows the number of permits issued each year since 2010. (The year 2017 has significantly fewer permits because the dataset only covers part of 2017.) You can compare the number of permits visually by the size of each bar. Although some fluctuation occurs from year to year, most years had similar permit activity.

Similarly you can visualize it by month as well as day_of_week

sns.countplot(x="month", data=sdf);

This bar chart changes to show the number of permits issued by month. Based on the chart, the highest permit activity occurs in June and July.

sns.countplot(x="day_of_week", data=sdf);

Almost all permit activity occurs on weekdays. Government offices are closed on weekends, so few permits are issued then.

ddf = sdf.set_index('datetime')

ddf['num'] = 1
ddf['num'].resample('M').sum().plot();

A huge spike in permit activity occurred in mid-2011. What caused this spike? Is it an increase in overall permit activity, or is it mostly an increase in a certain type of permit? You'll plot the number of permits based on Use_Code to find which one cased the spike.

fig = plt.figure(figsize=(15,5))
ax = fig.add_subplot(1, 1, 1)

ax.plot(ddf['num'].resample('M').sum(), 'k', label='Total permits')
for use_code in filtered_permits:
    x = ddf[ddf.use_code == use_code]['num'].resample('M').sum()
    ax.plot(x, label=use_code)
ax.legend();

Based on the legend, permit activity spiked in 2011 due to a sharp increase in the number of multifamily dwelling permits issued. This likely means that there was large residential growth in 2011.

You've investigated some temporal patterns in your data. Next, you'll look at spatial patterns. Are there certain areas in the county that have experienced a relatively high degree of permit activity? Was the 2011 spike in residential permits in a specific location? To find out, you'll change the symbology of the map card to show hot spots, or areas with concentrations of points.

hotspot_map = agol_gis.map('Germantown, Montgomery County, Maryland')
hotspot_map

sdf.spatial.plot(kind='map', map_widget=hotspot_map,
        renderer_type='h', 
        col='status')

True

The hot spots show up where there is a high concentration of permits. The highest concentration areas are in the southeast and northwest corners of the county, which correspond to the major population centers of Germantown and the suburban communities near Washington, D.C.

Next, you'll see if the 2011 permit spike corresponds to a specific area of the map. The code below filters the dataframe to only show permits from 2011 and highlights related data in the map. In this case, the heat map changes to show the hot spot in the northwest part of the county, near Germantown.

hotspot_2011_map = agol_gis.map('Germantown, Montgomery County, Maryland')
hotspot_2011_map

sdf[sdf.year==2011].spatial.plot(kind='map', map_widget=hotspot_2011_map,
        renderer_type='h',
        col='status')  # column to get unique values from

True