Data Visualization - Construction permits, part 1/2

Overview

One indicator of a region's growth is the number of permits issued for new construction. Exploring and analyzing permit activity can help regional planners ensure that development occurs in accordance to the area's long-term goals. One area that has recently experienced rapid growth is Montgomery County, Maryland, a suburban county near Washington, D.C. County planners want to observe spatial and temporal growth trends, find out why certain areas are growing faster than others, and communicate key information about the county's growth to the public.

In this notebook, you'll explore Montgomery County permit data. First, you'll add the permit data from ArcGIS Living Atlas of the World. You'll explore the data and become familiar with exactly what kind of information it contains. Then, you'll analyze the data to detect patterns and find out why growth is occurring. Once you've gathered your findings from your exploration and analysis, you'll share your work online.

Explore the data

To better understand trends in permit activity in Montgomery County, you'll add a dataset of permits issued since 2010. Before you begin your analysis, however, it's important to explore your data and understand what it shows and does not show. You'll familiarize yourself with the data's attributes, sort the data by type, and visualize spatial and temporal trends. In doing so, you'll gain context for your analysis and know exactly which questions you still need to ask to find out why, where, and when growth is occurring.

Connect to your ArcGIS online organization.

Input
from arcgis.gis import GIS
import pandas as pd

from arcgis.features import GeoAccessor, GeoSeriesAccessor
Input
agol_gis = GIS()

Search for the Commercial Permits since 2010 layer. You can specify the owner's name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

Input
data = agol_gis.content.search('title: Commercial Permits since 2010 owner: rpeake_LearnGIS', 'Feature layer', 
                               outside_org=True)
data[0]
Output
Commercial Permits since 2010
Commercial building permits issued in Montgomery County, Maryland, since 2010.Feature Layer Collection by rpeake_LearnGIS
Last Modified: December 09, 2017
0 comments, 846 views

Get the first item from the results.

Input
permits = data[0]

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The permit layer is the first layer in this item. Visualize this layer on a map of Montgomery County, Maryland.

Input
permit_layer = permits.layers[0]
Input
permit_map = agol_gis.map('Montgomery County, Maryland', zoomlevel=9)
permit_map
Output

You can add a number of different layer objects such as FeatureLayer, FeatureCollection, ImageryLayer, MapImageLayer to the map by calling the add_layer() method.

Input
permit_map.add_layer(permit_layer)

Data Exploration

Now that you've added the permit data, you'll explore its contents. Geographic data doesn't only contain information about location; it can also include other attributes not seen on a map.

Convert the layer into a spatially-enabled dataframe to explore these attributes.

Input
permit_layer
Output
<FeatureLayer url:"https://services2.arcgis.com/j80Jz20at6Bi0thr/arcgis/rest/services/Commercial_Permits_since_2010/FeatureServer/0">
Input
sdf = pd.DataFrame.spatial.from_layer(permit_layer)

tail() method gives the last 5 rows of the dataframe.

Input
sdf.tail()
Output
Added_Date Address Applicatio BldgAreaNu Building_A City DeclValNu DeclValNu2 Declared_V Descriptio ... Pre_direct SHAPE State Status Street_Nam Street_Num Street_Suf Use_Code Work_Type ZIP_code
11219 2014-01-31 1015 SPRING ST COMMERCIAL BUILDING 707.91 707.91 SILVER SPRING 42000.0 42000.0 $42,000.00 Silver Spring Enterprise Zone\n\nAdd exterior ... ... {"x": -8574668.7047, "y": 4721607.997599997, "... MD Stop Work SPRING 1015 ST BUSINESS BUILDING ADD 20910
11220 2014-02-11 26100 WOODFIELD RD COMMERCIAL BUILDING 0.00 0 DAMASCUS 5875.0 5875.0 $5,875.00 PYLON SIGN ... {"x": -8594080.2264, "y": 4762636.6635000035, ... MD Stop Work WOODFIELD 26100 RD BUSINESS BUILDING CONSTRUCT 20872
11221 2014-02-20 10520 MONTROSE AVE COMMERCIAL BUILDING 728.00 728 BETHESDA 31000.0 31000.0 $31,000.00 Remodeling a one story building with walk-out ... ... {"x": -8582314.4798, "y": 4725770.635600001, "... MD Stop Work MONTROSE 10520 AVE ASSEMBLY BUILDING ADD 20814
11222 2014-03-06 8500 RIVER RD COMMERCIAL BUILDING 472.02 472.02 BETHESDA 1000.0 1000.0 $1,000.00 TOTAL OF 17 GROUPED TENTS FOR 2014 QUICKEN LOA... ... {"x": -8591206.0439, "y": 4721680.315399997, "... MD Stop Work RIVER 8500 RD COMMERCIAL MISCELLANEOUS STRUC CONSTRUCT 20817
11223 2014-03-10 8500 RIVER RD COMMERCIAL BUILDING 8461.55 8461.55 BETHESDA 1.0 1.0 $1.00 1 GRANDSTAND & MULTIPLE PLATFORMS FOR 2014 QUI... ... {"x": -8591206.0439, "y": 4721680.315399997, "... MD Stop Work RIVER 8500 RD COMMERCIAL MISCELLANEOUS STRUC CONSTRUCT 20817

5 rows × 26 columns

The permit data contains a long list of attributes. Some attributes have self-explanatory names, while others may have names that can be difficult to understand without context. The list of attributes can be obtained using the columns of the dataframe.

Input
sdf.columns
Output
Index(['Added_Date', 'Address', 'Applicatio', 'BldgAreaNu', 'Building_A',
       'City', 'DeclValNu', 'DeclValNu2', 'Declared_V', 'Descriptio', 'FID',
       'Final_Date', 'Issue_Date', 'Location', 'Permit_Num', 'Post_direc',
       'Pre_direct', 'SHAPE', 'State', 'Status', 'Street_Nam', 'Street_Num',
       'Street_Suf', 'Use_Code', 'Work_Type', 'ZIP_code'],
      dtype='object')
Input
sdf.describe().T
Output
count mean std min 25% 50% 75% max
BldgAreaNu 11224.0 9241.778777 3.950484e+04 0.0 255.00 1537.0 4000.00 1.548205e+06
DeclValNu 11224.0 784736.035620 1.152229e+07 0.0 20000.00 74000.0 200000.00 1.129634e+09
DeclValNu2 11224.0 784736.035620 1.152229e+07 0.0 20000.00 74000.0 200000.00 1.129634e+09
FID 11224.0 5612.500000 3.240234e+03 1.0 2806.75 5612.5 8418.25 1.122400e+04
Permit_Num 11224.0 655806.112794 7.759285e+04 528631.0 587437.50 652073.5 722003.25 7.961930e+05
ZIP_code 11224.0 20848.988863 5.581882e+02 0.0 20832.00 20871.0 20901.00 2.177100e+04

Query the types of attributes and explore the data.

Input
sdf.dtypes
Output
Added_Date    datetime64[ns]
Address               object
Applicatio            object
BldgAreaNu           float64
Building_A            object
City                  object
DeclValNu            float64
DeclValNu2           float64
Declared_V            object
Descriptio            object
FID                    int64
Final_Date    datetime64[ns]
Issue_Date    datetime64[ns]
Location              object
Permit_Num             int64
Post_direc            object
Pre_direct            object
SHAPE               geometry
State                 object
Status                object
Street_Nam            object
Street_Num            object
Street_Suf            object
Use_Code              object
Work_Type             object
ZIP_code               int64
dtype: object
Input
sdf['Work_Type'].unique()
Output
array(['CONSTRUCT', 'ALTER', 'COMMERCIAL CHANGE OF USE',
       'RESTORE AND / OR REPAIR', 'ADD', 'BUILD FOUNDATION', 'INSTALL',
       'REPLACE', 'CONSTRUCT SHEETING/SHORING', 'FINAL ONLY AP',
       'REMOVE AND REPLACE', 'OCCUPY', 'DEMOLISH'], dtype=object)
Input
sdf['Status'].unique()
Output
array(['Finaled', 'Issued', 'Open', 'Stop Work'], dtype=object)
Input
sdf['Use_Code'].unique()
Output
array(['MULTI-FAMILY DWELLING', 'RESTAURANT', 'BUSINESS BUILDING',
       'MERCANTILE BUILDING', 'PLACE OF WORSHIP', 'ASSEMBLY BUILDING',
       'STORAGE BUILDING', 'GARAGE', 'INSTITUTIONAL BUILDING',
       'COMMERCIAL MISCELLANEOUS STRUC', 'INDUSTRIAL BUILDING',
       'EDUCATIONAL BUILDING', 'TOWER', 'SWIMMING POOL', 'FENCE', 'BANK',
       'SHED', 'MULTI-FAMILY SENIOR CITIZEN BL', 'RETAINING WALL',
       'TRAILER', 'HOSPITAL', 'BIOSCIENCE', 'TOWNHOUSE', 'HOTEL',
       'FACTORY', 'BOARDING HOUSE', 'SWIMMING POOL & FENCE',
       'UTILITY, MISCELLANEOUS', 'THEATER',
       'MULTIFAMILY DWELLING HIGH RISE', 'MULTIFAMILY DWELLING LOW RISE',
       'MISCELLANEOUS STRUCTURE', 'OWNERSHIP UNIT'], dtype=object)

Permits by Status

The groupby() method groups the rows per the column and does calculations, such as finding their counts, as shown in the following code.

Input
permits_by_status = sdf.groupby(sdf['Status']).size()
permits_by_status
Output
Status
Finaled      5341
Issued       4696
Open          757
Stop Work     430
dtype: int64

There are only four permit statuses: Issued, Finaled, Open, and Stop Work. To visualize the number of permits for each status, you'll create a pie chart.

Since the dataframe attributes just show the count of status, you can consider any attribute to graph the status count.

Input
%matplotlib inline
import matplotlib.pyplot as plt
Input
plt.axis('equal') 
permits_by_status.plot(kind='pie', legend=False, label='Permits by Status');

The pie chart above shows the four permit statuses, with the size of each status determined by the number of permits. The vast majority of permits are either Issued or Finaled. Finaled permits are issued permits that have also had the requisite inspections performed.

It's helpful to visualize the spatial distribution of permit attributes on a map. You'll change the map so that each permit's symbol represents its status.

Input
permits_by_status_map = agol_gis.map('Montgomery County, Maryland')
permits_by_status_map
Output

Input
sdf.spatial.plot(kind='map', map_widget=permits_by_status_map,
        renderer_type='u', # specify the unique value renderer using its notation 'u'
        col='Status')  # column to get unique values from
Output
True

Permits by Type

Input
permits_by_type = sdf.groupby(['Use_Code']).size()
permits_by_type
Output
Use_Code
ASSEMBLY BUILDING                  394
BANK                                87
BIOSCIENCE                          39
BOARDING HOUSE                       3
BUSINESS BUILDING                 3461
COMMERCIAL MISCELLANEOUS STRUC    1197
EDUCATIONAL BUILDING               658
FACTORY                              4
FENCE                               10
GARAGE                              56
HOSPITAL                           143
HOTEL                               44
INDUSTRIAL BUILDING                 53
INSTITUTIONAL BUILDING              30
MERCANTILE BUILDING               1016
MISCELLANEOUS STRUCTURE              9
MULTI-FAMILY DWELLING             1838
MULTI-FAMILY SENIOR CITIZEN BL      71
MULTIFAMILY DWELLING HIGH RISE      10
MULTIFAMILY DWELLING LOW RISE       31
OWNERSHIP UNIT                       1
PLACE OF WORSHIP                   167
RESTAURANT                         638
RETAINING WALL                     185
SHED                                28
STORAGE BUILDING                   208
SWIMMING POOL                       51
SWIMMING POOL & FENCE                5
THEATER                              1
TOWER                               14
TOWNHOUSE                          680
TRAILER                             91
UTILITY, MISCELLANEOUS               1
dtype: int64

The series is not sorted properly. Use the sort() method to sort it from highest count to lowest count. The most common use code, Business Buildings, has almost twice as many permits as the second highest, Multi-family Dwelling. The top four use codes together comprise the majority of all permits, so these use codes may be the most important to focus on in your analysis later.

Input
permits_by_type.sort_values(ascending=False, inplace=True)
permits_by_type.head()
Output
Use_Code
BUSINESS BUILDING                 3461
MULTI-FAMILY DWELLING             1838
COMMERCIAL MISCELLANEOUS STRUC    1197
MERCANTILE BUILDING               1016
TOWNHOUSE                          680
dtype: int64

Clean up the data

Before you begin analysis of your data, you'll hide attribute fields you don't intend to use, rename fields with unclear names, and filter your dataset to only show permits with the four most common use codes. These changes won't permanently affect the original dataset, but they will make the data easier to work with and understand.

'Declared_V', 'Building_A', 'Applicatio' attribute fields describe aspects of the data that aren't important for your analysis. You'll drop these fields.

Input
sdf.drop(['Declared_V', 'Building_A', 'Applicatio'], axis=1, inplace=True)
Input
sdf.columns
Output
Index(['Added_Date', 'Address', 'BldgAreaNu', 'City', 'DeclValNu',
       'DeclValNu2', 'Descriptio', 'FID', 'Final_Date', 'Issue_Date',
       'Location', 'Permit_Num', 'Post_direc', 'Pre_direct', 'SHAPE', 'State',
       'Status', 'Street_Nam', 'Street_Num', 'Street_Suf', 'Use_Code',
       'Work_Type', 'ZIP_code'],
      dtype='object')

The fields are no longer listed.

Next, you'll rename some of the attribute fields with shortened or unclear names so that their names are more descriptive.

Input
sdf.rename(columns={"Descriptio": "Description", "BldgAreaNu": "Building_Area", "DeclValNu": "Declared_Value"}, inplace=True)
Input
sdf.columns
Output
Index(['Added_Date', 'Address', 'Building_Area', 'City', 'Declared_Value',
       'DeclValNu2', 'Description', 'FID', 'Final_Date', 'Issue_Date',
       'Location', 'Permit_Num', 'Post_direc', 'Pre_direct', 'SHAPE', 'State',
       'Status', 'Street_Nam', 'Street_Num', 'Street_Suf', 'Use_Code',
       'Work_Type', 'ZIP_code'],
      dtype='object')

There are other fields that you may want to either rename or remove, but for the purposes of this lesson, these are enough.

Filter the permits

Next, you'll filter the permits to reduce the number of records in your analysis. As you saw previously, there are four types of permits that comprise over half the total number of permits. Focusing your analysis on just these four types will reduce the amount of data to analyze without ignoring the most important types of development. To remove the other use codes, you'll create a filter.

Input
permits_by_type.head(4) # top 4 Use_Codes
Output
Use_Code
BUSINESS BUILDING                 3461
MULTI-FAMILY DWELLING             1838
COMMERCIAL MISCELLANEOUS STRUC    1197
MERCANTILE BUILDING               1016
dtype: int64
Input
filtered_permits = list(permits_by_type.head(4).index)
filtered_permits
Output
['BUSINESS BUILDING',
 'MULTI-FAMILY DWELLING',
 'COMMERCIAL MISCELLANEOUS STRUC',
 'MERCANTILE BUILDING']

To visualize the top 4 Use Codes on a map, you can filer the dataframe with Use_Code containing only the top 4 attribute value.

Input
filtered_df = sdf.loc[sdf['Use_Code'].isin(filtered_permits)]
Input
filtered_df.head()
Output
Added_Date Address Building_Area City Declared_Value DeclValNu2 Description FID Final_Date Issue_Date ... Pre_direct SHAPE State Status Street_Nam Street_Num Street_Suf Use_Code Work_Type ZIP_code
0 2010-01-07 13536 WATERFORD HILLS BLVD 1336.0 GERMANTOWN 103000.0 103000.0 MODEL: TULIP - Unit #D036 - BLDG #4 1 2012-03-13 2011-07-08 ... {"x": -8602565.104, "y": 4747203.126800001, "s... MD Finaled WATERFORD HILLS 13536 BLVD MULTI-FAMILY DWELLING CONSTRUCT 20874
1 2010-01-07 13538 WATERFORD HILLS BLVD 1730.0 GERMANTOWN 117000.0 117000.0 MODEL: ORCHID - Unit #D038 - BLDG #4 2 2012-03-15 2011-07-08 ... {"x": -8602567.5243, "y": 4747204.106700003, "... MD Finaled WATERFORD HILLS 13538 BLVD MULTI-FAMILY DWELLING CONSTRUCT 20874
2 2010-01-07 13540 WATERFORD HILLS BLVD 1336.0 GERMANTOWN 103000.0 103000.0 MODEL: TULIP - Unit #D040 - BLDG #4 3 2012-03-15 2011-07-08 ... {"x": -8602569.9445, "y": 4747205.0867, "spati... MD Finaled WATERFORD HILLS 13540 BLVD MULTI-FAMILY DWELLING CONSTRUCT 20874
3 2010-01-07 13542 WATERFORD HILLS BLVD 1730.0 GERMANTOWN 117000.0 117000.0 MODEL: ORCHID - Unit #D042 - BLDG #4 4 2012-03-15 2011-07-08 ... {"x": -8602572.3648, "y": 4747206.066600002, "... MD Finaled WATERFORD HILLS 13542 BLVD MULTI-FAMILY DWELLING CONSTRUCT 20874
4 2010-01-07 13544 WATERFORD HILLS BLVD 1336.0 GERMANTOWN 103000.0 103000.0 MODEL: TULIP - Unit #D044 - BLDG #4 5 2012-03-15 2011-07-08 ... {"x": -8602574.7851, "y": 4747207.046599999, "... MD Finaled WATERFORD HILLS 13544 BLVD MULTI-FAMILY DWELLING CONSTRUCT 20874

5 rows × 23 columns

Input
sdf.shape, filtered_df.shape
Output
((11224, 23), (7512, 23))

The dataset is filtered. Instead of more than 11,000 permits, the filtered dataframe has about 7,500.

Visualize filtered dataset

Input
filtered_map = agol_gis.map('Montgomery County, Maryland')
Input
filtered_map
Output

Input
filtered_df.spatial.plot(kind='map', map_widget=filtered_map,
        renderer_type='u', # specify the unique value renderer using its notation 'u'
        col='Use_Code')  # column to get unique values from
Output
True

Your data show permits, but what do these permits say about when and where growth is happening in the county? Your data also contains temporal attribute fields, such as Added_Date, which indicates when a permit was first added to the system. The field has several values that break down the data by year, month, and even hour.

Split the Added_date to get year, month, week_of_day

Input
sdf['datetime'] = pd.to_datetime(sdf['Added_Date'], unit='ms')
sdf['year'], sdf['month'], sdf['day_of_week'] = sdf.datetime.dt.year, sdf.datetime.dt.month, sdf.datetime.dt.dayofweek

Visualize permits by time of issue

You'll create chart cards for the year, month, and day subfields to visualize patterns in permit activity over time.

Input
import seaborn as sns
Input
sns.countplot(x="year", data=sdf);

The chart shows the number of permits issued each year since 2010. (The year 2017 has significantly fewer permits because the dataset only covers part of 2017.) You can compare the number of permits visually by the size of each bar. Although some fluctuation occurs from year to year, most years had similar permit activity.

Similarly you can visualize it by month as well as day_of_week

Input
sns.countplot(x="month", data=sdf);

This bar chart changes to show the number of permits issued by month. Based on the chart, the highest permit activity occurs in June and July.

Input
sns.countplot(x="day_of_week", data=sdf);

Almost all permit activity occurs on weekdays. Government offices are closed on weekends, so few permits are issued then.

Input
ddf = sdf.set_index('datetime')
Input
ddf['num'] = 1
ddf['num'].resample('M').sum().plot();

A huge spike in permit activity occurred in mid-2011. What caused this spike? Is it an increase in overall permit activity, or is it mostly an increase in a certain type of permit? You'll plot the number of permits based on Use_Code to find which one cased the spike.

Input
fig = plt.figure(figsize=(15,5))
ax = fig.add_subplot(1, 1, 1)

ax.plot(ddf['num'].resample('M').sum(), 'k', label='Total permits')
for use_code in filtered_permits:
    x = ddf[ddf.Use_Code == use_code]['num'].resample('M').sum()
    ax.plot(x, label=use_code)
ax.legend();

Based on the legend, permit activity spiked in 2011 due to a sharp increase in the number of multifamily dwelling permits issued. This likely means that there was large residential growth in 2011.

You've investigated some temporal patterns in your data. Next, you'll look at spatial patterns. Are there certain areas in the county that have experienced a relatively high degree of permit activity? Was the 2011 spike in residential permits in a specific location? To find out, you'll change the symbology of the map card to show hot spots, or areas with concentrations of points.

Input
hotspot_map = agol_gis.map('Germantown, Montgomery County, Maryland')
hotspot_map
Output
Input
sdf.spatial.plot(kind='map', map_widget=hotspot_map,
        renderer_type='h', 
        col='Status') 
Output
True

The hot spots show up where there is a high concentration of permits. The highest concentration areas are in the southeast and northwest corners of the county, which correspond to the major population centers of Germantown and the suburban communities near Washington, D.C.

Next, you'll see if the 2011 permit spike corresponds to a specific area of the map. The code below filters the dataframe to only show permits from 2011 and highlights related data in the map. In this case, the heat map changes to show the hot spot in the northwest part of the county, near Germantown.

Input
hotspot_2011_map = agol_gis.map('Germantown, Montgomery County, Maryland')
hotspot_2011_map
Output
Input
sdf[sdf.year==2011].spatial.plot(kind='map', map_widget=hotspot_2011_map,
        renderer_type='h',
        col='Status')  # column to get unique values from
Output
True

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.