Safe Streets to Schools

Introduction

Accidents near elementary schools, such as this one has drawn the attention of city employees and civic-minded individuals to the topic of pedestrian and bicycle safety.

In this notebook, we want to suggest policy actions to civic authorities of Pasadena, California that will reduce the likelihood of future accidents. We will map accident data regarding pedestrians and cyclists struck by vehicles. Then, we will determine the number of accidents that occurred within each school zone and identify the five most dangerous zones. We will present this analysis using a notebook that provides narrative context and helps stakeholders understand our methodology.

The sample uses ArcGIS API for Python to help city officials in improving pedestrian and bicycle safety near schools in the city. We will first uncover patterns in accident data using spatial analysis tool such as calculate_density, find_hot_spots and HeatmapRenderer. HeatmapRenderer highlights the capabilities of map widget to show areas where large number of accidents occurred. To visualize different category of accidents on the map, create_symbol tool plays a vital role along with functionality of renderers and arcade expressions.

As the visualizations are not enough to make policy decisions, we'll further use create_drive_time_areas and summarize_within tools to determine the number of accidents that have occurred within half mile walk distance from each school.

Since the city does not have enough funds that encompasses all school zones, we will limit our analysis to the five most vulnerable areas. So, we apply filter to identify the five most dangerous zones in which a pedestrian or cyclist was injured or killed. City officials could then suggest measures such as adding more bike lanes in these areas or change street signs to reduce accidents near schools and increase safety.

Workflow

Necessary Imports

%matplotlib inline

import pandas as pd
from datetime import datetime as dt
import matplotlib.pyplot as plt
from IPython.display import display

from arcgis.gis import GIS
from arcgis.features.analyze_patterns import calculate_density, find_hot_spots
from arcgis.mapping.symbol import create_symbol
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.summarize_data import summarize_within

Connect to your ArcGIS Online organization

Connect to the GIS via an existing profile or creating a new connection by e.g. gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123")

gis = GIS('home')

Get the data for the analysis

Search for the Traffic Collisions layer. You can specify the owner's name as "api_data_owner" to get more specific results.

items = gis.content.search('title:Traffic Collisions, owner:api_data_owner', 'feature layer')

items[0]

Traffic Collisions
Traffic collisions in Pasadena, California, since 2008.

Feature Layer Collection by api_data_owner
Last Modified: April 26, 2019
0 comments, 24 views

item = items[0]

The item shows all accidents that occurred in Pasadena during the past decade.

We can print the names of the layers in this item and assign them to variables that will be used in our analysis

The code below prints name of layers.

for lyr in item.layers:
    print(lyr.properties.name)

collisions

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The collisions layer is the only layer in this item.

collisions = item.layers[0]

m1 = gis.map('Pasadena, California', 13)
m1

The map shows all accidents that occurred in Pasadena during the past decade.

m1.add_layer(collisions)

m1.legend = True

As the layer does not only contain information about location, but it also has other attributes that cannot be seen on map. So using query method, we can query the features on a feature layer. sdf property of featureSet class is a powerful tool to visualize all the features as Pandas DataFrame.

collisions.query().sdf.columns

Index(['FID', 'Accidno', 'Cause', 'CollisnTyp', 'Controls', 'CrossSt',
       'Damage1', 'Damage2', 'Date', 'Day', 'Direction', 'Direction1',
       'Direction2', 'Distance', 'HitAndRun', 'Injury', 'InvWith', 'Lighting',
       'Movement1', 'Movement2', 'NoInjured', 'NoKilled', 'OBJECTID',
       'PartyAge1', 'PartyAge2', 'PartySex1', 'PartySex2', 'PartyType1',
       'PartyType2', 'PedAction', 'PtyAtFault', 'PvtProp', 'RoadCond',
       'RoadSurf', 'SafetyEq1', 'SafetyEq2', 'Sobriety1', 'Sobriety2',
       'SpecInfo1', 'SpecInfo2', 'SpeedLim1', 'SpeedLim2', 'Street', 'Time',
       'VehMake1', 'VehMake2', 'VehModel1', 'VehModel2', 'VehType1',
       'VehType2', 'VehYear1', 'VehYear2', 'Violation', 'Weather', 'X', 'Y',
       'SHAPE'],
      dtype='object')

collisions.query().sdf.InvWith.unique()

array(['Other Motor Vehicle', 'Pedestrian', 'Fixed Object', 'Bicycle',
       'Parked Motor Vehicle', 'Other Object',
       'Motor Vehicle on Other Roadway', 'Non-Collision', 'Train',
       'Animal', 'Not Stated', 'Other'], dtype=object)

Filter the layer to show only accidents that involved a pedestrian or cyclist.

collisions.filter = "(InvWith = 'Bicycle') OR (InvWith = 'Pedestrian')"

Uncover patterns

We are able to visualize accidents on map. However, it does not reveal any patterns. There can possibly be areas where particularly large number of accidents are occurring.

Different ways to find patterns in data includes point clustering, heat maps, hot spot analysis and calculate density. These methods reveal where accidents are happening at abnormal rates.

First, we will apply calculate_density tool to get better insights on data.

collision_density = calculate_density(input_layer=collisions,
                                      output_name='density_of_incidents' + str(dt.now().microsecond))

collision_density

density_of_incidents660540

Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views

m2 = gis.map('Pasadena, California', 13)
m2

Areas with large number of accidents show up as more densely colored as compared to other areas.

m2.add_layer(collision_density)

m2.legend = True

find_hot_spots method finds statistically significant spatial clusters of high values (hot spots) and low values (cold spots).

collision_hot_spots = find_hot_spots(collisions,
                                     output_name='collision_hexagon_hot_spots' + str(dt.now().microsecond),
                                     shape_type='hexagon')

collision_hot_spots

collision_hexagon_hot_spots539052

Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views

m3 = gis.map('Pasadena, California', 13)
m3

The red hex bins show areas of spatially significant clustering, while white hex bins show areas with no significant clustering. There are no blue areas on the map, but if there were, they would represent areas with low statistically significant clustering. The map indicates that accidents happen across the city, but with a statistically significant clustering in the downtown area.

m3.add_layer(collision_hot_spots)

m3.legend = True

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

We will use HeatmapRenderer to display generalized point pattern locations.

m4 = gis.map('Pasadena, California', 13)
m4

Like the calculate_density, the HeatmapRenderer indicates a high density of accidents in central Pasadena. However, the heat map particularly emphasizes the area south of the Foothill Freeway, corresponding to downtown.

m4.add_layer({"type": "FeatureLayer",
              "url": collisions.url,   
              "renderer": "HeatmapRenderer",
              "opacity": 0.75})

m4.legend = True

We now have a better understanding of the data.

Symbolize by category

Next, we will change the symbols of accidents layer to show different categories of accidents. In particular, we want to distinguish fatal accidents from accidents with only injuries. We also want to distinguish pedestrian accidents from bicycle accidents. This information will add more detail to our findings and will help support policymaker decisions.

df = collisions.query().sdf

filtered_df = df[(df['InvWith'] == "Pedestrian") | (df['InvWith'] == "Bicycle")]

filtered_df.head()

	FID	Accidno	Cause	CollisnTyp	Controls	CrossSt	Date	Day	...	VehType1	VehType2	Violation	Weather	X	Y	SHAPE
1	2	10407211603116	Other Hazardous Movement	Vehicle - Pedestrian	Functioning	MARENGO AVE	6/29/2008	Sunday	...	Passenger Car		21451	Clear	6517494	1875480	{'x': -13151952.7988482, 'y': 4048399.24990493...
11	12	10409075901001	Auto R/W Violation	Broadside	Functioning	RAYMOND AVE	7/1/2008	Tuesday	...			21801	Clear	6516612	1876820	{'x': -13152278.024718883, 'y': 4048894.055900...
26	27	10411184701852	Unsafe Speed	Rear-End	No Controls Present / Factor	ROSE BOWL DR	7/3/2008	Thursday	...	Bicycle	Passenger Car	22350	Clear	6511556	1881787	{'x': -13154141.264326861, 'y': 4050727.197689...
29	30	10412131002035	Ped R/W Violation	Vehicle - Pedestrian	Functioning	ORANGE GROVE BLVD	7/4/2008	Friday	...			21950	Clear	6517482	1879853	{'x': -13151959.551540626, 'y': 4050015.650237...
58	59	10418170105652	Other Hazardous Movement	Sideswipe	No Controls Present / Factor	WESLEY AVE	7/10/2008	Thursday	...	Passenger Car	Bicycle	22517	Clear	6525309.99742721	1885626.50722028	{'x': -13149081.688380858, 'y': 4052153.917797...

5 rows Ã— 57 columns

To create these four symbol categories, we will use an Arcade expression. Arcade expressions use attribute information to determine symbology.

We will first assign three variables that represent the four categories we want to symbolize. However, these variables aren't exclusive. Accidents involving pedestrians or cyclists likely also have injuries or fatalities. There are four combinations of the type, fatal, and injured variables:

Accidents involving a pedestrian and a fatality
Accidents involving a pedestrian and an injury
Accidents involving a cyclist and a fatality
Accidents involving a cyclist and an injury

To account for these combinations, we will create a 'When' function. A When function indicates that when certain conditions are met, a specific symbology category will be used.

arcade_expression = """
var acc_type = $feature.InvWith;
var fatal = $feature.Nokilled;
var injured = $feature.NoInjured;
var result = When( 
    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',
    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',
    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',
    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',
    'null');
return result;
"""

arcade_expression

"\nvar acc_type = $feature.InvWith;\nvar fatal = $feature.Nokilled;\nvar injured = $feature.NoInjured;\nvar result = When( \n    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',\n    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',\n    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',\n    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',\n    'null');\nreturn result;\n"

The code below cycles trough each category and asigns it a color, size and symbol.

def get_symbol(color, size):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='o',
                         colors=color,
                         marker_size=size,
                         outline_style='s',
                         outline_color=[153,153,153,255], line_width=0.375)


def get_unique_values(color_list, values):
    return [ { "label": value, "symbol":get_symbol(color, size), "value": value } for color, size, value in zip(color_list, sizes, values) ]

color_list = [ [255, 0, 0, 255], [0, 255, 0, 255], [0, 0, 255, 255], [0 , 255, 255, 255] ] 
values = ['PedestrianFatality', 'PedestrianInjury', 'BicycleInjury', 'BicycleFatality']
sizes = [15, 3, 3, 15]
uv = get_unique_values(color_list, values)

m5 = gis.map('Pasadena, California', 13)
m5

The fatalities receive a larger symbol size to indicate the difference in severity between injuries and fatalities.

filtered_df.spatial.plot(map_widget=m5,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)

True

m5.legend = True

The points are competing with the the basemap colors. So let's change the basemap to dark-gray.

m6 = gis.map('Pasadena, California', 13)
m6

Now, the accidents stand out because the points are not competing with the basemap colors. There is also a clear distinction between bicycle and pedestrian accidents.

m6.basemap = 'dark-gray'

filtered_df.spatial.plot(map_widget=m6,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)

True

Add school areas

When it comes to policy decisions, hot spots and heat maps don't provide much context and make no explicit claims about the data. Because your subject is not only accidents but accidents near schools, we will add a layer of Pasadena Unified School District (PUSD) schools to our map. Then, we will find areas within a half-mile walking distance of each school.

schools = gis.content.search('PUSD schools, owner:Learn_ArcGIS', 'Feature Layer',outside_org=True)[0]

schools

PUSD Schools
Schools in the Pasadena Unified School District

Feature Layer Collection by Learn_ArcGIS
Last Modified: October 17, 2018
0 comments, 7,643 views

By default, the schools have red point symbols that can be difficult to distinguish from the accident points. We'll change the symbol so the schools stand out more. The square shape will be distinguishable from the circle shapes of the accidents. We'll also change the color to white so the symbols stand out against the dark basemap.

m7 = gis.map('Pasadena, California', 13)
m7

The layer of schools is added to the map. The schools now stand out and can be distinguished from the accidents.

m7_renderer = {"renderer": "autocast",
                "type": "simple"}
def get_symbol(color):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='s',
                         colors=color,
                         marker_size=13,
                         outline_style='s',
                         outline_color=[153,153,153,255],
                         line_width=0.375)

symbol = get_symbol([153,153,153,255])
m7_renderer['symbol'] = symbol

m7.basemap = 'dark-gray'

filtered_df.spatial.plot(map_widget=m7,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)

True

m7.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })

m7.legend = True

Creating areas within a half-mile walking distance of schools

Next, we'll create the walk-time areas around each school. This tool uses road network data to create areas that can be reached within a specific driving or walking distance or time. Creating areas within a half-mile walking distance of schools will show places where there are likely large numbers of student pedestrians.

We'll later be able to calculate the number of collisions near each school.

walk_dist_from_schools = create_drive_time_areas(schools.layers[0],
                                                 break_values=[0.5],
                                                 break_units='Miles',
                                                 travel_mode='Walking Distance',
                                                 output_name='psud_schools_drivetime' + str(dt.now().microsecond))

walk_dist_from_schools

psud_schools_drivetime224021

Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views

m8 = gis.map('Pasadena, California', 13)
m8

Some school areas have no pedestrian or cyclist accidents, while others have a significant amount. At the same time, the area with the highest density of accidents is not within any school area. Adding more information beyond a heat map has provided essential context for policymakers.

m8.basemap = 'dark-gray'

filtered_df.spatial.plot(map_widget=m8,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)

True

m8.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })

m8.add_layer(walk_dist_from_schools)

m8.legend = True

We could stop your analysis here, and use this map as grounds to implement a policy that encompasses all school zones. Changing street signs and adding bicycle lanes in these areas may reduce accidents near schools.

However, sometimes a city does not have enough funds to enact new policy for every location. Many Pasadena school zones have few accidents, so policies in these areas may have little effect. Instead, policymakers want to focus their efforts on areas that need it most.

We'll calculate the number of accidents within each school zone and filter the layer to show only the five most dangerous zones. Then, policymakers can prioritize these zones over zones that are already relatively safe.

Find the most dangerous school areas

This tool summarizes the number of point features within a polygon feature.

dangerous_areas = summarize_within(walk_dist_from_schools.layers[0],
                                   collisions,
                                   output_name="accident_count_within_school_zone" + str(dt.now().microsecond))

dangerous_areas

accident_count_within_school_zone169811

Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views

top_5_dangerous_areas = dangerous_areas.layers[0].query().sdf.sort_values(by='Point_Count', ascending=False)

top_5_dangerous_areas[:5]

	OBJECTID	Point_Count	Name	ToBreak	FacilityOID	Name_1	ADDRESS	CITY	SCHOOLTYPE	Breaks	AnalysisArea	SHAPE
10	11	110	McKinley School : 0 - 0.5	0.5	9	McKinley School	325 S OAK KNOLL AVE	PASADENA	ELEMENTARY/MIDDLE	None	1.492772	{'rings': [[[-13150773.7838, 4048681.8554], [-...
11	12	106	Rose City High Alternative : 0 - 0.5	0.5	20	Rose City High Alternative	351 S HUDSON AVE	PASADENA	HIGH	None	1.355013	{'rings': [[[-13150623.7846, 4048651.6474], [-...
14	15	104	Madison Elementary : 0 - 0.5	0.5	8	Madison Elementary	515 ASHTABULA ST	PASADENA	ELEMENTARY	None	1.273972	{'rings': [[[-13151398.781, 4051098.7538], [-1...
7	8	63	Washington Middle : 0 - 0.5	0.5	18	Washington Middle	1505 N MARENGO AVE	PASADENA	MIDDLE	None	1.292630	{'rings': [[[-13151998.7782, 4052730.4508], [-...
16	17	62	Longfellow Elementary : 0 - 0.5	0.5	7	Longfellow Elementary	1065 E WASHINGTON BLVD	PASADENA	ELEMENTARY	None	1.418004	{'rings': [[[-13149873.788, 4052398.0492], [-1...

The table is sorted so that school zones with more accidents are shown first. The first five school zones have accidents totaling 109, 106, 104, 63, and 62, respectively.

The sixth-highest school zone has 59 accidents, which is almost the same as the fifth highest. With this information, you might consider expanding policy to include this school zone as well. Alternatively, depending on the city's resources, we want to limit policy to focus only on the three most dangerous school zones, which have a much higher number of accidents than the fourth.

For this scenario, we'll continue to focus on the five most dangerous school zones. We'll filter the layer to show only these zones.

m9 = gis.map('Pasadena, California', 13)
m9

All five of the most dangerous school zones are relatively close to one another. Two of the most dangerous zones almost overlap entirely, which means the city can increase safety for both schools with many of the same policy decisions.

m9.basemap = 'dark-gray'

filtered_df.spatial.plot(map_widget=m9,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)

True

m9.add_layer({"type": "FeatureLayer", 
               "url": dangerous_areas.layers[0].url,
               "definition_expression" : "Point_Count>=61",
               "opacity": 0.7
              })

m9.add_layer({ "type": "FeatureLayer",
                 "url": schools.layers[0].url,
                 "transparency": 75,
                 "renderer": m7_renderer
                })

Conclusion

We've created a map that highlights five school zones that would benefit from policy intervention. The city officials can now identify the most dangerous school zones in order to reduce the likelihood of future accidents.