Safe Streets to Schools

Introduction

Accidents near elementary schools, such as this one has drawn the attention of city employees and civic-minded individuals to the topic of pedestrian and bicycle safety.

In this notebook, we want to suggest policy actions to civic authorities of Pasadena, California that will reduce the likelihood of future accidents. We will map accident data regarding pedestrians and cyclists struck by vehicles. Then, we will determine the number of accidents that occurred within each school zone and identify the five most dangerous zones. We will present this analysis using a notebook that provides narrative context and helps stakeholders understand our methodology.

The sample uses ArcGIS API for Python to help city officials in improving pedestrian and bicycle safety near schools in the city. We will first uncover patterns in accident data using spatial analysis tool such as calculate_density, find_hot_spots and HeatmapRenderer. HeatmapRenderer highlights the capabilities of map widget to show areas where large number of accidents occurred. To visualize different category of accidents on the map, create_symbol tool plays a vital role along with functionality of renderers and arcade expressions.

As the visualizations are not enough to make policy decisions, we'll further use create_drive_time_areas and summarize_within tools to determine the number of accidents that have occurred within half mile walk distance from each school.

Since the city does not have enough funds that encompasses all school zones, we will limit our analysis to the five most vulnerable areas. So, we apply filter to identify the five most dangerous zones in which a pedestrian or cyclist was injured or killed. City officials could then suggest measures such as adding more bike lanes in these areas or change street signs to reduce accidents near schools and increase safety.

Workflow

Necessary Imports

%matplotlib inline

import pandas as pd
from datetime import datetime as dt
import matplotlib.pyplot as plt
from IPython.display import display

from arcgis.gis import GIS
from arcgis.features.analyze_patterns import calculate_density, find_hot_spots
from arcgis.mapping.symbol import create_symbol
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.summarize_data import summarize_within

Connect to your ArcGIS Online organization

Connect to the GIS via an existing profile or creating a new connection by e.g. gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123")

gis = GIS('home')

Get the data for the analysis

Search for the Traffic Collisions layer. You can specify the owner's name as "api_data_owner" to get more specific results.

items = gis.content.search('title:Traffic Collisions, owner:api_data_owner', 'feature layer')
items[0]
Traffic Collisions
Traffic collisions in Pasadena, California, since 2008.Feature Layer Collection by api_data_owner
Last Modified: April 26, 2019
0 comments, 24 views
item = items[0]

The item shows all accidents that occurred in Pasadena during the past decade.

We can print the names of the layers in this item and assign them to variables that will be used in our analysis

The code below prints name of layers.

for lyr in item.layers:
    print(lyr.properties.name)
collisions

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The collisions layer is the only layer in this item.

collisions = item.layers[0]
m1 = gis.map('Pasadena, California', 13)
m1

The map shows all accidents that occurred in Pasadena during the past decade.

m1.add_layer(collisions)
m1.legend = True

As the layer does not only contain information about location, but it also has other attributes that cannot be seen on map. So using query method, we can query the features on a feature layer. sdf property of featureSet class is a powerful tool to visualize all the features as Pandas DataFrame.

collisions.query().sdf.columns
Index(['FID', 'Accidno', 'Cause', 'CollisnTyp', 'Controls', 'CrossSt',
       'Damage1', 'Damage2', 'Date', 'Day', 'Direction', 'Direction1',
       'Direction2', 'Distance', 'HitAndRun', 'Injury', 'InvWith', 'Lighting',
       'Movement1', 'Movement2', 'NoInjured', 'NoKilled', 'OBJECTID',
       'PartyAge1', 'PartyAge2', 'PartySex1', 'PartySex2', 'PartyType1',
       'PartyType2', 'PedAction', 'PtyAtFault', 'PvtProp', 'RoadCond',
       'RoadSurf', 'SafetyEq1', 'SafetyEq2', 'Sobriety1', 'Sobriety2',
       'SpecInfo1', 'SpecInfo2', 'SpeedLim1', 'SpeedLim2', 'Street', 'Time',
       'VehMake1', 'VehMake2', 'VehModel1', 'VehModel2', 'VehType1',
       'VehType2', 'VehYear1', 'VehYear2', 'Violation', 'Weather', 'X', 'Y',
       'SHAPE'],
      dtype='object')
collisions.query().sdf.InvWith.unique()
array(['Other Motor Vehicle', 'Pedestrian', 'Fixed Object', 'Bicycle',
       'Parked Motor Vehicle', 'Other Object',
       'Motor Vehicle on Other Roadway', 'Non-Collision', 'Train',
       'Animal', 'Not Stated', 'Other'], dtype=object)

Filter the layer to show only accidents that involved a pedestrian or cyclist.

collisions.filter = "(InvWith = 'Bicycle') OR (InvWith = 'Pedestrian')"

Uncover patterns

We are able to visualize accidents on map. However, it does not reveal any patterns. There can possibly be areas where particularly large number of accidents are occurring.

Different ways to find patterns in data includes point clustering, heat maps, hot spot analysis and calculate density. These methods reveal where accidents are happening at abnormal rates.

First, we will apply calculate_density tool to get better insights on data.

collision_density = calculate_density(input_layer=collisions,
                                      output_name='density_of_incidents' + str(dt.now().microsecond))
collision_density
density_of_incidents660540
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
m2 = gis.map('Pasadena, California', 13)
m2

Areas with large number of accidents show up as more densely colored as compared to other areas.

m2.add_layer(collision_density)
m2.legend = True

find_hot_spots method finds statistically significant spatial clusters of high values (hot spots) and low values (cold spots).

collision_hot_spots = find_hot_spots(collisions,
                                     output_name='collision_hexagon_hot_spots' + str(dt.now().microsecond),
                                     shape_type='hexagon')
collision_hot_spots
collision_hexagon_hot_spots539052
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
m3 = gis.map('Pasadena, California', 13)
m3

The red hex bins show areas of spatially significant clustering, while white hex bins show areas with no significant clustering. There are no blue areas on the map, but if there were, they would represent areas with low statistically significant clustering. The map indicates that accidents happen across the city, but with a statistically significant clustering in the downtown area.

m3.add_layer(collision_hot_spots)
m3.legend = True

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

We will use HeatmapRenderer to display generalized point pattern locations.

m4 = gis.map('Pasadena, California', 13)
m4

Like the calculate_density, the HeatmapRenderer indicates a high density of accidents in central Pasadena. However, the heat map particularly emphasizes the area south of the Foothill Freeway, corresponding to downtown.

m4.add_layer({"type": "FeatureLayer",
              "url": collisions.url,   
              "renderer": "HeatmapRenderer",
              "opacity": 0.75})
m4.legend = True

We now have a better understanding of the data.

Symbolize by category

Next, we will change the symbols of accidents layer to show different categories of accidents. In particular, we want to distinguish fatal accidents from accidents with only injuries. We also want to distinguish pedestrian accidents from bicycle accidents. This information will add more detail to our findings and will help support policymaker decisions.

df = collisions.query().sdf
filtered_df = df[(df['InvWith'] == "Pedestrian") | (df['InvWith'] == "Bicycle")]
filtered_df.head()
FIDAccidnoCauseCollisnTypControlsCrossStDamage1Damage2DateDay...VehModel2VehType1VehType2VehYear1VehYear2ViolationWeatherXYSHAPE
1210407211603116Other Hazardous MovementVehicle - PedestrianFunctioningMARENGO AVE6/29/2008Sunday...Passenger Car21451Clear65174941875480{'x': -13151952.7988482, 'y': 4048399.24990493...
111210409075901001Auto R/W ViolationBroadsideFunctioningRAYMOND AVE7/1/2008Tuesday...21801Clear65166121876820{'x': -13152278.024718883, 'y': 4048894.055900...
262710411184701852Unsafe SpeedRear-EndNo Controls Present / FactorROSE BOWL DR7/3/2008Thursday...BicyclePassenger Car22350Clear65115561881787{'x': -13154141.264326861, 'y': 4050727.197689...
293010412131002035Ped R/W ViolationVehicle - PedestrianFunctioningORANGE GROVE BLVD7/4/2008Friday...21950Clear65174821879853{'x': -13151959.551540626, 'y': 4050015.650237...
585910418170105652Other Hazardous MovementSideswipeNo Controls Present / FactorWESLEY AVE7/10/2008Thursday...Passenger CarBicycle22517Clear6525309.997427211885626.50722028{'x': -13149081.688380858, 'y': 4052153.917797...

5 rows × 57 columns

To create these four symbol categories, we will use an Arcade expression. Arcade expressions use attribute information to determine symbology.

We will first assign three variables that represent the four categories we want to symbolize. However, these variables aren't exclusive. Accidents involving pedestrians or cyclists likely also have injuries or fatalities. There are four combinations of the type, fatal, and injured variables:

  • Accidents involving a pedestrian and a fatality
  • Accidents involving a pedestrian and an injury
  • Accidents involving a cyclist and a fatality
  • Accidents involving a cyclist and an injury

To account for these combinations, we will create a 'When' function. A When function indicates that when certain conditions are met, a specific symbology category will be used.

arcade_expression = """
var acc_type = $feature.InvWith;
var fatal = $feature.Nokilled;
var injured = $feature.NoInjured;
var result = When( 
    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',
    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',
    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',
    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',
    'null');
return result;
"""
arcade_expression
"\nvar acc_type = $feature.InvWith;\nvar fatal = $feature.Nokilled;\nvar injured = $feature.NoInjured;\nvar result = When( \n    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',\n    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',\n    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',\n    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',\n    'null');\nreturn result;\n"

The code below cycles trough each category and asigns it a color, size and symbol.

def get_symbol(color, size):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='o',
                         colors=color,
                         marker_size=size,
                         outline_style='s',
                         outline_color=[153,153,153,255], line_width=0.375)


def get_unique_values(color_list, values):
    return [ { "label": value, "symbol":get_symbol(color, size), "value": value } for color, size, value in zip(color_list, sizes, values) ]
color_list = [ [255, 0, 0, 255], [0, 255, 0, 255], [0, 0, 255, 255], [0 , 255, 255, 255] ] 
values = ['PedestrianFatality', 'PedestrianInjury', 'BicycleInjury', 'BicycleFatality']
sizes = [15, 3, 3, 15]
uv = get_unique_values(color_list, values)
m5 = gis.map('Pasadena, California', 13)
m5

The fatalities receive a larger symbol size to indicate the difference in severity between injuries and fatalities.

filtered_df.spatial.plot(map_widget=m5,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
True
m5.legend = True

The points are competing with the the basemap colors. So let's change the basemap to dark-gray.

m6 = gis.map('Pasadena, California', 13)
m6

Now, the accidents stand out because the points are not competing with the basemap colors. There is also a clear distinction between bicycle and pedestrian accidents.

m6.basemap = 'dark-gray'
filtered_df.spatial.plot(map_widget=m6,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
True

Add school areas

When it comes to policy decisions, hot spots and heat maps don't provide much context and make no explicit claims about the data. Because your subject is not only accidents but accidents near schools, we will add a layer of Pasadena Unified School District (PUSD) schools to our map. Then, we will find areas within a half-mile walking distance of each school.

schools = gis.content.search('PUSD schools, owner:Learn_ArcGIS', 'Feature Layer',outside_org=True)[0]
schools
PUSD Schools
Schools in the Pasadena Unified School DistrictFeature Layer Collection by Learn_ArcGIS
Last Modified: October 17, 2018
0 comments, 7,643 views

By default, the schools have red point symbols that can be difficult to distinguish from the accident points. We'll change the symbol so the schools stand out more. The square shape will be distinguishable from the circle shapes of the accidents. We'll also change the color to white so the symbols stand out against the dark basemap.

m7 = gis.map('Pasadena, California', 13)
m7

The layer of schools is added to the map. The schools now stand out and can be distinguished from the accidents.

m7_renderer = {"renderer": "autocast",
                "type": "simple"}
def get_symbol(color):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='s',
                         colors=color,
                         marker_size=13,
                         outline_style='s',
                         outline_color=[153,153,153,255],
                         line_width=0.375)

symbol = get_symbol([153,153,153,255])
m7_renderer['symbol'] = symbol
m7.basemap = 'dark-gray'
filtered_df.spatial.plot(map_widget=m7,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
True
m7.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })
m7.legend = True

Creating areas within a half-mile walking distance of schools

Next, we'll create the walk-time areas around each school. This tool uses road network data to create areas that can be reached within a specific driving or walking distance or time. Creating areas within a half-mile walking distance of schools will show places where there are likely large numbers of student pedestrians.

We'll later be able to calculate the number of collisions near each school.

walk_dist_from_schools = create_drive_time_areas(schools.layers[0],
                                                 break_values=[0.5],
                                                 break_units='Miles',
                                                 travel_mode='Walking Distance',
                                                 output_name='psud_schools_drivetime' + str(dt.now().microsecond))
walk_dist_from_schools
psud_schools_drivetime224021
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
m8 = gis.map('Pasadena, California', 13)
m8

Some school areas have no pedestrian or cyclist accidents, while others have a significant amount. At the same time, the area with the highest density of accidents is not within any school area. Adding more information beyond a heat map has provided essential context for policymakers.

m8.basemap = 'dark-gray'
filtered_df.spatial.plot(map_widget=m8,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
True
m8.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })
m8.add_layer(walk_dist_from_schools)
m8.legend = True

We could stop your analysis here, and use this map as grounds to implement a policy that encompasses all school zones. Changing street signs and adding bicycle lanes in these areas may reduce accidents near schools.

However, sometimes a city does not have enough funds to enact new policy for every location. Many Pasadena school zones have few accidents, so policies in these areas may have little effect. Instead, policymakers want to focus their efforts on areas that need it most.

We'll calculate the number of accidents within each school zone and filter the layer to show only the five most dangerous zones. Then, policymakers can prioritize these zones over zones that are already relatively safe.

Find the most dangerous school areas

This tool summarizes the number of point features within a polygon feature.

dangerous_areas = summarize_within(walk_dist_from_schools.layers[0],
                                   collisions,
                                   output_name="accident_count_within_school_zone" + str(dt.now().microsecond))
dangerous_areas
accident_count_within_school_zone169811
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
top_5_dangerous_areas = dangerous_areas.layers[0].query().sdf.sort_values(by='Point_Count', ascending=False)
top_5_dangerous_areas[:5]
OBJECTIDPoint_CountNameFromBreakToBreakFacilityOIDName_1ADDRESSCITYSCHOOLTYPEBreaksAdditionalTimeAdditionalDistanceAnalysisAreaSHAPE
1011110McKinley School : 0 - 0.500.59McKinley School325 S OAK KNOLL AVEPASADENAELEMENTARY/MIDDLENone001.492772{'rings': [[[-13150773.7838, 4048681.8554], [-...
1112106Rose City High Alternative : 0 - 0.500.520Rose City High Alternative351 S HUDSON AVEPASADENAHIGHNone001.355013{'rings': [[[-13150623.7846, 4048651.6474], [-...
1415104Madison Elementary : 0 - 0.500.58Madison Elementary515 ASHTABULA STPASADENAELEMENTARYNone001.273972{'rings': [[[-13151398.781, 4051098.7538], [-1...
7863Washington Middle : 0 - 0.500.518Washington Middle1505 N MARENGO AVEPASADENAMIDDLENone001.292630{'rings': [[[-13151998.7782, 4052730.4508], [-...
161762Longfellow Elementary : 0 - 0.500.57Longfellow Elementary1065 E WASHINGTON BLVDPASADENAELEMENTARYNone001.418004{'rings': [[[-13149873.788, 4052398.0492], [-1...

The table is sorted so that school zones with more accidents are shown first. The first five school zones have accidents totaling 109, 106, 104, 63, and 62, respectively.

The sixth-highest school zone has 59 accidents, which is almost the same as the fifth highest. With this information, you might consider expanding policy to include this school zone as well. Alternatively, depending on the city's resources, we want to limit policy to focus only on the three most dangerous school zones, which have a much higher number of accidents than the fourth.

For this scenario, we'll continue to focus on the five most dangerous school zones. We'll filter the layer to show only these zones.

m9 = gis.map('Pasadena, California', 13)
m9

All five of the most dangerous school zones are relatively close to one another. Two of the most dangerous zones almost overlap entirely, which means the city can increase safety for both schools with many of the same policy decisions.

m9.basemap = 'dark-gray'
filtered_df.spatial.plot(map_widget=m9,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
True
m9.add_layer({"type": "FeatureLayer", 
               "url": dangerous_areas.layers[0].url,
               "definition_expression" : "Point_Count>=61",
               "opacity": 0.7
              })
m9.add_layer({ "type": "FeatureLayer",
                 "url": schools.layers[0].url,
                 "transparency": 75,
                 "renderer": m7_renderer
                })

Conclusion

We've created a map that highlights five school zones that would benefit from policy intervention. The city officials can now identify the most dangerous school zones in order to reduce the likelihood of future accidents.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.