Safe Streets to Schools

Introduction

Accidents near elementary schools, such as this one has drawn the attention of city employees and civic-minded individuals to the topic of pedestrian and bicycle safety.

In this notebook, we want to suggest policy actions to civic authorities of Pasadena, California that will reduce the likelihood of future accidents. We will map accident data regarding pedestrians and cyclists struck by vehicles. Then, we will determine the number of accidents that occurred within each school zone and identify the five most dangerous zones. We will present this analysis using a notebook that provides narrative context and helps stakeholders understand our methodology.

The sample uses ArcGIS API for Python to help city officials in improving pedestrian and bicycle safety near schools in the city. We will first uncover patterns in accident data using spatial analysis tool such as calculate_density, find_hot_spots and HeatmapRenderer. HeatmapRenderer highlights the capabilities of map widget to show areas where large number of accidents occurred. To visualize different category of accidents on the map, create_symbol tool plays a vital role along with functionality of renderers and arcade expressions.

As the visualizations are not enough to make policy decisions, we'll further use create_drive_time_areas and summarize_within tools to determine the number of accidents that have occurred within half mile walk distance from each school.

Since the city does not have enough funds that encompasses all school zones, we will limit our analysis to the five most vulnerable areas. So, we apply filter to identify the five most dangerous zones in which a pedestrian or cyclist was injured or killed. City officials could then suggest measures such as adding more bike lanes in these areas or change street signs to reduce accidents near schools and increase safety.

Workflow

Necessary Imports

Input
%matplotlib inline

import pandas as pd
from datetime import datetime as dt
import matplotlib.pyplot as plt
from IPython.display import display

from arcgis.gis import GIS
from arcgis.features.analyze_patterns import calculate_density, find_hot_spots
from arcgis.mapping.symbol import create_symbol
from arcgis.features.use_proximity import create_drive_time_areas
from arcgis.features.summarize_data import summarize_within

Connect to your ArcGIS Online organization

Connect to the GIS via an existing profile or creating a new connection by e.g. gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123")

Input
gis = GIS('home')

Get the data for the analysis

Search for the Traffic Collisions layer. You can specify the owner's name as "api_data_owner" to get more specific results.

Input
items = gis.content.search('title:Traffic Collisions, owner:api_data_owner', 'feature layer')
Input
items[0]
Output
Traffic Collisions
Traffic collisions in Pasadena, California, since 2008.Feature Layer Collection by api_data_owner
Last Modified: April 26, 2019
0 comments, 24 views
Input
item = items[0]

The item shows all accidents that occurred in Pasadena during the past decade.

We can print the names of the layers in this item and assign them to variables that will be used in our analysis

The code below prints name of layers.

Input
for lyr in item.layers:
    print(lyr.properties.name)
collisions

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The collisions layer is the only layer in this item.

Input
collisions = item.layers[0]
Input
m1 = gis.map('Pasadena, California', 13)
m1
Output

The map shows all accidents that occurred in Pasadena during the past decade.

Input
m1.add_layer(collisions)
Input
m1.legend = True

As the layer does not only contain information about location, but it also has other attributes that cannot be seen on map. So using query method, we can query the features on a feature layer. sdf property of featureSet class is a powerful tool to visualize all the features as Pandas DataFrame.

Input
collisions.query().sdf.columns
Output
Index(['FID', 'Accidno', 'Cause', 'CollisnTyp', 'Controls', 'CrossSt',
       'Damage1', 'Damage2', 'Date', 'Day', 'Direction', 'Direction1',
       'Direction2', 'Distance', 'HitAndRun', 'Injury', 'InvWith', 'Lighting',
       'Movement1', 'Movement2', 'NoInjured', 'NoKilled', 'OBJECTID',
       'PartyAge1', 'PartyAge2', 'PartySex1', 'PartySex2', 'PartyType1',
       'PartyType2', 'PedAction', 'PtyAtFault', 'PvtProp', 'RoadCond',
       'RoadSurf', 'SafetyEq1', 'SafetyEq2', 'Sobriety1', 'Sobriety2',
       'SpecInfo1', 'SpecInfo2', 'SpeedLim1', 'SpeedLim2', 'Street', 'Time',
       'VehMake1', 'VehMake2', 'VehModel1', 'VehModel2', 'VehType1',
       'VehType2', 'VehYear1', 'VehYear2', 'Violation', 'Weather', 'X', 'Y',
       'SHAPE'],
      dtype='object')
Input
collisions.query().sdf.InvWith.unique()
Output
array(['Other Motor Vehicle', 'Pedestrian', 'Fixed Object', 'Bicycle',
       'Parked Motor Vehicle', 'Other Object',
       'Motor Vehicle on Other Roadway', 'Non-Collision', 'Train',
       'Animal', 'Not Stated', 'Other'], dtype=object)

Filter the layer to show only accidents that involved a pedestrian or cyclist.

Input
collisions.filter = "(InvWith = 'Bicycle') OR (InvWith = 'Pedestrian')"

Uncover patterns

We are able to visualize accidents on map. However, it does not reveal any patterns. There can possibly be areas where particularly large number of accidents are occurring.

Different ways to find patterns in data includes point clustering, heat maps, hot spot analysis and calculate density. These methods reveal where accidents are happening at abnormal rates.

First, we will apply calculate_density tool to get better insights on data.

Input
collision_density = calculate_density(input_layer=collisions,
                                      output_name='density_of_incidents' + str(dt.now().microsecond))
Input
collision_density
Output
density_of_incidents660540
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
Input
m2 = gis.map('Pasadena, California', 13)
m2
Output

Areas with large number of accidents show up as more densely colored as compared to other areas.

Input
m2.add_layer(collision_density)
Input
m2.legend = True

find_hot_spots method finds statistically significant spatial clusters of high values (hot spots) and low values (cold spots).

Input
collision_hot_spots = find_hot_spots(collisions,
                                     output_name='collision_hexagon_hot_spots' + str(dt.now().microsecond),
                                     shape_type='hexagon')
Input
collision_hot_spots
Output
collision_hexagon_hot_spots539052
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
Input
m3 = gis.map('Pasadena, California', 13)
m3
Output

The red hex bins show areas of spatially significant clustering, while white hex bins show areas with no significant clustering. There are no blue areas on the map, but if there were, they would represent areas with low statistically significant clustering. The map indicates that accidents happen across the city, but with a statistically significant clustering in the downtown area.

Input
m3.add_layer(collision_hot_spots)
Input
m3.legend = True

The HeatmapRenderer renders point data into a raster visualization that emphasizes areas of higher density or weighted values.

We will use HeatmapRenderer to display generalized point pattern locations.

Input
m4 = gis.map('Pasadena, California', 13)
m4
Output

Like the calculate_density, the HeatmapRenderer indicates a high density of accidents in central Pasadena. However, the heat map particularly emphasizes the area south of the Foothill Freeway, corresponding to downtown.

Input
m4.add_layer({"type": "FeatureLayer",
              "url": collisions.url,   
              "renderer": "HeatmapRenderer",
              "opacity": 0.75})
Input
m4.legend = True

We now have a better understanding of the data.

Symbolize by category

Next, we will change the symbols of accidents layer to show different categories of accidents. In particular, we want to distinguish fatal accidents from accidents with only injuries. We also want to distinguish pedestrian accidents from bicycle accidents. This information will add more detail to our findings and will help support policymaker decisions.

Input
df = collisions.query().sdf
Input
filtered_df = df[(df['InvWith'] == "Pedestrian") | (df['InvWith'] == "Bicycle")]
Input
filtered_df.head()
Output
FID Accidno Cause CollisnTyp Controls CrossSt Damage1 Damage2 Date Day ... VehModel2 VehType1 VehType2 VehYear1 VehYear2 Violation Weather X Y SHAPE
1 2 10407211603116 Other Hazardous Movement Vehicle - Pedestrian Functioning MARENGO AVE 6/29/2008 Sunday ... Passenger Car 21451 Clear 6517494 1875480 {'x': -13151952.7988482, 'y': 4048399.24990493...
11 12 10409075901001 Auto R/W Violation Broadside Functioning RAYMOND AVE 7/1/2008 Tuesday ... 21801 Clear 6516612 1876820 {'x': -13152278.024718883, 'y': 4048894.055900...
26 27 10411184701852 Unsafe Speed Rear-End No Controls Present / Factor ROSE BOWL DR 7/3/2008 Thursday ... Bicycle Passenger Car 22350 Clear 6511556 1881787 {'x': -13154141.264326861, 'y': 4050727.197689...
29 30 10412131002035 Ped R/W Violation Vehicle - Pedestrian Functioning ORANGE GROVE BLVD 7/4/2008 Friday ... 21950 Clear 6517482 1879853 {'x': -13151959.551540626, 'y': 4050015.650237...
58 59 10418170105652 Other Hazardous Movement Sideswipe No Controls Present / Factor WESLEY AVE 7/10/2008 Thursday ... Passenger Car Bicycle 22517 Clear 6525309.99742721 1885626.50722028 {'x': -13149081.688380858, 'y': 4052153.917797...

5 rows × 57 columns

To create these four symbol categories, we will use an Arcade expression. Arcade expressions use attribute information to determine symbology.

We will first assign three variables that represent the four categories we want to symbolize. However, these variables aren't exclusive. Accidents involving pedestrians or cyclists likely also have injuries or fatalities. There are four combinations of the type, fatal, and injured variables:

  • Accidents involving a pedestrian and a fatality
  • Accidents involving a pedestrian and an injury
  • Accidents involving a cyclist and a fatality
  • Accidents involving a cyclist and an injury

To account for these combinations, we will create a 'When' function. A When function indicates that when certain conditions are met, a specific symbology category will be used.

Input
arcade_expression = """
var acc_type = $feature.InvWith;
var fatal = $feature.Nokilled;
var injured = $feature.NoInjured;
var result = When( 
    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',
    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',
    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',
    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',
    'null');
return result;
"""
Input
arcade_expression
Output
"\nvar acc_type = $feature.InvWith;\nvar fatal = $feature.Nokilled;\nvar injured = $feature.NoInjured;\nvar result = When( \n    acc_type == 'Pedestrian' && fatal == '1'  , 'PedestrianFatality',\n    acc_type == 'Pedestrian' && injured != '0', 'PedestrianInjury',\n    acc_type == 'Bicycle'    && injured != '0', 'BicycleInjury',\n    acc_type == 'Bicycle'    && fatal == '1',   'BicycleFatality',\n    'null');\nreturn result;\n"

The code below cycles trough each category and asigns it a color, size and symbol.

Input
def get_symbol(color, size):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='o',
                         colors=color,
                         marker_size=size,
                         outline_style='s',
                         outline_color=[153,153,153,255], line_width=0.375)


def get_unique_values(color_list, values):
    return [ { "label": value, "symbol":get_symbol(color, size), "value": value } for color, size, value in zip(color_list, sizes, values) ]
Input
color_list = [ [255, 0, 0, 255], [0, 255, 0, 255], [0, 0, 255, 255], [0 , 255, 255, 255] ] 
values = ['PedestrianFatality', 'PedestrianInjury', 'BicycleInjury', 'BicycleFatality']
sizes = [15, 3, 3, 15]
uv = get_unique_values(color_list, values)
Input
m5 = gis.map('Pasadena, California', 13)
m5
Output

The fatalities receive a larger symbol size to indicate the difference in severity between injuries and fatalities.

Input
filtered_df.spatial.plot(map_widget=m5,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
Output
True
Input
m5.legend = True

The points are competing with the the basemap colors. So let's change the basemap to dark-gray.

Input
m6 = gis.map('Pasadena, California', 13)
m6
Output

Now, the accidents stand out because the points are not competing with the basemap colors. There is also a clear distinction between bicycle and pedestrian accidents.

Input
m6.basemap = 'dark-gray'
Input
filtered_df.spatial.plot(map_widget=m6,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
Output
True

Add school areas

When it comes to policy decisions, hot spots and heat maps don't provide much context and make no explicit claims about the data. Because your subject is not only accidents but accidents near schools, we will add a layer of Pasadena Unified School District (PUSD) schools to our map. Then, we will find areas within a half-mile walking distance of each school.

Input
schools = gis.content.search('PUSD schools, owner:Learn_ArcGIS', 'Feature Layer',outside_org=True)[0]
Input
schools
Output
PUSD Schools
Schools in the Pasadena Unified School DistrictFeature Layer Collection by Learn_ArcGIS
Last Modified: October 17, 2018
0 comments, 7,643 views

By default, the schools have red point symbols that can be difficult to distinguish from the accident points. We'll change the symbol so the schools stand out more. The square shape will be distinguishable from the circle shapes of the accidents. We'll also change the color to white so the symbols stand out against the dark basemap.

Input
m7 = gis.map('Pasadena, California', 13)
m7
Output

The layer of schools is added to the map. The schools now stand out and can be distinguished from the accidents.

Input
m7_renderer = {"renderer": "autocast",
                "type": "simple"}
def get_symbol(color):
    return create_symbol(geometry_type='point',
                         symbol_type='simple',
                         symbol_style='s',
                         colors=color,
                         marker_size=13,
                         outline_style='s',
                         outline_color=[153,153,153,255],
                         line_width=0.375)

symbol = get_symbol([153,153,153,255])
m7_renderer['symbol'] = symbol
Input
m7.basemap = 'dark-gray'
Input
filtered_df.spatial.plot(map_widget=m7,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
Output
True
Input
m7.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })
Input
m7.legend = True

Creating areas within a half-mile walking distance of schools

Next, we'll create the walk-time areas around each school. This tool uses road network data to create areas that can be reached within a specific driving or walking distance or time. Creating areas within a half-mile walking distance of schools will show places where there are likely large numbers of student pedestrians.

We'll later be able to calculate the number of collisions near each school.

Input
walk_dist_from_schools = create_drive_time_areas(schools.layers[0],
                                                 break_values=[0.5],
                                                 break_units='Miles',
                                                 travel_mode='Walking Distance',
                                                 output_name='psud_schools_drivetime' + str(dt.now().microsecond))
Input
walk_dist_from_schools
Output
psud_schools_drivetime224021
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
Input
m8 = gis.map('Pasadena, California', 13)
m8
Output

Some school areas have no pedestrian or cyclist accidents, while others have a significant amount. At the same time, the area with the highest density of accidents is not within any school area. Adding more information beyond a heat map has provided essential context for policymakers.

Input
m8.basemap = 'dark-gray'
Input
filtered_df.spatial.plot(map_widget=m8,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
Output
True
Input
m8.add_layer({ "type": "FeatureLayer",
               "url": schools.layers[0].url,
               "transparency": 75,
               "renderer": m7_renderer
                })
Input
m8.add_layer(walk_dist_from_schools)
Input
m8.legend = True

We could stop your analysis here, and use this map as grounds to implement a policy that encompasses all school zones. Changing street signs and adding bicycle lanes in these areas may reduce accidents near schools.

However, sometimes a city does not have enough funds to enact new policy for every location. Many Pasadena school zones have few accidents, so policies in these areas may have little effect. Instead, policymakers want to focus their efforts on areas that need it most.

We'll calculate the number of accidents within each school zone and filter the layer to show only the five most dangerous zones. Then, policymakers can prioritize these zones over zones that are already relatively safe.

Find the most dangerous school areas

This tool summarizes the number of point features within a polygon feature.

Input
dangerous_areas = summarize_within(walk_dist_from_schools.layers[0],
                                   collisions,
                                   output_name="accident_count_within_school_zone" + str(dt.now().microsecond))
Input
dangerous_areas
Output
accident_count_within_school_zone169811
Feature Layer Collection by arcgis_python
Last Modified: September 13, 2019
0 comments, 0 views
Input
top_5_dangerous_areas = dangerous_areas.layers[0].query().sdf.sort_values(by='Point_Count', ascending=False)
Input
top_5_dangerous_areas[:5]
Output
OBJECTID Point_Count Name FromBreak ToBreak FacilityOID Name_1 ADDRESS CITY SCHOOLTYPE Breaks AdditionalTime AdditionalDistance AnalysisArea SHAPE
10 11 110 McKinley School : 0 - 0.5 0 0.5 9 McKinley School 325 S OAK KNOLL AVE PASADENA ELEMENTARY/MIDDLE None 0 0 1.492772 {'rings': [[[-13150773.7838, 4048681.8554], [-...
11 12 106 Rose City High Alternative : 0 - 0.5 0 0.5 20 Rose City High Alternative 351 S HUDSON AVE PASADENA HIGH None 0 0 1.355013 {'rings': [[[-13150623.7846, 4048651.6474], [-...
14 15 104 Madison Elementary : 0 - 0.5 0 0.5 8 Madison Elementary 515 ASHTABULA ST PASADENA ELEMENTARY None 0 0 1.273972 {'rings': [[[-13151398.781, 4051098.7538], [-1...
7 8 63 Washington Middle : 0 - 0.5 0 0.5 18 Washington Middle 1505 N MARENGO AVE PASADENA MIDDLE None 0 0 1.292630 {'rings': [[[-13151998.7782, 4052730.4508], [-...
16 17 62 Longfellow Elementary : 0 - 0.5 0 0.5 7 Longfellow Elementary 1065 E WASHINGTON BLVD PASADENA ELEMENTARY None 0 0 1.418004 {'rings': [[[-13149873.788, 4052398.0492], [-1...

The table is sorted so that school zones with more accidents are shown first. The first five school zones have accidents totaling 109, 106, 104, 63, and 62, respectively.

The sixth-highest school zone has 59 accidents, which is almost the same as the fifth highest. With this information, you might consider expanding policy to include this school zone as well. Alternatively, depending on the city's resources, we want to limit policy to focus only on the three most dangerous school zones, which have a much higher number of accidents than the fourth.

For this scenario, we'll continue to focus on the five most dangerous school zones. We'll filter the layer to show only these zones.

Input
m9 = gis.map('Pasadena, California', 13)
m9
Output

All five of the most dangerous school zones are relatively close to one another. Two of the most dangerous zones almost overlap entirely, which means the city can increase safety for both schools with many of the same policy decisions.

Input
m9.basemap = 'dark-gray'
Input
filtered_df.spatial.plot(map_widget=m9,
                         renderer_type='u-a',
                         unique_values=uv,
                         arcade_expression=arcade_expression)
Output
True
Input
m9.add_layer({"type": "FeatureLayer", 
               "url": dangerous_areas.layers[0].url,
               "definition_expression" : "Point_Count>=61",
               "opacity": 0.7
              })
Input
m9.add_layer({ "type": "FeatureLayer",
                 "url": schools.layers[0].url,
                 "transparency": 75,
                 "renderer": m7_renderer
                })

Conclusion

We've created a map that highlights five school zones that would benefit from policy intervention. The city officials can now identify the most dangerous school zones in order to reduce the likelihood of future accidents.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.