Skip To Content ArcGIS for Developers Sign In Dashboard

ArcGIS API for Python

Download the samples Try it live

Information extraction from Madison city crime incident reports using Deep Learning

Introduction

Crime analysis is an essential part of efficient law enforcement for any city. It involves:

  • Collecting data in a form that can be analyzed.
  • Identifying spatial/non-spatial patterns and trends in the data.
  • Informed decision making based on the analysis.

In order to start the analysis, the first and foremost requirement is analyzable data. A huge volume of data is present in the witness and police narratives of the crime incident. Few examples of such information are:

  • Place of crime
  • Nature of crime
  • Date and time of crime
  • Suspect
  • Witness

Extracting such information from incident reports requires tedious work. Crime analysts have to sift through piles of police reports to gather and organize this information.

With recent advancements in Natural Language Processing and Deep learning, its possible to devise an automated workflow to extract information from such unstructured text documents. In this notebook we will extract information from crime incident reports obtained from Madison police department [1]using arcgis.learn.EntityRecognizer().

Prerequisites

  • Data preparation and model training workflows using arcgis.learn have a dependency on spaCy. This can be installed using conda as follows: conda install -c esri arcgis fastai pillow scikit-image
  • Labelled data: In order for Entity Recognizer to learn, it needs to see examples that have been labelled for all the custom categories that the model is expected to extract. Labelled data for this sample notebook is located at data/EntityRecognizer/labelled_crime_reports.json
  • To learn how to use Doccano[2] for labelling text, please see the guide on Labeling text using Doccano
  • Test documents to extract named entities are in a zipped file at data/EntityRecognizer/reports.zip
  • To learn more on how EntityRecognizer works, please see the guide on Named Entity Extraction Workflow with arcgis.learn.

Imports

In [1]:
from arcgis.learn import prepare_data
from arcgis.learn import EntityRecognizer
import re
import os
import pandas as pd
from arcgis.gis import GIS
from arcgis.raster.functions import colormap
from arcgis.geocoding import batch_geocode
import zipfile,unicodedata
from itertools import repeat
In [2]:
gis = GIS("your_online_profile")

Data preparation

Data preparation involves splitting the data into training and validation sets, creating the necessary data structures for loading data into the model and so on. The prepare_data() method can directly read the training samples in one of the above specified formats and automate the entire process.

In [3]:
json_path = os.path.join("data", "EntityRecognizer", "labelled_crime_reports.json")
In [4]:
data = prepare_data(path= json_path, 
                    class_mapping={'address_tag':'Address'}, 
                    dataset_type='ner_json')

The show_batch() method can be used to visualize the training samples, along with labels.

In [5]:
data.show_batch()
Out[5]:
Address Crime Crime_datetime Reported_date Reported_time Reporting_officer Weapon text
0 [6900 block of Odana Road, 6900 block of Odana... [strong-armed robbery, kicked and struck multi... [10:11 pm on 10/10/2018] [10/11/2018] [12:55 AM] [Sgt. Paul Jacobsen] Madison Police responded to a strong-armed rob...
1 [E. Washington Ave.] [battered, punches were thrown, disturbances, ... [early Sunday morning] [04/09/2018] [10:31 AM] [PIO Joel Despain] [handgun] A mother and daughter from Cottage Grove were ...
2 [7-Eleven, 2703 W. Beltline Highway, south on ... [robbed at gunpoint] [before 11:00 p.m] [12/14/2017] [9:28 AM] [PIO Joel Despain] [weapon] A convenience store clerk was robbed at gunpoi...
3 [stolen] [01/04/2016] [12:12 PM] [PIO Joel Despain] [knives] The MPD arrested a 20-year-old man Saturday af...
4 [5700 block of Barton Rd] [garage burglaries, stolen, Bicycles have been... [Early yesterday morning] [06/28/2016 ] [10:48 AM] [PIO Joel Despain] Patrol officers and members of the MPD's Burgl...
5 [North side of Madison, Crestline Dr, Green Ri... [windows were shot out] [10/31/2016] [11:59] [Sgt. Paul Jacobsen] [pellet or soft air gun] Madison Police responded to three different ca...
6 [give him the money, given money] [10/06/2017] [2:00 AM] [Lt. Timothy Radke] Clerks were closing up the store at the end of...
7 [Telephone scam, spoofing, duped] [8:48 AM] [PIO Joel Despain] Telephone scam artists conned a 24-year-old UW...

Model training

  • First we will create model using the EntityRecognizer() constructor and passing it the data object.
  • Training the model is an iterative process. We can train the model using its fit() method till the validation loss (or error rate) continues to go down with each training pass, also known as epoch. This is indicative of the model learning the task.
In [6]:
ner = EntityRecognizer(data)
In [7]:
lr=ner.lr_find()
In [8]:
ner.fit(epochs=50,lr=lr)
epoch losses val_loss
0 76.25 9.0
1 18.15 14.0
2 16.9 11.0
3 14.44 10.0
4 16.61 20.0
5 18.18 29.0
6 27.81 23.0
7 21.5 30.0
8 23.08 14.0
9 22.26 22.0
10 21.18 13.0
11 22.09 8.0
12 19.91 27.0
13 39.95 6.0
14 12.83 20.0
15 23.21 12.0
16 12.98 9.0
17 12.46 4.0
18 12.16 7.0
19 27.05 5.0
20 12.66 12.0
21 18.22 5.0
22 20.58 3.0
23 12.69 4.0
24 16.92 2.0
25 16.79 6.0
26 19.31 2.0
27 9.44 1.0
28 10.38 1.0
29 9.15 1.0
30 9.26 1.0
31 10.53 1.0
32 8.24 0.0
33 10.57 0.0
34 9.18 0.0
35 8.35 0.0
36 8.27 0.0
37 7.53 0.0
38 7.94 0.0
39 7.59 0.0
40 7.2 0.0
41 9.52 0.0
42 7.83 0.0
43 7.47 0.0
44 8.15 0.0
45 7.23 0.0
46 6.73 0.0
47 7.51 0.0
48 6.46 0.0
49 7.1 0.0

Validate results

Now we have the trained model, let's look at how the model performs.

In [9]:
ner.show_results()
100.00% [8/8 00:00<00:00]
Out[9]:
TEXT Filename Address Crime Crime_datetime Reported_date Reported_time Reporting_officer Weapon
0 Madison Police hostage negotiators and SWAT te... Example_0 3600 block of Kipling Dr. barricaded person,weapons and violence At 9:52 PM 12/02/2016 6:36 PM Sgt. Jennifer Kane
1 The MPD is investigating a bank robbery over t... Example_1 Old National Bank, 302 N. Midvale Blvd. bank robbery noon hour,just before 12:30 p.m. 02/02/2018 1:20 PM PIO Joel Despain handgun
2 A 22-year-old woman suffered a serious head in... Example_2 Allied Dr. apartment serious head injury,pushed and punched Tuesday night 11:53 AM PIO Joel Despain
3 A male suspect entered the Mobile gas station ... Example_3 600 block of Cottage grove Road demanded money 02/19/2019 11:42 PM Lt. Reginald Patterson
4 A frightened Madison woman called the MPD Satu... Example_4 West Beltline Highway road-rage incident 05/13/2019 3:35 PM PIO Joel Despain gun,black handgun
5 A frightened Madison woman called the MPD Satu... Example_4 S. Whitney Way. road-rage incident 05/13/2019 3:35 PM PIO Joel Despain gun,black handgun
6 A frightened Madison woman called the MPD Satu... Example_5 West Beltline Highway road-rage incident 05/13/2019 3:35 PM PIO Joel Despain gun,black handgun
7 A frightened Madison woman called the MPD Satu... Example_5 S. Whitney Way. road-rage incident 05/13/2019 3:35 PM PIO Joel Despain gun,black handgun
8 Madison Police were called to a hit and run cr... Example_6 intersection of Maple Grove and McKee road hit and run crash,vehicle hitting a residence,... 06/05/2017 3:17 AM Sgt. Paul Jacobsen
9 Madison Police were called to a hit and run cr... Example_6 3100 blk of Silverton Trail hit and run crash,vehicle hitting a residence,... 06/05/2017 3:17 AM Sgt. Paul Jacobsen
10 A Madison woman was arrested last night after ... Example_7 S. Bedford St. damaging,swinging some type of object,smashing... night after 05/18/2016 12:13 PM PIO Joel Despain tire iron

Save and load trained models

Once you are satisfied with the model, you can save it using the save() method. This creates an Esri Model Definition (EMD file) that can be used for inferencing on new data. Saved models can also be loaded back using the load() method. load() method takes the path to the emd file as a required argument.

In [10]:
ner.save('crime_model')
Model has been saved to C:\Users\akh10431\Desktop\Repos\arcgis-python-api-18dec\samples\04_gis_analysts_data_scientists\data\EntityRecognizer\models\crime_model
In [11]:
model_path = os.path.join("data","EntityRecognizer","models", "crime_model","crime_model.emd")
model_path
Out[11]:
'data\\EntityRecognizer\\models\\crime_model\\crime_model.emd'

Model Inference

Now we can use the trained model to extract entities from new text documents using extract_entities() function. This method expects the folder path of where new text document are located, or a list of text documents.

In [ ]:
def read_zip_file(filepath): #Reads documents from a zip file and returns a document list
    zfile = zipfile.ZipFile(filepath)
    doc_list=[]
    for finfo in zfile.infolist():
        ifile = zfile.open(finfo)
        line_list = [ line.decode("ansi") for line in ifile.readlines()]
        line_list=list(map(unicodedata.normalize,repeat('NFKD'),line_list))
        doc_list.append(line_list[0])
    return doc_list
In [ ]:
reports_path = os.path.join("data", "EntityRecognizer","reports.zip")
In [ ]:
report_list = read_zip_file(reports_path) #list of new documents on which predictions are to be made.
In [ ]:
results = ner.extract_entities(report_list) #extract_entities()also accepts path of the documents folder as an argument.
In [18]:
results.head()
Out[18]:
TEXT Filename Address Crime Crime_datetime Reported_date Reported_time Reporting_officer Weapon
0 Officers were dispatched to a robbery of the ... Example_0 Associated Bank in the 1500 block of W Broadway demanded money 08/09/2018 6:17 PM Sgt. Jennifer Kane No weapon
1 The MPD was called to Pink at West Towne Mall ... Example_1 Pink at West Towne Mall thefts Tuesday night 08/18/2016 10:37 AM PIO Joel Despain
2 The MPD is seeking help locating a unique $1,... Example_2 Union St. home stolen,thief cut a bike lock that night 08/17/2016 11:09 AM PIO Joel Despain
3 A Radcliffe Drive resident said three men - a... Example_3 Radcliffe Drive targeted armed robbery early this morning 08/07/2018 11:17 AM PIO Joel Despain handguns
4 Madison Police officers were near the intersec... Example_5 intersection of Francis Street and State Street gunshot and observed a vehicle exiting the,sho... 08/10/2018 4:20 AM Lt. Daniel Nale

Publishing the results as a feature layer

The code below geocodes the extracted address and publishes the results as a feature layer.

In [19]:
# This function generates x,y coordinates based on the extracted location from the model.

def geocode_locations(processed_df, city, region, address_col):
    #creating address with city and region
    add_miner = processed_df[address_col].apply(lambda x: x+f', {city} '+f', {region}') 
    chunk_size = 200
    chunks = len(processed_df[address_col])//chunk_size+1
    batch = list()
    for i in range(chunks):
        batch.extend(batch_geocode(list(add_miner.iloc[chunk_size*i:chunk_size*(i+1)])))
    batch_geo_codes = []
    for i,item in enumerate(batch):
        if isinstance(item,dict):
            if (item['score'] > 90 and 
                    item['address'] != f'{city}, {region}'
                    and item['attributes']['City'] == f'{city}'):
                batch_geo_codes.append(item['location'])
            else:
                batch_geo_codes.append('')    
        else:
            batch_geo_codes.append('') 
    processed_df['geo_codes'] = batch_geo_codes    
    return processed_df
In [20]:
#This function converts the dataframe to a spatailly enabled dataframe.

def prepare_sdf(processed_df):
    processed_df['geo_codes_x'] = 'x'
    processed_df['geo_codes_y'] = 'y'
    for i,geo_code in processed_df['geo_codes'].iteritems():
        if geo_code == '': 
            processed_df.drop(i, inplace=True) #dropping rows with empty location
        else:
            processed_df['geo_codes_x'].loc[i] = geo_code.get('x')
            processed_df['geo_codes_y'].loc[i] = geo_code.get('y')
    
    sdf = processed_df.reset_index(drop=True)
    sdf['geo_x_y'] = sdf['geo_codes_x'].astype('str') + ',' +sdf['geo_codes_y'].astype('str')
    sdf = pd.DataFrame.spatial.from_df(sdf, address_column='geo_x_y') #adding geometry to the dataframe
    sdf.drop(['geo_codes_x','geo_codes_y','geo_x_y','geo_codes'], axis=1, inplace=True) #dropping redundant columns
    return sdf
In [21]:
#This function will publish the spatical dataframe as a feature layer.

def publish_to_feature(df, gis, layer_title:str, tags:str, city:str, 
                       region:str, address_col:str):
    processed_df = geocode_locations(df, city, region, address_col)
    sdf = prepare_sdf(processed_df)
    try:        
        layer = sdf.spatial.to_featurelayer(layer_title, gis,tags) 
    except:
        layer = sdf.spatial.to_featurelayer(layer_title, gis, tags)

    return layer    
In [22]:
# This will take few minutes to run
madison_crime_layer = publish_to_feature(results, gis, layer_title='Madison_Crime', 
                                         tags='nlp,madison,crime', city='Madison', 
                                         region='WI', address_col='Address')
In [23]:
madison_crime_layer
Out[23]:
Madison_Crime
Feature Layer Collection by arcgis_python
Last Modified: February 24, 2020
0 comments, 0 views

Visualize crime incident on map

In [22]:
result_map = gis.map('Madison, Wisconsin')
result_map.basemap = 'topographic'
In [23]:
result_map
Out[23]: