Crime analysis and clustering using geoanalytics and pyspark.ml

Introduction

Many of the poorest neighborhoods in the City of Chicago face violent crimes. With rapid increase in crime, amount of crime data is also increasing. Thus, there is a strong need to identify crime patterns in order to reduce its occurrence. Data mining using some of the most powerful tools available in ArcGIS API for Python is an effective way to analyze and detect patterns in data. Through this sample, we will demonstrate the utility of a number of geoanalytics tools including find_hot_spots, aggregate_points and calculate_density to visually understand geographical patterns.

The pyspark module available through run_python_script tool provides a collection of distributed analysis tools for data management, clustering, regression, and more. The run_python_script task automatically imports the pyspark module so you can directly interact with it. By calling this implementation of k-means in the run_python_script tool, we will cluster crime data into a predefined number of clusters. Such clusters are also useful in identifying crime patterns.

Further, based on the results of the analysis, the segmented crime map can be used to help efficiently dispatch officers throughout a city.

Necessary Imports

Input
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime as dt

import arcgis
import arcgis.geoanalytics
from arcgis.gis import GIS
from arcgis.geoanalytics.summarize_data import describe_dataset, aggregate_points
from arcgis.geoanalytics.analyze_patterns import calculate_density, find_hot_spots
from arcgis.geoanalytics.manage_data import clip_layer, run_python_script

Connect to your ArcGIS Enterprise Organization

Input
agol_gis = GIS('home')
gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')

Ensure your GIS supports GeoAnalytics

Before executing a tool, we need to ensure an ArcGIS Enterprise GIS is set up with a licensed GeoAnalytics server. To do so, call the is_supported() method after connecting to your Enterprise portal. See the Components of ArcGIS URLs documentation for details on the urls to enter in the GIS parameters based on your particular Enterprise configuration.

Input
arcgis.geoanalytics.is_supported()
Output
True

Prepare the data

To register a file share or an HDFS, we need to format datasets as subfolders within a single parent folder and register the parent folder. This parent folder becomes a datastore, and each subfolder becomes a dataset. Our folder hierarchy would look like below:

Learn more about preparing your big data file share datasets here.

Register a big data file share

The get_datastores() method of the geoanalytics module returns a DatastoreManager object that lets you search for and manage the big data file share items as Python API Datastore objects on your GeoAnalytics server.

Input
bigdata_datastore_manager = arcgis.geoanalytics.get_datastores()
bigdata_datastore_manager
Output
<DatastoreManager for https://pythonapi.playground.esri.com/ga/admin>

We will register chicago crime data as a big data file share using the add_bigdata() function on a DatastoreManager object.

When we register a directory, all subdirectories under the specified folder are also registered with the server. Always register the parent folder (for example, \machinename\mydatashare) that contains one or more individual dataset folders as the big data file share item. To learn more, see register a big data file share.

Note: You cannot browse directories in ArcGIS Server Manager. You must provide the full path to the folder you want to register, for example, \myserver\share\bigdata. Avoid using local paths, such as C:\bigdata, unless the same data folder is available on all nodes of the server site.

Input
# data_item = bigdata_datastore_manager.add_bigdata("Chicago_Crime_2001_2020", r"\\machine_name\data\chicago")
Created Big Data file share for Chicago_Crime_2001_2020
Input
bigdata_fileshares = bigdata_datastore_manager.search(id='0e7a861d-c1c5-4acc-869d-05d2cebbdbee')
bigdata_fileshares
Output
[<Datastore title:"/bigDataFileShares/GA_Data" type:"bigDataFileShare">]
Input
file_share_folder = bigdata_fileshares[0]

Once a big data file share is created, the GeoAnalytics server samples the datasets to generate a manifest, which outlines the data schema and specifies any time and geometry fields. A query of the resulting manifest returns each dataset's schema. This process can take a few minutes depending on the size of your data. Once processed, querying the manifest property returns the schema of the datasets in your big data file share.

Input
manifest = file_share_folder.manifest['datasets'][1]
manifest
Output
{'name': 'crime',
 'format': {'quoteChar': '"',
  'fieldDelimiter': ',',
  'hasHeaderRow': True,
  'encoding': 'UTF-8',
  'escapeChar': '"',
  'recordTerminator': '\n',
  'type': 'delimited',
  'extension': 'csv'},
 'schema': {'fields': [{'name': 'ID', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Case Number', 'type': 'esriFieldTypeString'},
   {'name': 'Date', 'type': 'esriFieldTypeString'},
   {'name': 'Block', 'type': 'esriFieldTypeString'},
   {'name': 'IUCR', 'type': 'esriFieldTypeString'},
   {'name': 'Primary Type', 'type': 'esriFieldTypeString'},
   {'name': 'Description', 'type': 'esriFieldTypeString'},
   {'name': 'Location Description', 'type': 'esriFieldTypeString'},
   {'name': 'Arrest', 'type': 'esriFieldTypeString'},
   {'name': 'Domestic', 'type': 'esriFieldTypeString'},
   {'name': 'Beat', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'District', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Ward', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Community Area', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'FBI Code', 'type': 'esriFieldTypeString'},
   {'name': 'X Coordinate', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Y Coordinate', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Year', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Updated On', 'type': 'esriFieldTypeString'},
   {'name': 'Latitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'Longitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'Location', 'type': 'esriFieldTypeString'}]},
 'geometry': {'geometryType': 'esriGeometryPoint',
  'spatialReference': {'wkid': 4326},
  'fields': [{'name': 'Location', 'formats': ['({y},{x})']}]},
 'time': {'timeType': 'instant',
  'timeReference': {'timeZone': 'UTC'},
  'fields': [{'name': 'Date', 'formats': ['MM/dd/yyyy hh:mm:ss a']}]}}

Get data for analysis

Adding a big data file share to the Geoanalytics server adds a corresponding big data file share item on the portal. We can search for these types of items using the item_type parameter.

Input
search_result = gis.content.search("bigDataFileShares_GA_Data", item_type = "big data file share")
search_result
Output
[<Item title:"bigDataFileShares_GA_Data" type:Big Data File Share owner:arcgis_python>]
Input
ga_item = search_result[0]
Input
ga_item
Output
bigDataFileShares_GA_Data
Big Data File Share by arcgis_python
Last Modified: May 27, 2021
0 comments, 0 views

Querying the layers property of the item returns a featureLayer representing the data. The object is actually an API Layer object.

Input
ga_item.layers
Output
[<Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/air_quality">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/crime">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/calls">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/analyze_new_york_city_taxi_data">]
Input
crime_lyr = ga_item.layers[1]
Input
illinois_blk_grps = agol_gis.content.get('a11d886be35149cb9dab0f7aac75a2af')
Input
illinois_blk_grps
Output
block_groups_illinois
block_groups_illinoisFeature Layer Collection by api_data_owner
Last Modified: May 27, 2021
0 comments, 2 views
Input
blk_lyr = illinois_blk_grps.layers[0]

We will filter the blockgroups by 031 code which is county code for Chicago.

Input
blk_lyr.filter = "COUNTYFP = '031'"
Input
m2 = gis.map('chicago')
m2
Output
Input
m2.add_layer(blk_lyr)

Describe data

The describe_dataset method provides an overview of big data. By default, the tool outputs a table layer containing calculated field statistics and a dict outlining geometry and time settings for the input layer.

Optionally, the tool can output a feature layer representing a sample set of features using the sample_size parameter, or a single polygon feature layer representing the input feature layers' extent by setting the extent_output parameter to True.

Input
description = describe_dataset(input_layer=crime_lyr,
                               extent_output=True,
                               sample_size=1000,
                               output_name="Description of crime data" + str(dt.now().microsecond),
                               return_tuple=True)
Input
description.output_json
Output
{'datasetName': 'crime',
 'datasetSource': 'Big Data File Share - Chicago_Crime_2001_2020',
 'recordCount': 7061128,
 'geometry': {'geometryType': 'Point',
  'sref': {'wkid': 4326},
  'countNonEmpty': 6993512,
  'countEmpty': 67616,
  'spatialExtent': {'xmin': -91.686565684,
   'ymin': 36.619446395,
   'xmax': -87.524529378,
   'ymax': 42.022910333}},
 'time': {'timeType': 'Instant',
  'countNonEmpty': 7061128,
  'countEmpty': 67616,
  'temporalExtent': {'start': '2001-01-01 00:00:00.000',
   'end': '2020-01-26 23:40:00.000'}}}
Input
sdf_desc_output = description.output.query(as_df=True)
sdf_desc_output.head()
Output
FIELD_NAME COUNT COUNT_NON_EMPTY AVG MIN MAX STDDEV RANGE SUM VAR ANY globalid OBJECTID
0 ID 7061128 7061128 6.468796e+06 634.0 11969378.0 3.180550e+06 11968744.0 4.567699e+13 1.011590e+13 None {46B95A04-F3C3-FA20-D745-B2C7C9E7AFAF} 1
1 Case Number 7061128 7061124 NaN NaN NaN NaN NaN NaN NaN JD114742 {7FCBD37F-459C-E78F-B873-CA734429AA9B} 2
2 Date 7061128 7061128 NaN NaN NaN NaN NaN NaN NaN 01/01/2001 12:00:00 AM {A7E0431E-0AD4-EC59-38A9-F71177ACDF45} 3
3 Block 7061128 7061128 NaN NaN NaN NaN NaN NaN NaN 061XX S FAIRFIELD AVE {FF3E7A5E-A887-D815-7812-AD995620C5A9} 4
4 IUCR 7061128 6761589 1.127044e+03 110.0 9901.0 8.126368e+02 9791.0 7.620611e+09 6.603785e+05 None {3A5F5858-F0FD-932D-DF6D-FF8355F9141B} 5
Input
description.sample_layer
Output
<FeatureLayer url:"https://ndhagsb01.esri.com/gis/rest/services/Hosted/Description_of_crime_data956049/FeatureServer/2">
Input
sdf_slyr = description.sample_layer.query(as_df=True)
sdf_slyr.head()
Output
ID Case_Number Date Block IUCR Primary_Type Description Location_Description Arrest Domestic ... Y_Coordinate Year Updated_On Latitude Longitude Location INSTANT_DATETIME globalid OBJECTID SHAPE
0 8196694 HT430829 08/04/2011 02:10:00 AM 079XX S MERRILL AVE 520.0 ASSAULT AGGRAVATED:KNIFE/CUTTING INSTR RESIDENCE true false ... 1852704.0 2011 02/10/2018 03:50:01 PM 41.750809 -87.572309 (41.750808511, -87.572308641) 2011-08-04 02:10:00 {25BA0BFD-A32B-802A-72C5-D8A698A3C06F} 1 {'x': -87.572308641, 'y': 41.750808511, 'spati...
1 5139385 HM736684 11/22/2006 09:00:00 PM 019XX N MOHAWK ST 1310.0 CRIMINAL DAMAGE TO PROPERTY OTHER false false ... 1913191.0 2006 02/10/2018 03:50:01 PM 41.917244 -87.642423 (41.917243909, -87.642422501) 2006-11-22 21:00:00 {A67F0D22-7EED-03EE-511A-49458AB189C7} 2 {'x': -87.642422501, 'y': 41.917243909, 'spati...
2 6257174 HP338636 05/16/2008 05:30:00 AM 108XX S LOWE AVE 915.0 MOTOR VEHICLE THEFT TRUCK, BUS, MOTOR HOME STREET false false ... 1832936.0 2008 02/28/2018 03:56:25 PM 41.696981 -87.638886 (41.696980545, -87.638886196) 2008-05-16 05:30:00 {5FE25286-201F-EF1D-3D6F-ECF7AC8DA402} 3 {'x': -87.638886196, 'y': 41.696980545, 'spati...
3 8518985 HV195817 01/20/2012 09:00:00 AM 047XX S KNOX AVE 840.0 THEFT FINANCIAL ID THEFT: OVER $300 RESIDENCE false false ... 1872783.0 2012 02/10/2018 03:50:01 PM 41.806897 -87.739467 (41.806896849, -87.739466549) 2012-01-20 09:00:00 {F475734C-7CC7-06DC-75F3-B1D9D6D91D8E} 4 {'x': -87.739466549, 'y': 41.806896849, 'spati...
4 3930218 HL301854 04/17/2005 11:40:00 PM 039XX W ARMITAGE AVE 1220.0 DECEPTIVE PRACTICE THEFT OF LOST/MISLAID PROP ALLEY true false ... 1912994.0 2005 02/28/2018 03:56:25 PM 41.917175 -87.725912 (41.917175309, -87.725912468) 2005-04-17 23:40:00 {862B9571-2761-454E-56E4-F19124DCC584} 5 {'x': -87.725912468, 'y': 41.917175309, 'spati...

5 rows × 26 columns

Input
m1 = gis.map('chicago')
m1
Output
Input
m1.add_layer(description.sample_layer)
Input
m1.legend = True

Analyze patterns

The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use 26771 as seen from http://epsg.io/?q=illinois%20kind%3APROJCRS.

Input
arcgis.env.process_spatial_reference = 26771 

Aggregate points

We can use the aggregate_points method in the arcgis.geoanalytics.summarize_data submodule to group call features into individual block group features. The output polygon feature layer summarizes attribute information for all calls that fall within each block group. If no calls fall within a block group, that block group will not appear in the output.

The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use the World Cylindrical Equal Area projection (WKID 54034) below. All results are stored in the spatiotemporal datastore of the Enterprise in the WGS 84 Spatial Reference.

See the GeoAnalytics Documentation for a full explanation of analysis environment settings.

Input
agg_result = aggregate_points(crime_lyr, 
                              polygon_layer=blk_lyr,
                              output_name="aggregate results of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101189","message":"The GeoAnalytics job is waiting for resources and has not started yet. The job will automatically cancel after 10 minutes.","params":{"minutes":"10"}}
{"messageCode":"BD_101189","message":"The GeoAnalytics job is waiting for resources and has not started yet. The job will automatically cancel after 10 minutes.","params":{"minutes":"10"}}
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'pointLayer'.","params":{"paramName":"pointLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
Input
agg_result
Output
aggregate_results_of_crime653441
aggregate_results_of_crime653441Feature Layer Collection by admin
Last Modified: April 09, 2020
0 comments, 0 views
Input
m3 = gis.map('chicago')
m3
Output
Input
m3.add_layer(agg_result)
Input
m3.legend = True

Calculate density

The calculate_density method creates a density map from point features by spreading known quantities of some phenomenon (represented as attributes of the points) across the map. The result is a layer of areas classified from least dense to most dense. In this example, we will create density map by aggregating points within a bin of 1 kilometer. To learn more. please see here.

Input
cal_density = calculate_density(crime_lyr,
                                weight='Uniform',
                                bin_type='Square',
                                bin_size=1,
                                bin_size_unit="Kilometers",
                                time_step_interval=1,
                                time_step_interval_unit="Years",
                                time_step_repeat_interval=1,
                                time_step_repeat_interval_unit="Months",
                                time_step_reference=dt(2001, 1, 1),
                                radius=1000,
                                radius_unit="Meters",
                                area_units='SquareKilometers',
                                output_name="calculate density of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'inputLayer'.","params":{"paramName":"inputLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
Input
m4 = gis.map('chicago')
m4
Output
Input
m4.add_layer(cal_density)
Input
m4.legend = True

The find_hot_spots tool analyzes point data and finds statistically significant spatial clustering of high (hot spots) and low (cold spots) numbers of incidents relative to the overall distribution of the data.

Find hot spots

The find_hot_spots tool analyzes point data and finds statistically significant spatial clustering of high (hot spots) and low (cold spots) numbers of incidents relative to the overall distribution of the data.

Input
hot_spots = find_hot_spots(crime_lyr, 
                           bin_size=100,
                           bin_size_unit='Meters',
                           neighborhood_distance=250,
                           neighborhood_distance_unit='Meters',
                           output_name="get hot spot areas of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'pointLayer'.","params":{"paramName":"pointLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
Input
m5 = gis.map('chicago')
m5
Output
Input
m5.add_layer(hot_spots)
Input
m5.legend = True

The darkest red features indicate areas where you can state with 99 percent confidence that the clustering of crime features is not the result of random chance but rather of some other variable that might be worth investigating. Similarly, the darkest blue features indicate that the lack of crime incidents is most likely not just random, but with 90% certainty you can state it is because of some variable in those locations. Features that are beige do not represent statistically significant clustering; the number of crimes could very likely be the result of random processes and random chance in those areas.

Use Spark Dataframe and Run Python Script

The run_python_script method executes a Python script directly in an ArcGIS GeoAnalytics server site . The script can create an analysis pipeline by chaining together multiple GeoAnalytics tools without writing intermediate results to a data store. The tool can also distribute Python functionality across the GeoAnalytics server site.

Geoanalytics Server installs a Python 3.6 environment that this tool uses. The environment includes Spark 2.2.0, the compute platform that distributes analysis across multiple cores of one or more machines in your GeoAnalytics Server site. The environment includes the pyspark module which provides a collection of distributed analysis tools for data management, clustering, regression, and more. The run_python_script task automatically imports the pyspark module so you can directly interact with it.

When using the geoanalytics and pyspark packages, most functions return analysis results as Spark DataFrame memory structures. You can write these data frames to a data store or process them in a script. This lets you chain multiple geoanalytics and pyspark tools while only writing out the final result, eliminating the need to create any bulky intermediate result layers. For more details, click here.

The Location Description field represents areas with the most common crime locations. We will write a function to group our data by location description. This will help us count the number of crimes occurring at each location type.

Input
def groupby_description():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    # group the dataframe by Location Description field and count the number of crimes for each Location Description. 
    out = df.groupBy('Location Description').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("groupby_location_description" + str(dt.now().microsecond))
Input
run_python_script(code=groupby_description, layers=[crime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def groupby_description():\\n    from datetime import datetime as dt\\n    # crime data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    # group the dataframe by Location Description field and count the number of crimes for each Location Description. \\n    out = df.groupBy(\'Location Description\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("groupby_location_description" + str(dt.now().microsecond))\\n\\ngroupby_description()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:21:15 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"18/259 distributed tasks completed.","params":{"completedTasks":"18","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"41/259 distributed tasks completed.","params":{"completedTasks":"41","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/259 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 181","params":{"resultCount":"181"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_location_description595817/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_location_description595817/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:22:03 2020 (Elapsed Time: 48.18 seconds)'}]

The result is saved as a feature layer. We can Search for the saved item using the search() method. Providing the search keyword same as the name we used for writing the result will retrieve the layer.

Input
groupby_description = gis.content.search('groupby_location_description')[0]
groupby_description_lyr = groupby_description.tables[0] #retrieve table from the item
groupby_description_df = groupby_description_lyr.query(as_df=True) #read layer as dataframe
groupby_description_df.sort_values(by='count', ascending=False, inplace=True) #sort count field in decreasing order

Location of crime

Input
groupby_description_df[:10].plot(x='Location_Description', 
                                 y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);

Street is the most frequent location for crime occurrance.

The Primary Type field contains the type for the crime. Let's investigate the most frequent type of crime in the Chicago by writing our own function:

Input
def groupby_texttype():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    # group the dataframe by TextType field and count the crime incidents for each crime type. 
    out = df.groupBy('Primary Type').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("groupby_type_of_crime" + str(dt.now().microsecond))
Input
run_python_script(code=groupby_texttype, layers=[crime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def groupby_texttype():\\n    from datetime import datetime as dt\\n    # Calls data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    # group the dataframe by TextType field and count the number of calls for each call type. \\n    out = df.groupBy(\'Primary Type\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("groupby_type_of_crime" + str(dt.now().microsecond))\\n\\ngroupby_texttype()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:55:46 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/259 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"44/259 distributed tasks completed.","params":{"completedTasks":"44","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 35","params":{"resultCount":"35"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_type_of_crime538317/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_type_of_crime538317/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:56:26 2020 (Elapsed Time: 39.68 seconds)'}]
Input
groupby_texttype = gis.content.search('groupby_type_of_crime')[0]
Input
groupby_texttype
Output
groupby_type_of_crime538317
Table Layer by admin
Last Modified: April 09, 2020
0 comments, 0 views
Input
groupby_texttype_df = groupby_texttype.tables[0].query(as_df=True)
Input
groupby_texttype_df.head()
Output
Primary_Type count globalid OBJECTID
0 OFFENSE INVOLVING CHILDREN 48412 {4120ABC0-FE3A-BBE0-ABC7-B885EEB2D5D2} 9
1 STALKING 3644 {52CC61CF-D8DE-D67B-4FC3-C8DD5DB175DE} 20
2 PUBLIC PEACE VIOLATION 49583 {4E902E77-D398-72D5-1E3C-EECF4A77B90E} 26
3 OBSCENITY 650 {A304388C-A37A-505E-D403-A90F83B04A77} 34
4 ARSON 11603 {E65FA2C6-9678-F283-A7B3-E61A40B12674} 52
Input
groupby_texttype_df.sort_values(by='count', ascending=False, inplace=True)

Type of crime

Input
groupby_texttype_df.head(10).plot(x='Primary_Type', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);

Theft is the most common type of crime in the city of Chicago.

Input
theft = groupby_texttype_df[groupby_texttype_df['Primary_Type'] == 'THEFT']
Input
theft
Output
Primary_Type count globalid OBJECTID
12 THEFT 1493302 {0CBB34E2-58C8-7D0B-01B4-D3E9CE832DC9} 102
Input
def theft_description():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    df[df['Primary Type'] == 'THEFT']
    out = df.groupBy('Location Description').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("theft_description" + str(dt.now().microsecond))
Input
run_python_script(code=theft_description, layers=[crime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def theft_description():\\n    from datetime import datetime as dt\\n    # Calls data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    df[df[\'Primary Type\'] == \'THEFT\']\\n    out = df.groupBy(\'Location Description\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("theft_description" + str(dt.now().microsecond))\\n\\ntheft_description()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:56:30 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/259 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/259 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"101/259 distributed tasks completed.","params":{"completedTasks":"101","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 181","params":{"resultCount":"181"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/theft_description406470/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/theft_description406470/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:57:11 2020 (Elapsed Time: 41.21 seconds)'}]
Input
theft_description = gis.content.search('theft_description')[0]
Input
theft_description_df = theft_description.tables[0].query(as_df=True)
Input
theft_description_df.sort_values(by='count', ascending=False, inplace=True)

Location of theft

Input
theft_description_df[:10].plot(x='Location_Description', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);

This plot shows the relation between crime type and crime location. It indicates that most of the theft activities occur on streets.

Input
def grpby_type_blkgrp():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    df = layers[0]
    out = df.groupBy('Primary Type', 'Block').count()
    out.write.format("webgis").save("grpby_type_blkgrp" + str(dt.now().microsecond))
Input
run_python_script(code=grpby_type_blkgrp, layers=[crime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def grpby_type_blkgrp():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    df = layers[0]\\n    out = df.groupBy(\'Primary Type\', \'Block\').count()\\n    out.write.format("webgis").save("grpby_type_blkgrp" + str(dt.now().microsecond))\\n\\ngrpby_type_blkgrp()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:57:14 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/259 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/259 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/259 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/259 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"163/259 distributed tasks completed.","params":{"completedTasks":"163","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"206/259 distributed tasks completed.","params":{"completedTasks":"206","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"245/259 distributed tasks completed.","params":{"completedTasks":"245","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 571108","params":{"resultCount":"571108"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/grpby_type_blkgrp322476/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/grpby_type_blkgrp322476/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:58:17 2020 (Elapsed Time: 1 minutes 3 seconds)'}]
Input
grpby_cat_blk = gis.content.search('grpby_type_blkgrp')[0]
Input
grpby_cat_blk
Output
grpby_type_blkgrp322476
Table Layer by admin
Last Modified: April 09, 2020
0 comments, 0 views
Input
grpby_cat_blk_df = grpby_cat_blk.tables[0].query(as_df=True)
Input
grpby_cat_blk_df.head()
Output
Block OBJECTID Primary_Type count globalid
0 096XX S MICHIGAN AV 1 BATTERY 34 {AC1470E0-614D-BB2A-901E-034720D62910}
1 070XX S LAFAYETTE ST 2 ASSAULT 1 {5876F9DE-F728-22A9-270C-710E48FA17FB}
2 061XX S COTTAGE GROVE 3 CRIMINAL TRESPASS 75 {3D7FC5D0-F670-53B9-7135-5DB84B1048E4}
3 014XX W MONTROSE AV 4 OTHER OFFENSE 1 {478D77DC-DAFA-C59D-08B0-29911B4F50FD}
4 055XX N LAKE SHORE DR 5 OTHER OFFENSE 1 {69281C9B-5158-8A73-7AEF-1EB4102276C2}

Count of crime incidents by block group

Input
grpby_cat_blk_df.sort_values(by='count', ascending=False, inplace=True)
Input
grpby_cat_blk_df.head(10).plot(x='Block', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);

Get crime types for a particular block group

Input
blk_addr_high = grpby_cat_blk_df[grpby_cat_blk_df['Block'] == '001XX N STATE ST']
Input
blk_addr_high.Primary_Type.sort_values(ascending=False).head()
Output
143115    WEAPONS VIOLATION
766                   THEFT
122685             STALKING
94954           SEX OFFENSE
28868               ROBBERY
Name: Primary_Type, dtype: object
Input
def crime_by_datetime():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    from pyspark.sql import functions as F
    df = layers[0]
    out = df.withColumn('datetime', F.unix_timestamp('Date', 'dd/MM/yyyy hh:mm:ss a').cast('timestamp'))
    out.write.format("webgis").save("crime_by_datetime" + str(dt.now().microsecond))
Input
run_python_script(code=crime_by_datetime, layers=[crime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def crime_by_datetime():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    from pyspark.sql import functions as F\\n    df = layers[0]\\n    out = df.withColumn(\'datetime\', F.unix_timestamp(\'Date\', \'dd/MM/yyyy hh:mm:ss a\').cast(\'timestamp\'))\\n    out.write.format("webgis").save("crime_by_datetime" + str(dt.now().microsecond))\\n\\ncrime_by_datetime()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:39:44 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"5/59 distributed tasks completed.","params":{"completedTasks":"5","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/59 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"13/59 distributed tasks completed.","params":{"completedTasks":"13","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/59 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/59 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/59 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"26/59 distributed tasks completed.","params":{"completedTasks":"26","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"29/59 distributed tasks completed.","params":{"completedTasks":"29","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"33/59 distributed tasks completed.","params":{"completedTasks":"33","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"37/59 distributed tasks completed.","params":{"completedTasks":"37","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"40/59 distributed tasks completed.","params":{"completedTasks":"40","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/59 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/59 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 7061128","params":{"resultCount":"7061128"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:42:31 2020 (Elapsed Time: 2 minutes 46 seconds)'}]
Input
calls_with_datetime = gis.content.search('crime_by_datetime')[0]
Input
calls_with_datetime_lyr = calls_with_datetime.layers[0]
Input
def crime_with_added_date_time_cols():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    from pyspark.sql.functions import year, month, hour
    df = layers[0]
    df = df.withColumn('month', month(df['datetime']))
    out = df.withColumn('hour', hour(df['datetime']))
    out.write.format("webgis").save("crime_with_added_date_time_cols" + str(dt.now().microsecond))
Input
run_python_script(code=crime_with_added_date_time_cols, layers=[calls_with_datetime_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def crime_with_added_date_time_cols():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    from pyspark.sql.functions import year, month, hour\\n    df = layers[0]\\n    df = df.withColumn(\'month\', month(df[\'datetime\']))\\n    out = df.withColumn(\'hour\', hour(df[\'datetime\']))\\n    out.write.format("webgis").save("crime_with_added_date_time_cols" + str(dt.now().microsecond))\\n\\ncrime_with_added_date_time_cols()" https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0 "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:42:34 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 66 tasks.","params":{"totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/66 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/66 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"3/66 distributed tasks completed.","params":{"completedTasks":"3","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"10/66 distributed tasks completed.","params":{"completedTasks":"10","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"15/66 distributed tasks completed.","params":{"completedTasks":"15","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/66 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/66 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/66 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"29/66 distributed tasks completed.","params":{"completedTasks":"29","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"32/66 distributed tasks completed.","params":{"completedTasks":"32","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"36/66 distributed tasks completed.","params":{"completedTasks":"36","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"37/66 distributed tasks completed.","params":{"completedTasks":"37","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"38/66 distributed tasks completed.","params":{"completedTasks":"38","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"42/66 distributed tasks completed.","params":{"completedTasks":"42","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/66 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/66 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"51/66 distributed tasks completed.","params":{"completedTasks":"51","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"55/66 distributed tasks completed.","params":{"completedTasks":"55","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"56/66 distributed tasks completed.","params":{"completedTasks":"56","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/66 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"66/66 distributed tasks completed.","params":{"completedTasks":"66","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 7061128","params":{"resultCount":"7061128"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:47:06 2020 (Elapsed Time: 4 minutes 32 seconds)'}]
Input
date_time_added_item = gis.content.search('crime_with_added_date_time_cols')
Input
date_time_added_lyr = date_time_added_item[0].layers[0]
Input
def grp_crime_by_hour():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    df = layers[0]
    out = df.groupBy('hour').count()
    out.write.format("webgis").save("grp_crime_by_hour" + str(dt.now().microsecond))
Input
run_python_script(code=grp_crime_by_hour, layers=[date_time_added_lyr])
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def grp_crime_by_hour():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    df = layers[0]\\n    out = df.groupBy(\'hour\').count()\\n    out.write.format("webgis").save("grp_crime_by_hour" + str(dt.now().microsecond))\\n\\ngrp_crime_by_hour()" https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0 "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:47:09 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 266 tasks.","params":{"totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/266 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/266 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"7/266 distributed tasks completed.","params":{"completedTasks":"7","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"11/266 distributed tasks completed.","params":{"completedTasks":"11","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/266 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/266 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"27/266 distributed tasks completed.","params":{"completedTasks":"27","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"32/266 distributed tasks completed.","params":{"completedTasks":"32","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"36/266 distributed tasks completed.","params":{"completedTasks":"36","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"41/266 distributed tasks completed.","params":{"completedTasks":"41","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/266 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"52/266 distributed tasks completed.","params":{"completedTasks":"52","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"58/266 distributed tasks completed.","params":{"completedTasks":"58","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"62/266 distributed tasks completed.","params":{"completedTasks":"62","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"266/266 distributed tasks completed.","params":{"completedTasks":"266","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 25","params":{"resultCount":"25"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/grp_crime_by_hour391644/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/grp_crime_by_hour391644/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:49:15 2020 (Elapsed Time: 2 minutes 5 seconds)'}]
Input
hour = gis.content.search('grp_crime_by_hour')[0]
Input
grp_hour = hour.tables[0]
Input
df_hour = grp_hour.query(as_df=True)

Crime distribution by the hour

Input
(df_hour
 .dropna()
 .sort_values(by='hour')
 .astype({'hour' : int})
 .plot(x='hour', y='count', kind='bar'))
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);

This graph shows that the crime activities are more common at the peak hours 12 A.M. and 12 P.M.

Big data machine learning using pyspark.ml

Find the optimal number of clusters

The average silhouette approach measures the quality of a clustering. That is, it determines how well each object lies within its cluster. A high average silhouette width indicates a good clustering. To learn more about silhouette analysis, click here.

Input
def optimal_k():
    import time
    import numpy as np
    import pandas as pd
    from pyspark.ml.feature import VectorAssembler
    from pyspark.ml.clustering import KMeans
    from datetime import datetime as dt
    from pyspark.ml.evaluation import ClusteringEvaluator
    from pyspark.sql.context import SQLContext
    from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, FloatType

    silh_lst = []
    k_lst = np.arange(3, 70)

    crime_locations = layers[0]
    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")
    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)
    
    for k in k_lst:
        silh_val = []
        for run in np.arange(1, 3):
            # Trains a k-means model.
            kmeans = KMeans().setK(int(k)).setSeed(int(np.random.randint(100, size=1)))
            model = kmeans.fit(crime_locations.select("features"))

            # Make predictions
            predictions = model.transform(crime_locations)

            # Evaluate clustering by computing Silhouette score
            evaluator = ClusteringEvaluator()
            silhouette = evaluator.evaluate(predictions)
            silh_val.append(silhouette)

        silh_array=np.asanyarray(silh_val)
        silh_lst.append(silh_array.mean())        

    silhouette = pd.DataFrame(list(zip(k_lst,silh_lst)),columns = ['k', 'silhouette'])
    schema = StructType([StructField('k',IntegerType(),True), StructField('silhouette',FloatType(),True)])
    out = SQLContext(sparkContext=spark.sparkContext, sparkSession=spark).createDataFrame(silhouette, schema)
    # Write the result DataFrame to the relational data store
    out.write.format("webgis").option("dataStore","relational").save("optimalKmeans" + str(dt.now().microsecond))
Input
run_python_script(code=optimal_k, layers=[crime_lyr])
Input
optimal_k = gis.content.search('optimalKmeans')[0]
Input
optimal_k_tbl = optimal_k.tables[0]
Input
k_df = optimal_k_tbl.query().sdf
Input
k_df.sort_values(by='silhouette', ascending=False)
Output
objectid k silhouette
54 58 15 0.556612
22 23 19 0.556012
2 3 9 0.555995
39 40 14 0.552853
38 39 11 0.551726
... ... ... ...
24 25 25 0.527496
19 20 7 0.527266
26 27 34 0.525585
37 38 8 0.507064
36 37 5 0.492071

67 rows × 3 columns

Input
num_clusters = k_df.sort_values(by='silhouette', ascending=False).loc[0]['k']
num_clusters
Output
15

K-Means Clustering

Input
def cluster_crimes():
    
    from pyspark.ml.feature import VectorAssembler
    from pyspark.ml.clustering import KMeans
    from datetime import datetime as dt
    # Crime data is stored in a feature service and accessed as a DataFrame via the layers object
    crime_locations = layers[0]
    
    # Combine the x and y columns in the DataFrame into a single column called "features"
    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")
    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)

    # Fit a k-means model with 15 clusters using the "features" column of the crime locations
    kmeans = KMeans(k=15)
    model = kmeans.fit(crime_locations.select("features"))
    
    cost = model.computeCost(crime_locations)
    # Add the cluster labels from the k-means model to the original DataFrame
    crime_locations_clusters = model.transform(crime_locations)
    # Write the result DataFrame to the relational data store
    crime_locations_clusters.write.format("webgis").save("Crime_Clusters_KMeans" + str(dt.now().microsecond))
Input
run_python_script(code=cluster_crimes, layers=[crime_lyr])
{"messageCode":"BD_101231","message":"The following fields are not supported and will be dropped: features","params":{"fields":"features"}}
Output
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def cluster_crimes():\\n    \\n    from pyspark.ml.feature import VectorAssembler\\n    from pyspark.ml.clustering import KMeans\\n    from datetime import datetime as dt\\n    # Crime data is stored in a feature service and accessed as a DataFrame via the layers object\\n    crime_locations = layers[0]\\n    \\n    # Combine the x and y columns in the DataFrame into a single column called "features"\\n    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")\\n    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)\\n\\n    # Fit a k-means model with 50 clusters using the "features" column of the crime locations\\n    kmeans = KMeans(k=15)\\n    model = kmeans.fit(crime_locations.select("features"))\\n    \\n    cost = model.computeCost(crime_locations)\\n    print(\'cost\', cost)\\n    # Add the cluster labels from the k-means model to the original DataFrame\\n    crime_locations_clusters = model.transform(crime_locations)\\n    # Write the result DataFrame to the relational data store\\n    crime_locations_clusters.write.format("webgis").save("Crime_Clusters_KMeans" + str(dt.now().microsecond))\\n\\ncluster_crimes()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Fri Apr 10 08:14:38 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"23/59 distributed tasks completed.","params":{"completedTasks":"23","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"52/59 distributed tasks completed.","params":{"completedTasks":"52","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/259 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"30/259 distributed tasks completed.","params":{"completedTasks":"30","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"53/259 distributed tasks completed.","params":{"completedTasks":"53","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"17/59 distributed tasks completed.","params":{"completedTasks":"17","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"42/59 distributed tasks completed.","params":{"completedTasks":"42","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101138","message":"[Python] cost 450444847551758.0","params":{"text":"cost 450444847551758.0"}}'},
 {'type': 'esriJobMessageTypeWarning',
  'description': '{"messageCode":"BD_101231","message":"The following fields are not supported and will be dropped: features","params":{"fields":"features"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"5/59 distributed tasks completed.","params":{"completedTasks":"5","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/59 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"12/59 distributed tasks completed.","params":{"completedTasks":"12","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"18/59 distributed tasks completed.","params":{"completedTasks":"18","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/59 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/59 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"26/59 distributed tasks completed.","params":{"completedTasks":"26","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"28/59 distributed tasks completed.","params":{"completedTasks":"28","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"31/59 distributed tasks completed.","params":{"completedTasks":"31","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"35/59 distributed tasks completed.","params":{"completedTasks":"35","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"38/59 distributed tasks completed.","params":{"completedTasks":"38","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"40/59 distributed tasks completed.","params":{"completedTasks":"40","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/59 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/59 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/59 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"53/59 distributed tasks completed.","params":{"completedTasks":"53","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 6993512","params":{"resultCount":"6993512"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/Crime_Clusters_KMeans540499/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/Crime_Clusters_KMeans540499/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Fri Apr 10 08:19:30 2020 (Elapsed Time: 4 minutes 52 seconds)'}]
Input
clusters = gis.content.search('Crime_Clusters_KMeans')[0]
Input
clusters
Output
Crime_Clusters_KMeans540499
Feature Layer Collection by admin
Last Modified: April 10, 2020
0 comments, 5 views

By symbolizing on the predictions made by the k-means model, we can visualize the clustered crime events as shown in the screen shot above.

Conclusion

In this sample, we have covered how to chain together geoanalytics and pyspark tools in order to analyze big data, while only writing out the final result to a data store, eliminating the need to create any intermediate result layers. We have really gained a lot of knowledge about the use of data mining and clustering to help manage huge amount of data and deduce useful information from criminal data.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.