Crime analysis and clustering using geoanalytics and pyspark.ml

Introduction

Many of the poorest neighborhoods in the City of Chicago face violent crimes. With rapid increase in crime, amount of crime data is also increasing. Thus, there is a strong need to identify crime patterns in order to reduce its occurrence. Data mining using some of the most powerful tools available in ArcGIS API for Python is an effective way to analyze and detect patterns in data. Through this sample, we will demonstrate the utility of a number of geoanalytics tools including find_hot_spots, aggregate_points and calculate_density to visually understand geographical patterns.

The pyspark module available through run_python_script tool provides a collection of distributed analysis tools for data management, clustering, regression, and more. The run_python_script task automatically imports the pyspark module so you can directly interact with it. By calling this implementation of k-means in the run_python_script tool, we will cluster crime data into a predefined number of clusters. Such clusters are also useful in identifying crime patterns.

Further, based on the results of the analysis, the segmented crime map can be used to help efficiently dispatch officers throughout a city.

Necessary Imports

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime as dt

import arcgis
import arcgis.geoanalytics
from arcgis.gis import GIS
from arcgis.geoanalytics.summarize_data import describe_dataset, aggregate_points
from arcgis.geoanalytics.analyze_patterns import calculate_density, find_hot_spots
from arcgis.geoanalytics.manage_data import clip_layer, run_python_script

Connect to your ArcGIS Enterprise Organization

agol_gis = GIS('home')
gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')

Ensure your GIS supports GeoAnalytics

Before executing a tool, we need to ensure an ArcGIS Enterprise GIS is set up with a licensed GeoAnalytics server. To do so, call the is_supported() method after connecting to your Enterprise portal. See the Components of ArcGIS URLs documentation for details on the urls to enter in the GIS parameters based on your particular Enterprise configuration.

arcgis.geoanalytics.is_supported()
True

Prepare the data

To register a file share or an HDFS, we need to format datasets as subfolders within a single parent folder and register the parent folder. This parent folder becomes a datastore, and each subfolder becomes a dataset. Our folder hierarchy would look like below:

Learn more about preparing your big data file share datasets here.

Register a big data file share

The get_datastores() method of the geoanalytics module returns a DatastoreManager object that lets you search for and manage the big data file share items as Python API Datastore objects on your GeoAnalytics server.

bigdata_datastore_manager = arcgis.geoanalytics.get_datastores()
bigdata_datastore_manager
<DatastoreManager for https://pythonapi.playground.esri.com/ga/admin>

We will register chicago crime data as a big data file share using the add_bigdata() function on a DatastoreManager object.

When we register a directory, all subdirectories under the specified folder are also registered with the server. Always register the parent folder (for example, \machinename\mydatashare) that contains one or more individual dataset folders as the big data file share item. To learn more, see register a big data file share.

Note: You cannot browse directories in ArcGIS Server Manager. You must provide the full path to the folder you want to register, for example, \myserver\share\bigdata. Avoid using local paths, such as C:\bigdata, unless the same data folder is available on all nodes of the server site.

# data_item = bigdata_datastore_manager.add_bigdata("Chicago_Crime_2001_2020", r"\\machine_name\data\chicago")
Created Big Data file share for Chicago_Crime_2001_2020
bigdata_fileshares = bigdata_datastore_manager.search(id='0e7a861d-c1c5-4acc-869d-05d2cebbdbee')
bigdata_fileshares
[<Datastore title:"/bigDataFileShares/GA_Data" type:"bigDataFileShare">]
file_share_folder = bigdata_fileshares[0]

Once a big data file share is created, the GeoAnalytics server samples the datasets to generate a manifest, which outlines the data schema and specifies any time and geometry fields. A query of the resulting manifest returns each dataset's schema. This process can take a few minutes depending on the size of your data. Once processed, querying the manifest property returns the schema of the datasets in your big data file share.

manifest = file_share_folder.manifest['datasets'][1]
manifest
{'name': 'crime',
 'format': {'quoteChar': '"',
  'fieldDelimiter': ',',
  'hasHeaderRow': True,
  'encoding': 'UTF-8',
  'escapeChar': '"',
  'recordTerminator': '\n',
  'type': 'delimited',
  'extension': 'csv'},
 'schema': {'fields': [{'name': 'ID', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Case Number', 'type': 'esriFieldTypeString'},
   {'name': 'Date', 'type': 'esriFieldTypeString'},
   {'name': 'Block', 'type': 'esriFieldTypeString'},
   {'name': 'IUCR', 'type': 'esriFieldTypeString'},
   {'name': 'Primary Type', 'type': 'esriFieldTypeString'},
   {'name': 'Description', 'type': 'esriFieldTypeString'},
   {'name': 'Location Description', 'type': 'esriFieldTypeString'},
   {'name': 'Arrest', 'type': 'esriFieldTypeString'},
   {'name': 'Domestic', 'type': 'esriFieldTypeString'},
   {'name': 'Beat', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'District', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Ward', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Community Area', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'FBI Code', 'type': 'esriFieldTypeString'},
   {'name': 'X Coordinate', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Y Coordinate', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Year', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'Updated On', 'type': 'esriFieldTypeString'},
   {'name': 'Latitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'Longitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'Location', 'type': 'esriFieldTypeString'}]},
 'geometry': {'geometryType': 'esriGeometryPoint',
  'spatialReference': {'wkid': 4326},
  'fields': [{'name': 'Location', 'formats': ['({y},{x})']}]},
 'time': {'timeType': 'instant',
  'timeReference': {'timeZone': 'UTC'},
  'fields': [{'name': 'Date', 'formats': ['MM/dd/yyyy hh:mm:ss a']}]}}

Get data for analysis

Adding a big data file share to the Geoanalytics server adds a corresponding big data file share item on the portal. We can search for these types of items using the item_type parameter.

search_result = gis.content.search("bigDataFileShares_GA_Data", item_type = "big data file share")
search_result
[<Item title:"bigDataFileShares_GA_Data" type:Big Data File Share owner:arcgis_python>]
ga_item = search_result[0]
ga_item
bigDataFileShares_GA_Data
Big Data File Share by arcgis_python
Last Modified: May 27, 2021
0 comments, 0 views

Querying the layers property of the item returns a featureLayer representing the data. The object is actually an API Layer object.

ga_item.layers
[<Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/air_quality">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/crime">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/calls">,
 <Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_GA_Data/BigDataCatalogServer/analyze_new_york_city_taxi_data">]
crime_lyr = ga_item.layers[1]
illinois_blk_grps = agol_gis.content.get('a11d886be35149cb9dab0f7aac75a2af')
illinois_blk_grps
block_groups_illinois
block_groups_illinoisFeature Layer Collection by api_data_owner
Last Modified: May 27, 2021
0 comments, 2 views
blk_lyr = illinois_blk_grps.layers[0]

We will filter the blockgroups by 031 code which is county code for Chicago.

blk_lyr.filter = "COUNTYFP = '031'"
m2 = gis.map('chicago')
m2
m2.add_layer(blk_lyr)

Describe data

The describe_dataset method provides an overview of big data. By default, the tool outputs a table layer containing calculated field statistics and a dict outlining geometry and time settings for the input layer.

Optionally, the tool can output a feature layer representing a sample set of features using the sample_size parameter, or a single polygon feature layer representing the input feature layers' extent by setting the extent_output parameter to True.

description = describe_dataset(input_layer=crime_lyr,
                               extent_output=True,
                               sample_size=1000,
                               output_name="Description of crime data" + str(dt.now().microsecond),
                               return_tuple=True)
description.output_json
{'datasetName': 'crime',
 'datasetSource': 'Big Data File Share - Chicago_Crime_2001_2020',
 'recordCount': 7061128,
 'geometry': {'geometryType': 'Point',
  'sref': {'wkid': 4326},
  'countNonEmpty': 6993512,
  'countEmpty': 67616,
  'spatialExtent': {'xmin': -91.686565684,
   'ymin': 36.619446395,
   'xmax': -87.524529378,
   'ymax': 42.022910333}},
 'time': {'timeType': 'Instant',
  'countNonEmpty': 7061128,
  'countEmpty': 67616,
  'temporalExtent': {'start': '2001-01-01 00:00:00.000',
   'end': '2020-01-26 23:40:00.000'}}}
sdf_desc_output = description.output.query(as_df=True)
sdf_desc_output.head()
FIELD_NAMECOUNTCOUNT_NON_EMPTYAVGMINMAXSTDDEVRANGESUMVARANYglobalidOBJECTID
0ID706112870611286.468796e+06634.011969378.03.180550e+0611968744.04.567699e+131.011590e+13None{46B95A04-F3C3-FA20-D745-B2C7C9E7AFAF}1
1Case Number70611287061124NaNNaNNaNNaNNaNNaNNaNJD114742{7FCBD37F-459C-E78F-B873-CA734429AA9B}2
2Date70611287061128NaNNaNNaNNaNNaNNaNNaN01/01/2001 12:00:00 AM{A7E0431E-0AD4-EC59-38A9-F71177ACDF45}3
3Block70611287061128NaNNaNNaNNaNNaNNaNNaN061XX S FAIRFIELD AVE{FF3E7A5E-A887-D815-7812-AD995620C5A9}4
4IUCR706112867615891.127044e+03110.09901.08.126368e+029791.07.620611e+096.603785e+05None{3A5F5858-F0FD-932D-DF6D-FF8355F9141B}5
description.sample_layer
<FeatureLayer url:"https://ndhagsb01.esri.com/gis/rest/services/Hosted/Description_of_crime_data956049/FeatureServer/2">
sdf_slyr = description.sample_layer.query(as_df=True)
sdf_slyr.head()
IDCase_NumberDateBlockIUCRPrimary_TypeDescriptionLocation_DescriptionArrestDomestic...Y_CoordinateYearUpdated_OnLatitudeLongitudeLocationINSTANT_DATETIMEglobalidOBJECTIDSHAPE
08196694HT43082908/04/2011 02:10:00 AM079XX S MERRILL AVE520.0ASSAULTAGGRAVATED:KNIFE/CUTTING INSTRRESIDENCEtruefalse...1852704.0201102/10/2018 03:50:01 PM41.750809-87.572309(41.750808511, -87.572308641)2011-08-04 02:10:00{25BA0BFD-A32B-802A-72C5-D8A698A3C06F}1{'x': -87.572308641, 'y': 41.750808511, 'spati...
15139385HM73668411/22/2006 09:00:00 PM019XX N MOHAWK ST1310.0CRIMINAL DAMAGETO PROPERTYOTHERfalsefalse...1913191.0200602/10/2018 03:50:01 PM41.917244-87.642423(41.917243909, -87.642422501)2006-11-22 21:00:00{A67F0D22-7EED-03EE-511A-49458AB189C7}2{'x': -87.642422501, 'y': 41.917243909, 'spati...
26257174HP33863605/16/2008 05:30:00 AM108XX S LOWE AVE915.0MOTOR VEHICLE THEFTTRUCK, BUS, MOTOR HOMESTREETfalsefalse...1832936.0200802/28/2018 03:56:25 PM41.696981-87.638886(41.696980545, -87.638886196)2008-05-16 05:30:00{5FE25286-201F-EF1D-3D6F-ECF7AC8DA402}3{'x': -87.638886196, 'y': 41.696980545, 'spati...
38518985HV19581701/20/2012 09:00:00 AM047XX S KNOX AVE840.0THEFTFINANCIAL ID THEFT: OVER $300RESIDENCEfalsefalse...1872783.0201202/10/2018 03:50:01 PM41.806897-87.739467(41.806896849, -87.739466549)2012-01-20 09:00:00{F475734C-7CC7-06DC-75F3-B1D9D6D91D8E}4{'x': -87.739466549, 'y': 41.806896849, 'spati...
43930218HL30185404/17/2005 11:40:00 PM039XX W ARMITAGE AVE1220.0DECEPTIVE PRACTICETHEFT OF LOST/MISLAID PROPALLEYtruefalse...1912994.0200502/28/2018 03:56:25 PM41.917175-87.725912(41.917175309, -87.725912468)2005-04-17 23:40:00{862B9571-2761-454E-56E4-F19124DCC584}5{'x': -87.725912468, 'y': 41.917175309, 'spati...

5 rows × 26 columns

m1 = gis.map('chicago')
m1
m1.add_layer(description.sample_layer)
m1.legend = True

Analyze patterns

The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use 26771 as seen from http://epsg.io/?q=illinois%20kind%3APROJCRS.

arcgis.env.process_spatial_reference = 26771 

Aggregate points

We can use the aggregate_points method in the arcgis.geoanalytics.summarize_data submodule to group call features into individual block group features. The output polygon feature layer summarizes attribute information for all calls that fall within each block group. If no calls fall within a block group, that block group will not appear in the output.

The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use the World Cylindrical Equal Area projection (WKID 54034) below. All results are stored in the spatiotemporal datastore of the Enterprise in the WGS 84 Spatial Reference.

See the GeoAnalytics Documentation for a full explanation of analysis environment settings.

agg_result = aggregate_points(crime_lyr, 
                              polygon_layer=blk_lyr,
                              output_name="aggregate results of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101189","message":"The GeoAnalytics job is waiting for resources and has not started yet. The job will automatically cancel after 10 minutes.","params":{"minutes":"10"}}
{"messageCode":"BD_101189","message":"The GeoAnalytics job is waiting for resources and has not started yet. The job will automatically cancel after 10 minutes.","params":{"minutes":"10"}}
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'pointLayer'.","params":{"paramName":"pointLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
agg_result
aggregate_results_of_crime653441
aggregate_results_of_crime653441Feature Layer Collection by admin
Last Modified: April 09, 2020
0 comments, 0 views
m3 = gis.map('chicago')
m3
m3.add_layer(agg_result)
m3.legend = True

Calculate density

The calculate_density method creates a density map from point features by spreading known quantities of some phenomenon (represented as attributes of the points) across the map. The result is a layer of areas classified from least dense to most dense. In this example, we will create density map by aggregating points within a bin of 1 kilometer. To learn more. please see here.

cal_density = calculate_density(crime_lyr,
                                weight='Uniform',
                                bin_type='Square',
                                bin_size=1,
                                bin_size_unit="Kilometers",
                                time_step_interval=1,
                                time_step_interval_unit="Years",
                                time_step_repeat_interval=1,
                                time_step_repeat_interval_unit="Months",
                                time_step_reference=dt(2001, 1, 1),
                                radius=1000,
                                radius_unit="Meters",
                                area_units='SquareKilometers',
                                output_name="calculate density of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'inputLayer'.","params":{"paramName":"inputLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
m4 = gis.map('chicago')
m4
m4.add_layer(cal_density)
m4.legend = True

The find_hot_spots tool analyzes point data and finds statistically significant spatial clustering of high (hot spots) and low (cold spots) numbers of incidents relative to the overall distribution of the data.

Find hot spots

The find_hot_spots tool analyzes point data and finds statistically significant spatial clustering of high (hot spots) and low (cold spots) numbers of incidents relative to the overall distribution of the data.

hot_spots = find_hot_spots(crime_lyr, 
                           bin_size=100,
                           bin_size_unit='Meters',
                           neighborhood_distance=250,
                           neighborhood_distance_unit='Meters',
                           output_name="get hot spot areas of crime" + str(dt.now().microsecond))
{"messageCode":"BD_101051","message":"Possible issues were found while reading 'pointLayer'.","params":{"paramName":"pointLayer"}}
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."}
m5 = gis.map('chicago')
m5
m5.add_layer(hot_spots)
m5.legend = True

The darkest red features indicate areas where you can state with 99 percent confidence that the clustering of crime features is not the result of random chance but rather of some other variable that might be worth investigating. Similarly, the darkest blue features indicate that the lack of crime incidents is most likely not just random, but with 90% certainty you can state it is because of some variable in those locations. Features that are beige do not represent statistically significant clustering; the number of crimes could very likely be the result of random processes and random chance in those areas.

Use Spark Dataframe and Run Python Script

The run_python_script method executes a Python script directly in an ArcGIS GeoAnalytics server site . The script can create an analysis pipeline by chaining together multiple GeoAnalytics tools without writing intermediate results to a data store. The tool can also distribute Python functionality across the GeoAnalytics server site.

Geoanalytics Server installs a Python 3.6 environment that this tool uses. The environment includes Spark 2.2.0, the compute platform that distributes analysis across multiple cores of one or more machines in your GeoAnalytics Server site. The environment includes the pyspark module which provides a collection of distributed analysis tools for data management, clustering, regression, and more. The run_python_script task automatically imports the pyspark module so you can directly interact with it.

When using the geoanalytics and pyspark packages, most functions return analysis results as Spark DataFrame memory structures. You can write these data frames to a data store or process them in a script. This lets you chain multiple geoanalytics and pyspark tools while only writing out the final result, eliminating the need to create any bulky intermediate result layers. For more details, click here.

The Location Description field represents areas with the most common crime locations. We will write a function to group our data by location description. This will help us count the number of crimes occurring at each location type.

def groupby_description():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    # group the dataframe by Location Description field and count the number of crimes for each Location Description. 
    out = df.groupBy('Location Description').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("groupby_location_description" + str(dt.now().microsecond))
run_python_script(code=groupby_description, layers=[crime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def groupby_description():\\n    from datetime import datetime as dt\\n    # crime data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    # group the dataframe by Location Description field and count the number of crimes for each Location Description. \\n    out = df.groupBy(\'Location Description\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("groupby_location_description" + str(dt.now().microsecond))\\n\\ngroupby_description()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:21:15 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"18/259 distributed tasks completed.","params":{"completedTasks":"18","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"41/259 distributed tasks completed.","params":{"completedTasks":"41","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/259 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 181","params":{"resultCount":"181"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_location_description595817/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_location_description595817/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:22:03 2020 (Elapsed Time: 48.18 seconds)'}]

The result is saved as a feature layer. We can Search for the saved item using the search() method. Providing the search keyword same as the name we used for writing the result will retrieve the layer.

groupby_description = gis.content.search('groupby_location_description')[0]
groupby_description_lyr = groupby_description.tables[0] #retrieve table from the item
groupby_description_df = groupby_description_lyr.query(as_df=True) #read layer as dataframe
groupby_description_df.sort_values(by='count', ascending=False, inplace=True) #sort count field in decreasing order

Location of crime

groupby_description_df[:10].plot(x='Location_Description', 
                                 y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);
<Figure size 432x288 with 1 Axes>

Street is the most frequent location for crime occurrance.

The Primary Type field contains the type for the crime. Let's investigate the most frequent type of crime in the Chicago by writing our own function:

def groupby_texttype():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    # group the dataframe by TextType field and count the crime incidents for each crime type. 
    out = df.groupBy('Primary Type').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("groupby_type_of_crime" + str(dt.now().microsecond))
run_python_script(code=groupby_texttype, layers=[crime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def groupby_texttype():\\n    from datetime import datetime as dt\\n    # Calls data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    # group the dataframe by TextType field and count the number of calls for each call type. \\n    out = df.groupBy(\'Primary Type\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("groupby_type_of_crime" + str(dt.now().microsecond))\\n\\ngroupby_texttype()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:55:46 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/259 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"44/259 distributed tasks completed.","params":{"completedTasks":"44","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 35","params":{"resultCount":"35"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_type_of_crime538317/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/groupby_type_of_crime538317/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:56:26 2020 (Elapsed Time: 39.68 seconds)'}]
groupby_texttype = gis.content.search('groupby_type_of_crime')[0]
groupby_texttype
groupby_type_of_crime538317
Table Layer by admin
Last Modified: April 09, 2020
0 comments, 0 views
groupby_texttype_df = groupby_texttype.tables[0].query(as_df=True)
groupby_texttype_df.head()
Primary_TypecountglobalidOBJECTID
0OFFENSE INVOLVING CHILDREN48412{4120ABC0-FE3A-BBE0-ABC7-B885EEB2D5D2}9
1STALKING3644{52CC61CF-D8DE-D67B-4FC3-C8DD5DB175DE}20
2PUBLIC PEACE VIOLATION49583{4E902E77-D398-72D5-1E3C-EECF4A77B90E}26
3OBSCENITY650{A304388C-A37A-505E-D403-A90F83B04A77}34
4ARSON11603{E65FA2C6-9678-F283-A7B3-E61A40B12674}52
groupby_texttype_df.sort_values(by='count', ascending=False, inplace=True)

Type of crime

groupby_texttype_df.head(10).plot(x='Primary_Type', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);
<Figure size 432x288 with 1 Axes>

Theft is the most common type of crime in the city of Chicago.

theft = groupby_texttype_df[groupby_texttype_df['Primary_Type'] == 'THEFT']
theft
Primary_TypecountglobalidOBJECTID
12THEFT1493302{0CBB34E2-58C8-7D0B-01B4-D3E9CE832DC9}102
def theft_description():
    from datetime import datetime as dt
    # crime data is stored in a feature service and accessed as a DataFrame via the layers object
    df = layers[0]
    df[df['Primary Type'] == 'THEFT']
    out = df.groupBy('Location Description').count()
    # Write the final result to our datastore.
    out.write.format("webgis").save("theft_description" + str(dt.now().microsecond))
run_python_script(code=theft_description, layers=[crime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def theft_description():\\n    from datetime import datetime as dt\\n    # Calls data is stored in a feature service and accessed as a DataFrame via the layers object\\n    df = layers[0]\\n    df[df[\'Primary Type\'] == \'THEFT\']\\n    out = df.groupBy(\'Location Description\').count()\\n    # Write the final result to our datastore.\\n    out.write.format("webgis").save("theft_description" + str(dt.now().microsecond))\\n\\ntheft_description()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:56:30 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/259 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/259 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"101/259 distributed tasks completed.","params":{"completedTasks":"101","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 181","params":{"resultCount":"181"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/theft_description406470/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/theft_description406470/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:57:11 2020 (Elapsed Time: 41.21 seconds)'}]
theft_description = gis.content.search('theft_description')[0]
theft_description_df = theft_description.tables[0].query(as_df=True)
theft_description_df.sort_values(by='count', ascending=False, inplace=True)

Location of theft

theft_description_df[:10].plot(x='Location_Description', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);
<Figure size 432x288 with 1 Axes>

This plot shows the relation between crime type and crime location. It indicates that most of the theft activities occur on streets.

def grpby_type_blkgrp():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    df = layers[0]
    out = df.groupBy('Primary Type', 'Block').count()
    out.write.format("webgis").save("grpby_type_blkgrp" + str(dt.now().microsecond))
run_python_script(code=grpby_type_blkgrp, layers=[crime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def grpby_type_blkgrp():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    df = layers[0]\\n    out = df.groupBy(\'Primary Type\', \'Block\').count()\\n    out.write.format("webgis").save("grpby_type_blkgrp" + str(dt.now().microsecond))\\n\\ngrpby_type_blkgrp()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 18:57:14 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/259 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/259 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/259 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/259 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/259 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/259 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"163/259 distributed tasks completed.","params":{"completedTasks":"163","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"206/259 distributed tasks completed.","params":{"completedTasks":"206","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"245/259 distributed tasks completed.","params":{"completedTasks":"245","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 571108","params":{"resultCount":"571108"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/grpby_type_blkgrp322476/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/grpby_type_blkgrp322476/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 18:58:17 2020 (Elapsed Time: 1 minutes 3 seconds)'}]
grpby_cat_blk = gis.content.search('grpby_type_blkgrp')[0]
grpby_cat_blk
grpby_type_blkgrp322476
Table Layer by admin
Last Modified: April 09, 2020
0 comments, 0 views
grpby_cat_blk_df = grpby_cat_blk.tables[0].query(as_df=True)
grpby_cat_blk_df.head()
BlockOBJECTIDPrimary_Typecountglobalid
0096XX S MICHIGAN AV1BATTERY34{AC1470E0-614D-BB2A-901E-034720D62910}
1070XX S LAFAYETTE ST2ASSAULT1{5876F9DE-F728-22A9-270C-710E48FA17FB}
2061XX S COTTAGE GROVE3CRIMINAL TRESPASS75{3D7FC5D0-F670-53B9-7135-5DB84B1048E4}
3014XX W MONTROSE AV4OTHER OFFENSE1{478D77DC-DAFA-C59D-08B0-29911B4F50FD}
4055XX N LAKE SHORE DR5OTHER OFFENSE1{69281C9B-5158-8A73-7AEF-1EB4102276C2}

Count of crime incidents by block group

grpby_cat_blk_df.sort_values(by='count', ascending=False, inplace=True)
grpby_cat_blk_df.head(10).plot(x='Block', y='count', kind='barh')
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);
<Figure size 432x288 with 1 Axes>

Get crime types for a particular block group

blk_addr_high = grpby_cat_blk_df[grpby_cat_blk_df['Block'] == '001XX N STATE ST']
blk_addr_high.Primary_Type.sort_values(ascending=False).head()
143115    WEAPONS VIOLATION
766                   THEFT
122685             STALKING
94954           SEX OFFENSE
28868               ROBBERY
Name: Primary_Type, dtype: object
def crime_by_datetime():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    from pyspark.sql import functions as F
    df = layers[0]
    out = df.withColumn('datetime', F.unix_timestamp('Date', 'dd/MM/yyyy hh:mm:ss a').cast('timestamp'))
    out.write.format("webgis").save("crime_by_datetime" + str(dt.now().microsecond))
run_python_script(code=crime_by_datetime, layers=[crime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def crime_by_datetime():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    from pyspark.sql import functions as F\\n    df = layers[0]\\n    out = df.withColumn(\'datetime\', F.unix_timestamp(\'Date\', \'dd/MM/yyyy hh:mm:ss a\').cast(\'timestamp\'))\\n    out.write.format("webgis").save("crime_by_datetime" + str(dt.now().microsecond))\\n\\ncrime_by_datetime()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:39:44 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"5/59 distributed tasks completed.","params":{"completedTasks":"5","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/59 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"13/59 distributed tasks completed.","params":{"completedTasks":"13","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/59 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/59 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/59 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"26/59 distributed tasks completed.","params":{"completedTasks":"26","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"29/59 distributed tasks completed.","params":{"completedTasks":"29","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"33/59 distributed tasks completed.","params":{"completedTasks":"33","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"37/59 distributed tasks completed.","params":{"completedTasks":"37","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"40/59 distributed tasks completed.","params":{"completedTasks":"40","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/59 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/59 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 7061128","params":{"resultCount":"7061128"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:42:31 2020 (Elapsed Time: 2 minutes 46 seconds)'}]
calls_with_datetime = gis.content.search('crime_by_datetime')[0]
calls_with_datetime_lyr = calls_with_datetime.layers[0]
def crime_with_added_date_time_cols():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    from pyspark.sql.functions import year, month, hour
    df = layers[0]
    df = df.withColumn('month', month(df['datetime']))
    out = df.withColumn('hour', hour(df['datetime']))
    out.write.format("webgis").save("crime_with_added_date_time_cols" + str(dt.now().microsecond))
run_python_script(code=crime_with_added_date_time_cols, layers=[calls_with_datetime_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def crime_with_added_date_time_cols():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    from pyspark.sql.functions import year, month, hour\\n    df = layers[0]\\n    df = df.withColumn(\'month\', month(df[\'datetime\']))\\n    out = df.withColumn(\'hour\', hour(df[\'datetime\']))\\n    out.write.format("webgis").save("crime_with_added_date_time_cols" + str(dt.now().microsecond))\\n\\ncrime_with_added_date_time_cols()" https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0 "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:42:34 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_by_datetime650380/FeatureServer/0'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 66 tasks.","params":{"totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/66 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/66 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"3/66 distributed tasks completed.","params":{"completedTasks":"3","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"10/66 distributed tasks completed.","params":{"completedTasks":"10","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"15/66 distributed tasks completed.","params":{"completedTasks":"15","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/66 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/66 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"24/66 distributed tasks completed.","params":{"completedTasks":"24","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"29/66 distributed tasks completed.","params":{"completedTasks":"29","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"32/66 distributed tasks completed.","params":{"completedTasks":"32","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"36/66 distributed tasks completed.","params":{"completedTasks":"36","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"37/66 distributed tasks completed.","params":{"completedTasks":"37","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"38/66 distributed tasks completed.","params":{"completedTasks":"38","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"42/66 distributed tasks completed.","params":{"completedTasks":"42","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/66 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/66 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"51/66 distributed tasks completed.","params":{"completedTasks":"51","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"55/66 distributed tasks completed.","params":{"completedTasks":"55","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"56/66 distributed tasks completed.","params":{"completedTasks":"56","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"60/66 distributed tasks completed.","params":{"completedTasks":"60","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"66/66 distributed tasks completed.","params":{"completedTasks":"66","totalTasks":"66"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 7061128","params":{"resultCount":"7061128"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:47:06 2020 (Elapsed Time: 4 minutes 32 seconds)'}]
date_time_added_item = gis.content.search('crime_with_added_date_time_cols')
date_time_added_lyr = date_time_added_item[0].layers[0]
def grp_crime_by_hour():
    from datetime import datetime as dt
    # Load the big data file share layer into a DataFrame
    df = layers[0]
    out = df.groupBy('hour').count()
    out.write.format("webgis").save("grp_crime_by_hour" + str(dt.now().microsecond))
run_python_script(code=grp_crime_by_hour, layers=[date_time_added_lyr])
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def grp_crime_by_hour():\\n    from datetime import datetime as dt\\n    # Load the big data file share layer into a DataFrame\\n    df = layers[0]\\n    out = df.groupBy(\'hour\').count()\\n    out.write.format("webgis").save("grp_crime_by_hour" + str(dt.now().microsecond))\\n\\ngrp_crime_by_hour()" https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0 "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Thu Apr  9 19:47:09 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhagsb01.esri.com/gis/rest/services/Hosted/crime_with_added_date_time_cols749239/FeatureServer/0'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 266 tasks.","params":{"totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/266 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/266 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"7/266 distributed tasks completed.","params":{"completedTasks":"7","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"11/266 distributed tasks completed.","params":{"completedTasks":"11","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"19/266 distributed tasks completed.","params":{"completedTasks":"19","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/266 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"27/266 distributed tasks completed.","params":{"completedTasks":"27","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"32/266 distributed tasks completed.","params":{"completedTasks":"32","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"36/266 distributed tasks completed.","params":{"completedTasks":"36","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"41/266 distributed tasks completed.","params":{"completedTasks":"41","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/266 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"52/266 distributed tasks completed.","params":{"completedTasks":"52","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"58/266 distributed tasks completed.","params":{"completedTasks":"58","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"62/266 distributed tasks completed.","params":{"completedTasks":"62","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"266/266 distributed tasks completed.","params":{"completedTasks":"266","totalTasks":"266"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 25","params":{"resultCount":"25"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = None","params":{"extent":"None"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/grp_crime_by_hour391644/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/grp_crime_by_hour391644/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Thu Apr  9 19:49:15 2020 (Elapsed Time: 2 minutes 5 seconds)'}]
hour = gis.content.search('grp_crime_by_hour')[0]
grp_hour = hour.tables[0]
df_hour = grp_hour.query(as_df=True)

Crime distribution by the hour

(df_hour
 .dropna()
 .sort_values(by='hour')
 .astype({'hour' : int})
 .plot(x='hour', y='count', kind='bar'))
plt.xticks(
    rotation=45,
    horizontalalignment='center',
    fontweight='light',
    fontsize='medium',
);
<Figure size 432x288 with 1 Axes>

This graph shows that the crime activities are more common at the peak hours 12 A.M. and 12 P.M.

Big data machine learning using pyspark.ml

Find the optimal number of clusters

The average silhouette approach measures the quality of a clustering. That is, it determines how well each object lies within its cluster. A high average silhouette width indicates a good clustering. To learn more about silhouette analysis, click here.

def optimal_k():
    import time
    import numpy as np
    import pandas as pd
    from pyspark.ml.feature import VectorAssembler
    from pyspark.ml.clustering import KMeans
    from datetime import datetime as dt
    from pyspark.ml.evaluation import ClusteringEvaluator
    from pyspark.sql.context import SQLContext
    from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, FloatType

    silh_lst = []
    k_lst = np.arange(3, 70)

    crime_locations = layers[0]
    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")
    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)
    
    for k in k_lst:
        silh_val = []
        for run in np.arange(1, 3):
            # Trains a k-means model.
            kmeans = KMeans().setK(int(k)).setSeed(int(np.random.randint(100, size=1)))
            model = kmeans.fit(crime_locations.select("features"))

            # Make predictions
            predictions = model.transform(crime_locations)

            # Evaluate clustering by computing Silhouette score
            evaluator = ClusteringEvaluator()
            silhouette = evaluator.evaluate(predictions)
            silh_val.append(silhouette)

        silh_array=np.asanyarray(silh_val)
        silh_lst.append(silh_array.mean())        

    silhouette = pd.DataFrame(list(zip(k_lst,silh_lst)),columns = ['k', 'silhouette'])
    schema = StructType([StructField('k',IntegerType(),True), StructField('silhouette',FloatType(),True)])
    out = SQLContext(sparkContext=spark.sparkContext, sparkSession=spark).createDataFrame(silhouette, schema)
    # Write the result DataFrame to the relational data store
    out.write.format("webgis").option("dataStore","relational").save("optimalKmeans" + str(dt.now().microsecond))
run_python_script(code=optimal_k, layers=[crime_lyr])
optimal_k = gis.content.search('optimalKmeans')[0]
optimal_k_tbl = optimal_k.tables[0]
k_df = optimal_k_tbl.query().sdf
k_df.sort_values(by='silhouette', ascending=False)
objectidksilhouette
5458150.556612
2223190.556012
2390.555995
3940140.552853
3839110.551726
............
2425250.527496
192070.527266
2627340.525585
373880.507064
363750.492071

67 rows × 3 columns

num_clusters = k_df.sort_values(by='silhouette', ascending=False).loc[0]['k']
num_clusters
15

K-Means Clustering

def cluster_crimes():
    
    from pyspark.ml.feature import VectorAssembler
    from pyspark.ml.clustering import KMeans
    from datetime import datetime as dt
    # Crime data is stored in a feature service and accessed as a DataFrame via the layers object
    crime_locations = layers[0]
    
    # Combine the x and y columns in the DataFrame into a single column called "features"
    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")
    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)

    # Fit a k-means model with 15 clusters using the "features" column of the crime locations
    kmeans = KMeans(k=15)
    model = kmeans.fit(crime_locations.select("features"))
    
    cost = model.computeCost(crime_locations)
    # Add the cluster labels from the k-means model to the original DataFrame
    crime_locations_clusters = model.transform(crime_locations)
    # Write the result DataFrame to the relational data store
    crime_locations_clusters.write.format("webgis").save("Crime_Clusters_KMeans" + str(dt.now().microsecond))
run_python_script(code=cluster_crimes, layers=[crime_lyr])
{"messageCode":"BD_101231","message":"The following fields are not supported and will be dropped: features","params":{"fields":"features"}}
[{'type': 'esriJobMessageTypeInformative',
  'description': 'Executing (RunPythonScript): RunPythonScript "def cluster_crimes():\\n    \\n    from pyspark.ml.feature import VectorAssembler\\n    from pyspark.ml.clustering import KMeans\\n    from datetime import datetime as dt\\n    # Crime data is stored in a feature service and accessed as a DataFrame via the layers object\\n    crime_locations = layers[0]\\n    \\n    # Combine the x and y columns in the DataFrame into a single column called "features"\\n    assembler = VectorAssembler(inputCols=["X Coordinate", "Y Coordinate"], outputCol="features")\\n    crime_locations = assembler.setHandleInvalid("skip").transform(crime_locations)\\n\\n    # Fit a k-means model with 50 clusters using the "features" column of the crime locations\\n    kmeans = KMeans(k=15)\\n    model = kmeans.fit(crime_locations.select("features"))\\n    \\n    cost = model.computeCost(crime_locations)\\n    print(\'cost\', cost)\\n    # Add the cluster labels from the k-means model to the original DataFrame\\n    crime_locations_clusters = model.transform(crime_locations)\\n    # Write the result DataFrame to the relational data store\\n    crime_locations_clusters.write.format("webgis").save("Crime_Clusters_KMeans" + str(dt.now().microsecond))\\n\\ncluster_crimes()" https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime "{"defaultAggregationStyles": false, "processSR": {"wkid": 26771}}"'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Start Time: Fri Apr 10 08:14:38 2020'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Using URL based GPRecordSet param: https://ndhga01.esri.com/gis/rest/services/DataStoreCatalogs/bigDataFileShares_Chicago_Crime_2001_2020/BigDataCatalogServer/crime'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"23/59 distributed tasks completed.","params":{"completedTasks":"23","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"52/59 distributed tasks completed.","params":{"completedTasks":"52","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 118 tasks.","params":{"totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"118/118 distributed tasks completed.","params":{"completedTasks":"118","totalTasks":"118"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 259 tasks.","params":{"totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/259 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"30/259 distributed tasks completed.","params":{"completedTasks":"30","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"53/259 distributed tasks completed.","params":{"completedTasks":"53","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"259/259 distributed tasks completed.","params":{"completedTasks":"259","totalTasks":"259"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"17/59 distributed tasks completed.","params":{"completedTasks":"17","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"42/59 distributed tasks completed.","params":{"completedTasks":"42","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101138","message":"[Python] cost 450444847551758.0","params":{"text":"cost 450444847551758.0"}}'},
 {'type': 'esriJobMessageTypeWarning',
  'description': '{"messageCode":"BD_101231","message":"The following fields are not supported and will be dropped: features","params":{"fields":"features"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101028","message":"Starting new distributed job with 59 tasks.","params":{"totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"0/59 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"1/59 distributed tasks completed.","params":{"completedTasks":"1","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"5/59 distributed tasks completed.","params":{"completedTasks":"5","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"6/59 distributed tasks completed.","params":{"completedTasks":"6","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"12/59 distributed tasks completed.","params":{"completedTasks":"12","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"18/59 distributed tasks completed.","params":{"completedTasks":"18","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"22/59 distributed tasks completed.","params":{"completedTasks":"22","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"25/59 distributed tasks completed.","params":{"completedTasks":"25","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"26/59 distributed tasks completed.","params":{"completedTasks":"26","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"28/59 distributed tasks completed.","params":{"completedTasks":"28","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"31/59 distributed tasks completed.","params":{"completedTasks":"31","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"35/59 distributed tasks completed.","params":{"completedTasks":"35","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"38/59 distributed tasks completed.","params":{"completedTasks":"38","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"40/59 distributed tasks completed.","params":{"completedTasks":"40","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"43/59 distributed tasks completed.","params":{"completedTasks":"43","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"45/59 distributed tasks completed.","params":{"completedTasks":"45","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"46/59 distributed tasks completed.","params":{"completedTasks":"46","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"53/59 distributed tasks completed.","params":{"completedTasks":"53","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101029","message":"59/59 distributed tasks completed.","params":{"completedTasks":"59","totalTasks":"59"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101081","message":"Finished writing results:"}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101082","message":"* Count of features = 6993512","params":{"resultCount":"6993512"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101083","message":"* Spatial extent = {\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}","params":{"extent":"{\\"xmin\\":-91.686565684,\\"ymin\\":36.619446395,\\"xmax\\":-87.524529378,\\"ymax\\":42.022910333}"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))","params":{"extent":"Interval(MutableInstant(2001-01-01 00:00:00.000),MutableInstant(2020-01-26 23:40:00.000))"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': '{"messageCode":"BD_101226","message":"Feature service layer created: https://ndhagsb01.esri.com/gis/rest/services/Hosted/Crime_Clusters_KMeans540499/FeatureServer/0","params":{"serviceUrl":"https://ndhagsb01.esri.com/gis/rest/services/Hosted/Crime_Clusters_KMeans540499/FeatureServer/0"}}'},
 {'type': 'esriJobMessageTypeInformative',
  'description': 'Succeeded at Fri Apr 10 08:19:30 2020 (Elapsed Time: 4 minutes 52 seconds)'}]
clusters = gis.content.search('Crime_Clusters_KMeans')[0]
clusters
Crime_Clusters_KMeans540499
Feature Layer Collection by admin
Last Modified: April 10, 2020
0 comments, 5 views

By symbolizing on the predictions made by the k-means model, we can visualize the clustered crime events as shown in the screen shot above.

Conclusion

In this sample, we have covered how to chain together geoanalytics and pyspark tools in order to analyze big data, while only writing out the final result to a data store, eliminating the need to create any intermediate result layers. We have really gained a lot of knowledge about the use of data mining and clustering to help manage huge amount of data and deduce useful information from criminal data.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.