Skip To Content ArcGIS for Developers Sign In Dashboard

Using GeoAnalytics Tasks in Run Python Script

The Run Python Script task allows you to programmatically execute most GeoAnalytics Tools with Python using an API that is available when you run the task. A geoanalytics object is instantiated automatically and gives you access to each tool using the syntax shown in the example and table below. Each tool accepts input layers as Spark DataFrames and will return results as a Spark DataFrame or collection of Spark DataFrames. To learn more, see Reading and Writing Layers in pyspark. DataFrames are held in memory and can be written to a data store at any time. This allows you to chain together multiple GeoAnalytics Tools without writing out intermediate results.

Note:

The API described in this topic can only be used within the Run Python Script task and should not be confused with the ArcGIS API for Python which uses a different syntax to execute standalone GeoAnalytics Tools and is intended for use outside of the Run Python Script task.

In the example below, the Detect Incidents task and Find Hot Spots task are used together and only the final DataFrame is written to a data store as a feature service layer. The input layer (represented in the example below by layers[0]) is a big data file share dataset of city bus locations recorded at 1 minute intervals for 15 days. To learn more about using layers see Reading and Writing Layers in pyspark.

Chaining together GeoAnalytics Tools with DataFrames

import time

# Run Detect Incidents to find all bus locations where delay status has changed from False to True
exp = "var dly = $track.field[\"dly\"].history(-2); return dly[0]==\"False\" && dly[1]==\"True\""
delay_incidents = geoanalytics.detect_incidents(input_layer = layers[0], track_fields = ["vid"], start_condition_expression = exp, output_mode = "Incidents")

# Use the resulting DataFrame as input to the Find Hot Spots task
delay_hotspots = geoanalytics.find_hot_spots(point_layer = delay_incidents, bin_size = 0.1, bin_size_unit = "Miles", neighborhood_distance = 1, neighborhood_distance_unit = "Miles", time_step_interval = 1, time_step_interval_unit = "Days")

# Write the Find Hot Spots result to the spatiotemporal big data store
delay_hotspots.write.format("webgis").save("Bus_Delay_HS_{0}".format(time.time()))

For more examples, see Examples: Scripting custom analysis with the Run Python Script task.

The table below describes the method signature for GeoAnalytics Tools in Run Python Script. All tools can be called except for Copy To Data Store and Append Data. The parameter syntax is the same as that of the REST API except where noted.

Note:
For all tool methods with time_step_repeat and time_step_repeat_unit arguments, these correspond to the timeStepRepeatInterval and timeStepRepeatIntervalUnit REST parameters respectively.

Tool

Syntax

Returns

Notes

Aggregate Points

aggregate_points(point_layer, bin_type = None, bin_size = None, bin_size_unit = None, polygon_layer = None, time_step_interval = None, time_step_interval_unit = None, time_step_repeat = None, time_step_repeat_unit = None, time_step_reference = None, summary_fields = None)

DataFrame

Build Multi-Variable Grid

build_multi_variable_grid(bin_type = "Square", bin_size = None, bin_size_unit = None, input_layers = None, variable_calculations = None)

DataFrame

input_layers should be list of DataFrames

Calculate Density

calculate_density(input_layer, fields = None, weight = "Uniform", bin_type = "Square", bin_size = None, bin_size_unit = None, time_step_interval = None, time_step_interval_unit = None, time_step_repeat = None, time_step_repeat_unit = None, time_step_reference = None, radius = None, radius_unit = None, area_units = "SquareKilometers")

DataFrame

Calculate Field

calculate_field(input_layer, field_name, data_type, expression, track_aware = None, track_fields = None, time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None)

DataFrame

Clip Layer

clip_layer(input_layer, clip_layer)

DataFrame

Create Buffers

create_buffers(input_layer, distance = None, distance_unit = None, field = None, method = "Planar", dissolve_option = "None", dissolve_fields = None, summary_fields = None, multipart = False)

DataFrame

Create Space Time Cube

create_space_time_cube(point_layer, bin_size, bin_size_unit, time_step_interval, time_step_interval_unit, time_step_alignment = None, time_step_reference = None, summary_fields = None, output_name = None)

String

Returns the local path to the resulting space time cube on a ArcGIS GeoAnalytics Server machine. The cube is written to a temp directory and will be deleted if not copied to a different location.

Describe Dataset

describe_dataset(input_layer, sample_size = None, extent_output = False)

Dictionary

Example result: {"output":<DataFrame>, "outputJSON":<string>,"extentLayer":<DataFrame>,"sampleLayer":<DataFrame>}

Detect Incidents

detect_incidents(input_layer, track_fields, start_condition_expression, end_condition_expression = None, output_mode = "AllFeatures", time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None)

DataFrame

Dissolve Boundaries

dissolve_boundaries(input_layer, dissolve_fields = None, summary_fields = None, multipart = False)

DataFrame

Enrich From Multi-Variable Grid

enrich_from_multi_variable_grid(input_features, grid_layer, enrich_attributes = None)

DataFrame

Find Hot Spots

find_hot_spots(point_layer, bin_size, bin_size_unit, neighborhood_distance, neighborhood_distance_unit, time_step_interval = None, time_step_interval_unit = None, time_step_alignment = None, time_step_reference = None)

DataFrame

Find Point Clusters

find_point_clusters(input_layer, cluster_method = "DBSCAN", min_features_cluster = None, search_distance = None, search_distance_unit = None)

DataFrame

Find Similar Locations

find_similar_locations(input_layer, search_layer, analysis_fields, most_or_least_similar = "MostSimilar", match_method = "AttributeValues", number_of_results = 10, append_fields = None)

Dictionary

Example result: {"output":<DataFrame>, "processInfo":<string>}

Forest-based Classification And Regression

forest_based_classification_and_regression(prediction_type = "Train", in_features = None, features_to_predict = None, variable_predict = None, explanatory_variables = None, number_of_trees = 100, minimum_leaf_size = None, maximum_tree_depth = None, sample_size = 100, random_variables = None, percentage_for_validation = 10, create_variable_importance_table = False, explanatory_variable_matching = None)

Dictionary

Example result: {"outputTrained":<DataFrame>, "variableOfImportance":<DataFrame>,"outputPredicted":<DataFrame>,"processInfo":<string>}

Generalized Linear Regression

generalized_linear_regression(input_layer, features_to_predict = None, dependent_variable = None, explanatory_variables = None, regression_family = "Continuous", generate_coefficient_table = False, explanatory_variable_matching = None)

Dictionary

Example result: {"output":<DataFrame>, "coefficientTable":<DataFrame>,"outputPredicted":<DataFrame>, "processInfo":<string>}

Geocode Locations

geocode_locations(input_layer, geocode_service_url, geocode_parameters, source_country = None, category = None, include_attributes = None, locator_parameters = None)

DataFrame

Join Features

join_features(target_layer, join_layer, join_operation = "JoinOneToOne", join_fields = None, summary_fields = None, spatial_relationship = None, spatial_near_distance = None, spatial_near_distance_unit = None, temporal_relationship = None, temporal_near_distance = None, temporal_near_distance_unit = None, attribute_relationship = None, join_condition = None)

DataFrame

Merge Layers

merge_layers(input_layer, merge_layer, merging_attributes = None)

DataFrame

Overlay Layers

overlay_layers(input_layer, overlay_layer, overlay_type = "Intersect", include_overlaps = True)

DataFrame

Reconstruct Tracks

reconstruct_tracks(input_layer, track_fields, method = "Planar", buffer_field = None, summary_fields = None, time_split = None, time_split_unit = None, distance_split = None, distance_split_unit = None, time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None)

DataFrame

Summarize Attributes

summarize_attributes(input_layer, fields, summary_fields = None)

DataFrame

Summarize Within

summarize_within(summary_polygons = None, bin_type = None, bin_size = None, bin_size_unit = None, summarized_layer = None, standard_summary_fields = None, weighted_summary_fields = None, sum_shape = True, shape_units = None, group_by_field = None, minority_majority = False, percent_shape = False)

Dictionary

Example result: {"output":<DataFrame>, "groupBySummary":<DataFrame>}

In addition to the tools listed above, a project tool is provided with the geoanalytics package that allows you to project the geometry of a DataFrame into the specified spatial reference.

Tool

Syntax

Returns

Notes

Project

project(input_features, output_coord_system)

DataFrame

input_features is the DataFrame to project and output_coord_system is the WKT or WKID of the spatial reference to use.

Example: geoanalytics.project(df, 2796)