geoanalytics.tools

Aggregate Points

class geoanalytics.tools.AggregatePoints

Aggregates points into square or hexagon bins, or existing polygons.

The tool first determines which points fall within each specified area. After determining this point-in-area spatial relationship, statistics about all points in the area are calculated and assigned to the area.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Aggregate Points

addSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame.

  • statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.

  • alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)

Runs the AggregatePoints tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column.

Returns

A DataFrame containing a polygon column, count of points within the polygon, and any summary statistics for each polygon.

Return type

DataFrame

setBins(bin_size, bin_size_unit, bin_type='square')

Sets the size and shape of bins used to aggregate into.

Note

This method will override setPolygons.

Parameters
  • bin_size (int/float) – Distance between parallel sides of a bin or H3 resolution.

  • bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards or H3Res for H3 bins.

  • bin_type (str) – Choose from Square, Hexagon or H3.

setPolygons(polygons)

Sets the DataFrame containing a column of polygons into which the input points will be aggregated.

Note

This method will override setBins.

Parameters

polygons (pyspark.sql.DataFrame) – A DataFrame containing a column of polygons.

setTimeStep(interval_duration, interval_unit, repeat_duration=None, repeat_unit=None, reference_time=None)

Sets the time step interval, time step repeat, and reference time. If set, points will be aggregated into each bin for each time step. The input DataFrame must have a datetime column to use this setter.

Parameters
  • interval_duration (int) – Duration of each time step.

  • interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • repeat_duration (int) – Time between one time step to the next time step.

  • repeat_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

  • reference_time (int/long/datetime.datetime) – A reference datetime to align the time steps to. The default is epoch time 0.

Calculate Density

class geoanalytics.tools.CalculateDensity

Calculates the density of points and their attributes.

Each point represents the location of some event or incident, and the result calculation represents a count of incidents per unit area. A higher density value in a new location means that there are more points near that location.

In many cases, the result layer can be interpreted as a risk surface for future events. For example, if the input points represent locations of lightning strikes, the result layer can be interpreted as a risk surface for future lightning strikes.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Calculate Density

run(dataframe)

Runs the CalculateDensity tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column with a spatial reference.

Returns

A DataFrame of square or hexagon bins with a column of calculated density values.

Return type

DataFrame

setAreaUnit(area_unit)

Sets the desired output units of the density values. The default is SquareKilometers. If density values are very small, you can increase the scale of the area units to return larger values.

Parameters

area_unit (str) – Choose from SquareMeters, SquareKilometers, Hectares, SquareFeet, SquareYards, SquareMiles or Acres.

setBins(bin_size, bin_size_unit, bin_type='square')

Sets the size and shape of bins used to calculate density.

Parameters
  • bin_size (float) – Distance between parallel sides of a bin.

  • bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

  • bin_type (str) – Choose from Square or Hexagon.

setFields(*fields)

Sets one or more fields specifying the number of incidents at each location. You can calculate the density on multiple fields. The density of the count of points will always be calculated.

Parameters

fields (*str) – The names of one or more fields from the input DataFrame.

setNeighborhood(distance, distance_unit)

Sets the size of the neighborhood within which to calculate density. The distance must be larger than the bin size.

Parameters
  • distance (float) – Radius of the neighborhood, measured from each bin center.

  • distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeStep(interval_duration, interval_unit, repeat_duration=None, repeat_unit=None, reference_time=None)

Sets the time step interval, time step repeat, and reference time. If set, density will be calculated for each time step at each bin location. The input DataFrame must have a datetime column to use this setter.

Parameters
  • interval_duration (int) – Duration of each time step.

  • interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • repeat_duration (int) – Time between one time step to the next time step.

  • repeat_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • reference_time (int/long/datetime.datetime) – A reference datetime to align the time steps to. The default is epoch time 0.

setWeightType(weight_type)

Sets the type of weighting applied to density calculations. This parameter supports two options:

  • Uniform: calculates density as magnitude-per-area. This is the default.

  • Kernel: calculates density by applying a kernel function to fit a smooth tapered surface to each point.

Parameters

weight_type (str) – Choose from Uniform or Kernel.

Calculate Field

class geoanalytics.tools.CalculateField

Creates and populates a new field or edits an existing field using ArcGIS Arcade.

Your calculation can optionally be track aware. Track-aware equations use Arcade expressions that include track functions. To include a track-aware calculation, setTrackFields must be called and the input DataFrame must have datetime and track ID columns.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Calculate Field

run(dataframe)

Runs the CalculateField tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame.

Returns

A copy of the input DataFrame with the calculated field appended or overwritten.

Return type

DataFrame

setExpression(expression)

Sets an Arcade expression used to calculate the new field values. You can use any of the Date, Logical, Mathematical, or Text functions available with Arcade expressions.

Parameters

expression (str) – An Arcade expression.

setField(field_name, field_type)

Sets the name and type of the new field. If the name already exists in the dataset the field will be overwritten.

Parameters
  • field_name (str) – The name of the column that will be appended to the input DataFrame.

  • field_type (str) – Choose from Date, Double, Integer, or String.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int/long/datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTrackFields(*track_fields)

Sets one or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input DataFrame.

Calculate Motion Statistics

class geoanalytics.tools.CalculateMotionStatistics

Calculates motion statistics and descriptors for time-enabled points that represent one or more moving entities.

Points are grouped together into tracks representing each entity using a unique identifier. Motion statistics are calculated at each point using one or more points in the track history. Calculations include summaries of distance traveled, duration, elevation, speed, acceleration, bearing, and idle status.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Calculate Motion Statistics

run(dataframe)

Runs the CalculateMotionStatistics tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a track ID column and a datetime column.

Returns

A copy of the input DataFrame with motion statistics appended to each row.

Return type

DataFrame

setDistanceMethod(distance_method)

Sets the method used to calculate distances between track observations. There are two methods to choose from:

  • Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.

  • Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setIdleTolerance(distance_tolerance, distance_tolerance_unit, time_tolerance, time_tolerance_unit)

Sets the tolerances to use to decide if an entity is idling. An entity is idling when it hasn’t moved more than the distance tolerance in at least the time tolerance.

Parameters
  • distance_tolerance (float) – Spatial idling tolerance.

  • distance_tolerance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

  • time_tolerance (int) – Temporal idling tolerance.

  • time_tolerance_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

setMotionStatistics(*motion_statistics)

Sets the statistic groups that will be calculated.

Parameters

motion_statistics (*str) – Choose from Distance, Speed, Acceleration, Duration, Elevation, Slope, Idle, and Bearing.

setStatisticUnits(distance_unit='Meters', duration_unit='Seconds', speed_unit='MetersPerSecond', acceleration_unit='MetersPerSecondSquared', elevation_unit='Meters')

Sets the output units for each statistic group.

Parameters
  • distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

  • duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • speed_unit (str) – Choose from MetersPerSecond, KilometersPerHour, FeetPerSecond, MilesPerHour, or NauticalMilesPerHour.

  • acceleration_unit (str) – Choose MetersPerSecondSquared or FeetPerSecondSquared.

  • elevation_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int/long/datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTrackFields(*track_fields)

Sets one or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input DataFrame.

setTrackHistoryWindow(track_history_window)

Sets the number of observations (including the current observation) that will be used when calculating summary statistics that are not instantaneous. This includes minimum, maximum, average, and total statistics.

The default track history window is 3, which means that at each point in a track summary, statistics will be calculated using the current observation and the previous two observations.

Note

This setter does not affect instantaneous statistics or idle classification.

Parameters

track_history_window (int) – Number of observations.

Clip

class geoanalytics.tools.Clip

Extracts geometries that overlay clip geometries.

Note

This tool operates on the entire input DataFrame and thus can more performant than equivalent row-wise operations using SQL functions.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Clip

run(input_dataframe, clip_dataframe)

Runs the Clip tool using the provided DataFrames.

Parameters
  • input_dataframe (DataFrame) – A DataFrame containing a geometry column.

  • clip_dataframe (DataFrame) – A DataFrame containing a polygon column to clip with.

Returns

A DataFrame containing the result of the clip.

Return type

DataFrame

Create Routes

class geoanalytics.tools.CreateRoutes

Uses a network dataset to understand the connectivity of a transportation network in order to find the best route between a series of input points. The resulting DataFrame contains a linestring column with the routes that visit the input points.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Create Routes

run(dataframe)

Runs the CreateRoutes tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a column with an array of points representing the stops for which a route will be created.

Returns

A copy of the input DataFrame that will also contain the result route if specified, the total travel time for the route in minutes and the total distance in meters.

Return type

DataFrame

setNetwork(path)

Sets the network data source from a mobile map package or a mobile geodatabase.

Parameters

path (str) – The path to the network data source.

setRouteGeometry(route_geometry)

Sets the shape of the route between stops. The following options are supported:

  • AlongNetwork: returns a route that has the exact shape of the underlying network dataset.

  • StraightLines: returns a route that will be a straight line between the stops.

  • NoLines: doesn’t return any route geometry.

Parameters

route_geometry (str) – Choose from AlongNetwork, StraightLines or NoLines.

setSequence(find_best, preserve_first, preserve_last)

Determines the order that the input points will be used to create the route.

Parameters
  • find_best (bool) – True to find the best sequence or False to use the current sequence of the provided points in the array.

  • preserve_first (bool) – True to preserve the first point in the array, False to honor the best sequence and not preserve the first point.

  • preserve_last (bool) – True to preserve the last point in the array, False to honor the best sequence and not preserve the last point.

setStops(*stops)

Sets the stops, which are locations that the returned route will visit.

Parameters

stops (*pyspark.sql.Column) – An array of points that will be used to create the route.

setTravelMode(travel_mode)

Sets the travel mode. A travel mode refers to the mode of transportation, such as driving or walking. By default, the tool uses the default travel mode in the network dataset.

Parameters

travel_mode (str) – The mode of transportation. The parameter accepts any travel mode that is defined on the network data source or a JSON format for a custom travel mode.

Create Service Areas

class geoanalytics.tools.CreateServiceAreas

Generates reachable service areas around facilities that contain all streets accessed within a specified travel distance or travel time.

For example, the 10-minute walk-time service area around a subway station indicates a region where residents can walk to the station within ten minutes.

This tool requires that the input DataFrame contains a point column representing the facilities around which the service areas will be created.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Create Service Areas

run(dataframe)

Runs the CreateServiceAreas tool using the provided DataFrame representing the facilities.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column representing the facilities around which the service areas will be created.

Returns

A copy of the input DataFrame with service polygons representing the reachable service areas around the facilities.

Return type

DataFrame

setCutoffs(cutoffs, unit=None)

Sets impedance cutoffs to determine the extent of the service areas.

There are two types of cutoffs, distance and time cutoffs.

Distance cutoffs specify the maximum distance that can be traveled from or to the facilities.

Time cutoffs specify the maximum time allowed to travel from or to the facilities.

Parameters
  • cutoffs (*int/*float) – The impedance cutoffs used to calculate the extent of the service areas. It accepts a single cutoff value, or one or multiple values in an array format.

  • unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards for distance cutoffs. Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years for time cutoffs. By default, the cutoff values are in the units of the impedance attribute used by the selected travel mode.

setGeometryAtCutoff(geometry_at_cutoff)

Specifies whether concentric service area polygons will be created as rings or disks.

  • Rings: the polygons representing larger breaks will exclude the polygons of smaller breaks. This creates polygons between consecutive breaks. Use this option to find the area from one break to another. For instance, if you create 5- and 10-minute service areas, the 10-minute service area polygon will exclude the area under the 5-minute service area polygon. This is the default.

  • Disks: the polygons will be created from the facility to the break. For instance, if you create 5- and 10-minute service areas, the 10-minute service area polygon will include the area under the 5-minute service area polygon.

Parameters

geometry_at_cutoff (str) – Choose from Rings (default) or Disks.

setNetwork(path)

Sets the network data source from a mobile map package or a mobile geodatabase.

Parameters

path (str) – The path to the network data source.

setPolygonDetail(polygon_detail)

Sets the level of detail for the output polygons representing the reachable areas within the specified impedance cutoffs. Supported options are “Standard” and “High”.

  • Standard: polygons will be created with a standard level of detail. Standard polygons are generated quickly and are fairly accurate, but quality deteriorates as you move closer to the borders of the service area polygons. This is the default.

  • High: polygons will be created with the highest level of detail. Holes in the polygon may exist; they represent islands of network elements, such as streets, that couldn’t be reached without exceeding the cutoff impedance or due to travel restrictions. Use this option for applications in which precise results are important.

Parameters

polygon_detail (str) – Choose from Standard`(default) or `High.

setTravelDirection(travel_direction)

Sets the direction of travel to or from the facilities. This parameter supports two options:

  • FromFacilities: the service area is calculated starting from the facilities and extending outward to the periphery. It means that the tool calculates how far you can travel from the facilities within the specified impedance cutoffs.

  • ToFacilities: the service area is calculated the opposite direction from the periphery to the facilities within the specified impedance cutoffs.

Parameters

travel_direction (str) – The direction of travel to or from the facilities. Choose from ‘FromFacilities’ (default) or ‘ToFacilities’.

setTravelMode(travel_mode)

Sets the travel mode. A travel mode refers to the mode of transportation, such as driving or walking. By default, the tool uses the default travel mode in the network dataset.

Parameters

travel_mode (str) – The mode of transportation. The parameter accepts any travel mode that is defined on the network data source or a JSON format for a custom travel mode.

Detect Incidents

class geoanalytics.tools.DetectIncidents

Determines which observations are incidents of interest using a specified condition.

Rows in the input DataFrame are grouped using a track ID and ordered sequentially before an incident condition is applied. Rows that meet the starting incident condition are marked as an incident. An ending incident condition can be applied; when the end condition is true, the track is no longer in an incident. You can return all input rows or only rows that are incidents.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Detect Incidents

run(dataframe)

Runs the DetectIncidents tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a track ID column and a datetime column.

Returns

A copy of the input DataFrame with incident status appended to each row.

Return type

DataFrame

setEndConditionExpression(end_condition_expression)

Sets the condition used to end incidents. If there is an end condition, any feature that meets the start condition expression and does not meet the end condition expression is an incident.

Parameters

end_condition_expression (str) – Arcade expression used to identify incidents.

setOutputMode(output_mode)

Sets which observations are returned. There are two options:

  • All: all of the input observations are returned. This is the default.

  • Incidents: only observations that were found to be incident are returned.

Parameters

output_mode (str) – Choose from All or Incidents.

setStartConditionExpression(start_condition_expression)

Sets the condition used to start incidents. If there is no end condition expression specified, any feature that meets this condition is an incident. If there is an end condition, any feature that meets the start condition expression and does not meet the end condition expression is an incident.

Parameters

start_condition_expression (str) – Arcade expression used to identify incidents.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if setting a time boundary of 1 day starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int/long/datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTrackFields(*track_fields)

Sets one or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input DataFrame.

Find Closest Facilities

class geoanalytics.tools.FindClosestFacilities

Finds the given number of facilities from each incident within the specified travel time or travel distance, and returns the best routes between the incidents and the chosen facilities. When finding closest facilities, you can specify whether the direction of travel is to or away from the facilities. Examples of using this tool include finding the closest fire stations to fire incidents, closest healthcare providers to resident’s addresses, or closest nearby hospitals for emergency responses.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Find Closest Facilities

accumulateAttributes(*attributes)

Accumulates cost attributes along the network between the incident and the identified facility. No accumulated cost is returned by default.

Parameters

attributes (*str) – The cost attributes to accumulate.

run(incidents_df, facilities_df)

Runs the FindClosestFacilities tool using the provided DataFrames.

Parameters
  • incidents_df (DataFrame) – A DataFrame containing points that represent the incidents.

  • facilities_df (DataFrame) – A DataFrame containing points that represent the facilities.

Returns

A copy of the inputs combined into one DataFrame that will also contain the rank of the closest facilities, the travel time in minutes, the travel distance in meters between the incident and the identified facility, and if specified, the resulting route and accumulated cost attributes.

Return type

DataFrame

setCutoff(cutoff, unit=None)

Sets the maximum travel distance or travel time when searching for facilities for each incident. Its unit should match the travel mode. For example, if the travel mode is set in the units of distance, the impedance cutoff must be set in distance.

There are two types of cutoffs, distance and time cutoffs.

Distance cutoffs specify the maximum travel distance between incidents and facilities. For example, when analyzing walking distance from schools (incident DataFrame) to subway stations (facility DataFrame), a cutoff value of 1 mile (e.g., setCutoff(1, “mile”)) means that the tool will search for the closest subway stations within 1 mile walking from each school.

Time cutoffs specify the maximum travel time between incidents and facilities. For example, when analyzing driving time from fire stations (facility DataFrame) to fire incidents (incident DataFrame), a cutoff value of 15 minutes (e.g. setCutoff(15, “minutes”)) means the tool will search for the closest fire stations within 15-minutes drive-time to the fire incidents.

Parameters
  • cutoff (int/float) – The impedance cutoff used to calculate the maximum travel distance or travel time when searching for facilities for each incident. It accepts a positive value.

  • unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards for distance cutoffs. Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years for time cutoffs. If the unit is missing, the tool will use the distance or time unit defined in the travel mode.

setNetwork(path)

Sets the network data source from a mobile map package or a mobile geodatabase.

Parameters

path (str) – The path to the network data source.

setNumFacilities(count)

Specifies the maximum number of closest facilities to find for each incident. If there are multiple facilities with an equal travel cost to an incident, the tool will break ties by randomly selecting one or more records from the equidistant facilities to ensure the specified number of closest facilities.

Parameters

count (int) – The number of facilities to find. The default is 1.

setRouteGeometry(route_geometry)

Sets the shape of the route between the incident and the identified facility. You can also choose not to return the line geometry for better performance. The following options are supported:

  • AlongNetwork: returns the true shape of the result route that is based on the streets along the network.

  • StraightLines: returns a straight line between the incident and the identified facility.

  • NoLines: doesn’t return any route geometry.

Parameters

route_geometry (str) – Choose from AlongNetwork, StraightLines or NoLines.

setTravelDirection(travel_direction)

Sets the direction of travel to or from the facilities. This parameter supports two options:

  • FromFacilities: the closest facilities are searched along the network from the incidents to the facilities within the specified impedance cutoff. This is the default.

  • ToFacilities: the closest facilities are searched along the network from the facilities to the incidents within the specified impedance cutoff.

Parameters

travel_direction (str) – The direction of travel to or from the facilities. Choose from ‘FromFacilities’ (default) or ‘ToFacilities’.

setTravelMode(travel_mode)

Sets the travel mode. A travel mode refers to the mode of transportation, such as driving or walking. By default, the tool uses the default travel mode in the network dataset.

Parameters

travel_mode (str) – The mode of transportation. The parameter accepts any travel mode that is defined on the network data source or a JSON format for a custom travel mode.

Find Dwell Locations

class geoanalytics.tools.FindDwellLocations

Finds where entities dwell within a specific distance and duration using a record of their location through time.

Dwell locations are determined using time and distance tolerances. First, the tool groups points into tracks representing each entity using a track identifier and orders them sequentially. Next, the distance between the first point in a track and the next is calculated. If two temporally consecutive points stay within the given distance for at least the given duration, they are considered part of a dwell. When two points are found to be part of a dwell, the first point in the dwell is used as a reference point, and the tool finds consecutive points that are within the specified distance of the reference point in the dwell.

Once all points within the specified distance are found, the tool collects the dwell points and calculates their mean center. Features before and after the current dwell are added to the dwell if they are within the given distance of the dwell location’s mean center. This process continues until the end of the track.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Find Dwell Locations

addSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame.

  • statistic (str) – Choose from First, Last, Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.

  • alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)

Runs the FindDwellLocations tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column with a spatial reference, a track ID column, and a datetime column

Return type

DataFrame

setDistanceMethod(distance_method)

Sets the method used to calculate distances between track observations. There are two methods to choose from:

  • Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.

  • Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setDwellMaxDistance(max_distance, max_distance_unit)

Sets the maximum distance between points for them to be considered part of a single dwell event.

Note

This method is used along with setDwellMinDuration to define dwell criteria.

Parameters
  • max_distance (float) – The maximum distance between points to be considered in a single dwell location.

  • max_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setDwellMinDuration(min_duration, min_duration_unit)

Sets the minimum time between points for them to be considered part of a single dwell event.

Note

This method is used along with setDwellMaxDistance to define dwell criteria.

Parameters
  • min_duration (int) – The minimum time duration of a dwell to be considered in a single dwell location.

  • min_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setOutputType(output_type)

Sets the output type.

  • DwellMeanCenters: A point representing the centroid of each discovered dwell location. This is the default.

  • DwellConvexHulls: Polygons representing the convex hull of each dwell group.

  • DwellPoints: All of the input points determined to belong to a dwell are returned.

  • AllPoints: All of the input points are returned.

Parameters

output_type – Choose from DwellMeanCenters, DwellConvexHulls, DwellPoints, or AllPoints.

Returns

The result DataFrame specified by output_type

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int/long/datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTrackFields(*track_fields)

Sets one or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input DataFrame.

Find Hot Spots

class geoanalytics.tools.FindHotSpots

Aggregates points into square bins and finds statistically significant bins of high incidents (hot spots) and low incidents (cold spots).

This tool finds hot and cold spots using the Getis-Ord Gi* statistic. The local counts of points for a bin and its neighbors are compared proportionally to the sum of points in all bins. A local sum is considered statistically significant (larger z-score) when it is very different from the expected local sum and when that difference is too large to be the result of random chance.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Find Hot Spots

run(dataframe)

Runs the FindHotSpots tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference.

Returns

A DataFrame of square bins assigned a z-score, p-value, and confidence level.

Return type

DataFrame

setBins(bin_size, bin_size_unit)

Sets the size of square bins used to find hot spots.

Parameters
  • bin_size (float) – Distance between parallel sides of a bin.

  • bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setNeighborhood(distance, distance_unit)

Sets the size of the neighborhood used to find hot spots. The neighborhood size must be larger than the bin size.

Parameters
  • distance (float) – Radius of the neighborhood, measured from each bin center.

  • distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeStep(interval_duration, interval_unit, reference_time=None, alignment=None)

Sets the time step interval, time step repeat, and reference time. If set, hot spots will be calculated for each time step at each bin location. The input DataFrame must have a datetime column to use this setter.

Parameters
  • interval_duration (int) – Duration of each time step.

  • interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • reference_time (int/long/datetime.datetime) – A reference datetime to align the time steps to if alignment is ReferenceTime. The default is epoch time 0.

  • alignment (str) – Defines how aggregation will occur based on a given interval duration. Choose from StartTime, EndTime, or ReferenceTime.

Find Point Clusters

class geoanalytics.tools.FindPointClusters

Finds clusters of points within surrounding noise based on their spatial or spatiotemporal distribution.

Two clustering methods are supported: DBSCAN or HDBSCAN. Both methods can find clusters in space, while DBSCAN can find spatiotemporal clusters in time-enabled point layers.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Find Point Clusters

run(dataframe)

Runs the FindPointClusters tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference.

Returns

A copy of the input DataFrame with a cluster ID assigned to each point.

Return type

DataFrame

setClusterMethod(cluster_method)

Sets The algorithm used for cluster analysis. Supported options are “DBSCAN” and “HDBSCAN”.

The DBSCAN algorithm uses a specified distance to separate dense clusters from sparser noise. DBSCAN is faster than HDBSCAN, but is only appropriate if there is a clear search distance to use that works well to define all clusters that may be present.

DBSCAN finds clusters that have similar densities. The HDBSCAN algorithm allows for clusters with varying densities based on cluster probability (or stability).

HDBSCAN is data-driven and does not use a search distance, but is a more time-consuming calculation than DBSCAN. The DBSCAN algorithm finds clusters in two-dimensional space by default. When setTimeMethod is called, DBSCAN will discover clusters in both space and time.

Parameters

cluster_method (str) – Choose from DBSCAN or HDBSCAN.

setMinPointsCluster(min_points_cluster)

This setter is used differently depending on the clustering method chosen. For DBSCAN, min_points_cluster specifies the number of points that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer points than this value.

For HDBSCAN, min_points_cluster specifies the number of points neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Parameters

min_points_cluster (int) – Number of points.

setSearchDistance(search_distance, search_distance_unit)

Sets the search distance within which the number of points specified by setMinPointsCluster must be found (in addition to being within the search duration, if applicable) to form a cluster using the DBSCAN algorithm. No search distance is used by HDBSCAN.

Parameters
  • search_distance (float) – Distance within which min_points_cluster must be found to start forming a cluster. Results may include clusters with fewer points min_points_cluster.

  • search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrame has a spatial reference. Otherwise use None if the input DataFrame has no spatial reference.

setSearchDuration(search_duration, search_duration_unit)

Sets the search duration within which the number of points specified by setMinPointsCluster must be found (in addition to being within the search distance) to form a cluster using the DBSCAN algorithm.

Warning

The input DataFrame must have a datetime column to use this setter.

Note

This method is not used by HDBSCAN.

Parameters
  • search_duration (int) – Duration within which min_points_cluster must be found to start forming a cluster. Results may include clusters with fewer points than min_points_cluster.

  • search_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Find Similar Locations

class geoanalytics.tools.FindSimilarLocations

Measures the similarity of candidate locations to one or more reference locations.

This tool requires two DataFrames, one containing the reference locations and one containing candidate locations. Using specified fields representing the criteria to match, the tool will rank all of the candidate locations by how closely they match the reference locations.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Find Similar Locations

run(reference_dataframe, search_dataframe)

Runs the FindSimilarLocations tool using the provided DataFrames.

Parameters
  • reference_dataframe (DataFrame) – A DataFrame containing one or more reference rows with attributes.

  • search_dataframe (DataFrame) – A DataFrame containing candidate locations that will be evaluated for similarity to the reference rows.

Returns

The similarity statistics with appended fields.

Return type

DataFrame

setAnalysisFields(*analysis_fields)

Sets the fields that will be used to determine similarity. They must be numeric fields, and the fields must exist on both input DataFrames. Depending on the match method selected, the tool will find rows that are most similar based on values or profiles of the fields.

Parameters

analysis_fields (*str) – The names of one or more fields from the input DataFrames.

setAppendFields(*append_fields)

Sets which fields from the search DataFrame are included in the result. By default, all fields from the search DataFrame are appended.

Parameters

append_fields (*str) – The names of one or more fields from the search DataFrame.

setMatchMethod(match_method)

Sets the method that specifies how matching is determined. There are two options:

  • AttributeValues: uses the squared differences of standardized values. This is the default.

  • AttributeProfiles: uses cosine similarity mathematics to compare the profile of standardized values. This option requires the use of at least two analysis fields.

Parameters

match_method (str) – Choose from AttributeValues or AttributeProfiles.

setMostOrLeastSimilar(most_or_least_similar)

Sets the rows that will be returned. Options include returning rows that are either most similar or least similar to the reference, or return both the most and least similar.

Parameters

most_or_least_similar (str) – Choose from MostSimilar, LeastSimilar, or Both.

setNumberOfResults(number_of_results)

Sets the number of ranked candidate rows to return. The default is 10 and the maximum allowed is 10000.

Parameters

number_of_results (int) – Number of most or least similar locations to return.

Generate OD Matrix

class geoanalytics.tools.GenerateODMatrix

Creates an origin-destination cost matrix from multiple origins to multiple destinations. It returns a table that contains the travel cost, including travel time and travel distance from each origin to each destination within the specified impedance cutoff.

This tool accepts two point DataFrames as the input, representing the origins and destinations.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Generate OD Matrix

accumulateAttributes(*attributes)

Accumulates cost attributes along the network between the associated origin and destination. No accumulated cost is returned by default.

Parameters

attributes (*str) – The cost attributes to accumulate.

run(origins_df, destinations_df)

Runs the GenerateODMatrix tool using the provided DataFrames.

Parameters
  • origins_df (DataFrame) – A DataFrame containing points that represent the origins.

  • destinations_df (DataFrame) – A DataFrame containing points that represent the destinations.

Returns

A copy of the inputs combined into one DataFrame that will also contain the rank of the destinations, the travel time in minutes, the travel distance in meters between the origin and the destination, and if specified, the resulting straight lines and accumulated cost attributes.

Return type

DataFrame

setCutoff(cutoff, unit=None)

Sets the maximum travel distance or travel time when searching for destinations for each origin. Its unit should match the travel mode. For example, if the travel mode is set in the units of distance, the impedance cutoff must be set in distance.

There are two types of cutoffs, distance and time cutoffs.

Distance cutoffs specify the maximum travel distance between origins and destinations. For example, when analyzing walking distance, a cutoff value of 1 mile (e.g. setCutoff(1, “miles”)) means that the tool will search for the destinations in 1 mile walking from the origin.

Time cutoffs specify the maximum travel time between origins and destinations. For example, when analyzing driving time, a cutoff value of 15 minutes (e.g. setCutoff(15, “minutes”)) means the tool will search for the destinations within 15 driving minutes from the origin.

Parameters
  • cutoff (int/float) – The impedance cutoff used to calculate the maximum travel distance or travel time from an origin to a destination. It accepts a positive value.

  • unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards for distance cutoffs. Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years for time cutoffs. By default, it is in the unit of the impedance attribute used by the travel mode.

setNetwork(path)

Sets the network data source from a mobile map package or a mobile geodatabase.

Parameters

path (str) – The path to the network data source.

setNumDestinations(count)

Sets the number of destinations to find for each origin.

Parameters

count (int) – The number of destinations to find for each origin. The default is returning all destinations within the impedance cutoff.

setRouteGeometry(route_geometry)

Specifies whether to return the straight line between the incidents and the destinations. The tool does not output the true shape of routes for performance reasons, but the travel time and travel distance are calculated along the network.

The following options are supported:

  • StraightLines: returns a straight line from the origin to the destination.

  • NoLines: doesn’t return any geometry.

Parameters

route_geometry (str) – Choose from StraightLines or NoLines.

setTravelMode(travel_mode)

Sets the travel mode. A travel mode refers to the mode of transportation, such as driving or walking. By default, the tool uses the default travel mode in the network dataset.

Parameters

travel_mode (str) – The mode of transportation. The parameter accepts any travel mode that is defined on the network data source or a JSON format for a custom travel mode.

Geocode

class geoanalytics.tools.Geocode

Converts addresses into geographic coordinates.

This tool requires an input DataFrame that contains one or more columns that store the string addresses to be geocoded and a locator accessible to all nodes in the Spark cluster.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Geocode

run(dataframe)

Runs the Geocode tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing string addresses that will be geocoded.

Returns

A copy of the input DataFrame with output fields specified in setOutFields(), including the geocoded locations as point geometries.

Return type

DataFrame

setAddressFields(*address_fields)

Sets one or more input address fields used by the locator to geocode addresses.

Parameters

address_fields (*str) – The names of one or more address fields from the input DataFrame.

setCountryCode(country_code)

Sets the country to search the geocoded addresses in.

Parameters

country_code (str) – A two-letter or three-letter country code defined in ISO 3166-1.

setLocator(path)

Sets the address locator that will be used to geocode the addresses. The locator must be accessible to all nodes in your Spark cluster. For more information, read about Staging the locators.

Parameters

path (str) – The file path of a locator (.loc) or a mobile map package (.mmpk).

setMinScore(min_score)

Sets the minimum score of the records that will be matched in the output.

Parameters

min_score (int/float) – The value of the minimum score. The value should be greater than 0 and less than 100.

setOutFields(predefined_set)

Sets the output fields.

  • LocationOnly: geocode_location is returned.

  • Minimal: geocode_location, Status, Score, Match_addr, and Addr_type are returned. This is the default.

  • MinimalAndUserFields: geocode_location, Status, Score, Match_addr, Addr_type, and any custom output fields available in the locator are returned.

  • All: All fields are returned including any custom fields defined in your locator.

Parameters

predefined_set (str) – Choose from LocationOnly, Minimal, MinimalAndUserFields or All.

GWR

class geoanalytics.tools.GWR

Performs Geographically Weighted Regression (GWR), a local form of linear regression used to model spatially varying relationships.

GWR provides a local model of a variable by fitting a regression equation to every row in the input DataFrame using the geometry and any specified explanatory variables.

Refer to the GeoAnalytics Engine guide for examples and usage notes: GWR

run(dataframe)

Runs the GWR tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference, dependent variables, and explanatory variables.

Returns

A copy of the input DataFrame with model attributes appended to each row.

Return type

DataFrame

setDependentVariable(dependent_variable)

The numeric field containing the observed values to model.

Parameters

dependent_variable (str) – The name of a field in the input DataFrame.

setDistanceBand(distance_band=None, distance_band_unit=None)

Sets the neighborhood size as a fixed distance for each feature.

Note

This method will override setNumNeighbors if called last.

Parameters
  • distance_band (float) – The distance for the spatial extent of the neighborhood.

  • distance_band_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setExplanatoryVariables(*explanatory_variables)

Sets one or more fields to represent independent explanatory variables in the model.

Parameters

explanatory_variables (*str) – The names of one or more fields from the input DataFrame.

setLocalWeightingScheme(local_weighting_scheme)

Sets the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each points is related to other points within its neighborhood. Two options are supported:

  • Bisquare: assigns a weight of 0 to any geometry outside the neighborhood. This is the default.

  • Gaussian: assigns weights to all geometries, but weights become exponentially smaller the farther away they are from the target geometry.

Parameters

local_weighting_scheme (str) – Choose from Bisquare or Gaussian.

setNumNeighbors(number_of_neighbors)

Sets the neighborhood size as a function of a specified number of neighbors included in calculations for each point. Where points are dense, the spatial extent of the neighborhood is smaller; where points are sparse, the spatial extent of the neighborhood is larger.

Note

This method will override setDistanceBand if called last.

Parameters

number_of_neighbors (int) – The number of neighbors included in calculations.

Group By Proximity

class geoanalytics.tools.GroupByProximity

Groups geometries that are within spatial or spatiotemporal proximity of each other.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Group By Proximity

run(dataframe)

Runs the GroupByProximity tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a geometry column.

Returns

A copy of the input DataFrame with a column of group IDs appended.

Return type

DataFrame

setAttributeRelationship(expression, expression_type='sql')

Sets the attribute relationship expression to further refine groupings.

Parameters
  • expression (str) – Expression representing the attribute relationship.

  • expression_type (str) – Choose from Arcade or SQL.

setSpatialRelationship(spatial_relationship='Intersects', near_distance=None, near_distance_unit=None)

Sets the type of spatial relationship to group by.

Parameters
  • spatial_relationship (str) – Choose from Intersects, Touches, NearGeodesic, or NearPlanar.

  • near_distance (float) – The search distance to determine if geometries are near one another. This is only applied if NearGeodesic or NearPlanar are set as the spatial relationship.

  • near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTemporalRelationship(temporal_relationship='Intersects', temporal_distance=None, temporal_distance_unit=None)

Sets the type of temporal relationship to group by.

Parameters
  • temporal_relationship (str) – Choose from Intersects or Near.

  • temporal_distance (int) – Sets the temporal search distance to determine if geometries are near one another.

  • temporal_distance_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Nearest Neighbors

class geoanalytics.tools.NearestNeighbors

Search for the given number of neighbors to a record in a DataFrame from records in another DataFrame. The records from the input DataFrames are matched based on closest proximity.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Nearest Neighbors

run(query_dataframe, data_dataframe=None)

Runs the NearestNeighbors tool using the provided DataFrames.

If you only provide a query_dataframe, the DataFrame is used as both the query_dataframe and the data_dataframe. In this case, each record will be joined with other nearby records, excluding itself.

Parameters
  • query_dataframe (DataFrame) – A DataFrame containing geometries whose nearest neighbors will be found.

  • data_dataframe (DataFrame) – A DataFrame containing the neighbor candidates.

Returns

A DataFrame containing the result of the join.

Return type

DataFrame

setDistanceMethod(distance_method)

Specify the distance method category for relative nearness. There are two methods:

  • Planar: this is the default when the input DataFrame is in a projected coordinate system.

  • Geodesic: this is the default when the input DataFrame is in a geographic coordinate system.

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setNumNeighbors(k)

The number of neighbors to find that are nearest to each query record.

Parameters

k (int) – The number of nearest neighbors. The number must be greater than 0.

setOutputUnit(distance_unit)

Sets the desired output unit of the distance values. The default is meters.

Parameters

distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the provided DataFrame has a spatial reference. Inapplicable if the input has no spatial reference.

setResultLayout(layout='long')

Sets the layout format for the result DataFrame. There are two options:

  • long: Each row represents a query record with a single nearest neighbor, and the output is organized by stacking all paired records. This is the default.

  • wide: Each row represents a query record with all nearest neighbors, with the fields in data_dataframe consolidated into one column for each nearest neighbor.

Parameters

layout (str) – Choose from long format or wide format.

setSearchDistance(search_distance, search_distance_unit)

Sets a distance bound within which to search for nearest neighbors.

Parameters
  • near_distance (float) – The search distance to determine if geometries are near one another based on the distance method in use.

  • near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the provided DataFrame has a spatial reference. Otherwise use None if the input has no spatial reference.

Overlay

class geoanalytics.tools.Overlay

Combines two or more geometry columns into a single column using a spatial overlay operation.

Note

This tool operates on the entire input DataFrame and thus can more performant than equivalent row-wise operations using SQL functions.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Overlay

run(input_dataframe, overlay_dataframe)

Runs the Overlay tool using the provided DataFrames.

Parameters
  • input_dataframe (DataFrame) – A DataFrame containing a geometry column.

  • overlay_dataframe (DataFrame) – A DataFrame containing a geometry column to overlay.

Returns

A DataFrame containing the result of the overlay.

Return type

DataFrame

setOverlayType(overlay_type)

Sets the type of overlay to be performed.

Parameters

overlay_type (str) – Choose from Intersect, Erase, Union, Identity, or SymmetricalDifference.

Reconstruct Tracks

class geoanalytics.tools.ReconstructTracks

Creates a line or polygon representing an entity’s path of movement over time using points or polygons with associated timestamps.

This tool groups input rows into tracks representing unique entities using a track identifier field. It then creates a linestring by connecting the point observations for each entity sequentially. The linestring can be buffered with a variable distance using a field from the input DataFrame.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Reconstruct Tracks

addSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame.

  • statistic (str) – Choose from First, Last, Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.

  • alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)

Runs the ReconstructTracks tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point or polygon column, a track ID column, and a datetime column.

Returns

A DataFrame containing the result linestrings or polygons.

Return type

DataFrame

setArcadeSplit(arcade_split)

Sets an Arcade expression to split tracks with. The expression will be evaluated for each point in a track and the track will be split if the expression equals True.

Parameters

arcade_split (str) – An Arcade expression.

setBufferField(buffer_field)

Sets a field in the input DataFrame that contains a buffer distance or a buffer expression. A buffer expression must begin with an equal sign (=).

Parameters

buffer_field (str) – The name of a field from the input DataFrame.

setDistanceMethod(distance_method)

Sets the method used to calculate distances between track observations. There are two methods to choose from:

  • Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.

  • Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setDistanceSplit(distance_split, distance_split_unit)

Sets the distance used to split tracks. Any rows in the input DataFrame that are in the same track and are farther apart than this distance will be split into a new track. If both the distance split and the time split are used, the track is split when at least one condition is met.

Parameters
  • distance_split (float) – The distance used to split tracks.

  • distance_split_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setSplitBoundaryOption(split_boundary_option)

Sets how the track segment between two points is created when a track is split. The split type is applied to split expressions, distance splits, and time splits. There are three options:

  • Gap: no segment is created between the two points (this is the default).

  • FinishLast: a segment is created between the two points that ends after the split.

  • StartNext: a segment is created between the two points that ends before the split.

Parameters

split_boundary_option (str) – Choose from Gap, FinishLast, or StartNext

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int/long/datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTimeSplit(time_split, time_split_unit)

Sets the time duration used to split tracks. Any rows in the input DataFrame that are in the same track and are farther apart than this time will be split into a new track. If both the distance split and time split are used, a track is split when at least one condition is met.

Parameters
  • time_split (int) – The time duration used to split tracks.

  • time_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setTrackFields(*track_fields)

Sets one or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input DataFrame.

Reverse Geocode

class geoanalytics.tools.ReverseGeocode

Creates addresses from point geometries and returns them as string values.

This tool requires an input DataFrame that contains a column of point geometries and a locator accessible to all nodes in the Spark cluster.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Reverse Geocode

run(dataframe)

Runs the ReverseGeocode tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a column of point geometries with a spatial reference.

Returns

A copy of the input DataFrame with output fields specified in setOutFields(), including the matched reverse-geocoded addresses as string values.

Return type

DataFrame

setFeatureTypes(*feature_types)

Sets one or more match types that reverse geocoded addresses are returned with.

Parameters

feature_types (*str) – Specifies the possible match types from Subaddress, PointAddress, StreetAddress, DistanceMarker, StreetName, StreetInt, Postal, Locality, and POI.

setLanguageCode(language_code)

Sets the language in which reverse geocoded addresses are returned.

Parameters

language_code (str) – A two-letter or three-letter language code defined in ISO 639.

setLocator(path)

Sets the address locator that will be used to geocode the addresses. The locator must be accessible to all nodes in your Spark cluster. For more information, read about Staging the locators.

Parameters

path (str) – The file path of a locator (.loc) or a mobile map package (.mmpk).

setOutFields(predefined_set)

Sets the output fields.

  • Minimal: Match_addr, and Addr_type are returned. This is the default.

  • MinimalAndUserFields: Match_addr, Addr_type, and any custom output fields available in the locator are returned.

  • All: All fields are returned including any custom fields defined in your locator.

Parameters

predefined_set (str) – Choose from Minimal, MinimalAndUserFields or All.

Snap Tracks

class geoanalytics.tools.SnapTracks

Snaps input track points to lines. The points dataframe must have a timestamp column where each row represents an instant in time. The lines dataframe must also contain fields indicating the from and to nodes for analysis.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Snap Tracks

run(points_dataframe, lines_dataframe)

Runs the SnapTracks tool using the provided DataFrames.

Parameters
  • points_dataframe (DataFrame) – A DataFrame containing points that will be matched to lines.

  • lines_dataframe (DataFrame) – A DataFrame containing lines to which points will be matched. The input must contain fields with values indicating the from and to nodes of the line.

Returns

The snapped points DataFrame with appended fields.

Return type

DataFrame

setAppendFields(*line_fields)

Sets one or more fields from the input lines DataFrame that will be included in the output result.

Parameters

line_fields (*str) – The names of one or more fields from the line DataFrame.

setConnectivityFields(from_node, to_node)

The line DataFrame fields that will be used to define the connectivity of the input lines.

Parameters
  • from_node (str) – The field that represents the from_node, the node that the travel along a line is moving away from.

  • to_node (str) – The field that represents the from_node, the node that the travel along a line is moving to.

setDirectionFieldMatching(direction_field, forward_value=None, backward_value=None, both_value=None, none_value=None)

The line field and attribute values that will be used to define the direction of the input lines.

Parameters
  • direction_field (str) – The field from the line DataFrame that describes the direction of travel.

  • forward_value (str) – The value from the direction_field that indicates the supported direction of travel is forward along a line.

  • backward_value (str) – The value from the direction_field that indicates the supported direction of travel is backward along a line.

  • both_value (str) – The value from the direction_field that indicates both forward and backward directions of travel are supported along a line.

  • none_value (str) – The value from the direction_field that indicates there are no supported directions of travel along a line.

setDistanceMethod(distance_method)

Sets the method used to calculate distances. There are two methods to choose from: ‘Planar’ or ‘Geodesic’ (default).

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setDistanceSplit(distance_split, distance_split_unit)

Sets the distance used to split tracks. Any observations in the input DataFrame that are in the same track and are farther apart than this distance will be split into a new track. If both the distance split and the time split are used, the track is split when at least one condition is met.

Parameters
  • distance_split (float) – The distance used to split tracks.

  • distance_split_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setOutputMode(output_mode)

Sets the result type. There are two options:

  • AllPoints: All input points are returned. This is the default.

  • MatchedPoints: Only input points that matched to a line are returned.

Parameters

output_mode (str) – Choose from AllPoints or MatchedPoints.

setSearchDistance(search_distance, search_distance_unit)

The maximum distance allowed between a point and any line to be considered a match.

Parameters
  • search_distance (float) – Maximum distance between any point and a line.

  • search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980, tracks will be analyzed one day at a time.

Parameters
  • time_boundary_split (int) – The scale of the time boundary.

  • time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

  • time_boundary_reference (int, long, datetime.datetime) – A reference datetime to align the time boundaries to. The default is epoch time 0.

setTimeSplit(time_split, time_split_unit)

Sets the time duration used to split tracks. Any observations in the point DataFrame that are in the same track and are farther apart than this time will be split into a new track. If both the distance split and time split are used, a track is split when at least one condition is met.

Parameters
  • time_split (int) – The time duration used to split tracks.

  • time_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setTrackFields(*track_fields)

One or more fields used to identify distinct tracks.

Parameters

track_fields (*str) – The names of one or more fields from the input points DataFrame.

Spatiotemporal Join

class geoanalytics.tools.SpatiotemporalJoin

Joins attributes from one DataFrame to another based on spatial, temporal, and attribute relationships or some combination of the three.

The tool determines all input rows that meet the specified join conditions and joins the second DataFrame to the first. You can optionally join all rows to the matching rows or summarize the matching rows.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Spatiotemporal Join

addSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame.

  • statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.

  • alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

includeDistance(include=True, distance_unit=None)

Specifies whether to include spatial distance and/or temporal difference in the columns of the result DataFrame (new in version 1.2.0).

Parameters
  • include (bool) – True to include, or False to exclude, spatial distance and/or temporal difference.

  • distance_unit (str) – the desired output unit of the spatial distance values. The default is meters. Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrames have a spatial reference. Otherwise use None if the input DataFrames have no spatial reference.

run(target_dataframe, join_dataframe)

Runs the SpatiotemporalJoin tool using the provided DataFrames.

Parameters
  • target_dataframe (DataFrame) – A DataFrame.

  • join_dataframe (DataFrame) – A DataFrame to join.

Returns

A DataFrame containing the result of the join.

Return type

DataFrame

setAttributeRelationship(attribute_relationship)

Sets a target field, relationship, and join field used to join equal attributes.

An equals relationship can be used (equal in JSON, and = using the string format), or to check for join strings that are equal without comparing casing or trailing and leading white spaces, equalIgnoreCaseTrimWhiteSpace can be used through JSON or ~= using a string.

Parameters

attribute_relationship (str) – Expression representing the attribute relationship.

setJoinCondition(join_condition)

Sets a condition to specified fields using an Arcade expression. Only rows with columns that meet this condition will be joined.

Parameters

join_condition (str) – An Arcade expression.

setJoinOneToMany()

Sets the join operation to one to many. If multiple join rows are found that have the same relationships with a single target row, the result DataFrame will contain multiple copies of the target row.

For example, if a single point in the target DataFrame is found within two separate polygons in the join DataFrame, the result DataFrame will contain two copies of the target row: one row with the attributes of one polygon and another row with the attributes of the other polygon. There are no summary statistics available with this method.

Note

This method will override setJoinOneToOne.

setJoinOneToOne()

Sets the join operation to one to one. If multiple join rows are found that have the same relationships with a single target row, the fields from the multiple join rows will be aggregated using the specified summary statistics.

For example, if a point is found within two separate polygons, the fields associated with the two polygons will be aggregated before being returned in the result DataFrame. If one polygon has an attribute value of 3 and the other has a value of 7, and a summary statistic of sum is specified, the aggregated value in the output DataFrame will be 10. There will always be a Count field calculated, with a value of 2, for the number of rows specified.

Note

This method will override setJoinOneToMany

setLeftJoin(left_join=True)

Specifies whether all target rows will be returned in the result DataFrame (known as a left or left outer join) or only those that have the specified relationships with the join rows (inner join). Left join can be used with a one-to-one join or a one-to-many join (new in version 1.1.0).

Parameters

left_join (bool) – If True a left outer join will be used, if False an inner join will be used.

setSpatialRelationship(spatial_relationship, near_distance=None, near_distance_unit=None)

Sets the spatial relationship used to spatially join rows.

Parameters
  • spatial_relationship (str) – Choose from Equals, Intersects, Contains, Within, Crosses, Touches, Overlaps, NearPlanar, NearGeodesic.

  • near_distance (float) – A double value used for the search distance to determine if a target geometry is near a join geometry. This is only applied if NearPlanar or NearGeodesic is the specified spatial relationship.

  • near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrames have a spatial reference. Otherwise use None if the input DataFrames have no spatial reference.

setTemporalRelationship(temporal_relationship, near_duration=None, near_duration_unit=None)

Sets the temporal relationship used to temporally join rows.

Parameters
  • temporal_relationship (str) – Choose from Equals, Intersects, During, Contains, Finishes, FinishedBy, Meets, MetBy, Overlaps, OverlappedBy, Starts, StartedBy, Near,`NearBefore` or NearAfter.

  • near_duration (int) – An integer value used for the temporal search distance to determine if a target geometry is temporally near a join geometry.

  • near_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Summarize Within

class geoanalytics.tools.SummarizeWithin

Summarizes geometries from the input DataFrame where they intersect summary polygons or bins using statistics.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Summarize Within

Result

alias of SummarizeWithinResult

addRateField(rate_field)

Marks a numeric field in the input DataFrame as having quantity type rate/index (rather than count/sum).

Parameters

rate_field (str) – The name of a field from the input DataFrame

addStandardSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame.

  • statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.

  • alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

addWeightedSummaryField(summary_field, statistic, alias=None)

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters
  • summary_field (str) – The name of a field from the input DataFrame

  • statistic (str) – Choose from Mean, Stddev, or Var.

  • alias (str) – The name of the result field containing the statistic. The default is ‘p’, the field name, underscore, and statistic.

includeShapeSummary(include=True, units=None)

Sets the inclusion of calculated statistics based on the geometry type of the primary geometry column in the input DataFrame, such as the length of lines or areas of polygons within each summary polygon.

Parameters
  • include (bool) – If True, geometry summary statistics will be included in the result.

  • units (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, Yards, SquareMeters, SquareKilometers, Hectares, SquareFeet, SquareYards, SquareMiles or Acres.

run(dataframe)

Runs the SummarizeWithin tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a geometry column.

Returns

A named tuple with a DataFrame containing the summary polygons and a DataFrame containing the group-by summary (if applicable).

Return type

namedtuple

setGroupBy(group_by_field, include_minor_major_fields=True, include_group_percentages=True)

Sets a field from the input DataFrame that will be used to calculate statistics for each unique value.

When setGroupBy is called, the tool will return a DataFrame containing the statistics in addition to a DataFrame containing the summaries.

For example, suppose the input DataFrame contains city boundaries and the polygons set by setSummaryPolygons are parcels. One of the fields of the parcels is Status which contains two values: VACANT and OCCUPIED. To calculate the total area of vacant and occupied parcels within the boundaries of cities, use Status as the group-by field.

Parameters
  • group_by_field (str) – The name of a field from the input DataFrame.

  • include_minor_major_fields (bool) – If True, the minority (least dominant) or the majority (most dominant) attribute values for each group will be included in the result.

  • include_group_percentages (bool) – If True, the percentage of each unique field value is calculated for each summary polygon.

setSummaryBins(bin_size, bin_size_unit, bin_type='square')

Sets the size and shape of bins that the input DataFrame will be summarized into.

Note

This method overrides setSummaryPolygons. Use setSummaryPolygons if summarizing into an existing column of polygons.

Parameters
  • bin_size (float) – Distance between parallel sides of a bin.

  • bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

  • bin_type (str) – Choose from Square or Hexagon.

setSummaryPolygons(summary_polygons)

Sets the DataFrame containing a column of polygons that the input DataFrame will be summarized into.

Note

This method overrides setSummaryBins. Use setSummaryBins instead if summarizing into square or hexagon bins that are generated when the tool runs.

Parameters

summary_polygons (pyspark.sql.DataFrame) – A DataFrame containing a polygon column.

Trace Proximity Events

class geoanalytics.tools.TraceProximityEvents

Analyzes points representing moving entities. The tool will follow entities of interest in space (location) and time to see which other entities the entities of interest have interacted with. The trace will continue from entity to entity to a configurable maximum degrees of separation from the original entity of interest.

For example, suppose an organization monitors company-issued devices carried by workers. The company is interested in determining which employees were near an individual known to have COVID-19. Using the point layer representing device locations and time, they can identify devices that have been within 6 meters and 5 minutes of the contagious person and other possibly contagious employees.

Refer to the GeoAnalytics Engine guide for examples and usage notes: Trace Proximity Events

Result

alias of TraceProximityEventsResult

includeTracksDataFrame()

Includes a second DataFrame with the points used in the trace.

run(dataframe)

Runs the TraceProximityEvents tool using the provided DataFrame.

Parameters

dataframe (DataFrame) – A DataFrame containing a point column, timestamp column, and entity ID column.

Returns

A named tuple containing a copy of the input DataFrame with proximity event info appended and a DataFrame containing only points used in the trace.

Return type

DataFrame

setAttributeMatchCriteria(*attribute_match_criteria)

One or more fields used to constrain the proximity events. Entities will only be considered near when the spatial search distance and temporal search distance criteria are met and the two entities have equal values of the fields specified.

Parameters

attribute_match_criteria (*str) – The names of one or more fields from the input DataFrame.

setDistanceMethod(distance_method)

Sets the method used to calculate distances between track observations. There are two methods to choose from:

  • Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.

  • Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters

distance_method (str) – Choose from Planar or Geodesic.

setEntitiesOfInterestIds(entities_of_interest_ids)

Sets one or more entities that you are interested in tracing from, as well as a time to start tracing from. If you do not specify a time, January 1, 1970, at 12:00 a.m. will be used.

Parameters

entities_of_interest_ids (str) – A stringified list of dictionaries containing entity IDs and times in epoch ms.

Example

‘[{“entityID”: “user5”, “epochTimeStamp”: 1598390663000}, {“entityID”: “user9”, “epochTimeStamp”: None}]’

setEntityIdField(entity_id_field)

Sets the field used to identify distinct entities.

Parameters

entity_id_field (str) – The name of a field from the input DataFrame.

setMaxTraceDepth(max_trace_depth)

Sets the maximum degrees of separation between an entity of interest and an entity further down the trace.

Parameters

max_trace_depth (int) – Degrees of separation.

setSearchDistance(search_distance, search_distance_unit)

Sets the maximum distance between two points to be considered in proximity. Points closer together in space and that also meet the search duration criteria are considered in proximity of each other.

Note

This method is used along with setSearchDuration to define proximity.

Parameters
  • search_distance (float) – The search distance used to determine if points are in proximity.

  • search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setSearchDuration(search_duration, search_duration_unit)

Sets the maximum duration between two points that are considered in proximity. Points closer together in time and that also meet the search distance criteria are considered in proximity of each other.

Note

This method is used along with setSearchDistance to define proximity.

Parameters
  • search_duration (int) – The search duration used to determine if points are in proximity.

  • search_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years