geoanalytics_fabric.tools¶

Aggregate Points¶

class geoanalytics_fabric.tools.AggregatePoints¶

Aggregates points into square or hexagon bins, or existing polygons.

The tool first determines which points fall within each specified area. After determining this point-in-area spatial relationship, statistics about all points in the area are calculated and assigned to the area.

Refer to the GeoAnalytics guide for examples and usage notes: Aggregate Points

addSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)¶

Runs the AggregatePoints tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column.
Returns:: A DataFrame containing a polygon column, count of points within the polygon, and any summary statistics for each polygon.
Return type:: DataFrame

setBins(bin_size, bin_size_unit, bin_type='square')¶

Sets the size and shape of bins used to aggregate into.

Note

This method will override setPolygons.

Parameters:

bin_size (int/float) – Distance between parallel sides of a bin or H3 resolution.
bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards or H3Res for H3 bins.
bin_type (str) – Choose from Square, Hexagon or H3.

setPolygons(polygons)¶

Sets the DataFrame containing a column of polygons into which the input points will be aggregated.

Note

This method will override setBins.

Parameters:: polygons (pyspark.sql.DataFrame) – A DataFrame containing a column of polygons.

setTimeStep(interval_duration, interval_unit, repeat_duration=None, repeat_unit=None, reference_time=None)¶

Sets the time step interval, time step repeat, and reference time. If set, points will be aggregated into each bin for each time step. The input DataFrame must have a datetime column to use this setter.

Parameters:

interval_duration (int) – Duration of each time step.
interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
repeat_duration (int) – Time between one time step to the next time step.
repeat_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years
reference_time (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

Calculate Density¶

class geoanalytics_fabric.tools.CalculateDensity¶

Calculates the density of points and their attributes.

Each point represents the location of some event or incident, and the result calculation represents a count of incidents per unit area. A higher density value in a new location means that there are more points near that location.

In many cases, the result layer can be interpreted as a risk surface for future events. For example, if the input points represent locations of lightning strikes, the result layer can be interpreted as a risk surface for future lightning strikes.

Refer to the GeoAnalytics guide for examples and usage notes: Calculate Density

run(dataframe)¶

Runs the CalculateDensity tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a spatial reference.
Returns:: A DataFrame of square or hexagon bins with a column of calculated density values.
Return type:: DataFrame

setAreaUnit(area_unit)¶

Sets the desired output units of the density values. The default is SquareKilometers. If density values are very small, you can increase the scale of the area units to return larger values.

Parameters:: area_unit (str) – Choose from SquareMeters, SquareKilometers, Hectares, SquareFeet, SquareYards, SquareMiles or Acres.

setBins(bin_size, bin_size_unit, bin_type='square')¶

Sets the size and shape of bins used to calculate density.

Parameters:

bin_size (float) – Distance between parallel sides of a bin.
bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.
bin_type (str) – Choose from Square or Hexagon.

setFields(*fields)¶

Sets one or more fields specifying the number of incidents at each location. You can calculate the density on multiple fields. The density of the count of points will always be calculated.

Parameters:: fields (*str) – The names of one or more fields from the input DataFrame.

setNeighborhood(distance, distance_unit)¶

Sets the size of the neighborhood within which to calculate density. The distance must be larger than the bin size.

Parameters:

distance (float) – Radius of the neighborhood, measured from each bin center.
distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeStep(interval_duration, interval_unit, repeat_duration=None, repeat_unit=None, reference_time=None)¶

Sets the time step interval, time step repeat, and reference time. If set, density will be calculated for each time step at each bin location. The input DataFrame must have a datetime column to use this setter.

Parameters:

interval_duration (int) – Duration of each time step.
interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
repeat_duration (int) – Time between one time step to the next time step.
repeat_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
reference_time (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setWeightType(weight_type)¶

Sets the type of weighting applied to density calculations. This parameter supports two options:

Uniform: calculates density as magnitude-per-area. This is the default.
Kernel: calculates density by applying a kernel function to fit a smooth tapered surface to each point.

Parameters:: weight_type (str) – Choose from Uniform or Kernel.

Calculate Field¶

class geoanalytics_fabric.tools.CalculateField¶

Creates and populates a new field or edits an existing field using ArcGIS Arcade.

Your calculation can optionally be track aware. Track-aware equations use Arcade expressions that include track functions. To include a track-aware calculation, setTrackFields must be called and the input DataFrame must have datetime and track ID columns.

Refer to the GeoAnalytics guide for examples and usage notes: Calculate Field

run(dataframe)¶

Runs the CalculateField tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame.
Returns:: A copy of the input DataFrame with the calculated field appended or overwritten.
Return type:: DataFrame

setExpression(expression)¶

Sets an Arcade expression used to calculate the new field values. You can use any of the Date, Logical, Mathematical, or Text functions available with Arcade expressions.

Parameters:: expression (str) – An Arcade expression.

setField(field_name, field_type)¶

Sets the name and type of the new field. If the name already exists in the dataset the field will be overwritten.

Parameters:

field_name (str) – The name of the column that will be appended to the input DataFrame.
field_type (str) – Choose from Date, Double, Integer, or String.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTrackFields(*track_fields)¶

Sets one or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input DataFrame.

Calculate Motion Statistics¶

class geoanalytics_fabric.tools.CalculateMotionStatistics¶

Calculates motion statistics and descriptors for time-enabled points that represent one or more moving entities.

Points are grouped together into tracks representing each entity using a unique identifier. Motion statistics are calculated at each point using one or more points in the track history. Calculations include summaries of distance traveled, duration, elevation, speed, acceleration, bearing, and idle status.

Refer to the GeoAnalytics guide for examples and usage notes: Calculate Motion Statistics

run(dataframe)¶

Runs the CalculateMotionStatistics tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a track ID column and a datetime column.
Returns:: A copy of the input DataFrame with motion statistics appended to each row.
Return type:: DataFrame

setDistanceMethod(distance_method)¶

Sets the method used to calculate distances between track observations. There are two methods to choose from:

Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.
Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setIdleTolerance(distance_tolerance, distance_tolerance_unit, time_tolerance, time_tolerance_unit)¶

Sets the tolerances to use to decide if an entity is idling. An entity is idling when it hasn’t moved more than the distance tolerance in at least the time tolerance.

Parameters:

distance_tolerance (float) – Spatial idling tolerance.
distance_tolerance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.
time_tolerance (int) – Temporal idling tolerance.
time_tolerance_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

setMotionStatistics(*motion_statistics)¶

Sets the statistic groups that will be calculated.

Parameters:: motion_statistics (*str) – Choose from Distance, Speed, Acceleration, Duration, Elevation, Slope, Idle, and Bearing.

setStatisticUnits(distance_unit='Meters', duration_unit='Seconds', speed_unit='MetersPerSecond', acceleration_unit='MetersPerSecondSquared', elevation_unit='Meters')¶

Sets the output units for each statistic group.

Parameters:

distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.
duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
speed_unit (str) – Choose from MetersPerSecond, KilometersPerHour, FeetPerSecond, MilesPerHour, or NauticalMilesPerHour.
acceleration_unit (str) – Choose MetersPerSecondSquared or FeetPerSecondSquared.
elevation_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTrackFields(*track_fields)¶

Sets one or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input DataFrame.

setTrackHistoryWindow(track_history_window)¶

Sets the number of observations (including the current observation) that will be used when calculating summary statistics that are not instantaneous. This includes minimum, maximum, average, and total statistics.

The default track history window is 3, which means that at each point in a track summary, statistics will be calculated using the current observation and the previous two observations.

Note

This setter does not affect instantaneous statistics or idle classification.

Parameters:: track_history_window (int) – Number of observations.

Clip¶

class geoanalytics_fabric.tools.Clip¶

Extracts geometries that overlay clip geometries.

Note

This tool operates on the entire input DataFrame and thus can more performant than equivalent row-wise operations using SQL functions.

Refer to the GeoAnalytics guide for examples and usage notes: Clip

run(input_dataframe, clip_dataframe)¶

Runs the Clip tool using the provided DataFrames.

Parameters:

input_dataframe (DataFrame) – A DataFrame containing a geometry column.
clip_dataframe (DataFrame) – A DataFrame containing a polygon column to clip with.

Returns:

A DataFrame containing the result of the clip.

Return type:

DataFrame

Detect Incidents¶

class geoanalytics_fabric.tools.DetectIncidents¶

Determines which observations are incidents of interest using a specified condition.

Rows in the input DataFrame are grouped using a track ID and ordered sequentially before an incident condition is applied. Rows that meet the starting incident condition are marked as an incident. An ending incident condition can be applied; when the end condition is true, the track is no longer in an incident. You can return all input rows or only rows that are incidents.

Refer to the GeoAnalytics guide for examples and usage notes: Detect Incidents

run(dataframe)¶

Runs the DetectIncidents tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a track ID column and a datetime column.
Returns:: A copy of the input DataFrame with incident status appended to each row.
Return type:: DataFrame

setEndConditionExpression(end_condition_expression)¶

Sets the condition used to end incidents. If there is an end condition, any feature that meets the start condition expression and does not meet the end condition expression is an incident.

Parameters:: end_condition_expression (str) – Arcade expression used to identify incidents.

setOutputMode(output_mode)¶

Sets which observations are returned. There are two options:

All: all of the input observations are returned. This is the default.
Incidents: only observations that were found to be incident are returned.

Parameters:: output_mode (str) – Choose from All or Incidents.

setStartConditionExpression(start_condition_expression)¶

Sets the condition used to start incidents. If there is no end condition expression specified, any feature that meets this condition is an incident. If there is an end condition, any feature that meets the start condition expression and does not meet the end condition expression is an incident.

Parameters:: start_condition_expression (str) – Arcade expression used to identify incidents.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if setting a time boundary of 1 day starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTrackFields(*track_fields)¶

Sets one or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input DataFrame.

Find Dwell Locations¶

class geoanalytics_fabric.tools.FindDwellLocations¶

Finds where entities dwell within a specific distance and duration using a record of their location through time.

Dwell locations are determined using time and distance tolerances. First, the tool groups points into tracks representing each entity using a track identifier and orders them sequentially. Next, the distance between the first point in a track and the next is calculated. If two temporally consecutive points stay within the given distance for at least the given duration, they are considered part of a dwell. When two points are found to be part of a dwell, the first point in the dwell is used as a reference point, and the tool finds consecutive points that are within the specified distance of the reference point in the dwell.

Once all points within the specified distance are found, the tool collects the dwell points and calculates their mean center. Features before and after the current dwell are added to the dwell if they are within the given distance of the dwell location’s mean center. This process continues until the end of the track.

Refer to the GeoAnalytics guide for examples and usage notes: Find Dwell Locations

addSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from First, Last, Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)¶

Runs the FindDwellLocations tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a spatial reference, a track ID column, and a datetime column
Return type:: DataFrame

setDistanceMethod(distance_method)¶

Sets the method used to calculate distances between track observations. There are two methods to choose from:

Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.
Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setDwellMaxDistance(max_distance, max_distance_unit)¶

Sets the maximum distance between points for them to be considered part of a single dwell event.

Note

This method is used along with setDwellMinDuration to define dwell criteria.

Parameters:

max_distance (float) – The maximum distance between points to be considered in a single dwell location.
max_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setDwellMinDuration(min_duration, min_duration_unit)¶

Sets the minimum time between points for them to be considered part of a single dwell event.

Note

This method is used along with setDwellMaxDistance to define dwell criteria.

Parameters:

min_duration (int) – The minimum time duration of a dwell to be considered in a single dwell location.
min_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setOutputType(output_type)¶

Sets the output type. * DwellMeanCenters: A point representing the centroid of each discovered dwell location. This is the default. * DwellConvexHulls: Polygons representing the convex hull of each dwell group. * DwellPoints: All the input points determined to belong to a dwell are returned. * AllPoints: All the input points are returned. * CollapseDwellPoints: All the non dwell points along with start and end points of a dwell are returned.

Parameters:: output_type – Choose from DwellMeanCenters, DwellConvexHulls, DwellPoints, AllPoints, or CollapseDwellPoints.
Returns:: The result DataFrame specified by output_type

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTrackFields(*track_fields)¶

Sets one or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input DataFrame.

Find Hot Spots¶

class geoanalytics_fabric.tools.FindHotSpots¶

Aggregates points into square bins and finds statistically significant bins of high incidents (hot spots) and low incidents (cold spots).

This tool finds hot and cold spots using the Getis-Ord Gi* statistic. The local counts of points for a bin and its neighbors are compared proportionally to the sum of points in all bins. A local sum is considered statistically significant (larger z-score) when it is very different from the expected local sum and when that difference is too large to be the result of random chance.

Refer to the GeoAnalytics guide for examples and usage notes: Find Hot Spots

run(dataframe)¶

Runs the FindHotSpots tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference.
Returns:: A DataFrame of square bins assigned a z-score, p-value, and confidence level.
Return type:: DataFrame

setBins(bin_size, bin_size_unit)¶

Sets the size of square bins used to find hot spots.

Parameters:

bin_size (float) – Distance between parallel sides of a bin.
bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setNeighborhood(distance, distance_unit)¶

Sets the size of the neighborhood used to find hot spots. The neighborhood size must be larger than the bin size.

Parameters:

distance (float) – Radius of the neighborhood, measured from each bin center.
distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeStep(interval_duration, interval_unit, reference_time=None, alignment=None)¶

Sets the time step interval, time step repeat, and reference time. If set, hot spots will be calculated for each time step at each bin location. The input DataFrame must have a datetime column to use this setter.

Parameters:

interval_duration (int) – Duration of each time step.
interval_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
reference_time (int/datetime.datetime) – A reference datetime to which to align the time steps, if alignment is ReferenceTime. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.
alignment (str) – Defines how aggregation will occur based on a given interval duration. Choose from StartTime, EndTime, or ReferenceTime.

Find Point Clusters¶

class geoanalytics_fabric.tools.FindPointClusters¶

Finds clusters of points within surrounding noise based on their spatial or spatiotemporal distribution.

Two clustering methods are supported: DBSCAN or HDBSCAN. Both methods can find clusters in space, while DBSCAN can find spatiotemporal clusters in time-enabled point layers.

Refer to the GeoAnalytics guide for examples and usage notes: Find Point Clusters

run(dataframe)¶

Runs the FindPointClusters tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference.
Returns:: A copy of the input DataFrame with a cluster ID assigned to each point.
Return type:: DataFrame

setClusterMethod(cluster_method)¶

Sets The algorithm used for cluster analysis. Supported options are “DBSCAN” and “HDBSCAN”.

The DBSCAN algorithm uses a specified distance to separate dense clusters from sparser noise. DBSCAN is faster than HDBSCAN, but is only appropriate if there is a clear search distance to use that works well to define all clusters that may be present.

DBSCAN finds clusters that have similar densities. The HDBSCAN algorithm allows for clusters with varying densities based on cluster probability (or stability).

HDBSCAN is data-driven and does not use a search distance, but is a more time-consuming calculation than DBSCAN. The DBSCAN algorithm finds clusters in two-dimensional space by default. When setTimeMethod is called, DBSCAN will discover clusters in both space and time.

Parameters:: cluster_method (str) – Choose from DBSCAN or HDBSCAN.

setMinPointsCluster(min_points_cluster)¶

This setter is used differently depending on the clustering method chosen. For DBSCAN, min_points_cluster specifies the number of points that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer points than this value.

For HDBSCAN, min_points_cluster specifies the number of points neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Parameters:: min_points_cluster (int) – Number of points.

setSearchDistance(search_distance, search_distance_unit)¶

Sets the search distance within which the number of points specified by setMinPointsCluster must be found (in addition to being within the search duration, if applicable) to form a cluster using the DBSCAN algorithm. No search distance is used by HDBSCAN.

Parameters:

search_distance (float) – Distance within which min_points_cluster must be found to start forming a cluster. Results may include clusters with fewer points min_points_cluster.
search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrame has a spatial reference. Otherwise use None if the input DataFrame has no spatial reference.

setSearchDuration(search_duration, search_duration_unit)¶

Sets the search duration within which the number of points specified by setMinPointsCluster must be found (in addition to being within the search distance) to form a cluster using the DBSCAN algorithm.

Warning

The input DataFrame must have a datetime column to use this setter.

Note

This method is not used by HDBSCAN.

Parameters:

search_duration (int) – Duration within which min_points_cluster must be found to start forming a cluster. Results may include clusters with fewer points than min_points_cluster.
search_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Find Similar Locations¶

class geoanalytics_fabric.tools.FindSimilarLocations¶

Measures the similarity of candidate locations to one or more reference locations.

This tool requires two DataFrames, one containing the reference locations and one containing candidate locations. Using specified fields representing the criteria to match, the tool will rank all of the candidate locations by how closely they match the reference locations.

Refer to the GeoAnalytics guide for examples and usage notes: Find Similar Locations

run(reference_dataframe, search_dataframe)¶

Runs the FindSimilarLocations tool using the provided DataFrames.

Parameters:

reference_dataframe (DataFrame) – A DataFrame containing one or more reference rows with attributes.
search_dataframe (DataFrame) – A DataFrame containing candidate locations that will be evaluated for similarity to the reference rows.

Returns:

The similarity statistics with appended fields.

Return type:

DataFrame

setAnalysisFields(*analysis_fields)¶

Sets the fields that will be used to determine similarity. They must be numeric fields, and the fields must exist on both input DataFrames. Depending on the match method selected, the tool will find rows that are most similar based on values or profiles of the fields.

Parameters:: analysis_fields (*str) – The names of one or more fields from the input DataFrames.

setAppendFields(*append_fields)¶

Sets which fields from the search DataFrame are included in the result. By default, all fields from the search DataFrame are appended.

Parameters:: append_fields (*str) – The names of one or more fields from the search DataFrame.

setMatchMethod(match_method)¶

Sets the method that specifies how matching is determined. There are two options:

AttributeValues: uses the squared differences of standardized values. This is the default.
AttributeProfiles: uses cosine similarity mathematics to compare the profile of standardized values. This option requires the use of at least two analysis fields.

Parameters:: match_method (str) – Choose from AttributeValues or AttributeProfiles.

setMostOrLeastSimilar(most_or_least_similar)¶

Sets the rows that will be returned. Options include returning rows that are either most similar or least similar to the reference, or return both the most and least similar.

Parameters:: most_or_least_similar (str) – Choose from MostSimilar, LeastSimilar, or Both.

setNumberOfResults(number_of_results)¶

Sets the number of ranked candidate rows to return. The default is 10 and the maximum allowed is 10000.

Parameters:: number_of_results (int) – Number of most or least similar locations to return.

GWR¶

class geoanalytics_fabric.tools.GWR¶

Performs Geographically Weighted Regression (GWR), a local form of linear regression used to model spatially varying relationships.

GWR provides a local model of a variable by fitting a regression equation to every row in the input DataFrame using the geometry and any specified explanatory variables.

Refer to the GeoAnalytics guide for examples and usage notes: GWR

Result¶: alias of GeographicallyWeightedRegressionResult

run(dataframe)¶

Runs the GWR tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference, dependent variables, and explanatory variables.
Returns:: A copy of the input DataFrame with model attributes appended to each row.
Return type:: DataFrame

runIncludeDiagnostics(dataframe)¶

Runs the GWR tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column with a projected spatial reference, dependent variables, and explanatory variables.
Returns:: A named tuple containing: outputTrained, a copy of the input DataFrame with model attributes appended to each row; and modelDiagnostics, a dictionary containing the model diagnostics.
Return type:: namedtuple

setDependentVariable(dependent_variable)¶

The numeric field containing the observed values to model.

Parameters:: dependent_variable (str) – The name of a field in the input DataFrame.

setDistanceBand(distance_band=None, distance_band_unit=None)¶

Sets the neighborhood size as a fixed distance for each feature.

Note

This method will override setNumNeighbors if called last.

Parameters:

distance_band (float) – The distance for the spatial extent of the neighborhood.
distance_band_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setExplanatoryVariables(*explanatory_variables)¶

Sets one or more fields to represent independent explanatory variables in the model.

Parameters:: explanatory_variables (*str) – The names of one or more fields from the input DataFrame.

setLocalWeightingScheme(local_weighting_scheme)¶

Sets the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each points is related to other points within its neighborhood. Two options are supported:

Bisquare: assigns a weight of 0 to any geometry outside the neighborhood. This is the default.
Gaussian: assigns weights to all geometries, but weights become exponentially smaller the farther away they are from the target geometry.

Parameters:: local_weighting_scheme (str) – Choose from Bisquare or Gaussian.

setNumNeighbors(number_of_neighbors)¶

Sets the neighborhood size as a function of a specified number of neighbors included in calculations for each point. Where points are dense, the spatial extent of the neighborhood is smaller; where points are sparse, the spatial extent of the neighborhood is larger.

Note

This method will override setDistanceBand if called last.

Parameters:: number_of_neighbors (int) – The number of neighbors included in calculations.

Group By Proximity¶

class geoanalytics_fabric.tools.GroupByProximity¶

Groups geometries that are within spatial or spatiotemporal proximity of each other.

Refer to the GeoAnalytics guide for examples and usage notes: Group By Proximity

run(dataframe)¶

Runs the GroupByProximity tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a geometry column.
Returns:: A copy of the input DataFrame with a column of group IDs appended.
Return type:: DataFrame

setAttributeRelationship(expression, expression_type='sql')¶

Sets the attribute relationship expression to further refine groupings.

Parameters:

expression (str) – Expression representing the attribute relationship.
expression_type (str) – Choose from Arcade or SQL.

setSpatialRelationship(spatial_relationship='Intersects', near_distance=None, near_distance_unit=None)¶

Sets the type of spatial relationship to group by.

Parameters:

spatial_relationship (str) – Choose from Intersects, Touches, NearGeodesic, NearBinnedProject, or NearPlanar.
near_distance (float) – The search distance to determine if geometries are near one another. This is applied only if NearGeodesic, NearBinnedProject, or NearPlanar is set as the spatial relationship.
near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTemporalRelationship(temporal_relationship='Intersects', temporal_distance=None, temporal_distance_unit=None)¶

Sets the type of temporal relationship to group by.

Parameters:

temporal_relationship (str) – Choose from Intersects or Near.
temporal_distance (int) – Sets the temporal search distance to determine if geometries are near one another.
temporal_distance_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Nearest Neighbors¶

class geoanalytics_fabric.tools.NearestNeighbors¶

Search for the given number of neighbors to a record in a DataFrame from records in another DataFrame. The records from the input DataFrames are matched based on closest proximity.

Refer to the GeoAnalytics guide for examples and usage notes: Nearest Neighbors

addSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(query_dataframe, data_dataframe=None)¶

Runs the NearestNeighbors tool using the provided DataFrames.

If you provide only a query_dataframe, the DataFrame is used as both the query_dataframe and the data_dataframe. In this case, each record will be joined with other nearby records, excluding itself.

Parameters:

query_dataframe (DataFrame) – A DataFrame containing geometries whose nearest neighbors will be found.
data_dataframe (DataFrame) – A DataFrame containing the neighbor candidates.

Returns:

A DataFrame containing the result of the join.

Return type:

DataFrame

setDistanceMethod(distance_method)¶

Specify the distance method category for relative nearness. There are two methods:

Planar: this is the default when the input DataFrame is in a projected coordinate system.
Geodesic: this is the default when the input DataFrame is in a geographic coordinate system.

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setNumNeighbors(k)¶

The number of neighbors to find that are nearest to each query record.

Parameters:: k (int) – The number of nearest neighbors. The number must be greater than 0.

setOutputUnit(distance_unit)¶

Sets the desired output unit of the distance values. The default is meters.

Parameters:: distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the provided DataFrame has a spatial reference. Inapplicable if the input has no spatial reference.

setResultLayout(layout='long')¶

Sets the layout format for the result DataFrame. There are two options:

long: Each row represents a query record with a single nearest neighbor, and the output is organized by stacking all paired records. This is the default (when summary statistics are not in use).
wide: Each row represents a query record with all nearest neighbors, with the fields in data_dataframe consolidated into one column for each nearest neighbor.

Parameters:: layout (str) – Choose from long format or wide format.

setSearchDistance(search_distance, search_distance_unit)¶

Sets a distance bound within which to search for nearest neighbors.

Parameters:

near_distance (float) – The search distance to determine if geometries are near one another according to the distance method in use.
near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the provided DataFrame has a spatial reference. Otherwise use None if the input has no spatial reference.

Overlay¶

class geoanalytics_fabric.tools.Overlay¶

Combines two or more geometry columns into a single column using a spatial overlay operation.

Note

This tool operates on the entire input DataFrame and thus can more performant than equivalent row-wise operations using SQL functions.

Refer to the GeoAnalytics guide for examples and usage notes: Overlay

run(input_dataframe, overlay_dataframe)¶

Runs the Overlay tool using the provided DataFrames.

Parameters:

input_dataframe (DataFrame) – A DataFrame containing a geometry column.
overlay_dataframe (DataFrame) – A DataFrame containing a geometry column to overlay.

Returns:

A DataFrame containing the result of the overlay.

Return type:

DataFrame

setOverlayType(overlay_type)¶

Sets the type of overlay to be performed.

Parameters:: overlay_type (str) – Choose from Intersect, Erase, Union, Identity, or SymmetricalDifference.

Reconstruct Tracks¶

class geoanalytics_fabric.tools.ReconstructTracks¶

Creates a line or polygon representing an entity’s path of movement over time using points or polygons with associated timestamps.

This tool groups input rows into tracks representing unique entities using a track identifier field. It then creates a linestring by connecting the point observations for each entity sequentially. The linestring can be buffered with a variable distance using a field from the input DataFrame.

Refer to the GeoAnalytics guide for examples and usage notes: Reconstruct Tracks

addSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from First, Last, Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

run(dataframe)¶

Runs the ReconstructTracks tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point or polygon column, a track ID column, and a datetime column.
Returns:: A DataFrame containing the result linestrings or polygons.
Return type:: DataFrame

setArcadeSplit(arcade_split)¶

Sets an Arcade expression to split tracks with. The expression will be evaluated for each point in a track and the track will be split if the expression equals True.

Parameters:: arcade_split (str) – An Arcade expression.

setBufferField(buffer_field)¶

Sets a field in the input DataFrame that contains a buffer distance or a buffer expression. A buffer expression must begin with an equal sign (=).

Parameters:: buffer_field (str) – The name of a field from the input DataFrame.

setDistanceMethod(distance_method)¶

Sets the method used to calculate distances between track observations. There are two methods to choose from:

Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.
Geodesic: calculations will cross the antimeridian when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setDistanceSplit(distance_split, distance_split_unit)¶

Sets the distance used to split tracks. Any rows in the input DataFrame that are in the same track and are farther apart than this distance will be split into a new track. If both the distance split and the time split are used, the track is split when at least one condition is met.

Parameters:

distance_split (float) – The distance used to split tracks.
distance_split_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setSplitBoundaryOption(split_boundary_option)¶

Sets how the track segment between two points is created when a track is split. The split type is applied to split expressions, distance splits, and time splits. There are three options:

Gap: no segment is created between the two points (this is the default).
FinishLast: a segment is created between the two points that ends after the split.
StartNext: a segment is created between the two points that ends before the split.

Parameters:: split_boundary_option (str) – Choose from Gap, FinishLast, or StartNext

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980 tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTimeSplit(time_split, time_split_unit)¶

Sets the time duration used to split tracks. Any rows in the input DataFrame that are in the same track and are farther apart than this time will be split into a new track. If both the distance split and time split are used, a track is split when at least one condition is met.

Parameters:

time_split (int) – The time duration used to split tracks.
time_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setTrackFields(*track_fields)¶

Sets one or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input DataFrame.

Snap Tracks¶

class geoanalytics_fabric.tools.SnapTracks¶

Snaps input track points to lines. The points dataframe must have a timestamp column where each row represents an instant in time. The lines dataframe must also contain fields indicating the from and to nodes for analysis.

Refer to the GeoAnalytics guide for examples and usage notes: Snap Tracks

run(points_dataframe, lines_dataframe)¶

Runs the SnapTracks tool using the provided DataFrames.

Parameters:

points_dataframe (DataFrame) – A DataFrame containing points that will be matched to lines.
lines_dataframe (DataFrame) – A DataFrame containing lines to which points will be matched. The input must contain fields with values indicating the from and to nodes of the line.

Returns:

The snapped points DataFrame with appended fields.

Return type:

DataFrame

setAppendFields(*line_fields)¶

Sets one or more fields from the input lines DataFrame that will be included in the output result.

Parameters:: line_fields (*str) – The names of one or more fields from the line DataFrame.

setConnectivityFields(from_node, to_node)¶

The line DataFrame fields that will be used to define the connectivity of the input lines.

Parameters:

from_node (str) – The field that represents the from_node, the node that the travel along a line is moving away from.
to_node (str) – The field that represents the from_node, the node that the travel along a line is moving to.

setDirectionFieldMatching(direction_field, forward_value=None, backward_value=None, both_value=None, none_value=None)¶

The line field and attribute values that will be used to define the direction of the input lines.

Parameters:

direction_field (str) – The field from the line DataFrame that describes the direction of travel.
forward_value (str) – The value from the direction_field that indicates the supported direction of travel is forward along a line.
backward_value (str) – The value from the direction_field that indicates the supported direction of travel is backward along a line.
both_value (str) – The value from the direction_field that indicates both forward and backward directions of travel are supported along a line.
none_value (str) – The value from the direction_field that indicates there are no supported directions of travel along a line.

setDistanceMethod(distance_method)¶

Sets the method used to calculate distances. There are two methods to choose from: ‘Planar’ or ‘Geodesic’ (default).

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setDistanceSplit(distance_split, distance_split_unit)¶

Sets the distance used to split tracks. Any observations in the input DataFrame that are in the same track and are farther apart than this distance will be split into a new track. If both the distance split and the time split are used, the track is split when at least one condition is met.

Parameters:

distance_split (float) – The distance used to split tracks.
distance_split_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setOutputMode(output_mode)¶

Sets the result type. There are two options:

AllPoints: All input points are returned. This is the default.
MatchedPoints: Only input points that matched to a line are returned.

Parameters:: output_mode (str) – Choose from AllPoints or MatchedPoints.

setSearchDistance(search_distance, search_distance_unit)¶

The maximum distance allowed between a point and any line to be considered a match.

Parameters:

search_distance (float) – Maximum distance between any point and a line.
search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)¶

Sets boundaries to limit calculations to defined spans of time. For example, if you use a time boundary of 1 day, starting on January 1, 1980, tracks will be analyzed one day at a time.

Parameters:

time_boundary_split (int) – The scale of the time boundary.
time_boundary_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.
time_boundary_reference (int/datetime.datetime) – A reference timestamp to which to align the time steps. The reference timestamp can be either a datetime object or an integer Unix timestamp in milliseconds. The default reference time is the Unix epoch time, which is 1970-01-01 00:00:00 UTC.

setTimeSplit(time_split, time_split_unit)¶

Sets the time duration used to split tracks. Any observations in the point DataFrame that are in the same track and are farther apart than this time will be split into a new track. If both the distance split and time split are used, a track is split when at least one condition is met.

Parameters:

time_split (int) – The time duration used to split tracks.
time_split_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years

setTrackFields(*track_fields)¶

One or more fields used to identify distinct tracks.

Parameters:: track_fields (*str) – The names of one or more fields from the input points DataFrame.

Spatiotemporal Join¶

class geoanalytics_fabric.tools.SpatiotemporalJoin¶

Joins attributes from one DataFrame to another based on spatial, temporal, and attribute relationships or some combination of the three.

The tool determines all input rows that meet the specified join conditions and joins the second DataFrame to the first. You can optionally join all rows to the matching rows or summarize the matching rows.

Refer to the GeoAnalytics guide for examples and usage notes: Spatiotemporal Join

addSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

includeDistance(include=True, distance_unit=None)¶

Specifies whether to include spatial distance and/or temporal difference in the columns of the result DataFrame (new in version 1.2.0).

Parameters:

include (bool) – True to include, or False to exclude, spatial distance and/or temporal difference.
distance_unit (str) – the desired output unit of the spatial distance values. The default is meters. Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrames have a spatial reference. Otherwise use None if the input DataFrames have no spatial reference.

run(target_dataframe, join_dataframe)¶

Runs the SpatiotemporalJoin tool using the provided DataFrames.

Parameters:

target_dataframe (DataFrame) – A DataFrame.
join_dataframe (DataFrame) – A DataFrame to join.

Returns:

A DataFrame containing the result of the join.

Return type:

DataFrame

setAttributeRelationship(attribute_relationship)¶

Sets a target field, relationship, and join field used to join equal attributes.

An equals relationship can be used (equal in JSON, and = using the string format), or to check for join strings that are equal without comparing casing or trailing and leading white spaces, equalIgnoreCaseTrimWhiteSpace can be used through JSON or ~= using a string.

Parameters:: attribute_relationship (str) – Expression representing the attribute relationship.

setJoinCondition(join_condition)¶

Sets a condition to specified fields using an Arcade expression. Only rows with columns that meet this condition will be joined.

Parameters:: join_condition (str) – An Arcade expression.

setJoinOneToMany()¶

Sets the join operation to one to many. If multiple join rows are found that have the same relationships with a single target row, the result DataFrame will contain multiple copies of the target row.

For example, if a single point in the target DataFrame is found within two separate polygons in the join DataFrame, the result DataFrame will contain two copies of the target row: one row with the attributes of one polygon and another row with the attributes of the other polygon. There are no summary statistics available with this method.

Note

This method will override setJoinOneToOne.

setJoinOneToOne()¶

Sets the join operation to one to one. If multiple join rows are found that have the same relationships with a single target row, the fields from the multiple join rows will be aggregated using the specified summary statistics.

For example, if a point is found within two separate polygons, the fields associated with the two polygons will be aggregated before being returned in the result DataFrame. If one polygon has an attribute value of 3 and the other has a value of 7, and a summary statistic of sum is specified, the aggregated value in the output DataFrame will be 10. There will always be a Count field calculated, with a value of 2, for the number of rows specified.

Note

This method will override setJoinOneToMany

setLeftJoin(left_join=True)¶

Specifies whether all target rows will be returned in the result DataFrame (known as a left or left outer join) or only those that have the specified relationships with the join rows (inner join). Left join can be used with a one-to-one join or a one-to-many join (new in version 1.1.0).

Parameters:: left_join (bool) – If True a left outer join will be used, if False an inner join will be used.

setSpatialRelationship(spatial_relationship, near_distance=None, near_distance_unit=None)¶

Sets the spatial relationship used to spatially join rows.

Parameters:

spatial_relationship (str) – Choose from Equals, Intersects, Contains, Within, Crosses, Touches, Overlaps, NearPlanar, NearGeodesic.
near_distance (float) – A double value used for the search distance to determine if a target geometry is near a join geometry. This is only applied if NearPlanar or NearGeodesic is the specified spatial relationship.
near_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards if the input DataFrames have a spatial reference. Otherwise use None if the input DataFrames have no spatial reference.

setTemporalRelationship(temporal_relationship, near_duration=None, near_duration_unit=None)¶

Sets the temporal relationship used to temporally join rows.

Parameters:

temporal_relationship (str) – Choose from Equals, Intersects, During, Contains, Finishes, FinishedBy, Meets, MetBy, Overlaps, OverlappedBy, Starts, StartedBy, Near,`NearBefore` or NearAfter.
near_duration (int) – An integer value used for the temporal search distance to determine if a target geometry is temporally near a join geometry.
near_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years.

Summarize Within¶

class geoanalytics_fabric.tools.SummarizeWithin¶

Summarizes geometries from the input DataFrame where they intersect summary polygons or bins using statistics.

Refer to the GeoAnalytics guide for examples and usage notes: Summarize Within

Result¶: alias of SummarizeWithinResult

addRateField(rate_field)¶

Marks a numeric field in the input DataFrame as having quantity type rate/index (rather than count/sum).

Parameters:: rate_field (str) – The name of a field from the input DataFrame

addStandardSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame.
statistic (str) – Choose from Count, Sum, Mean, Max, Min, Range, Stddev, Var, or Any.
alias (str) – The name of the result field containing the statistic. The default is the field name and statistic separated by an underscore.

addWeightedSummaryField(summary_field, statistic, alias=None)¶

Adds a summary statistic of a field in the input DataFrame to the result DataFrame.

Parameters:

summary_field (str) – The name of a field from the input DataFrame
statistic (str) – Choose from Mean, Stddev, or Var.
alias (str) – The name of the result field containing the statistic. The default is ‘p’, the field name, underscore, and statistic.

includeShapeSummary(include=True, units=None)¶

Sets the inclusion of calculated statistics based on the geometry type of the primary geometry column in the input DataFrame, such as the length of lines or areas of polygons within each summary polygon.

Parameters:

include (bool) – If True, geometry summary statistics will be included in the result.
units (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, Yards, SquareMeters, SquareKilometers, Hectares, SquareFeet, SquareYards, SquareMiles or Acres.

run(dataframe)¶

Runs the SummarizeWithin tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a geometry column.
Returns:: A named tuple with a DataFrame containing the summary polygons and a DataFrame containing the group-by summary (if applicable).
Return type:: namedtuple

setGroupBy(group_by_field, include_minor_major_fields=True, include_group_percentages=True)¶

Sets a field from the input DataFrame that will be used to calculate statistics for each unique value.

When setGroupBy is called, the tool will return a DataFrame containing the statistics in addition to a DataFrame containing the summaries.

For example, suppose the input DataFrame contains city boundaries and the polygons set by setSummaryPolygons are parcels. One of the fields of the parcels is Status which contains two values: VACANT and OCCUPIED. To calculate the total area of vacant and occupied parcels within the boundaries of cities, use Status as the group-by field.

Parameters:

group_by_field (str) – The name of a field from the input DataFrame.
include_minor_major_fields (bool) – If True, the minority (least dominant) or the majority (most dominant) attribute values for each group will be included in the result.
include_group_percentages (bool) – If True, the percentage of each unique field value is calculated for each summary polygon.

setSummaryBins(bin_size, bin_size_unit, bin_type='square')¶

Sets the size and shape of bins that the input DataFrame will be summarized into.

Note

This method overrides setSummaryPolygons. Use setSummaryPolygons if summarizing into an existing column of polygons.

Parameters:

bin_size (float) – Distance between parallel sides of a bin.
bin_size_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.
bin_type (str) – Choose from Square or Hexagon.

setSummaryPolygons(summary_polygons)¶

Sets the DataFrame containing a column of polygons that the input DataFrame will be summarized into.

Note

This method overrides setSummaryBins. Use setSummaryBins instead if summarizing into square or hexagon bins that are generated when the tool runs.

Parameters:: summary_polygons (pyspark.sql.DataFrame) – A DataFrame containing a polygon column.

Trace Proximity Events¶

class geoanalytics_fabric.tools.TraceProximityEvents¶

Analyzes points representing moving entities. The tool will follow entities of interest in space (location) and time to see which other entities the entities of interest have interacted with. The trace will continue from entity to entity to a configurable maximum degrees of separation from the original entity of interest.

For example, suppose an organization monitors company-issued devices carried by workers. The company is interested in determining which employees were near an individual known to have COVID-19. Using the point layer representing device locations and time, they can identify devices that have been within 6 meters and 5 minutes of the contagious person and other possibly contagious employees.

Refer to the GeoAnalytics guide for examples and usage notes: Trace Proximity Events

Result¶: alias of TraceProximityEventsResult

includeTracksDataFrame()¶: Includes a second DataFrame with the points used in the trace.

run(dataframe)¶

Runs the TraceProximityEvents tool using the provided DataFrame.

Parameters:: dataframe (DataFrame) – A DataFrame containing a point column, timestamp column, and entity ID column.
Returns:: A named tuple containing a copy of the input DataFrame with proximity event info appended and a DataFrame containing only points used in the trace.
Return type:: DataFrame

setAttributeMatchCriteria(*attribute_match_criteria)¶

One or more fields used to constrain the proximity events. Entities will only be considered near when the spatial search distance and temporal search distance criteria are met and the two entities have equal values of the fields specified.

Parameters:: attribute_match_criteria (*str) – The names of one or more fields from the input DataFrame.

setDistanceMethod(distance_method)¶

Sets the method used to calculate distances between track observations. There are two methods to choose from:

Planar: measures distances using a Euclidean plane and will not calculate statistics across the date line.
Geodesic: calculations will cross the date line when appropriate. This is the default. If the spatial reference cannot be panned, calculations will be limited to the coordinate system extent and may not wrap.

Parameters:: distance_method (str) – Choose from Planar or Geodesic.

setEntitiesOfInterestIds(entities_of_interest_ids)¶

Sets one or more entities that you are interested in tracing from, as well as a time to start tracing from. If you do not specify a time, January 1, 1970, at 12:00 a.m. will be used.

Parameters:: entities_of_interest_ids (str) – A stringified list of dictionaries containing entity IDs and times in epoch ms.
Example:: ‘[{“entityID”: “user5”, “epochTimeStamp”: 1598390663000}, {“entityID”: “user9”, “epochTimeStamp”: None}]’

setEntityIdField(entity_id_field)¶

Sets the field used to identify distinct entities.

Parameters:: entity_id_field (str) – The name of a field from the input DataFrame.

setMaxTraceDepth(max_trace_depth)¶

Sets the maximum degrees of separation between an entity of interest and an entity further down the trace.

Parameters:: max_trace_depth (int) – Degrees of separation.

setSearchDistance(search_distance, search_distance_unit)¶

Sets the maximum distance between two points to be considered in proximity. Points closer together in space and that also meet the search duration criteria are considered in proximity of each other.

Note

This method is used along with setSearchDuration to define proximity.

Parameters:

search_distance (float) – The search distance used to determine if points are in proximity.
search_distance_unit (str) – Choose from Meters, Kilometers, Feet, Miles, NauticalMiles, or Yards.

setSearchDuration(search_duration, search_duration_unit)¶

Sets the maximum duration between two points that are considered in proximity. Points closer together in time and that also meet the search distance criteria are considered in proximity of each other.

Note

This method is used along with setSearchDistance to define proximity.

Parameters:

search_duration (int) – The search duration used to determine if points are in proximity.
search_duration_unit (str) – Choose from Milliseconds, Seconds, Minutes, Hours, Days, Weeks, Months, or Years