Nearest neighbors
Nearest Neighbors finds the given number of neighbors to a record in a DataFrame from records in another DataFrame. The records from the input DataFrames are matched based on closest proximity.
Usage notes
Nearest Neighbors supports point, line ,and polygon geometry types.
Nearest Neighbors supports two formats for the output layout:
Long—Each row represents a query record with a single nearest neighbor, and the columns include rank, distance between geometries of two records, and all fields from
query_
anddataframe data_
. The output is organized by stacking all paired records.dataframe Wide—Each row represents a query record with all nearest neighbors, with the fields in
data_
consolidated into one column with distance to the query record. The columns include all fields fromdataframe query_
and the information for each nearest neighbor.dataframe
If you provide only one DataFrame, the DataFrame is used as both the
query_
and thedataframe data_
. In the output, each record will be joined with other nearby records, excluding itself. For example, if you are interested in finding the nearby cities for each city in the United States, you can provide the US-city DataFrame as the input DataFrame without specifying separatedataframe query_
andData Frame data_
.Data Frame Nearest Neighbors uses planar distances for calculations. It is required that geometry of the
query_
has a projected coordinate system. You can transform your data to a projected coordinate system by using ST_Transform. If theData Frame query_
andData Frame data_
have different coordinate systems, analysis will be completed in the coordinate system of theData Frame query_
. Because Nearest Neighbors uses planar calculations, it is not recommended for use on datasets that span a large extent.Data Frame To learn more see Coordinate systems and transformations.
If either DataFrame has a spatial reference, the other DataFrame must also have a spatial reference or the tool will fail. If both input DataFrames have no spatial references, Nearest Neighbors can be used to calculate the distance and find neighbors with the assumption that the search distance and output distances have the same unit as the input coordinates. In this case, the search distance unit should be set to
None
.If there are multiple nearest neighbors with an equal distance to the query record, nearest neighbor will break ties by randomly selecting one or more records from the equidistant neighbors to ensure the specified number of closest neighbors. For example, if you are interested in finding two nearest neighbors when there are three records that are equidistant from the query record, two of the three records will be randomly selected and returned in the output.
Set a search distance to exclude neighbors further away than the search distance. This can result in fewer neighbors returned than the specified number of neighbors. For example, if you are interested in finding three nearest neighbors within a specified search distance when there are two records within the distance, only the two neighbors will be returned in the output.
When Nearest Neighbors finds fewer neighbors in the
data_
than the specified number of neighbors, it will returndataframe Null
for no neighbors in awide
-format output, or only return rows that have a matched neighbor in along
-format output.
Limitations
- Nearest Neighbors only supports planar distance. Geodesic distance is not supported.
Results
The format of the output DataFrame differs depending on the output layout type. The two options are long
and wide
.
Long-format layout
The following fields are included in the output DataFrame with the long-format layout:
- All fields from the query DataFrame
- All fields from the data DataFrame
In addition, the following fields are included in the output records:
Field | Description |
---|---|
near_ | The rank of the nearest neighbors. The rank is given according to ascending order distance. |
near_ | The distance between the record in the query_ to the identified nearest neighbor from the data_ . |
Wide-format layout
The following fields are included in the output DataFrame with the wide-format layout:
- All fields from the query DataFrame
In addition, there is one column for each near record with the sub-fields in the output DataFrame:
near_
—The distance to the query recorddistance - All fields from the data DataFrame
For example, if the number of neighbors is 3, three new fields will be appended to the result dataframe, near1
, near2
,
and near3
. Each of the three fields include near_
and all fields from the data DataFrame.
Performance notes
Improve the performance of Nearest Neighbors by doing one or more of the following:
Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:
- ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
- ST_EnvIntersects—Select records that intersect an envelope.
- ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.
- Set a search distance with
set
.Search Distance() - Use smaller values for
set
andSearch Distance() set
.N u m Neighbors()
Similar capabilities
Syntax
For more details, go to the GeoAnalytics Engine API reference for nearest neighbors.
Setter | Description | Required |
---|---|---|
run(query_ | Runs the Nearest Neighbors tool using the provided DataFrames. query_ is a DataFrame containing geometries whose nearest neighbors will be found, and data_ is a DataFrame containing the neighbor candidates. | Yes |
set | Sets the number of neighbors (k) to find that are nearest to the query record. | Yes |
set | Sets the layout format for the result DataFrame. Choose from 'long' format (default) or 'wide' format. | No |
set | Sets a distance bound within which to search for nearest neighbors. Choose from 'Miles' , 'Kilometers' , 'Meters' , 'Feet' , 'Nautical , 'Yards' , or None . | No |
Examples
Run Nearest Neighbors
Plot results
Version table
Release | Notes |
---|---|
1.1.0 | Tool introduced |