Nearest neighbors

Nearest Neighbors finds the given number of neighbors to a record in a DataFrame from records in another DataFrame. The records from the input DataFrames are matched based on closest proximity.

nn

Usage notes

  • Nearest Neighbors supports point, line ,and polygon geometry types.

  • Nearest Neighbors supports two formats for the output layout:

    • Long—Each row represents a query record with a single nearest neighbor, and the columns include rank, distance between geometries of two records, and all fields from query_dataframe and data_dataframe. The output is organized by stacking all paired records.

    • Wide—Each row represents a query record with all nearest neighbors, with the fields in data_dataframe consolidated into one column with distance to the query record. The columns include all fields from query_dataframe and the information for each nearest neighbor.

  • If you provide only one DataFrame, the DataFrame is used as both the query_dataframe and the data_dataframe. In the output, each record will be joined with other nearby records, excluding itself. For example, if you are interested in finding the nearby cities for each city in the United States, you can provide the US-city DataFrame as the input DataFrame without specifying separate query_DataFrame and data_DataFrame.

  • Nearest Neighbors uses planar distances for calculations. It is required that geometry of the query_DataFrame has a projected coordinate system. You can transform your data to a projected coordinate system by using ST_Transform. If the query_DataFrame and data_DataFrame have different coordinate systems, analysis will be completed in the coordinate system of the query_DataFrame. Because Nearest Neighbors uses planar calculations, it is not recommended for use on datasets that span a large extent.

    To learn more see Coordinate systems and transformations.

  • If either DataFrame has a spatial reference, the other DataFrame must also have a spatial reference or the tool will fail. If both input DataFrames have no spatial references, Nearest Neighbors can be used to calculate the distance and find neighbors with the assumption that the search distance and output distances have the same unit as the input coordinates. In this case, the search distance unit should be set to None.

  • If there are multiple nearest neighbors with an equal distance to the query record, nearest neighbor will break ties by randomly selecting one or more records from the equidistant neighbors to ensure the specified number of closest neighbors. For example, if you are interested in finding two nearest neighbors when there are three records that are equidistant from the query record, two of the three records will be randomly selected and returned in the output.

  • Set a search distance to exclude neighbors further away than the search distance. This can result in fewer neighbors returned than the specified number of neighbors. For example, if you are interested in finding three nearest neighbors within a specified search distance when there are two records within the distance, only the two neighbors will be returned in the output.

  • When Nearest Neighbors finds fewer neighbors in the data_dataframe than the specified number of neighbors, it will return Null for no neighbors in a wide-format output, or only return rows that have a matched neighbor in a long-format output.

Limitations

  • Nearest Neighbors only supports planar distance. Geodesic distance is not supported.

Results

The format of the output DataFrame differs depending on the output layout type. The two options are long and wide.

Long-format layout

  • The following fields are included in the output DataFrame with the long-format layout:

    • All fields from the query DataFrame
    • All fields from the data DataFrame

In addition, the following fields are included in the output records:

FieldDescription
near_rankThe rank of the nearest neighbors. The rank is given according to ascending order distance.
near_distanceThe distance between the record in the query_dataframe to the identified nearest neighbor from the data_dataframe.

Wide-format layout

  • The following fields are included in the output DataFrame with the wide-format layout:

    • All fields from the query DataFrame
  • In addition, there is one column for each near record with the sub-fields in the output DataFrame:

    • near_distance—The distance to the query record
    • All fields from the data DataFrame

For example, if the number of neighbors is 3, three new fields will be appended to the result dataframe, near1, near2, and near3. Each of the three fields include near_distance and all fields from the data DataFrame.

Performance notes

Improve the performance of Nearest Neighbors by doing one or more of the following:

  • Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:

    • ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
    • ST_EnvIntersects—Select records that intersect an envelope.
    • ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.
  • Set a search distance with setSearchDistance().
  • Use smaller values for setSearchDistance() and setNumNeighbors().

Similar capabilities

Syntax

For more details, go to the GeoAnalytics Engine API reference for nearest neighbors.

SetterDescriptionRequired
run(query_dataframe, data_dataframe=None)Runs the Nearest Neighbors tool using the provided DataFrames. query_dataframe is a DataFrame containing geometries whose nearest neighbors will be found, and data_dataframe is a DataFrame containing the neighbor candidates.Yes
setNumNeighbors(k)Sets the number of neighbors (k) to find that are nearest to the query record.Yes
setResultLayout(layout='long')Sets the layout format for the result DataFrame. Choose from 'long' format (default) or 'wide' format.No
setSearchDistance(search_distance, search_distance_unit)Sets a distance bound within which to search for nearest neighbors. Choose from 'Miles', 'Kilometers', 'Meters', 'Feet', 'NauticalMiles', 'Yards', or None.No

Examples

Run Nearest Neighbors

Python
Use dark colors for code blocksCopy
                                                                           
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Log in
import geoanalytics
geoanalytics.auth(username="myusername", password="mypassword")

# Imports
from geoanalytics.tools import NearestNeighbors, Clip
from geoanalytics.sql import functions as ST
from pyspark.sql import functions as F

# Path to the USA parks, public schools and county boundary data
parks_data_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Parks/FeatureServer/0"
schools_data_path = "https://services1.arcgis.com/Ua5sjt3LWTPigjyD/arcgis/rest/services/Public_School_Location_201819/FeatureServer/0"
counties_data_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0"

# Create DataFrames for park data and school data in Los Angeles County
la_df = spark.read.format("feature-service").load(counties_data_path) \
                 .where("NAME == 'Los Angeles County'") \
                 .withColumn("shape", ST.transform("shape", 6423))

schools_df = spark.read.format("feature-service").load(schools_data_path) \
                  .withColumn("shape", ST.transform("shape", 6423)) \
                  .select("NCESSCH","NAME","STREET","CITY","shape")

parks_df = spark.read.format("feature-service").load(parks_data_path) \
                .withColumn("shape", ST.transform("shape", 6423)) \
                .select("FID","NAME","SQMI","FEATTYPE","shape")

schools_la = Clip().run(schools_df, la_df).select("NCESSCH","NAME","STREET",F.col("clip_geometry").alias("shape"))
parks_la = Clip().run(parks_df, la_df).select("FID","NAME","FEATTYPE",F.col("clip_geometry").alias("shape"))

# Run Nearest Neighbors tool to identify the 4 closest parks near each school within 1 kilometer
print("This is the long-format layout for the output:")
result_long = NearestNeighbors() \
            .setNumNeighbors(4) \
            .setSearchDistance(1, "Kilometer") \
            .setResultLayout("long") \
            .run(schools_la, parks_la)
result_long.show(12)

print("This is the wide-format layout for the output:")
result_wide = NearestNeighbors() \
            .setNumNeighbors(4) \
            .setSearchDistance(1, "Kilometer") \
            .setResultLayout("wide") \
            .run(schools_la, parks_la)
result_wide.show(5, False)
Result
Use dark colors for code blocksCopy
                              
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
This is the long-format layout for the output:
+------------+--------------------+--------------------+--------------------+---------+------------------+----+--------------------+----------+--------------------+
|     NCESSCH|                NAME|              STREET|               shape|near_rank|     near_distance| FID|               NAME1|  FEATTYPE|              shape1|
+------------+--------------------+--------------------+--------------------+---------+------------------+----+--------------------+----------+--------------------+
|062271003020|John H. Francis P...|  12431 Roscoe Blvd.|{"x":1962779.8802...|        1| 461.4095044548353|7294|Fernangeles Recre...|Local park|{"rings":[[[19629...|
|062271003020|John H. Francis P...|  12431 Roscoe Blvd.|{"x":1962779.8802...|        2|  695.570144456635|7303|      Strathern Park|Local park|{"rings":[[[19626...|
|062271003020|John H. Francis P...|  12431 Roscoe Blvd.|{"x":1962779.8802...|        3| 916.0205307434085|7325|         Slavin Park|Local park|{"rings":[[[19626...|
|062271011621|Amanecer Primary ...|   832 S. E.man Ave.|{"x":1982834.2575...|        1|325.08124718431054|7184|        Salazar Park|Local park|{"rings":[[[19825...|
|060162000004|    Bragg Elementary|       11501 Bos St.|{"x":1991754.2577...|        1|  610.898594992709|7503|        Gridley Park|Local park|{"rings":[[[19916...|
|060162000004|    Bragg Elementary|       11501 Bos St.|{"x":1991754.2577...|        2|  974.661104723056|7511|        Liberty Park|Local park|{"rings":[[[19907...|
|060162000004|    Bragg Elementary|       11501 Bos St.|{"x":1991754.2577...|        3| 985.7022988089832|7523|        Artesia Park|Local park|{"rings":[[[19927...|
|062271003322|San Antonio Conti...|  2911 Belgrave Ave.|{"x":1979940.1123...|        1| 868.1204098829357|7200|          Miles Park|Local park|{"rings":[[[19797...|
|060172312852| Rise Kohyang Middle|3020 Wilshire Blv...|{"x":1973469.7063...|        1|195.68601441991382|7151|LAFAYETTE MULTIPU...|Local park|{"rings":[[[19738...|
|060172312852| Rise Kohyang Middle|3020 Wilshire Blv...|{"x":1973469.7063...|        2| 627.6240522661285|7156|Shatto Recreation...|Local park|{"rings":[[[19733...|
|060172312852| Rise Kohyang Middle|3020 Wilshire Blv...|{"x":1973469.7063...|        3| 721.8113938952536|7159|      MacArthur Park|Local park|{"rings":[[[19744...|
|060172312852| Rise Kohyang Middle|3020 Wilshire Blv...|{"x":1973469.7063...|        4| 797.0436162384274|7174|Robert F. Kennedy...|Local park|{"rings":[[[19726...|
+------------+--------------------+--------------------+--------------------+---------+------------------+----+--------------------+----------+--------------------+
only showing top 12 rows

This is the wide-format layout for the output:
+------------+---------------------------+---------------------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|NCESSCH     |NAME                       |STREET                     |shape                                         |near1                                                                                                                                                                                                                                                                                                      |near2                                                                                                                                                                                                                                                                                      |near3                                                                                                                                                                                                                                                                            |near4                                                                                                                                                                                                                                                                                               |
+------------+---------------------------+---------------------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|062271003020|John H. Francis Polytechnic|12431 Roscoe Blvd.         |{"x":1962779.880239428,"y":580374.5675968826} |{461.4095044548353, 7294, Fernangeles Recreation Center, Local park, {"rings":[[[1962941.7680235023,580806.6452243626],[1962915.1737325499,580837.8102185465],[1962822.816152859,580942.8921648934],[1962799.3457749686,580971.7155642249],[1962862.7847542353,581030.4721... (634 characters)}            |{695.570144456635, 7303, Strathern Park, Local park, {"rings":[[[1962634.06804393,579466.7073888276],[1962579.1494464332,579467.1287700906],[1962561.3930563936,579467.267080469],[1962510.8697688405,579467.6489166319],[1962382.519960966,579629.5628030... (666 characters)}            |{916.0205307434085, 7325, Slavin Park, Local park, {"rings":[[[1962656.7713297158,579411.3538131025],[1962657.119599543,579333.1736328024],[1962632.9169003346,579341.2577241734],[1962621.0821948028,579354.2835089527],[1962607.0480700093,579370.3131... (1133 characters)}   |null                                                                                                                                                                                                                                                                                                |
|062271011621|Amanecer Primary Center    |832 S. E.man Ave.          |{"x":1982834.2575726525,"y":558189.1313533243}|{325.08124718431054, 7184, Salazar Park, Local park, {"rings":[[[1982521.8649353,557916.3043995164],[1982427.9562155101,557928.3504643682],[1982336.312437017,557917.099288065],[1982336.2483571323,557931.9631249905],[1982345.5567998975,557969.99209129... (745 characters)}                            |null                                                                                                                                                                                                                                                                                       |null                                                                                                                                                                                                                                                                             |null                                                                                                                                                                                                                                                                                                |
|060162000004|Bragg Elementary           |11501 Bos St.              |{"x":1991754.2577179892,"y":539449.4404410813}|{610.898594992709, 7503, Gridley Park, Local park, {"rings":[[[1991600.2329460091,538786.2662924584],[1991599.9097188488,538838.2902596723],[1992222.5005864492,538839.3018269148],[1992401.0333279963,538841.8164618034],[1992401.201394395,538820.9624... (666 characters)}                              |{974.661104723056, 7511, Liberty Park, Local park, {"rings":[[[1990782.4531818037,538884.5434746109],[1990688.5034559802,538947.5313368663],[1990680.3472090436,538981.8817841727],[1990669.2342414903,539039.4628283754],[1990655.136398865,539092.1107... (1756 characters)}             |{985.7022988089832, 7523, Artesia Park, Local park, {"rings":[[[1992724.9863712233,539773.0580173396],[1992724.1952818076,539790.9840457998],[1992722.7205649102,539798.5835320968],[1992719.4869247105,539805.3635534644],[1992714.6432034194,539812.311... (1328 characters)}  |null                                                                                                                                                                                                                                                                                                |
|062271003322|San Antonio Continuation   |2911 Belgrave Ave.         |{"x":1979940.112360189,"y":554087.5503004752} |{868.1204098829357, 7200, Miles Park, Local park, {"rings":[[[1979756.4644370242,553217.6701141037],[1979793.1331284847,553218.1781630106],[1979819.0522357032,553217.5339162555],[1979895.5635881308,553220.573684318],[1979902.6737057995,553218.1179... (631 characters)}                               |null                                                                                                                                                                                                                                                                                       |null                                                                                                                                                                                                                                                                             |null                                                                                                                                                                                                                                                                                                |
|060172312852|Rise Kohyang Middle        |3020 Wilshire Blvd. 2nd Fl.|{"x":1973469.7063463225,"y":562288.8246625029}|{195.68601441991382, 7151, LAFAYETTE MULTIPURPOSE COMMUNITY CENTER, Local park, {"rings":[[[1973854.573665645,562312.4842551835],[1973849.748157746,562303.6239919495],[1973816.758214277,562321.5978512168],[1973807.8436627998,562326.0599403046],[1973799.9247356371,562330.020072... (2022 characters)}|{627.6240522661285, 7156, Shatto Recreation Center, Local park, {"rings":[[[1973368.5544766407,563009.9098818749],[1973370.0911705212,562956.7399581764],[1973370.1296526291,562950.8498231471],[1973370.266641756,562908.5211173408],[1973331.8733220005,562930.7605... (1135 characters)}|{721.8113938952536, 7159, MacArthur Park, Local park, {"rings":[[[1974449.1752743653,561769.189684933],[1974444.0727815202,561760.2189508509],[1974394.9948596985,561787.4417005107],[1974381.2689493923,561795.0556429271],[1974371.3224893385,561800.5627... (1980 characters)}|{797.0436162384274, 7174, Robert F. Kennedy Inspiration Park, Local park, {"rings":[[[1972608.5325487286,562226.1754599456],[1972519.8323969678,562226.1493158527],[1972519.8951714416,562247.3465748187],[1972473.839912956,562248.2151729874],[1972472.8675446312,562337.4447... (362 characters)}|
+------------+---------------------------+---------------------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
only showing top 5 rows

Plot results

Python
Use dark colors for code blocksCopy
                                                                           
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
result_sample = result_long.where("NAME = 'Rise Kohyang Middle'") \
    .withColumn("buffer", ST.buffer("shape", 1000))

school_area = result_sample.st.plot(geometry="buffer",
                                    facecolor="none",
                                    edgecolor="lightblue",
                                    figsize=(16, 10))
school_area.set(xlim=(1.9715e6, 1.9755e6), ylim=(561000, 563750))

school_plot = result_sample.st.plot(geometry="shape", legend=True, label='Rise Kohyang Middle School', ax=school_area)

result_plot = result_sample.st.plot(geometry="shape1",
                                    is_categorical=True,
                                    cmap_values="NAME1",
                                    cmap="Greens",
                                    basemap="light",
                                    legend=True,
                                    label='Parks',
                                    ax=school_area)

result_plot.set_title("Searching for four nearest parks around Rise Kohyang Middle School within 1 Km search distance")
result_plot.set_xlabel("X (Meters)")
result_plot.set_ylabel("Y (Meters)")

Plotting example for a Nearest Neighbors result. Closet parks nears schools in Los Angelos county is shown.

Version table

ReleaseNotes

1.1.0

Tool introduced

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.