Skip To Content ArcGIS for Developers Sign In Dashboard

Find Point Clusters

Find Point Clusters

This tool extracts clusters from your input point features and identifies any surrounding noise.

For example, a non-governmental organization is studying a particular pest-borne disease. It has a point dataset representing households in a study area, some of which are infested, and some of which are not. By using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.

To learn more, see the ArcGIS Pro documentation on How Density-based Clustering works

Note:

Find Point Clusters was introduced in ArcGIS Enterprise 10.6.1.

Request URL

http://<analysis url>/FindPointClusters/submitJob

Request parameters

ParameterDescription

inputLayer

(Required)

The point features that clusters will be found from.

Syntax: As described in Feature input, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection

REST web example:

  • {"url" : "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

REST scripting example:

  • "inputLayer" : {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

clusterMethod

(Required)

The algorithm used for cluster analysis. This parameter must be specified as DBSCAN or HDBSCAN. The HDBSCAN option is available at ArcGIS Enterprise 10.7.

The DBSCAN algorithm uses a specified distance to separate dense clusters from sparser noise. DBSCAN is faster than HDBSCAN, but is only appropriate if there is a very clear searchDistance to use that works well to define all clusters that may be present. DBSCAN finds clusters that have similar densities.

The HDBSCAN algorithm finds clusters of points similar to DBSCAN but uses varying distances allowing for clusters with varying densities based on cluster probability (or stability). HDBSCAN is very data-driven and does not require or use searchDistance, but it a more time consuming calculation than DBSCAN.

Note:

When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless you increase the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Roughly 2 GB of heap space is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB normally required by GeoAnalytics Server. For example, if you want to cluster 9 million features with HDBSCAN you should set javaHeapSize to no less than 6144 MB, or 6 GB. In this case, each GeoAnalytics Server machine should have a total of at least 22 GB of RAM available.

REST web example: DBSCAN

REST scripting example: "clusterMethod" : "DBSCAN"

minFeaturesCluster

(Required)

This parameter is used differently depending on the clustering method chosen. For DBSCAN the parameter specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the searchDistance parameter.

For HDBSCAN the minFeaturesCluster parameter specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

REST web example: 10

REST scripting example: "minFeaturesCluster" : 5

searchDistance

(Optional)

When using DBSCAN, this parameter is the distance within which minFeaturesCluster must be found. This parameter is not used when HDBSCAN is chosen as the clustering method.

REST web example: 108.3

REST scripting example: "searchDistance" : 100

searchDistanceUnit

(Optional)

The units used for the searchDistance parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN.

Values: Meters | Kilometers | Feet | Miles | NauticalMiles | Yards

REST web example: Meters

REST scripting example: "searchDistanceUnit" : "Miles"

outputName

(Required)

The task will create a feature service of the results. You define the name of the service.

REST web example: myOutput

REST scripting example: "outputName" : "myOutput"

context

The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  • Extent (extent)—A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.
  • Processing spatial reference (processSR)—The features will be projected into this coordinate system for analysis.
  • Output spatial reference (outSR)—The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.
  • Data store (dataStore)—Results will be saved to the specified data store. The default is the spatiotemporal big data store.

Syntax:
{
"extent" : {extent},
"processSR" : {spatial reference},
"outSR" : {spatial reference},
"dataStore":{data store}
}

f

The response format. The default response format is html.

Values: html | json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:
{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}

After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Checking job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:

https://<analysis url>/FindPointClusters/jobs/<jobId>

Access results

When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:

http://<analysis url>/FindPointClusters/jobs/<jobId>/results/output?token=<your token>&f=json

ParameterDescription

output

The output parameter will contain the cluster results. Fields added to output include all the fields from the inputLayer and the following:

  • CLUSTER_ID—A numeric value showing you which cluster a feature falls into. A feature with a CLUSTER_ID of -1 does not fall into a cluster and is noise.
  • COLOR_ID—An ID value used for rendering results. Multiple clusters will each be assigned a different color. Colors will be assigned and repeated so that each cluster is visually distinct from its neighboring clusters.

When the HDBSCAN algorithm is used to find clusters, the following fields will also be added to output:

  • PROB—The probability that a feature belongs in its assigned cluster.
  • OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
  • EXEMPLAR— Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
  • STABILITY— The persistence of each cluster across a range of scales. A larger score indicates that a cluster persists over a wider range of distance scales.

Request example
{"url": 
"http://<analysis url>/FindPointClusters/jobs/<jobId>/results/output"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.

{
"paramName":"output", 
"dataType":"GPRecordSet",
"value":{"url":"<hosted featureservice layer url>"}
}

See Feature output for more information about how the result layer is accessed.