- Version Introduced:10.6.1
The FindPointClusters operation extracts clusters from your input point features and identifies any surrounding noise.
Two clustering methods can be used, DBSCAN or HDBSCAN. Both methods can find clusters in space, while DBSCAN can find spatiotemporal clusters in time-enabled point layers.
For example, a nongovernmental organization is studying a particular pest-borne disease. It has a point dataset representing households in a study area, some of which are infested, and some of which are not. By using the Find Point Clusters tool, an analyst can identify clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.
To learn more, see the ArcGIS Pro documentation on How Density-based Clustering works.
The point features from which clusters will be found.
Syntax: As described in Feature input, this parameter can be one of the following:
The algorithm used for cluster analysis. This parameter must be specified as either DBSCAN or HDBSCAN. The DBSCAN algorithm uses a specified distance to separate dense clusters from sparser noise. DBSCAN is faster than HDBSCAN, but is only appropriate if there is a very clear searchDistance to use that works well to define all clusters that may be present. DBSCAN finds clusters that have similar densities. The HDBSCAN algorithm finds clusters of points similar to DBSCAN but uses varying distances allowing for clusters with varying densities based on cluster probability (or stability). HDBSCAN is very data-driven and does not require or use searchDistance, but is a more time-consuming calculation than DBSCAN.
The DBSCAN algorithm finds clusters in two-dimensional space only by default. When timeMethod is set to Linearand inputLayer is time enabled and is of type instant, DBSCAN will discover clusters in both space and time. When searching for cluster members, minFeaturesCluster must be found within a specified search range and search duration to form a cluster. Temporal clustering is available at ArcGIS Enterprise 10.8. HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.
When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless you increase the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Roughly 2 GB of heap size is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB normally required by GeoAnalytics Server. For example, if you want to cluster 9 million features with HDBSCAN, you should set javaHeapSize to no less than 6144 MB (or 6 GB). In this case, each GeoAnalytics Server machine should have a total of at least 22 GB or RAM available.
When this parameter is set to Linear and clusterMethod is DBSCAN, both space and time will be used to find point clusters. If clusterMethod is HDBSCAN, this parameter will be ignored and clusters will be found in space only. This parameter can only be used if inputLayer has time enabled and is of type instant. Temporal clustering is available at ArcGIS Enterprise 10.8.
REST web example: Linear
REST scripting example: "timeMethod" : "Linear"
This parameter is used differently depending on the clustering method chosen. For DBSCAN, this parameter specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the searchDistance parameter. For HDBSCAN, the minFeaturesCluster parameter specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.
When using DBSCAN, this parameter is the distance within which minFeaturesCluster must be found. This parameter is not used when HDBSCAN is chosen as the clustering method.
The units used for the searchDistance parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN.
Values: Meters | Kilometers | Feet | FeetInt | FeetUS | Miles | MilesInt | MilesUS | NauticalMiles | NauticalMilesInt | NauticalMilesUS | Yards | YardsInt | YardsUS
When using DBSCAN with timeMethod set as Linear, this parameter is the time duration within which minFeaturesCluster must be found. This parameter is not used when HDBSCAN is chosen as the clustering method or when timeMethod is not used.
The units used for the searchDuration parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN or space-only DBSCAN.
The task will create a feature service of the results. You define the name of the service.
The context parameter contains additional settings that affect task execution. For this task, there are four settings:
The response format. The default response format is html.
Values: html | json
Below is a sample request URL for FindPointClusters:
When you submit a request, the service assigns a unique job ID for the transaction.
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:
When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:
The output parameter will contain the cluster results. Fields added to output include all the fields from the inputLayer and the following:
When the HDBSCAN algorithm is used to find clusters, the following fields will also be added to output:
The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.
See Feature output for more information about how the result layer is accessed.