Similarity measures

Line similarity measures calculate the resemblance between two lines. There are multiple methods to quantify line similarity, each returning a numeric value.

Line similarity measures can answer questions like the following:

  • Which of my lines are more similar?
  • Is Line A more similar to Line B or Line C?
  • What portion of my lines are most similar?

GeoAnalytics for Microsoft Fabric provides functions to calculate point-based similarity measures. Point-based methods compute similarity by finding matching observations (vertices) between a pair of lines. These methods consider observation locations and depending on the measure, they may also consider timestamps associated with each location.

A similarity measure can be applied on linestrings or tracks. The difference between the two is that line similarity does not take the timestamp value stored in each vertex into consideration while track similarity does.

The table below summarizes the line and track similarity functions included in GeoAnalytics for Microsoft Fabric.

FunctionType of similarityInput type
ST_EuclideanDistanceLineLinestring
ST_FréchetDistanceLineLinestring
ST_HausdorffDistanceGeometryPoint, linestring, polygon
TRK_LCSSTrackTrack

Line similarity

ST_EuclideanDistance—Calculates the Euclidean distance between two linestrings. A distance is calculated from each vertex in the first input linestring to the corresponding vertices in the second input linestring. The Euclidean distance is the average of these distances.

The Euclidean distance is commonly used in vehicle routing problems and can also be applied to route planning for short distances like open-space hiking. It can also be used to rank lines based on similarity to a query line or find the most dissimilar lines in a road network.

For more information, see ST_EuclideanDistance.

ST_Euclidean_Distance
Two linestrings (orange and blue). A distance (red line) is calculated for each vertex pair. Euclidean distance is the average of these distances.

ST_FréchetDistance—Calculates the discrete Fréchet distance between two linestrings. The calculation spatially aligns vertices in the first input linestring to the closest vertices in the second input linestring. The discrete Fréchet distance is the greatest distance among all aligned pair of vertices.

The Fréchet distance is typically used to find the similarity between two pedestrian paths.

For more information, see ST_FréchetDistance.

ST_FréchetDistance
Two linestrings (orange and blue). A distance (red line) is calculated between the closest vertices in the two linestrings. The green line is the discrete Fréchet distance.

ST_HausdorffDistance—Calculates the Hausdorff distance between two geometries. The Hausdorff distance is defined as the greatest distance among all vertices of a given geometry to the closest vertex in the reference geometry. This function works with all types of geometries.

Use cases for Hausdorff distance include finding the nearest entry point to a nature walking trail from a street or finding the nearest shelter.

For more information, see ST_HausdorffDistance.

ST_Hausdorff_Distance
Two linestrings (orange and blue). A distance (gray line) is calculated between all the vertex combinations in the two linestrings. The red lines represent the distance of the closest vertices. The Hausdorff distance is the green line which is the greatest distance among all the closest vertices.

Track similarity

TRK_LCSS—Calculates the Longest Common Subsequence similarity between two tracks. This function is considered a track similarity measure as it takes timestamps (stored as m-values) into consideration.

The Longest Common Subsequence measure can be used to identify the total number of outlier points when comparing two tracks.

For more information, see TRK_LCSS.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.