Coordinate systems and transformations

Coordinate systems are arbitrary designations for spatial data. Their purpose is to provide a common basis for communication about a particular place or area on the Earth's surface.

There are a few critical considerations that should be made when choosing the correct coordinate system for your data or analysis; this includes the units your data is measured in, where the data is on Earth, what the data's extent is, and the phenomena you are trying to analyze (areas, distances, angles, etc.).

For the same dataset, the most appropriate coordinate system may vary based on whether you are plotting, analyzing, or sharing data.

This topic reviews the different types of coordinate systems, and best practices for setting and transforming spatial references for your spatial data.

Geographic vs. projected coordinate systems

This guide outlines two types of coordinate systems:

A Geographic Coordinate System (GCS) specifies a datum, spheroid, and prime meridian. A GCS uses coordinates in angular units (e.g. degrees or grads) and is better imagined as a globe than as a flat map.
A Projected Coordinate System (PCS) is the result of applying a map projection to data in a GCS to create a flat map. The PCS contains the original GCS definition and additional projection information. A PCS uses coordinates in linear units (e.g. meters or feet).

A PCS uses a map projection to convert GCS data to a flat map. If your data is stored in a GCS and your intent is to draw or plot your data on a map, then projecting your data will be required. To learn more about plotting your data on a map, see visualize results.

You won't arbitrarily choose a GCS, as your spatial data will have been collected and stored in one already. Some data sources, such as shapefiles or feature services, have a coordinate system set by default that is stored with the geometry field. Other data sources, such as delimited files or string definitions, do not have a spatial reference set by default.

Checking the spatial reference of geometries

To verify a spatial reference has been set for your geometry column, you can use the ST_SRID or the ST_SRText function. The following code sample uses ST_SRID to get the spatial reference set on the geometry column:

Python
Use dark colors for code blocksCopy
# Check the spatial reference of your geometry column
df.select(ST.srid("geometry")).show(1)

Result
Use dark colors for code blocksCopy
+----------------+
|stsrid(geometry)|
+----------------+
|            4267|
+----------------+
only showing top 1 rows

You can also call get_spatial_reference on a DataFrame to return a SpatialReference object representing the spatial reference of the primary geometry column in the DataFrame. A SpatialReference is a named tuple containing the following fields:

srid—The spatial reference ID.
is_projected—True if the spatial reference includes a PCS. False if it only includes a GCS.
unit—The unit of the spatial reference.
wkt—The Well-Known Text representation of the spatial reference.

The following code sample shows an example of using get_spatial_reference on a DataFrame:

Python
Use dark colors for code blocksCopy
# Check the spatial reference of your DataFrame
sr = df.st.get_spatial_reference()
print("SRID:", sr.srid)
print("Is Projected:", sr.is_projected)
print("Unit:", sr.unit)
print("WKT:", sr.wkt)

Result
Use dark colors for code blocksCopy
SRID: 3857
Is Projected: True
Unit: Meter
WKT: PROJCS["WGS_1984_Web_Mercator_Auxiliary_Sphere",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Mercator_Auxiliary_Sphere"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],PARAMETER["Standard_Parallel_1",0.0],PARAMETER["Auxiliary_Sphere_Type",0.0],UNIT["Meter",1.0]]

Setting the spatial reference of geometries

In the cases where a spatial reference is not set on a geometry column, you need to set it to use the spatial reference the data was collected in. To set a coordinate system for a geometry column, you can use the ST_SRID or ST_SRText function when defining the geometry of your DataFrame. For example, if you collected location data using a GPS that was set to use a GCS of NAD 1983 (CSRS) (SRID: 4617), you need to set the spatial reference (SRID) of your geometry column to 4617 using the ST_SRID function. You can also set the spatial reference of the primary geometry column in your DataFrame using set_spatial_reference.

If you are unsure of what spatial reference your data is in, here are a few hints to figure it out:

See if there is metadata or information stored with the file or where you downloaded the data.
Look for clues in the dataset:
- Look at the field names. If they are named "latitude" or "longitude", the data is most likely in a GCS. If you see field names that include "meters", "feet", or "utmzone", the data is most likely in a PCS.
- Look at the field values in the geometry columns. Values that are consistently between 0 and 180 and 0 and 90 are most likely in degrees, meaning that the data is likely in a GCS. Since they are outside the range of degrees in the globe, values less than 0 or greater than 180 indicate that your data is most likely in a PCS.
Try visualizing your dataset with different spatial references with other datasets. Verify that geometries are where you expect.

The following code sample uses ST_SRID to set the spatial reference of a geometry column:

Python
Use dark colors for code blocksCopy
# Set the spatial reference for your geometry column and verify the result
df_sref = df.withColumn("geometry", ST.srid("geometry", 4269))
df_sref.select(ST.srid("geometry")).show(1)

Result
Use dark colors for code blocksCopy
+----------------+
|stsrid(geometry)|
+----------------+
|            4269|
+----------------+
only showing top 1 row

In addition to setting your coordinate system to match your input data, consider whether you should transform your data or use a different projection that is more appropriate for your analysis. For example, storing data in Web Mercator (SRID: 3857) is not recommended for analysis as it is known to distort spatial representations. For more information see Choosing a projected coordinate system below.

You can use the geoanalytics.system.spatial_references catalog in Spark SQL to list the well-known spatial references available in GeoAnalytics Engine. The catalog includes the following fields:

Name—The full name of the spatial reference.
Code—The numeric ID of the spatial reference (SRID).
Type—The type of the spatial reference, either "Projected" or "Geographic".
Authority—The organization that defined the spatial reference (typically EPSG or Esri).
WKT—The well-known text representation of the spatial reference.
Units—The units utilized by the spatial reference.
Tolerance—The minimum distance between coordinates supported by the spatial reference.
AreaOfUse—The minimum and maximum x and y coordinates of the geographic area for which the spatial reference is valid.
AreaOfUsePolygon—A polygon representing the geographic area for which the spatial reference is valid.

The following code sample shows an example of listing available spatial references using Spark SQL.

Python
Use dark colors for code blocksCopy
# List the first 5 spatial references available in GeoAnalytics Engine
spark.sql("SELECT Name, Code, Type, Units, Tolerance, AreaOfUse " + \
          "FROM geoanalytics.system.spatial_references").show(5, truncate=False)

Result
Use dark colors for code blocksCopy
+------------+----+----------+------+--------------------+------------------------------+
|Name        |Code|Type      |Units |Tolerance           |AreaOfUse                     |
+------------+----+----------+------+--------------------+------------------------------+
|GCS_HD1909  |3819|Geographic|Degree|8.984194981201908E-9|{16.11, 45.74, 22.9, 48.58}   |
|GCS_TWD_1967|3821|Geographic|Degree|8.983120447446023E-9|{119.25, 21.87, 122.06, 25.34}|
|GCS_TWD_1997|3824|Geographic|Degree|8.983152841195215E-9|{114.32, 17.36, 123.61, 26.96}|
|GCS_IGRS    |3889|Geographic|Degree|8.983152841195215E-9|{38.79, 29.06, 48.75, 37.39}  |
|GCS_MGI_1901|3906|Geographic|Degree|8.984194981201908E-9|{13.38, 40.85, 23.04, 46.88}  |
+------------+----+----------+------+--------------------+------------------------------+
only showing top 5 rows

Geographic transformations

Because Earth is a lumpy, squished sphere, there are many GCS tailored to specific locations. Geographic (or datum) transformations convert geographic data between different GCS.

Transformations can be useful in aligning locations between datasets. For example, if you have boundary polygons in one GCS and observation points in another, it may be best to transform your data to the same GCS to attain the most accurate relative positioning prior to analysis.

Transformations can alter the locations of your data significantly. If possible, plot your data with the new GCS and verify the new locations using records where ground-truth is known.

Some analysis tools that use two different geometry columns as input will automatically transform your data to the same spatial reference for analysis. To avoid automatic transformations, it is recommended you transform your data prior to analysis for the most accurate results.

GeoAnalytics Engine includes many transformations that can be applied when transforming your data from one GCS to another using ST_Transform. If you do not specify the transformation to use, one will be chosen automatically. Your geometry column must have a spatial reference set before transforming it. The following code sample shows using ST_Transform to transform a geometry column into North American Datum of 1983 (CSRS) version 6 (SRID: 8250) using the default transformation.

Python
Use dark colors for code blocksCopy
# Transform your spatial data
df.withColumn("geometry", ST.transform("geometry", 8252))
df.select(ST.srid("geometry")).show(1)

Result
Use dark colors for code blocksCopy
+----------------+
|stsrid(geometry)|
+----------------+
|            8252|
+----------------+
only showing top 1 rows

You can check which transformation is being used by calling explain and looking at the datum_transform property in the physical plan. For example, the following code sample shows that WGS_1984_(ITRF00)_To_NAD_1983_2011 is the default transformation used between Web Mercator (SRID: 3857) and National Spatial Reference System 2011 (SRID: 6318).

Python
Use dark colors for code blocksCopy
df = spark.read.format("feature-service") \
          .load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")

# Transform the GCS from 4326 to 6318
df_transformed = df.select(ST.transform("shape", 6318))
df_transformed.explain()

Result
Use dark colors for code blocksCopy
== Physical Plan ==
*(1) Project [ST_Transform(shape#5989, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF00)_To_NAD_1983_2011") AS ST_Transform(shape)#6053]
+- BatchScan[shape#5989] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []

You can specify a non-default transformation using the datum_transform property in ST_Transform, as shown in the following code sample where ITRF08 is used instead of the default ITRF00.

Python
Use dark colors for code blocksCopy
df = spark.read.format("feature-service") \
          .load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")

# Transform the GCS from 4326 to 6318 using WGS_1984_(ITRF08)_To_NAD_1983_2011
df_transformed = df.select(ST.transform("shape", 6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011"))
df_transformed.explain()

Result
Use dark colors for code blocksCopy
== Physical Plan ==
*(1) Project [ST_Transform(shape#6631, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011") AS ST_Transform(shape)#6695]
+- BatchScan[shape#6631] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []

To change the default transformation used by ST_Transform, you can set the geoanalytics.sql.transforms.<FromCode>.<ToCode> property in the Spark Configuration, where FromCode is the SRID of the input data and ToCode is the SRID to transform to. For example, the following code sample shows changing the default transformation between 4326 and 6318.

Python
Use dark colors for code blocksCopy
# Set the default transformation between 4326 and 6318 to GS_1984_(ITRF08)_To_NAD_1983_2011
spark.conf.set("geoanalytics.sql.transforms.4326.6318", "WGS_1984_(ITRF08)_To_NAD_1983_2011")

df = spark.read.format("feature-service") \
          .load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")
df_transformed = df.select(ST.transform("shape", 6318))
df_transformed.explain()

Result
Use dark colors for code blocksCopy
== Physical Plan ==
*(1) Project [ST_Transform(shape#7018, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011") AS ST_Transform(shape)#7082]
+- BatchScan[shape#7018] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []

To list the transformations available in GeoAnalytics Engine, you can use the geoanalytics.system.transformations catalog in Spark SQL. Each transformation listed can be used both forwards and reverse. The DataFrame returned includes the following fields:

Name—The full name of the transformation.
Code—The numeric ID of the transformation.
Authority—The organization that defined the transformation (typically EPSG or Esri).
Method—The methodology used to perform the transformation.
DataFiles—Specifies which supplemental projection data files are used in the transformation. null if none are required.
Usable—False if the transformation is unsupported for any reason, including if there are missing data files.
FromCode—The SRID of the starting GCS.
ToCode—The SRID of the GCS to transform to.
AreaOfUse—The minimum and maximum x and y coordinates of the geographic area for which the transformation is valid.
AreaOfUsePolygon—A polygon representing the geographic area for which the transformation is valid.

The following code sample shows an example of listing available transformations using Spark SQL.

Python
Use dark colors for code blocksCopy
# List the first 5 transformations available in GeoAnalytics Engine
spark.sql("SELECT Name, Code, FromCode, ToCode, AreaOfUse " + \
          "FROM geoanalytics.system.transformations").show(5, truncate=False)

Result
Use dark colors for code blocksCopy
+------------------------+----+--------+------+----------------------------+
|Name                    |Code|FromCode|ToCode|AreaOfUse                   |
+------------------------+----+--------+------+----------------------------+
|MGI_To_ETRS_1989_4      |1024|4312    |4258  |{13.58, 46.64, 16.17, 47.84}|
|Ain_el_Abd_To_WGS_1984_3|1055|4204    |4326  |{46.54, 28.53, 48.48, 30.09}|
|Ain_El_Abd_To_WGS_1984_4|1056|4204    |4326  |{46.54, 28.53, 48.48, 30.09}|
|Ain_El_Abd_To_WGS_1984_5|1057|4204    |4326  |{46.54, 29.1, 48.42, 30.09} |
|Ain_El_Abd_To_WGS_1984_6|1058|4204    |4326  |{46.54, 28.53, 48.48, 29.45}|
+------------------------+----+--------+------+----------------------------+
only showing top 5 rows

Use the list_transformations function to explore all transformation paths that could be used to transform between two GCS for a specified extent. The function returns a DataFrame with the following fields:

Path—The transformation name, or names if multiple transformations are required. Names beginning with a ~ indicate that the transformation will be used in reverse.
Percent—The percentage of the specified extent for which the transformation path is valid. If no extent is defined, the value represents the percentage of the world for which the path is valid.
Accuracy—The maximum accuracy guaranteed by the transformation path, in meters.

Transformation paths are sorted by Percent and then Accuracy. For example, the code sample below shows listing the top 5 available paths for transforming from WGS 1984 (SRID:4326) to NAD83 / UTM zone 15N (SRID:26915) for the extent of a USA Parks dataset. The first transformation path valid for 80.87% of the dataset extent and will guarantee an accuracy of 0.1 meters. The other four paths are valid for a smaller percentage of the dataset extent and/or guarantee less accuracy when transforming.

Python
Use dark colors for code blocksCopy
df = spark.read.format("feature-service") \
    .load("https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Parks/FeatureServer/0")
bb = df.st.get_extent()

# List available paths for transforming between 4326 and 26915 for the extent of a USA Parks dataset
geoanalytics.util.list_transformations(from_sr=4326, to_sr=26915, extent=bb).show(5, truncate=False)

Result
Use dark colors for code blocksCopy
+-----------------------------------------------------------+------------------+--------+
|Path                                                       |Percent           |Accuracy|
+-----------------------------------------------------------+------------------+--------+
|WGS_1984_(ITRF00)_To_NAD_1983                              |80.87444335193237 |0.1     |
|~NAD_1983_To_WGS_1984_5                                    |78.52542646454121 |1.0     |
|~NAD_1983_To_WGS_1984_1                                    |66.13025783747702 |4.0     |
|~NAD_1927_To_WGS_1984_33 + NAD_1927_To_NAD_1983_NTv2_Canada|30.344173869840425|2.5     |
|~NAD_1927_To_WGS_1984_3 + NAD_1927_To_NAD_1983_NTv2_Canada |30.344173869840425|21.5    |
+-----------------------------------------------------------+------------------+--------+
only showing top 5 rows

Some transformations are not included with the geoanalytics jar file and require you to install the supplementary Projection Engine jar files. The Projection Engine jars offer additional geographic transformations required for transforming to or from certain spatial references. For example, if you try to project from WGS 1984 (SRID:4326) to NAD27 / UTM zone 11N (SRID:26711), you may get the error Can't perform requested spatial reference transformation due to Missing grid file. This error indicates that your transformation requires supplemental files and you'll need to install the Projection Engine jar files to complete your workflow.

Choosing a projected coordinate system

When choosing a projected coordinate system (PCS), consider the information you want to maintain in the result. For example, let's say you are analyzing the area required for new wind turbines. Because area is important in this case, it is recommended that your data is either stored in or transformed to a PCS with an equal-area projection (and if possible, a planar calculation method should be used in analysis). Similarly, if you need to preserve distances or angles, then use a projection appropriate for the desired characteristic. To help determine which PCS you should use, refer to the USGS criteria guide.

In addition, you can use the create_optimal_sr utility to generate a spatial reference with specific properties for a specified extent, or the create_optimal_sr DataFrame accessor to generate a spatial reference with specific properties for a Dataframe. Both the utility and DataFrame accessor allow you to choose which of the following properties will be prioritized when creating a spatial reference:

EQUAL_AREA—Preserves the relative area of regions everywhere on earth. Shapes and distances will be distorted.
CONFORMAL—Preserves angles in small areas. Shapes, sizes, and distances will be distorted.
EQUIDISTANT_ONE_POINT—Preserves distances when measured through the center of the projection. Areas, shapes, and other distances will be distorted.
EQUIDISTANT_MERIDIANS—Preserves distances when measured along meridians. Areas, shapes, and other distances will be distorted.
COMPROMISE_WORLD—Does not preserve areas, shapes, or distances specifically, but creates a balance between these geometric properties. Compromise projections are only suggested for very large areas.

Spatial references generated using create_optimal_sr may have an SRID of 0. The following code sample shows an example of projecting geometries to a spatial reference that was generated using create_optimal_sr with the COMPROMISE_WORLD property. The example feature service is updated regularly and results may not match those shown below exactly.

Python
Use dark colors for code blocksCopy
df = spark.read.format("feature-service") \
          .load(r"https://services9.arcgis.com/RHVPKKiFTONKtxq3/ArcGIS/rest/services/MODIS_Thermal_v1/FeatureServer/1")

# Create an optimal spatial reference and print the WKT string
sr = df.st.create_optimal_sr("COMPROMISE_WORLD")
print(sr.wkt)

# Project the data to the optimal spatial reference
df_transformed = df.select(ST.transform("shape", sr))
df_transformed.show(5, truncate=False)

Result
Use dark colors for code blocksCopy
PROJCS["CustomProjection",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Natural_Earth"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-29.63],UNIT["Meter",1.0]]
+--------------------------------------------------+
|ST_Transform(shape)                               |
+--------------------------------------------------+
|{"x":1.686213355038753e7,"y":-2331626.9778772173} |
|{"x":1.5324947321214821e7,"y":-2649943.1473921738}|
|{"x":1.5735459338985184e7,"y":-3837413.4567152876}|
|{"x":1.5736402521247668e7,"y":-3837581.503795787} |
|{"x":1.5787735925841367e7,"y":-4104234.0465022367}|
+--------------------------------------------------+
only showing top 5 rows

If your data is at a local scale, consider using a coordinate system specific to the location you are analyzing. For example, you have data in Palm Springs, California. Palm Springs is in Riverside county and the current official PCS for the county is NAD 1983 (2011) State Plane California VI (SRID: 6425).

If your data is at a global scale, avoid using Web Mercator (SRID: 3857) when possible, especially to run analysis. This projection is known to drastically distort areas, lengths, and angles. There are other options to analyze global data that may be more applicable to your analysis. GCS are usually recommended for accurate analysis of global-scale data.

If your data crosses the antimeridian (similar to the international dateline in location), use a local PCS for the particular area. For example, if your data is in Fiji you may want to use either Fiji 1986 Fiji Map Grid (SRID: 3460) or Fiji 1986 UTM Zone 60S (SRID: 3141). Alternatively, use a GCS that supports wrapping around the antimeridian.

Planar vs. geodesic distance calculations

Some analysis tools enable you to choose between a geodesic or planar distance calculation. Planar calculations are recommended for accurate analysis of small, local areas only. Geodesic calculations are recommended for accurate analysis of larger, global areas. It is recommended you use geodesic distances in the following circumstances:

Tracks cross the antimeridian—When using the geodesic method, geometries that cross the antimeridian will have tracks that correctly cross the antimeridian. Your spatial reference must support wrapping around the antimeridian, for example, a global projection such as World Cylindrical Equal Area.
Your dataset is not in a local projection—If your spatial reference is set to a local projection, use the planar distance method. For example, use the planar method to examine data using a state plane spatial reference.

What's next?

Learn more about coordinate systems and transformations: