Coordinate systems are arbitrary designations for spatial data. Their purpose is to provide a common basis for communication about a particular place or area on the Earth's surface.
There are a few critical considerations that should be made when choosing the correct coordinate system for your data or analysis; this includes the units your data is measured in, where the data is on Earth, what the data's extent is, and the phenomena you are trying to analyze (areas, distances, angles, etc.).
For the same dataset, the most appropriate coordinate system may vary based on whether you are plotting, analyzing, or sharing data.
This topic reviews the different types of coordinate systems, and best practices for setting and transforming spatial references for your spatial data.
Geographic vs. projected coordinate systems
This guide outlines two types of coordinate systems:
- A Geographic Coordinate System (GCS) specifies a datum, spheroid, and prime meridian. A GCS uses coordinates in angular units (e.g. degrees or grads) and is better imagined as a globe than as a flat map.
- A Projected Coordinate System (PCS) is the result of applying a map projection to data in a GCS to create a flat map. The PCS contains the original GCS definition and additional projection information. A PCS uses coordinates in linear units (e.g. meters or feet).
A PCS uses a map projection to convert GCS data to a flat map. If your data is stored in a GCS and your intent is to draw or plot your data on a map, then projecting your data will be required. To learn more about plotting your data on a map, see visualize results.
You won't arbitrarily choose a GCS, as your spatial data will have been collected and stored in one already. Some data sources, such as shapefiles or feature services, have a coordinate system set by default that is stored with the geometry field. Other data sources, such as delimited files or string definitions, do not have a spatial reference set by default.
Checking the spatial reference of geometries
To verify a spatial reference has been set for your geometry column, you can use the ST
or the ST
function.
The following code sample uses ST
to get the spatial reference set on the geometry column:
# Check the spatial reference of your geometry column
df.select(ST.srid("geometry")).show(1)
+----------------+
|stsrid(geometry)|
+----------------+
| 4267|
+----------------+
only showing top 1 rows
You can also call get
on a
DataFrame to return a Spatial
object representing the spatial reference of the primary geometry column
in the DataFrame. A Spatial
is a named tuple containing the following fields:
srid
—The spatial reference ID.is
—True if the spatial reference includes a PCS. False if it only includes a GCS._projected unit
—The unit of the spatial reference.wkt
—The Well-Known Text representation of the spatial reference.
The following code sample shows an example of using get
on a DataFrame:
# Check the spatial reference of your DataFrame
sr = df.st.get_spatial_reference()
print("SRID:", sr.srid)
print("Is Projected:", sr.is_projected)
print("Unit:", sr.unit)
print("WKT:", sr.wkt)
SRID: 3857
Is Projected: True
Unit: Meter
WKT: PROJCS["WGS_1984_Web_Mercator_Auxiliary_Sphere",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Mercator_Auxiliary_Sphere"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],PARAMETER["Standard_Parallel_1",0.0],PARAMETER["Auxiliary_Sphere_Type",0.0],UNIT["Meter",1.0]]
Setting the spatial reference of geometries
In the cases where a spatial reference is not set on a geometry column, you need to set it to use the spatial
reference the data was collected in. To set a coordinate system for a geometry column, you can use the
ST
or ST
function when
defining the geometry of your DataFrame.
For example, if you collected location data using a GPS that was set to use a GCS of NAD 1983 (CSRS) (SRID: 4617), you need
to set the spatial reference (SRID) of your geometry column to 4617 using the ST_SRID function. You can also set the
spatial reference of the primary geometry column in your DataFrame using
set
.
If you are unsure of what spatial reference your data is in, here are a few hints to figure it out:
- See if there is metadata or information stored with the file or where you downloaded the data.
- Look for clues in the dataset:
- Look at the field names. If they are named "latitude" or "longitude", the data is most likely in a GCS. If you see field names that include "meters", "feet", or "utmzone", the data is most likely in a PCS.
- Look at the field values in the geometry columns. Values that are consistently between 0 and 180 and 0 and 90 are most likely in degrees, meaning that the data is likely in a GCS. Since they are outside the range of degrees in the globe, values less than 0 or greater than 180 indicate that your data is most likely in a PCS.
- Try visualizing your dataset with different spatial references with other datasets. Verify that geometries are where you expect.
The following code sample uses ST
to set the spatial reference of a geometry column:
# Set the spatial reference for your geometry column and verify the result
df_sref = df.withColumn("geometry", ST.srid("geometry", 4269))
df_sref.select(ST.srid("geometry")).show(1)
+----------------+
|stsrid(geometry)|
+----------------+
| 4269|
+----------------+
only showing top 1 row
In addition to setting your coordinate system to match your input data, consider whether you should transform your data or use a different projection that is more appropriate for your analysis. For example, storing data in Web Mercator (SRID: 3857) is not recommended for analysis as it is known to distort spatial representations. For more information see Choosing a projected coordinate system below.
You can use the geoanalytics.system.spatial
catalog in Spark SQL to list the well-known spatial references
available in GeoAnalytics Engine. The catalog includes the following fields:
Name
—The full name of the spatial reference.Code
—The numeric ID of the spatial reference (SRID).Type
—The type of the spatial reference, either "Projected" or "Geographic".Authority
—The organization that defined the spatial reference (typically EPSG or Esri).WKT
—The well-known text representation of the spatial reference.Units
—The units utilized by the spatial reference.Tolerance
—The minimum distance between coordinates supported by the spatial reference.Area
—The minimum and maximum x and y coordinates of the geographic area for which the spatial reference is valid.Of Use Area
—A polygon representing the geographic area for which the spatial reference is valid.Of Use Polygon
The following code sample shows an example of listing available spatial references using Spark SQL.
# List the first 5 spatial references available in GeoAnalytics Engine
spark.sql("SELECT Name, Code, Type, Units, Tolerance, AreaOfUse " + \
"FROM geoanalytics.system.spatial_references").show(5, truncate=False)
+------------+----+----------+------+--------------------+------------------------------+
|Name |Code|Type |Units |Tolerance |AreaOfUse |
+------------+----+----------+------+--------------------+------------------------------+
|GCS_HD1909 |3819|Geographic|Degree|8.984194981201908E-9|{16.11, 45.74, 22.9, 48.58} |
|GCS_TWD_1967|3821|Geographic|Degree|8.983120447446023E-9|{119.25, 21.87, 122.06, 25.34}|
|GCS_TWD_1997|3824|Geographic|Degree|8.983152841195215E-9|{114.32, 17.36, 123.61, 26.96}|
|GCS_IGRS |3889|Geographic|Degree|8.983152841195215E-9|{38.79, 29.06, 48.75, 37.39} |
|GCS_MGI_1901|3906|Geographic|Degree|8.984194981201908E-9|{13.38, 40.85, 23.04, 46.88} |
+------------+----+----------+------+--------------------+------------------------------+
only showing top 5 rows
Geographic transformations
Because Earth is a lumpy, squished sphere, there are many GCS tailored to specific locations. Geographic (or datum) transformations convert geographic data between different GCS.
Transformations can be useful in aligning locations between datasets. For example, if you have boundary polygons in one GCS and observation points in another, it may be best to transform your data to the same GCS to attain the most accurate relative positioning prior to analysis.
Transformations can alter the locations of your data significantly. If possible, plot your data with the new GCS and verify the new locations using records where ground-truth is known.
Some analysis tools that use two different geometry columns as input will automatically transform your data to the same spatial reference for analysis. To avoid automatic transformations, it is recommended you transform your data prior to analysis for the most accurate results.
GeoAnalytics Engine includes many transformations that can be applied when transforming your data from
one GCS to another using ST
. If you do not specify the
transformation to use, one will be chosen automatically. Your geometry column must have a
spatial reference set before transforming it. The following code sample shows using ST
to transform
a geometry column into North American Datum of 1983 (CSRS) version 6 (SRID: 8250) using the default transformation.
# Transform your spatial data
df.withColumn("geometry", ST.transform("geometry", 8252))
df.select(ST.srid("geometry")).show(1)
+----------------+
|stsrid(geometry)|
+----------------+
| 8252|
+----------------+
only showing top 1 rows
You can check which transformation is being used by calling
explain
and looking at the datum
property in the physical plan. For example, the following code
sample shows that WGS
is the default transformation used between
Web Mercator (SRID: 3857) and National Spatial Reference System 2011 (SRID: 6318).
df = spark.read.format("feature-service") \
.load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")
# Transform the GCS from 4326 to 6318
df_transformed = df.select(ST.transform("shape", 6318))
df_transformed.explain()
== Physical Plan ==
*(1) Project [ST_Transform(shape#5989, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF00)_To_NAD_1983_2011") AS ST_Transform(shape)#6053]
+- BatchScan[shape#5989] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []
You can specify a non-default transformation using the datum
property in ST
, as shown in the
following code sample where ITR
is used instead of the default ITR
.
df = spark.read.format("feature-service") \
.load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")
# Transform the GCS from 4326 to 6318 using WGS_1984_(ITRF08)_To_NAD_1983_2011
df_transformed = df.select(ST.transform("shape", 6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011"))
df_transformed.explain()
== Physical Plan ==
*(1) Project [ST_Transform(shape#6631, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011") AS ST_Transform(shape)#6695]
+- BatchScan[shape#6631] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []
To change the default transformation used by ST
, you can set the
geoanalytics.sql.transforms.
property in the Spark Configuration,
where From
is the SRID of the input data and To
is the SRID to transform to. For example, the
following code sample shows changing the default transformation between 4326 and 6318.
# Set the default transformation between 4326 and 6318 to GS_1984_(ITRF08)_To_NAD_1983_2011
spark.conf.set("geoanalytics.sql.transforms.4326.6318", "WGS_1984_(ITRF08)_To_NAD_1983_2011")
df = spark.read.format("feature-service") \
.load("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties/FeatureServer/0")
df_transformed = df.select(ST.transform("shape", 6318))
df_transformed.explain()
== Physical Plan ==
*(1) Project [ST_Transform(shape#7018, in=WGS_1984_Web_Mercator_Auxiliary_Sphere:3857, out=GCS_NAD_1983_2011:6318, datum_transform="WGS_1984_(ITRF08)_To_NAD_1983_2011") AS ST_Transform(shape)#7082]
+- BatchScan[shape#7018] FeatureServiceLayerScan[f=json, query="returnM=true&outFields=&where=1=1&returnZ=true", paging=oid-range(field=FID,size=2000,count=2)] RuntimeFilters: []
To list the transformations available in GeoAnalytics Engine, you can use the
geoanalytics.system.transformations
catalog in Spark SQL. Each transformation listed can be used
both forwards and reverse. The DataFrame returned includes the following
fields:
Name
—The full name of the transformation.Code
—The numeric ID of the transformation.Authority
—The organization that defined the transformation (typically EPSG or Esri).Method
—The methodology used to perform the transformation.Data
—Specifies which supplemental projection data files are used in the transformation.Files null
if none are required.Usable
—False if the transformation is unsupported for any reason, including if there are missing data files.From
—The SRID of the starting GCS.Code To
—The SRID of the GCS to transform to.Code Area
—The minimum and maximum x and y coordinates of the geographic area for which the transformation is valid.Of Use Area
—A polygon representing the geographic area for which the transformation is valid.Of Use Polygon
The following code sample shows an example of listing available transformations using Spark SQL.
# List the first 5 transformations available in GeoAnalytics Engine
spark.sql("SELECT Name, Code, FromCode, ToCode, AreaOfUse " + \
"FROM geoanalytics.system.transformations").show(5, truncate=False)
+------------------------+----+--------+------+----------------------------+
|Name |Code|FromCode|ToCode|AreaOfUse |
+------------------------+----+--------+------+----------------------------+
|MGI_To_ETRS_1989_4 |1024|4312 |4258 |{13.58, 46.64, 16.17, 47.84}|
|Ain_el_Abd_To_WGS_1984_3|1055|4204 |4326 |{46.54, 28.53, 48.48, 30.09}|
|Ain_El_Abd_To_WGS_1984_4|1056|4204 |4326 |{46.54, 28.53, 48.48, 30.09}|
|Ain_El_Abd_To_WGS_1984_5|1057|4204 |4326 |{46.54, 29.1, 48.42, 30.09} |
|Ain_El_Abd_To_WGS_1984_6|1058|4204 |4326 |{46.54, 28.53, 48.48, 29.45}|
+------------------------+----+--------+------+----------------------------+
only showing top 5 rows
Use the list
function to explore all transformation paths that could be used to transform between two GCS for a specified extent. The function
returns a DataFrame with the following fields:
Path
—The transformation name, or names if multiple transformations are required. Names beginning with a~
indicate that the transformation will be used in reverse.Percent
—The percentage of the specified extent for which the transformation path is valid. If no extent is defined, the value represents the percentage of the world for which the path is valid.Accuracy
—The maximum accuracy guaranteed by the transformation path, in meters.
Transformation paths are sorted by Percent
and then Accuracy
. For example, the code sample below shows listing
the top 5 available paths for transforming from WGS 1984 (SRID:4326) to NAD83 / UTM zone 15N (SRID:26915) for the extent
of a USA Parks dataset. The first transformation path valid for 80.87% of the dataset extent and will guarantee an
accuracy of 0.1 meters. The other four paths are valid for a smaller percentage of the dataset extent and/or guarantee
less accuracy when transforming.
df = spark.read.format("feature-service") \
.load("https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Parks/FeatureServer/0")
bb = df.st.get_extent()
# List available paths for transforming between 4326 and 26915 for the extent of a USA Parks dataset
geoanalytics.util.list_transformations(from_sr=4326, to_sr=26915, extent=bb).show(5, truncate=False)
+-----------------------------------------------------------+------------------+--------+
|Path |Percent |Accuracy|
+-----------------------------------------------------------+------------------+--------+
|WGS_1984_(ITRF00)_To_NAD_1983 |80.87444335193237 |0.1 |
|~NAD_1983_To_WGS_1984_5 |78.52542646454121 |1.0 |
|~NAD_1983_To_WGS_1984_1 |66.13025783747702 |4.0 |
|~NAD_1927_To_WGS_1984_33 + NAD_1927_To_NAD_1983_NTv2_Canada|30.344173869840425|2.5 |
|~NAD_1927_To_WGS_1984_3 + NAD_1927_To_NAD_1983_NTv2_Canada |30.344173869840425|21.5 |
+-----------------------------------------------------------+------------------+--------+
only showing top 5 rows
Some transformations are not included with the geoanalytics jar file and require you to install the supplementary Projection Engine jar
files. The Projection Engine jars offer additional geographic transformations required for transforming to or from certain
spatial references. For example, if you try to project from WGS 1984 (SRID:4326) to NAD27 / UTM zone 11N (SRID:26711),
you may get the error Can't perform requested spatial reference transformation due to Missing grid file
.
This error indicates that your transformation requires supplemental files and you'll need to install
the Projection Engine jar files to complete your workflow.
Choosing a projected coordinate system
When choosing a projected coordinate system (PCS), consider the information you want to maintain in the result. For example, let's say you are analyzing the area required for new wind turbines. Because area is important in this case, it is recommended that your data is either stored in or transformed to a PCS with an equal-area projection (and if possible, a planar calculation method should be used in analysis). Similarly, if you need to preserve distances or angles, then use a projection appropriate for the desired characteristic. To help determine which PCS you should use, refer to the USGS criteria guide.
In addition, you can use the
create
utility to generate
a spatial reference with specific properties for a specified extent, or the
create
DataFrame accessor to
generate a spatial reference with specific properties for a Dataframe. Both the utility and DataFrame accessor allow
you to choose which of the following properties will be prioritized when creating a spatial reference:
EQUAL
—Preserves the relative area of regions everywhere on earth. Shapes and distances will be distorted._AREA CONFORMAL
—Preserves angles in small areas. Shapes, sizes, and distances will be distorted.EQUIDISTANT
—Preserves distances when measured through the center of the projection. Areas, shapes, and other distances will be distorted._ONE _POINT EQUIDISTANT
—Preserves distances when measured along meridians. Areas, shapes, and other distances will be distorted._MERIDIANS COMPROMISE
—Does not preserve areas, shapes, or distances specifically, but creates a balance between these geometric properties. Compromise projections are only suggested for very large areas._WORLD
Spatial references generated using create
may have an SRID of 0.
The following code sample shows an example of projecting geometries to a spatial reference that was
generated using create
with the COMPROMISE
property. The example feature service is updated
regularly and results may not match those shown below exactly.
df = spark.read.format("feature-service") \
.load(r"https://services9.arcgis.com/RHVPKKiFTONKtxq3/ArcGIS/rest/services/MODIS_Thermal_v1/FeatureServer/1")
# Create an optimal spatial reference and print the WKT string
sr = df.st.create_optimal_sr("COMPROMISE_WORLD")
print(sr.wkt)
# Project the data to the optimal spatial reference
df_transformed = df.select(ST.transform("shape", sr))
df_transformed.show(5, truncate=False)
PROJCS["CustomProjection",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Natural_Earth"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-29.63],UNIT["Meter",1.0]]
+--------------------------------------------------+
|ST_Transform(shape) |
+--------------------------------------------------+
|{"x":1.686213355038753e7,"y":-2331626.9778772173} |
|{"x":1.5324947321214821e7,"y":-2649943.1473921738}|
|{"x":1.5735459338985184e7,"y":-3837413.4567152876}|
|{"x":1.5736402521247668e7,"y":-3837581.503795787} |
|{"x":1.5787735925841367e7,"y":-4104234.0465022367}|
+--------------------------------------------------+
only showing top 5 rows
If your data is at a local scale, consider using a coordinate system specific to the location you are analyzing. For example, you have data in Palm Springs, California. Palm Springs is in Riverside county and the current official PCS for the county is NAD 1983 (2011) State Plane California VI (SRID: 6425).
If your data is at a global scale, avoid using Web Mercator (SRID: 3857) when possible, especially to run analysis. This projection is known to drastically distort areas, lengths, and angles. There are other options to analyze global data that may be more applicable to your analysis. GCS are usually recommended for accurate analysis of global-scale data.
If your data crosses the antimeridian (similar to the international dateline in location), use a local PCS for the particular area. For example, if your data is in Fiji you may want to use either Fiji 1986 Fiji Map Grid (SRID: 3460) or Fiji 1986 UTM Zone 60S (SRID: 3141). Alternatively, use a GCS that supports wrapping around the antimeridian.
Planar vs. geodesic distance calculations
Some analysis tools enable you to choose between a geodesic or planar distance calculation. Planar calculations are recommended for accurate analysis of small, local areas only. Geodesic calculations are recommended for accurate analysis of larger, global areas. It is recommended you use geodesic distances in the following circumstances:
- Tracks cross the antimeridian—When using the geodesic method, geometries that cross the antimeridian will have tracks that correctly cross the antimeridian. Your spatial reference must support wrapping around the antimeridian, for example, a global projection such as World Cylindrical Equal Area.
- Your dataset is not in a local projection—If your spatial reference is set to a local projection, use the planar distance method. For example, use the planar method to examine data using a state plane spatial reference.
What's next?
Learn more about coordinate systems and transformations: