Coordinate systems and transformations
Coordinate systems are arbitrary designations for spatial data. Their purpose is to provide a common basis for communication about a particular place or area on the Earth's surface.
There are a few critical considerations that should be made when choosing the correct coordinate system for your data or analysis; this includes the units your data is measured in, where the data is on Earth, what the data's extent is, and the phenomena you are trying to analyze (areas, distances, angles, etc.).
For the same dataset, the most appropriate coordinate system may vary based on whether you are plotting, analyzing, or sharing data.
This topic reviews the different types of coordinate systems, and best practices for setting and transforming spatial references for your spatial data.
This guide outlines two types of coordinate systems:
- A Geographic Coordinate System (GCS) specifies a datum, spheroid, and prime meridian. A GCS uses coordinates in angular units (e.g. degrees or grads) and is better imagined as a globe than as a flat map.
- A Projected Coordinate System (PCS) is the result of applying a map projection to data in a GCS to create a flat map. The PCS contains the original GCS definition and additional projection information. A PCS uses coordinates in linear units (e.g. meters or feet).
A PCS uses a map projection to convert GCS data to a flat map. If your data is stored in a GCS and your intent is to draw or plot your data on a map, then projecting your data will be required. To learn more about plotting your data on a map, see visualize results.
You won't arbitrarily choose a GCS, as your spatial data will have been collected and stored in one already. Some data sources, such as shapefiles or feature services, have a coordinate system set by default that is stored with the geometry field. Other data sources, such as delimited files or string definitions, do not have a spatial reference set by default.
To verify a spatial reference ID (SRID) has been set for your geometry column, use the following code sample.
# Check the spatial reference of your geometry column df.select(ST.srid("geometry")).show(1)
+----------------+ |stsrid(geometry)| +----------------+ | 4267| +----------------+ only showing top 1 rows
In the cases where the your spatial reference is not set on your DataFrame, you need to set it to use the SRID it was
collected in. To set a coordinate system for your data, you need to use the
ST_ function when
defining the geometry of your DataFrame.
For example, you collected location data using a GPS that was set to use a GCS of NAD 1983 (CSRS) (SRID: 4617), you need to define
the spatial reference of your DataFrame as 4617.
If you are unsure of what spatial reference your data is in, here a few hints to figure it out:
- See if there is metadata or information stored with the file or where you downloaded the data.
- Look for clues in the dataset:
- Look at the field names. If they are named "latitude" or "longitude", the data is most likely in a GCS. If you see field names that include "meters", "feet", or "utmzone", the data is most likely in a PCS.
- Look at the field values in the geometry columns. Values that are consistently between 0 and 180 and 0 and 90 are most likely in degrees, meaning that the data is likely in a GCS. Since they are outside the range of degrees in the globe, values less than 0 or greater than 180 indicate that your data is most likely in a PCS.
- Try visualizing your dataset with different spatial references with other datasets. Verify that geometries are where you expect.
In addition to setting your coordinate system to match your input data, consider whether it is the most appropriate for your analysis. For example, if data is stored in Web Mercator (SRID: 3857), this is never recommended for analysis as it is known to distort spatial representations.
To set the spatial reference for your geometry column and verify the result, use the following code sample.
# Set the spatial reference for your geometry column and verify the result df_sref = df.withColumn("geometry", ST.srid("geometry", 4269)) df_sref.select(ST.srid("geometry")).show(1)
+----------------+ |stsrid(geometry)| +----------------+ | 4269| +----------------+ only showing top 1 row
Because Earth is a lumpy, squished sphere, there are many geographic coordinate systems tailored to specific locations. Geographic (or datum) transformations convert geographic data between different geographic coordinate systems.
Transformations can be useful in aligning locations between datasets. For example, if you have boundary polygons in one GCS and observation points in another, it may be best to transform your data to the same GCS to attain the most accurate relative positioning prior to analysis.
Transformations can alter the locations of your data significantly. If possible, plot your data with the new GCS and verify the new locations using records where ground-truth is known.
Some analysis tools that use two different geometry columns as input will automatically transform your data to the same spatial reference for analysis. To avoid automatic transformations, it is recommended you transform your data prior to analysis for the most accurate results.
GeoAnalytics Engine includes many transformations that will be automatically applied when transforming your data from one spatial reference to another using ST_Transform. However, some transformations are not included with the geoanalytics jar file, and require
you to install the supplementary Projection Engine jar files. The Projection Engine jars offer additional geographic transformations required for transforming to or from certain spatial references. For example, if you try to project from SRID
you will get the error
Can't perform requested spatial reference transformation due to Missing grid file. If you get this error, your required transformation is not included in the geoanalytics jar, and you'll need to install the Projection Engine jar files and complete your workflow.
To learn how to install Projection Engine jars refer to the install and setup guide for your environment.
Transform your geometry column from one GCS to another and verify the result using the following code sample. Your geometry column must have a spatial reference set before transforming it.
# Transform your spatial data df.withColumn("geometry", ST.transform("geometry", 8252))
+----------------+ |stsrid(geometry)| +----------------+ | 8252| +----------------+ only showing top 1 rows
Some analysis tools enable you to choose between a geodesic or planar distance calculation. Note that planar calculations are recommended for accurate analysis of small, local areas only. Geodesic calculations are recommended for accurate analysis of larger, global areas. It is recommended you use geodesic distances in the following circumstances:
- Tracks cross the international date line—When using the geodesic method, geometries that cross the international date line will have tracks that correctly cross the international date line. Your spatial reference must support wrapping around the international date line, for example, a global projection such as World Cylindrical Equal Area.
- Your dataset is not in a local projection—If your spatial reference is set to a local projection, use the planar distance method. For example, use the planar method to examine data using a state plane spatial reference.
Consider the information you want to maintain in analysis results. For example, let's say you are analyzing the area required for new wind turbines. It is recommended that your data is either stored in or transformed to a projected coordinate system with an equal-area projection and if possible a planar calculation method should be used in analysis. Alternatively, use a dataset with a geographic coordinate system and a geodesic calculation method. Similarly, if you need to preserve distances or, angles, then use a coordinate system and calculation method appropriate for the desired characteristic. To best determine the projected coordinate system you should use, refer to the USGS criteria guide.
If your data is at a local scale, consider using a coordinate system specific to the location you are analyzing. For example, you have data in Palm Springs, California. Palm Springs is in Riverside county and the current official PCS for the county is NAD 1983 (2011) State Plane California VI (SRID: 6425).
If your data is at a global scale, avoid using Web Mercator (SRID: 3857) when possible, especially to run analysis. This projection is known to drastically distort areas, lengths, and angles. There are other options to analyze global data that may be more applicable to your analysis. For example, a more appropriate coordinate system for geodesic analysis may be WGS 1984 (SRID:4326) geographic coordinate system. Note that planar analysis is not recommended for global datasets.
If your data crosses the antimeridian (similar to the international dateline in location), use a geographic coordinate system and a geodesic calculation method when available. Alternatively, use a local projected coordinate system for the particular area. For example, if your data is in Fiji you may want to use either Fiji 1986 Fiji Map Grid (SRID: 3460) or Fiji 1986 UTM Zone 60S (SRID: 3141).
Learn more about coordinate systems and transformations: