GeoAnalytics Engine uses PySpark DataFrame columns to represent geographic datasets, where each row in the column is a different geographic feature. Engine includes five data types that represent different types of geometries. The five geometry types are:
- point
- multipoint
- linestring
- polygon
- geometry
Multipart linestring and multipart polygon geometries are represented by the linestring
and polygon
types respectively.
The generic geometry
type is used when a column contains more than one geometry type or if the types of the geometries
in the column are unknown. Some tools and functions require a specific geometry type and do not support geometry
.
All five of the types listed above implement the OpenGIS Simple Features Implementation Specification for SQL 1.2.1.
Creating geometry columns
Geometry columns are created using SQL functions or by loading data from a shapefile or feature service. Use a SQL function if you have geometry data in one of the following formats:
- Point coordinates
- Well known text
- Well known binary
- EsriJSON
- GeoJSON
- Shapefile
If you know the type of your geometry data (i.e. point, multipoint, line, or polygon), you should use the SQL function for that type. For example, if you have linestring geometries in EsriJSON, you should use the ST_LineFromEsriJSON function. If you do not know the type of your geometry data or a column contains more than one geometry type, use the generic function for your format. For example, if you have a mix of geometries in EsriJSON, you should use the ST_GeomFromEsriJSON function.
If you are loading data from a collection of shapefiles or a feature
service, a geometry field called geometry
will be created
automatically.
Spatial references
If you do not specify a spatial reference when creating a geometry column, the spatial reference of the geometry column will be 0 by default, indicating an unknown spatial reference. If you know the spatial reference of the geometry data, you can set it using ST_SRID or ST_SRText. Setting the spatial reference is required by some functions and tools.
When loading data from a shapefile or feature service, the spatial reference of the
geometry column will be set automatically. You can check the spatial
reference of any geometry column using st.get
.
To learn more about spatial references, see
Coordinate systems and transformations.
Setting the primary geometry column
Most tools in geoanalytics.tools
require that the input DataFrame
has a geometry column. If there are multiple geometry columns in a DataFrame,
you must call st.set
on the DataFrame
to specify the primary geometry column that will be used in analysis.
If there is only one geometry column in a DataFrame, it will be used automatically.
Empty and null geometries
Empty geometries are returned from some functions when the result of an operation is known to be an empty geometry of a certain type. For more information, see the documentation specific to the function you are using. You can also create empty geometries from the WKT representation of an empty geometry. Creating empty geometries from other representations is not supported.
A null geometry record indicates that the geometry could not be created or that the result is unknown or undefined behavior.