Geometry

GeoAnalytics Engine uses PySpark DataFrame columns to represent geographic datasets, where each row in the column is a different geographic feature. Engine includes five data types that represent different types of geometries. The five geometry types are:

  • point
  • multipoint
  • linestring
  • polygon
  • geometry

Multipart linestring and multipart polygon geometries are represented by the linestring and polygon types respectively.

The generic geometry type is used when a column contains more than one geometry type or if the types of the geometries in the column are unknown. Some tools and functions require a specific geometry type and do not support geometry. All five of the types listed above implement the OpenGIS Simple Features Implementation Specification for SQL 1.2.1.

Creating geometry columns

Geometry columns are created using SQL functions or by loading data from a shapefile or feature service. Use a SQL function if you have geometry data in one of the following formats:

If you know the type of your geometry data (i.e. point, multipoint, line, or polygon), you should use the SQL function for that type. For example, if you have linestring geometries in EsriJSON, you should use the ST_LineFromEsriJSON function. If you do not know the type of your geometry data or a column contains more than one geometry type, use the generic function for your format. For example, if you have a mix of geometries in EsriJSON, you should use the ST_GeomFromEsriJSON function.

If you are loading data from a collection of shapefiles or a feature service, a geometry field called geometry will be created automatically.

Spatial references

If you do not specify a spatial reference when creating a geometry column, the spatial reference of the geometry column will be 0 by default, indicating an unknown spatial reference. If you know the spatial reference of the geometry data, you can set it using ST_SRID or ST_SRText. Setting the spatial reference is required by some functions and tools.

When loading data from a shapefile or feature service, the spatial reference of the geometry column will be set automatically. You can check the spatial reference of any geometry column using st.get_spatial_reference(). To learn more about spatial references, see Coordinate systems and transformations.

Setting the primary geometry column

Most tools in geoanalytics.tools require that the input DataFrame has a geometry column. If there are multiple geometry columns in a DataFrame, you must call st.set_geometry_field() on the DataFrame to specify the primary geometry column that will be used in analysis. If there is only one geometry column in a DataFrame, it will be used automatically.

Empty and null geometries

Empty geometries are returned from some functions when the result of an operation is known to be an empty geometry of a certain type. For more information, see the documentation specific to the function you are using. You can also create empty geometries from the WKT representation of an empty geometry. Creating empty geometries from other representations is not supported.

A null geometry record indicates that the geometry could not be created or that the result is unknown or undefined behavior.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.