geoanalytics.STDataFrameAccessor

create_optimal_sr

geoanalytics.extensions.STDataFrameAccessor.create_optimal_sr(self, property, custom_name=None, geometry=None)

Creates a spatial reference with a custom projected coordinate system optimal for the data extent and intended purpose of your analysis.

Supported Properties:

  • EQUAL_AREA - Preserves the relative area of regions everywhere on earth. Shapes and distances will be distorted.

  • CONFORMAL - Preserves angles in small areas. Shapes, sizes, and distances will be distorted.

  • EQUIDISTANT_ONE_POINT - Preserves distances when measured through the center of the projection. Areas, shapes, and other distances will be distorted.

  • EQUIDISTANT_MERIDIANS - Preserves distances when measured along meridians. Area, shape, and other distances will be distorted.

  • COMPROMISE_WORLD - Does not preserve areas, shapes, or distances specifically, but creates a balance between these geometric properties. Compromise projections are only suggested for very large areas.

Parameters
  • property (str) – A property that represents the purpose of the projection. Choose from EQUAL_AREA, CONFORMAL, EQUIDISTANT_ONE_POINT, EQUIDISTANT_MERIDIANS, COMPROMISE_WORLD.

  • custom_name (str, optional) – The name of the custom projected coordinate system. If unspecified, the name will be Custom_Projection.

  • geometry (str, optional) – Geometry field name. Required if there is more than one geometry field and the default is not set.

Returns

A spatial reference object

Return type

SpatialReference

get_extent

geoanalytics.extensions.STDataFrameAccessor.get_extent(self, geometry=None)

Computes the spatial extent of a geometry column in the dataframe and returns it as a BoundingBox.

Parameters

geometry (str, optional) – Geometry field name. Required if there is more than one geometry field and the default is not set.

Returns

a bounding box representing the extent

Return type

BoundingBox

get_geometry_field

geoanalytics.extensions.STDataFrameAccessor.get_geometry_field(self, *, infer=True)

Returns the set geometry field for the Spark DataFrame.

Parameters

infer (Boolean, optional, by name only) – If there is exactly one geometry column, then infer that it is the geometry field.

Returns

the geometry field name if set

Return type

str

get_spatial_reference

geoanalytics.extensions.STDataFrameAccessor.get_spatial_reference(self, geometry_field=None)

Returns the spatial reference for the geometry field.

Parameters

geometry_field (pyspark.sql.Column, optional) – Geometry type column.

Returns

NamedTuple containing the srid, if projected (PCS), and spatial reference unit.

Return type

geoanalytics.sql.SpatialReference

get_time_fields

geoanalytics.extensions.STDataFrameAccessor.get_time_fields(self, *, infer=True)

Returns the set time field(s) for the Spark DataFrame.

Parameters

infer (Boolean, optional, by name only) – If there is exactly one timestamp column, then infer that it is the start time field.

Returns

a list of time field names if set

Return type

list

plot

geoanalytics.extensions.STDataFrameAccessor.plot(self, geometry=None, cmap_values=None, is_categorical=None, vmin=None, vmax=None, ax=None, cmap=None, figsize=None, dpi=None, aspect='equal', max_geoms=1000000, legend=False, legend_kwds=None, classification_method=None, classification_kwds=None, basemap=None, xmargin=None, ymargin=None, sr=None, extent=None, **style_kwds)

Plot a geometry column from a PySpark DataFrame.

Parameters
  • geometry (str, optional) – Name of the geometry column to plot. Required if the DataFrame has more than one geometry column.

  • cmap_values (str, optional) – Name of the column to use for color mapping.

  • classification_method (str) – The name of the classification method for MapClassify

  • classification_kwds (dict) – keyword arguments to pass to mapclassify.classify such a ‘k’

  • is_categorical (bool, optional) – Set to True when the cmap_values column is categorical. The default is False.

  • vmin (float, optional) – Cmap minimum value.

  • vmax (float, optional) – Cmap maximum value.

  • ax (matplotlib.axes.Axes, optional) – The axes on which to plot. By default new axes are created.

  • cmap (str, optional) – Name of the matplotlib colormap to use.

  • figsize ((float, float), optional) – Tuple representing the width and height of the resulting matplotlib.figure.Figure in inches. This parameter is ignored when the ax parameter is set.

  • dpi (float, optional) – The resolution of the figure in dots-per-inch.

  • aspect (str or float, optional) – Aspect of the axes. Choose from “equal” (default), “auto”, or set a number representing the ratio of the height to the width.

  • max_geoms (int, optional) – Maximum number of geometries to plot. The default is 1,000,000.

  • legend (bool, optional) – Adds a legend to the plot for the cmap_values values if set to True. The default is False.

  • legend_kwds (dict, optional) – A dictionary of legend keyword arguments. For categorical legends, any argument accepted by matplotlib.axes.Axes.legend is supported. For continuous legends, see the arguments for matplotlib.pyplot.colorbar.

  • basemap (str, optional) – Adds a basemap to the plot. Choose from “light” (Light Gray Canvas), “dark” (Dark Gray Canvas), “streets” (Esri Streets Basemap) or “osm” (OpenStreetMap Vector Basemap). Basemap labels are not supported.

  • xmargin (float, optional) – Sets padding of X data. For more information see matplotlib.axes.Axes.set_xmargin.

  • ymargin (float, optional) – Sets padding of Y data. For more information see matplotlib.axes.Axes.set_ymargin.

  • sr (SpatialReference, optional) – Spatial reference (SRID or WKT) to set or transform to on the resulting plot.

  • extent (BoundingBox, optional) – Sets the extent for plotting geometries. Only geometries that intersect the extent will be visible in the plot.

  • **style_kwds

Returns

Matplotlib axes

Return type

matplotlib.axes.Axes

set_geometry_field

geoanalytics.extensions.STDataFrameAccessor.set_geometry_field(self, geometry_field)

Returns a Spark DataFrame with the set geometry field.

Parameters

geometry_field (pyspark.sql.Column) – Geometry type column.

Returns

Spark DataFrame with the set geometry field.

Return type

pyspark.sql.dataframe.DataFrame

set_spatial_reference

geoanalytics.extensions.STDataFrameAccessor.set_spatial_reference(self, srid, geometry_field=None)

Sets the spatial reference on the geometry field.

Parameters
  • srid (int) – spatial reference wkid

  • geometry_field (pyspark.sql.Column) – Geometry type column.

Returns

Spark DataFrame with the spatial reference set on the geometry field.

Return type

pyspark.sql.dataframe.DataFrame

set_time_fields

geoanalytics.extensions.STDataFrameAccessor.set_time_fields(self, start_time_field, end_time_field=None)

Returns a Spark DataFrame with the set time field(s).

Parameters
  • start_time_field (pyspark.sql.Column) – TimestampType column or StringType column that will be cast to TimestampType.

  • end_time_field (pyspark.sql.Column, optional) – TimestampType column or StringType column that will be cast to TimestampType.

Returns

Spark DataFrame with the set time field(s).

Return type

pyspark.sql.dataframe.DataFrame

to_pandas_sdf

geoanalytics.extensions.STDataFrameAccessor.to_pandas_sdf(self, geometry=None)

Converts a Spark DataFrame to a Pandas Spatially Enabled DataFrame.

Note

The map viewer widget is only supported in Jupyter Notebooks.

Parameters

geometry (pyspark.sql.Column) – Geometry type column to use for the Pandas Spatial Enabled DataFrame geometry column, defaults to None. If no column is specified, the first valid geometry type column will be used.

Returns

A Pandas Spatially Enabled DataFrame.

Return type

pandas.core.frame.DataFrame