geoanalytics.raster.functions
add_band
- geoanalytics.raster.functions.add_band(pixels, raster_col, no_data_value=None)
Returns a raster column with an additional band created on the input raster from the provided pixel values.
Refer to the GeoAnalytics guide for examples and usage notes: RT_AddBand
- Parameters:
pixels (pyspark.sql.Column) – A list containing pixel values for the new band.
raster_col (pyspark.sql.Column) – Raster column.
no_data_value (pyspark.sql.Column/Int/Float, optional) – Optional NoData value.
- Returns:
Raster column with the additional band.
- Return type:
pyspark.sql.Column
apply
- geoanalytics.raster.functions.apply(raster_col, band, func)
Applies a user-defined function to each pixel value in the specified band of the raster column.
The function is a Spark SQL lambda expression that takes exactly one argument (px). The lambda is translated to a Spark SQL lambda of the form: px -> <SQL expression>. All Spark SQL scalar functions are available inside the lambda, including spatial (ST_*) functions.
The lambda argument px is a struct with the following fields:
px.value - value of the current pixel in the target band.
px.values - array of pixel values across all bands at this pixel. Indexing is 0-based, so px.values[0] is the value of the first band at this pixel.
px.row - row index of the pixel (0-based).
px.column - column index of the pixel (0-based).
px.center - geometry representing the pixel center.
Band IDs are 1-based. If band refers to an existing band, that band is overwritten. If band is exactly one greater than the current number of bands, a new band is appended. All other band IDs are invalid and result in a null raster.
Non-NULL results produce valid pixels. If the lambda returns NULL for a pixel, that pixel is NoData in the output band. NaN or ±Infinity results are treated as NULL and masked. Existing NoData pixels remain masked unless explicitly unmasked.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Apply
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
band (pyspark.sql.Column/Int) – The band ID to which the function will be applied.
func (function) – A user-defined function that takes a single pixel struct and returns a modified pixel value.
- Returns:
Raster column with the user-defined function applied to the specified band.
- Return type:
pyspark.sql.Column
band_mask
- geoanalytics.raster.functions.band_mask(raster_col, band_id)
Returns an array column containing the mask values for the specified band.
Refer to the GeoAnalytics guide for examples and usage notes: RT_BandMask
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
band_id (pyspark.sql.Column) – A numeric column that represents the band ID.
- Returns:
An array column containing the mask values for the specified band.
- Return type:
pyspark.sql.Column
band_statistics
- geoanalytics.raster.functions.band_statistics(raster_col, band_id)
Returns a struct column containing statistics from a specified band in the raster. The statistics include minimum, maximum, mean, and standard deviation.
Refer to the GeoAnalytics guide for examples and usage notes: RT_BandStatistics
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
band_id (pyspark.sql.Column/int) – Numeric value representing the band index.
- Returns:
StructType column containing statistics from a specified band in the raster.
- Return type:
pyspark.sql.Column
band_values
- geoanalytics.raster.functions.band_values(raster_col, band_id)
Returns an array column containing the cell values for the specified band.
Refer to the GeoAnalytics guide for examples and usage notes: RT_BandValues
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
band_id (pyspark.sql.Column) – A numeric column that represents the band ID.
- Returns:
An array column containing the cell values for the specified band.
- Return type:
pyspark.sql.Column
bbox_clip
- geoanalytics.raster.functions.bbox_clip(raster_col, xmin, ymin, xmax, ymax)
Returns a raster column representing the raster clipped using the area specified by the x,y-coordinates. The four numeric values represent the minimum and maximum x,y-coordinates of an axis-aligned rectangle, also known as an envelope.
Refer to the GeoAnalytics guide for examples and usage notes: RT_BboxClip
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
xmin (int/float) – The minimum x-coordinate point for the Envelope.
ymin (int/float) – The minimum y-coordinate point for the Envelope.
xmax (int/float) – The maximum x-coordinate point for the Envelope.
ymax (int/float) – The maximum y-coordinate point for the Envelope.
- Returns:
A raster column representing the clipped rasters.
- Return type:
pyspark.sql.Column
calculator
- geoanalytics.raster.functions.calculator(expression, raster1_col, raster2_col=None, raster3_col=None, raster4_col=None)
Returns a raster column calculated from a raster map algebra expression applied to one or more input rasters. The supported expressions are addition, subtraction, multiplication and division.
The function expects the same number of bands on each raster, so if you are using raster inputs with mismatched band counts, you can use the RT_SelectBands function to specify the number of bands on each raster.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Calculator
- Parameters:
expression (pyspark.sql.Column) – Raster column.
raster1_col (pyspark.sql.Column) – The primary raster column to use in the expression.
raster2_col (pyspark.sql.Column, optional) – The second raster column to use in the expression. Defaults to None.
raster3_col (pyspark.sql.Column, optional) – The third raster column to use in the expression. Defaults to None.
raster4_col (pyspark.sql.Column, optional) – The fourth raster column to use in the expression. Defaults to None.
- Returns:
Raster column.
- Return type:
pyspark.sql.Column
cell_size_x
- geoanalytics.raster.functions.cell_size_x(raster_col)
Returns a numeric value that represents the dimension on a raster cell in the horizontal direction (x-coordinate).
Refer to the GeoAnalytics guide for examples and usage notes: RT_CellSizeX
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A numeric value that represents the dimension on a raster cell in the horizontal direction (x-coordinate).
- Return type:
pyspark.sql.Column
cell_size_y
- geoanalytics.raster.functions.cell_size_y(raster_col)
Returns a numeric value that represents the dimension on a raster cell in the vertical direction (y-coordinate).
Refer to the GeoAnalytics guide for examples and usage notes: RT_CellSizeY
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A numeric value that represents the dimension on a raster cell in the vertical direction (y-coordinate).
- Return type:
pyspark.sql.Column
convert_pixel_type
- geoanalytics.raster.functions.convert_pixel_type(raster_col, pixel_type)
Returns a raster column where the raster values have been cast to the specified pixel type.
The supported pixel types are uint1, uint2, uint4, uint8, uint16, uint32, int8, int16, int32, float32, and float64. The short version of pixel type is also supported. For example, u1 can be used instead of uint1 or f32 instead of float32. Also, the pixel type representation is bit-based. For example, u1 is one bit and not one byte, which means that it uses one bit per pixel.
Refer to the GeoAnalytics guide for examples and usage notes: RT_ConvertPixelType
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
pixel_type (pyspark.sql.Column/Str) – The data type for the pixels. Choose from uint1, uint2, uint4, uint8, uint16, uint32, int8, int16, int32, float32, and float64.
- Returns:
Raster column with updated pixel type.
- Return type:
pyspark.sql.Column
create_raster
- geoanalytics.raster.functions.create_raster(pixels, num_cols, num_rows, pixel_type, no_data_value=None)
Creates and returns a raster column from a list of pixel values. The function constructs a raster of the specified dimensions and pixel type using the provided pixel list. If no_data_value is specified, any pixel matching that value is treated as NoData in the resulting raster.
The supported pixel types are uint1, uint2, uint4, uint8, uint16, uint32, int8, int16, int32, float32, and float64. The short version of pixel type is also supported. For example, u1 can be used instead of uint1 or f32 instead of float32. Also, the pixel type representation is bit-based. For example, u1 is one bit and not one byte, which means that it uses one bit per pixel.
Refer to the GeoAnalytics guide for examples and usage notes: RT_CreateRaster
- Parameters:
pixels (pyspark.sql.Column) – A list containing raster pixel values.
num_cols (pyspark.sql.Column/Int) – The number of columns in the raster.
num_rows (pyspark.sql.Column/Int) – The number of rows in the raster.
pixel_type (pyspark.sql.Column/Str) – The data type for the pixels. Choose from uint1, uint2, uint4, uint8, uint16, uint32, int8, int16, int32, float32, and float64.
no_data_value (pyspark.sql.Column/Int/Float, optional) – Optional NoData value.
- Returns:
Raster column.
- Return type:
pyspark.sql.Column
extent
- geoanalytics.raster.functions.extent(raster_col, sr=None)
Returns a polygon column representing the axis-aligned bounding box of the input raster. You can optionally specify a spatial reference so that the raster is projected into that spatial reference before its extent is calculated. The resulting polygon’s coordinates are expressed in the defined spatial reference.
If a spatial reference is provided but the input raster has none, the function assumes that the raster’s extent is already in that spatial reference. If the raster was not created with an extent, the center of the upper left pixel is assumed to be at (0,0).
Refer to the GeoAnalytics guide for examples and usage notes: RT_Extent
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
sr (int/str, optional) – The spatial reference (SRID or WKT) that the raster will be projected into.
- Returns:
Raster column representing the projected raster.
- Return type:
pyspark.sql.Column
from_binary
- geoanalytics.raster.functions.from_binary(binary_col)
Returns a raster column. It supports reading binary data from the following formats: TIFF (.tif), JPEG (.jpg), and PNG (.png). The maximum acceptable size for the input byte array is 2GB.
Refer to the GeoAnalytics guide for examples and usage notes: RT_FromBinary
- Parameters:
binary_col (pyspark.sql.Column) – Binary column.
- Returns:
Raster column representing the binary data.
- Return type:
pyspark.sql.Column
info
- geoanalytics.raster.functions.info(raster_col)
Returns a struct column that contains the properties of the input raster, including number of rows and columns, cell size, cell type, number of bands, extent, srid and spatial reference.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Info
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
StructType column containing the properties of the input raster
- Return type:
pyspark.sql.Column
materialize
- geoanalytics.raster.functions.materialize(raster_col)
Returns a raster column where the rasters have been materialized. Raster columns returned by the datasource may be lazily evaluated reference rasters, only holding the source file path and a tile offset. This function will force evaluation of that reference and load its pixels into memory.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Materialize
- Parameters:
raster_col (pyspark.sql.Column) – Raster column that may contain reference or value rasters.
- Returns:
Raster column in which all rasters are of value type.
- Return type:
pyspark.sql.Column
merge
- geoanalytics.raster.functions.merge(rasters_col)
Returns a raster column representing the merged raster. The input rasters must have the same number of bands. When rasters overlap, cell values are taken from the first raster in the array that contains valid data (non-NoData values) for that location.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Merge
- Parameters:
rasters_col (pyspark.sql.Column) – An array column containing arrays of rasters.
- Returns:
Raster column representing the merged raster.
- Return type:
pyspark.sql.Column
num_bands
- geoanalytics.raster.functions.num_bands(raster_col)
Returns a numeric value that represents the number of bands in the raster.
Refer to the GeoAnalytics guide for examples and usage notes: RT_NumBands
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A numeric value that represents the number of bands in the raster.
- Return type:
pyspark.sql.Column
num_columns
- geoanalytics.raster.functions.num_columns(raster_col)
Returns a numeric value that represents the number of columns in the raster.
Refer to the GeoAnalytics guide for examples and usage notes: RT_NumColumns
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A numeric value that represents the number of columns in the raster.
- Return type:
pyspark.sql.Column
num_rows
- geoanalytics.raster.functions.num_rows(raster_col)
Returns a numeric value that represents the number of rows in the raster.
Refer to the GeoAnalytics guide for examples and usage notes: RT_NumRows
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A numeric value that represents the number of rows in the raster.
- Return type:
pyspark.sql.Column
pixel_type
- geoanalytics.raster.functions.pixel_type(raster_col)
Returns a string column that represents tha data type of the cell. Cell values can be either positive or negative, integer, or floating point. Cells can also have a NoData value to represent the absence of data. Supported pixel types are: UInt1, UInt2, UInt4, UInt8, UInt16, UInt32, Int8, Int16, Int32, Float32 and Float64.
Refer to the GeoAnalytics guide for examples and usage notes: RT_PixelType
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
A string column that represents the data type of the cell.
- Return type:
pyspark.sql.Column
resample
- geoanalytics.raster.functions.resample(raster_col, cell_size_x, cell_size_y, method=None)
Returns a raster column representing the resampled raster with the updated spatial resolution.
You can optionally specify a resampling method that will set rules for aggregating or interpolating values across the new cell sizes. There are two options:
Nearest: Performs a nearest neighbor assignment and is the fastest of the interpolation methods. It is used primarily for discrete data, such as a land-use classification, since it will not change the values of the cells. The maximum spatial error will be one-half the cell size.
Bilinear: Performs a bilinear interpolation and determines the new value of a cell based on a weighted distance average of the four nearest input cell centers. It is useful for continuous data and will cause some smoothing of the data.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Resample
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
cell_size_x (pyspark.sql.Column) – A numeric value that represent the cell size of the new raster in x coordinate.
cell_size_y (pyspark.sql.Column) – A numeric value that represent the cell size of the new raster in y coordinate.
method (pyspark.sql.Column) – Choose from Nearest and Bilinear.
- Returns:
Raster column representing the resampled raster.
- Return type:
pyspark.sql.Column
select_bands
- geoanalytics.raster.functions.select_bands(raster_col, band_ids)
Returns a raster column. The function selects or reorders the bands in the output raster using the specified band IDs.
Refer to the GeoAnalytics guide for examples and usage notes: RT_SelectBands
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
band_ids (ArrayType) – An array of numeric values that represent the band IDs
- Returns:
Raster column representing the output raster with the selected bands.
- Return type:
pyspark.sql.Column
set_extent
- geoanalytics.raster.functions.set_extent(raster_col, xmin, ymin, xmax, ymax)
Returns a raster column with the updated extent based on the specified coordinates. The four numeric values represent the minimum and maximum x,y-coordinates of an axis-aligned rectangle, also known as an envelope.
Refer to the GeoAnalytics guide for examples and usage notes: RT_SetExtent
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
xmin (int/float) – The minimum x-coordinate point for the Envelope.
ymin (int/float) – The minimum y-coordinate point for the Envelope.
xmax (int/float) – The maximum x-coordinate point for the Envelope.
ymax (int/float) – The maximum y-coordinate point for the Envelope.
- Returns:
Raster column.
- Return type:
pyspark.sql.Column
sr_text
- geoanalytics.raster.functions.sr_text(raster_col, wkt=None)
Can work as a getter or a setter, depending on the inputs.
Getter: Takes a raster column and returns the spatial reference of the column as a string in Well-Known Text (WKT) format. If the spatial reference of the input raster column has not been set, the function returns an empty string.
Setter: Takes a raster column and a spatial reference string in Well-Known Text (WKT) format and returns the input raster column with its spatial reference set to the string value. This does not affect the raster data in the column. To transform your raster from one spatial reference to another, use RT_Transform. This also does not affect the extent of the output raster column, to set the extent of your raster column, use RT_SetExtent.
Refer to the GeoAnalytics guide for examples and usage notes: RT_SRText
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
wkt (str, optional) – Spatial reference (WKT) to set on the geometry, defaults to None.
- Returns:
Getter: StringType column representing the spatial reference (WKT) for the raster.
Setter: Raster column representing the raster with the updated spatial reference.
srid
- geoanalytics.raster.functions.srid(raster_col, srid=None)
Can work as a getter or a setter, depending on the inputs.
Getter: Takes a raster column and returns the spatial reference ID (SRID) of the column as an integer column. If the spatial reference of the input raster column has not been set, the function returns 0.
Setter: Takes a raster column and a numeric or string value which represents the spatial reference ID (SRID) and returns the input geometry column with its SRID set to the numeric value. This does not affect the raster data in the column. To transform your raster from one spatial reference to another, use RT_Transform. This also does not affect the extent of the output raster column, to set the extent of your raster column, use RT_SetExtent.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Srid
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
srid (int, optional) – Spatial reference (SRID) to set on the geometry, defaults to None.
- Returns:
Getter: IntegerType column representing the spatial reference (SRID) for the raster.
Setter: Raster column representing the geometry with the updated spatial reference.
- Return type:
pyspark.sql.Column
statistics
- geoanalytics.raster.functions.statistics(raster_col)
Returns an array column containing statistics from all bands in the raster. The statistics include minimum, maximum, mean, and standard deviation.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Statistics
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
- Returns:
Array column containing statistics about the raster.
- Return type:
pyspark.sql.Column
tiles
- geoanalytics.raster.functions.tiles(raster_col, width_in_pixels, height_in_pixels, overlap=0)
Returns an array of rasters column created by tiling the input raster into smaller sections. Each tile is defined by the specified number of columns and rows. Some tiles may contain fewer columns and rows than specified, depending on the remaining cells in the raster. By default, the function will not overlap tiles. You can optionally specify an overlap greater than 0 for the tiles to overlap. The function will start tiling from the top left side of the raster.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Tiles
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
width_in_pixels (int/str, optional) – The spatial reference (SRID or WKT) that the polygon geometry will be projected into.
height_in_pixels (int/str, optional) – The spatial reference (SRID or WKT) that the polygon geometry will be projected into.
overlap (int, optional) – A numeric value that represents the overlapping cells.
- Returns:
An array of rasters column representing the tiled rasters.
- Return type:
pyspark.sql.Column
to_binary
- geoanalytics.raster.functions.to_binary(raster_col, format='gtiff', policy=None, compression=None)
Returns a binary column representing the raster data in a specified file format. The function supports the following file formats: GTiff, JPEG, and PNG.
Internally, GeoAnalytics uses a pixel mask to represent excluded pixels. When writing to a file format that supports a NoData value (such as GeoTIFF), users have several options to translate this mask into a NoData value.
Promotion - Converts the raster to the next largest pixel type and sets the NoData value to the largest value for the new type. If the pixel type is Float64, no conversion will occur and the NoData value will be set to 1.7976931348623157E308, the maximum value for Float64
<numeric value> - Sets the NoData value to the numeric value provided.
Maximum - Sets the NoData value to the maximum value for the pixel type.
Minimum - Sets the NoData value to the minimum value for the pixel type.
Best-Effort - No policy is applied. The NoData value present will be used if it exists.
Users can also configure a global default by setting the Spark conf geoanalytics.sql.raster.noDataPolicy
Below are the supported pixel types when writing a raster column in the specified format:
GTiff (.tif, tiff): uint1, uint2, uint4, uint8, int8, uint16, int16, uint32, int32, float32, float64.
JPEG (.jpg, .jpeg): uint8, int8.
PNG (.png): uint8, int8, uint16, int16
The short version of pixel type is also supported. For example, u1 can be used instead of uint1 or f32 instead of float32. Also, the pixel type representation is bit-based. For example, u1 is one bit and not one byte, which means that it uses one bit per pixel.
Refer to the GeoAnalytics guide for examples and usage notes: RT_ToBinary
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
format (Str, optional) – The binary data format. Choose from GTiff (default), JPEG, and PNG.
policy (Str, optional) – Choose Promotion to convert the raster to the next largest pixel type. Default to None.
compression (Str, optional) – The compression type to use when writing the binary data. Only applicable when format is GTiff. Choose from “NONE”, “JPEG”, “LZW”, “PACKBITS”, “DEFLATE”.
- Returns:
Binary column representing the raster data.
- Return type:
pyspark.sql.Column
transform
- geoanalytics.raster.functions.transform(raster_col, sr)
Returns a raster column in the specified spatial reference. It will also set the spatial reference of the result column.
Refer to the GeoAnalytics guide for examples and usage notes: RT_Transform
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
sr (int/str) – The spatial reference (SRID or WKT) that the geometry will be projected into.
- Returns:
Raster column representing the projected raster.
- Return type:
pyspark.sql.Column
zonal_statistics
- geoanalytics.raster.functions.zonal_statistics(raster_col, zone_col, band_id=1, cell_assignment='center', circular_wrap_low=None, circular_wrap_high=None)
Returns a struct column containing statistics of the raster’s pixel values within each polygon zone. Each polygon in the geometry column defines a distinct zone, and the function calculates statistics for pixels that fall inside each zone. Statistics include count, minimum, maximum, range, mean, standard deviation, sum, median, the 90th percentile, variety, majority, majority count, majority percent, minority, minority count, and minority percent.
You can optionally specify the band index to analyze from the input raster. By default, the first band of the raster is used for statistical calculations.
The cell_assignment parameter determines how pixels are included in a zone. It has two options:
Center: A pixel is included if its center point lies within the polygon zone. This is the default, since it will not change the values of the cells. The maximum spatial error will be one-half the cell size.
Extent: A pixel is included if any part of the pixel overlaps the polygon zone. Use this option when you want to ensure all partially overlapping pixels are counted, especially for coarse raster resolutions relative to polygon size. This is useful for continuous data and will cause some smoothing of the data.
If both circular_wrap_low and circular_wrap_high are provided, the function computes circular statistics. The calculation includes count, mean, standard deviation, variety, majority, majority count, majority percent, minority, minority count, and minority percent.
Refer to the GeoAnalytics guide for examples and usage notes: RT_ZonalStatistics
- Parameters:
raster_col (pyspark.sql.Column) – Raster column.
zone_col (pyspark.sql.Column) – Polygon geometry column representing the zones.
band_id (pyspark.sql.Column/int, optional) – Numeric value representing the band index.
cell_assignment (pyspark.sql.Column/Str, optional) – Determines whether the pixels are included in a zone. Choose from Center (default) or Extent.
circular_wrap_low (pyspark.sql.Column/int/float, optional) – The lowest value of a cyclic dataset.
circular_wrap_high (pyspark.sql.Column/int/float, optional) – The highest value of a cyclic dataset.
- Returns:
StructType column containing statistics of the raster’s pixel values within polygon zones.
- Return type:
pyspark.sql.Column