This tutorial will show you how to access the properties and functions of the Python raster class.
The raster class provides a Python interface for working with raster data in a Spark DataFrame. It enables users to extract raster properties and band values through a comprehensive set of functions and properties, as well as to create new raster datasets. Additionally, the Raster class is compatible with Python UDFs. To be able to use the raster functions, a reference raster should be used.
Prerequisites
To complete the following steps, you will need:
- A running Spark session configured with ArcGIS GeoAnalytics Engine 2.0.0 or later.
- A notebook connected to your Spark session (e.g. Jupyter, JupyterLab, Databricks, EMR, etc.).
Steps
Import
-
In your notebook, import
geoanalyticsand authorize the module using a username and password, an API key, or a license file. Also, import the modules required to run the examples below.PythonUse dark colors for code blocks Copy import geoanalytics geoanalytics.auth(username="user1", password="p@ssword") from geoanalytics.raster import functions as RT from geoanalytics.sql import Raster as Raster import numpy as np from pyspark.sql.functions import udf import matplotlib.pyplot as plt
Create a PySpark Dataframe and collect the raster object
-
Create a Dataframe with a 3x3 raster.
PythonUse dark colors for code blocks Copy data = [(list(range(9)), )] df = spark.createDataFrame(data, ["pixels"]) \ .withColumn("raster", RT.srid(RT.create_raster("pixels", 3, 3, "float32"), 4326)) df = df.withColumn("raster", RT.materialize("raster")) -
Collect the raster object.
PythonUse dark colors for code blocks Copy raster = df.first().raster print(raster)ResultUse dark colors for code blocks Copy Raster(columns=3, rows=3, bands=1, pixel_type=Float32)
Extract raster properties
-
In this example, the available raster properties are printed out.
PythonUse dark colors for code blocks Copy print(f"Shape: {(raster.num_bands, raster.num_columns, raster.num_rows)}") print(f"Extent: {raster.extent}") print(f"Spatial reference: {raster.spatial_reference}") print(f"Is reference raster: {raster.is_reference}") print(f"Pixel type: {raster.pixel_type}") print(f"No data values: {raster.no_data_values}") print(f"Colormap values: {raster.colormap_values}") print(f"Colormap colors: {raster.colormap_colors}") print(f"Attribute table: {raster.attribute_table}")ResultUse dark colors for code blocks Copy Shape: (1, 3, 3) Extent: BoundingBox(min_x=-0.5, min_y=-2.5, max_x=2.5, max_y=0.5) Spatial reference: 4326 Is reference raster: False Pixel type: PixelType.Float32 No data values: [None] Colormap values: None Colormap colors: None Attribute table: None
Extract band values
- Extract the raster's band values as a list
Python
Use dark colors for code blocks Copy print(raster.band_values(1))ResultUse dark colors for code blocks Copy [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] - Extract the raster's band values as a numpy array
Python
Use dark colors for code blocks Copy band_values = raster.np_band_values(1) print(band_values.shape) print(band_values)ResultUse dark colors for code blocks Copy (3, 3) [[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]
Draw
- In this example, the band values are drawn.
Python
Use dark colors for code blocks Copy (_, ax) = plt.subplots(figsize=(5,5)) ax.imshow(raster.np_band_values(1));
Using the Python Raster Class with User-Defined Functions (UDFs)
-
Use a UDF that returns the max numpy band value in the raster object.
PythonUse dark colors for code blocks Copy @udf(returnType="double") def max_band_value(raster): return float(np.max(raster.np_band_values(1))) df.select(max_band_value("raster")).show()ResultUse dark colors for code blocks Copy +----------------------+ |max_band_value(raster)| +----------------------+ | 8.0| +----------------------+ -
Use a UDF that creates a raster object using the
Raster.create()function.PythonUse dark colors for code blocks Copy print(df.first().raster.np_values()) @udf(returnType=Raster.__UDT__) def process_raster(raster): values = raster.np_values() # Add one to values which is a numpy array values += 1 # use the Raster.create to create a raster object return Raster.create(values, raster.extent, raster.spatial_reference) df = df.select(process_raster("raster").alias("raster")) print(df.first().raster.np_values())ResultUse dark colors for code blocks Copy [[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]] [[[1. 2. 3.] [4. 5. 6.] [7. 8. 9.]]]
Create a new raster object
-
This uses the
spark.createalong withData Frame Raster.create()and creates a new raster using a numpy array as an input.PythonUse dark colors for code blocks Copy data_ndarray = np.array([1,2,3,4,5,6,7,8,9], dtype="uint8").reshape(3,3) spark.createDataFrame([(Raster.create(data_ndarray, extent=(10, 10, 20, 20), sr=4326), )], ["raster"])\ .select(RT.info("raster").alias("info")).select("info.*")\ .show()ResultUse dark colors for code blocks Copy +----------+-------+--------+------------------+------------------+---------+----+--------------------+--------------------+ |numColumns|numRows|numBands| cellSizeX| cellSizeY|pixelType|srid| srText| extent| +----------+-------+--------+------------------+------------------+---------+----+--------------------+--------------------+ | 3| 3| 1|3.3333333333333335|3.3333333333333335| UInt8|4326|GEOGCS["GCS_WGS_1...|[10.0, 10.0, 20.0...| +----------+-------+--------+------------------+------------------+---------+----+--------------------+--------------------+
What's next?
Learn about how to analyze your data through raster functions and tools: