Skip to content

This tutorial demonstrates how to read and write raster datasets. GeoAnalytics Engine supports reading the following raster file formats:

  • GTiff
  • JPEG
  • PNG

You can write a raster column to the same formats for data storage or export to other systems.

Additional details and parameters that can be used to configure reading and writing raster data can be found in the Raster data source documentation.

Read TIF data

Prepare your input raster

  1. In this tutorial, we will use a small multiband image from the United States Geological Survey's National Agriculture Image Program (NAIP). Note that the image provided for use is a clipped piece of a larger NAIP image. The file when unzipped contains both a TIF and TFW file.

    The file can be downloaded here.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    
    # raster dataset location
    raster_data_loc = r"data\m_4209351_se_15_030_20230902_clip.tif"

Set up the workspace

  1. Import the required modules.
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    
    # Import the required modules
    import geoanalytics
    from geoanalytics.sql import functions as ST
    from geoanalytics.raster import functions as RT
    
    geoanalytics.auth(username="user1", password="p@ssword")

Read from your raster file

  1. You can read directly with the raster format option.

    This will result in a DataFrame with all of the raster data stored in a raster column. Larger raster datasets will be broken into multiple tiles (DataFrame rows) to improve performance.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    
    # Read the raster into a DataFrame
    df_raster = spark.read.format("raster").load(raster_data_loc)
  2. Reading in a raster creates a dataframe with two columns -

    • Path to the input file/tile
    • Raster information
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    
    # Show the schema of your Dataframe
    df_raster.printSchema()
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    root
     |-- path: string (nullable = true)
     |-- raster: raster (nullable = true)
  3. Look at those details in the table.

    The first table below shows a truncated element for each of the tiles that were created, while the next cell after shows details for one tile in a more readable format.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    
    df_raster.show()
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    +--------------------+--------------------+
    |                path|              raster|
    +--------------------+--------------------+
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x935x1...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x1024x...|
    |file:/C:/_data/ra...|SqlRaster(4x935x8...|
    +--------------------+--------------------+
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    
    df_raster.show(1, vertical=True, truncate=False)
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    -RECORD 0----------------------------------------------------------------------
     path   | file:/C:/_workbooks/raster/data/m_4209351_se_15_030_20230902_clip.tif
     raster | SqlRaster(4x1024x1024, UInt8)
    only showing top 1 row
  4. Look at one tile to see size and confirm that it is 1024x1024.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    df_raster.limit(1).select(
              RT.num_columns("raster"),
              RT.num_rows("raster")).show(truncate=False)
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    +------------------+---------------+
    |NumColumns(raster)|NumRows(raster)|
    +------------------+---------------+
    |1024              |1024           |
    +------------------+---------------+
  5. Change the tile sizes of the raster.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    
    # read in the same raster file with larger tile sizes
    df_raster_small_tile = spark.read.format("raster")\
        .option("tileColumns", 2048)\
        .option("tileRows", 2048)\
        .load(raster_data_loc)
    
    # show the table
    df_raster_small_tile.show()
    
    # show the tile size
    # note that in this dataset example there are fewer than 2048 rows in the input, so the tiles only have 1910 rows)
    df_raster_small_tile.select(
              RT.num_columns("raster"),
              RT.num_rows("raster")).show(truncate=False)
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    +--------------------+--------------------+
    |                path|              raster|
    +--------------------+--------------------+
    |file:/C:/_workboo...|SqlRaster(4x2048x...|
    |file:/C:/_workboo...|SqlRaster(4x2048x...|
    |file:/C:/_workboo...|SqlRaster(4x935x1...|
    +--------------------+--------------------+
    
    +------------------+---------------+
    |NumColumns(raster)|NumRows(raster)|
    +------------------+---------------+
    |2048              |1910           |
    |2048              |1910           |
    |935               |1910           |
    +------------------+---------------+
  6. By default, rasters are loaded as a reference file path to a raster, using the materialize option, the pixels of the raster are loaded as byte data and stored in memory.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    
    df_materialize = spark.read.format("raster").option("materialize", True).load(raster_data_loc)

Read JPEG data

When reading raster data in JPG format, there is usually a world file (.tfw) stored along with the JPG file to georeference the pixel values stored in the JPG file. When the .tfw file is not present, loading in a raster from JPG requires extra steps to set the spatial reference and extent before performing any raster analysis.

Read from your raster file

  1. Read in raster from JPG file.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    df = spark.read.format("raster") \
            .load(r"data\m_4209351_se_15_030_20230902_clip.jpg") \
            .groupBy().agg(RT.merge(F.collect_list("raster")).alias("raster")) # merge all tiles into one raster
  2. Check the spatial reference of the input raster read from JPEG. Note that the spatial reference ID is returned as NULL as expected.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    
    df.select(RT.info("raster").srid).show()
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    +-----------------+
    |Info(raster).srid|
    +-----------------+
    |             NULL|
    +-----------------+
  3. Set spatial reference and spatial extent of the raster.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    # set sr and spatial extent
    df_updated = df.select(RT.set_extent(RT.srid("raster", 26915),
                                         444822.9, 4666574.399999999, 446332.2, 4667147.399999999).alias("raster"))
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    # get sr and spatial extent
    df_updated.select(RT.info("raster").srid).show()
    df_updated.select(RT.info("raster").extent).show(truncate=False)
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    +-----------------+
    |Info(raster).srid|
    +-----------------+
    |            26915|
    +-----------------+
    
    +----------------------------------------------------------+
    |Info(raster).extent                                       |
    +----------------------------------------------------------+
    |[444822.9, 4666574.399999999, 446332.2, 4667147.399999999]|
    +----------------------------------------------------------+
  4. Plot the raster

    Python
    Use dark colors for code blocksCopy
    1
    2
    
    df_updated.rt.plot(cmap="cividis", alpha=0.8)
    Plotted Raster

Read from image services

Image services hosted on ArcGIS Online can be loaded as raster datasets.

The URL for an image service layer hosted in ArcGIS Online typically resembles the following pattern: https://<host>/<uniqueID>/ArcGIS/rest/services/<serviceName>/ImageServer/

Read from your hosted raster

When loading a public image service layer, you can pass the URL without options gis or token:

Use dark colors for code blocksCopy
1
spark.read.format("image-service").load(URL)

When loading a secured image service layer, a GIS connection or token will be required. For example:

PythonPythonScala
Use dark colors for code blocksCopy
1
2
3

geoanalytics.register_gis("myGIS", username="username", password="password")
spark.read.format("image-service").option("gis", "myGIS").load(URL)

Write shapefiles

In this section we demonstrate writing to a raster.

Output raster datasets can be written as:

  • GTiff
  • JPEG
  • PNG

Output files can be written to:

  • Any file system that your Spark environment can write to (e.g., local environment, Azure blob, S3, etc.)
  • ArcGIS Online

Write TIF data

Use a defined dataset to create a DataFrame and write it to TIF.

  1. Check that we have a spatial reference assigned.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    
    df_raster.limit(1).select(RT.srid("raster")).show()
  2. By default, write outputs to TIF.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    df_raster\
        .write.options(noDataPolicy = "best-effort").format("raster")\
        .save(r"raster\raster_output")

    We will end up with one .tif for each row in our DataFrame.

  3. Read back in the data.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    df_raster_readback = spark.read.format("raster").load(r"raster\raster_output")
    
    df_raster_readback.show()
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    +--------------------+--------------------+
    |                path|              raster|
    +--------------------+--------------------+
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x935x1...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x1024x...|
    |file:/C:/_workboo...|SqlRaster(4x935x8...|
    +--------------------+--------------------+
  4. Double-check the properties of the raster.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    df_raster_readback.limit(1).select(RT.extent("raster"),
              RT.cell_size_x("raster"),
              RT.cell_size_y("raster"),
              RT.num_bands("raster"),
              RT.num_columns("raster"),
              RT.num_rows("raster"),
              RT.pixel_type("raster"),
              RT.srid("raster"))\
        .show(vertical=True,truncate=False)
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    -RECORD 0------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Extent(raster)     | {"rings":[[[444822.9,4666840.199999999],[444822.9,4667147.399999999],[445130.10000000003,4667147.399999999],[445130.10000000003,4666840.199999999],[444822.9,4666840.199999999]]]}
     CellSizeX(raster)  | 0.30000000000001137
     CellSizeY(raster)  | 0.3000000000001819
     NumBands(raster)   | 4
     NumColumns(raster) | 1024
     NumRows(raster)    | 1024
     PixelType(raster)  | UInt8
     Srid(raster)       | 26915

Merge tiles to create a single raster output

Let's look at an example to merge all of our individual tiles into a single raster for output.

We will use RT_Merge and collect_list to combine the rows.

  1. Merge tiles to combine rows.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    8
    
    # using RT_Merge to combine the tiles
    import pyspark.sql.functions as F
    
    df_raster_merge = df_raster.agg(RT.merge(F.collect_list("raster")).alias("raster"))
    
    # just one row now
    df_raster_merge.show()
    Result
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    +--------------------+
    |              raster|
    +--------------------+
    |SqlRaster(4x5031x...|
    +--------------------+
  2. Write it out, and we just get one .tif (+ auxiliary files).

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    
    df_raster_merge\
        .write.options(noDataPolicy = "best-effort").format("raster")\
        .save(r"raster\raster_output_merge")

Write to ArcGIS Online

Rasters can also be written to ArcGIS Online as Image Services.

  1. You will need to authenticate with ArcGIS Online first.

    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    
    # authenticate with AGOL
    geoanalytics.register_gis("myGIS", "https://arcgis.com", username="username", password="password")
    PythonPythonScala
    Use dark colors for code blocksCopy
    1
    2
    3
    4
    5
    6
    7
    
    # write to image service
    df_raster_merge.write.format("image-service") \
                 .option("gis", "myGIS") \
                 .option("serviceName", "write_raster_test") \
                 .option("fieldName", "raster") \
                 .save()

We can see the result in the "My Content" tab in ArcGIS Online.

arcgis-one.png

And we can then map the result using the online map.

arcgis-two.png

What's next?

Learn about how to read in other data types or analyze your data through raster functions and analysis tools:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.