Read and write to databases

GeoAnalytics for Microsoft Fabric can read and write data to and from databases that are supported by Apache Spark. This tutorial demonstrates how to read and write data to and from DataFrames for the following databases:

  • Microsoft SQL Server
  • Oracle
  • PostgreSQL

Steps

JDBC drivers

A JDBC driver is required to read and write data to and from a database with GeoAnalytics for Microsoft Fabric. Each database type (e.g. PostgreSQL, Microsoft SQL Server, etc.) requires a specific JDBC driver.

In this tutorial, the following JDBC drivers are used:

DatabaseJDBC Driver
Microsoft SQL ServerSpark MS SQL Connector
OracleOjdbc11
PostgreSQLPostgreSQL JDBC Driver

The database specific JDBC drivers can be included when starting up Spark/GeoAnalytics for Microsoft Fabric by using the packages flag and specifying a comma-delimited list of Maven coordinates.

Examples:

  • --packages groupId:artifactId:version
  • --packages groupId:artifactId:version,groupId:artifactId:version

Setup modules and data

  1. Import the GeoAnalytics for Microsoft Fabric Spatial Type functions and the Point geometry type. The Python uuid module is also imported to create a unique database table name.

    PythonPythonScala
    1
    2
    3
    4
    
    import uuid
    from geoanalytics_fabric.sql import functions as ST
    from geoanalytics_fabric.sql import Point
  2. Create a DataFrame with columns for a unique ID and Point type geometry.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    
    data = [(1, Point(10, 10)),
           (2, Point(20, 20)),
           (3, Point(30, 30))]
    
    df = spark.createDataFrame(data, ["unique_id", "geometry"])
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    root
     |-- unique_id: long (nullable = true)
     |-- geometry: point (nullable = true)
    
    +---------+---------------+
    |unique_id|       geometry|
    +---------+---------------+
    |        1|{"x":10,"y":10}|
    |        2|{"x":20,"y":20}|
    |        3|{"x":30,"y":30}|
    +---------+---------------+
  3. Create a new well-known binary column from the Point geometry column and then drop the Point geometry column.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    
    df = df.withColumn("geom_wkb", ST.as_binary("geometry")) \
           .drop("geometry")
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    root
     |-- unique_id: long (nullable = true)
     |-- geom_wkb: binary (nullable = true)
    
    +---------+--------------------+
    |unique_id|            geom_wkb|
    +---------+--------------------+
    |        1|[01 01 00 00 00 0...|
    |        2|[01 01 00 00 00 0...|
    |        3|[01 01 00 00 00 0...|
    +---------+--------------------+

Microsoft SQL Server

SQL Server Write

  1. Write the DataFrame created in the Setup modules and data section to a new table within the Microsoft SQL Server database.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    # Create a unique table name using the UUID module
    table_name = f"df_write_{uuid.uuid4().hex[:6]}"
    
    # Write the DataFrame to the SQL Server database
    df.write \
      .format("jdbc") \
      .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \
      .option("dbtable", table_name) \
      .option("user", username) \
      .option("password", password) \
      .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
      .save()
  2. Use the Microsoft SQL Server STPointFromWKB geometry method to create a SQL Server point geometry type column from the well-known binary column. The SQL Query below will add an empty SQL Server geometry spatial data type column to the database table created in step 1 and then populate it with the results from STPointFromWKB.

    SQL
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    -- Add a geometry column to the database table
    ALTER TABLE table_name
    ADD geom Geometry;
    
    -- Populate the geometry column with the results from STPointFromWKB and set the SRID
    UPDATE table_name
    SET geom=geometry::STPointFromWKB(geom_wkb, SRID);
    
    -- View the database table updates
    SELECT * FROM table_name;
    Result
    1
    2
    3
    4
    5
    6
    7
     +---------+--------------------------------------------+----------------------------------------------+
     |unique_id|geom_wkb                                    |geom                                          |
     +---------+--------------------------------------------+----------------------------------------------+
     |3        |0x01010000000000000000003E400000000000003E40|0xE6100000010C0000000000003E400000000000003E40|
     |1        |0x010100000000000000000024400000000000002440|0xE6100000010C00000000000024400000000000002440|
     |2        |0x010100000000000000000034400000000000003440|0xE6100000010C00000000000034400000000000003440|
     +---------+--------------------------------------------+----------------------------------------------+

SQL Server Read

  1. When reading from a database table, you can read all records or a subset. Both options are demonstrated below.

    The Microsoft SQL Server STAsBinary method is used with the Spark query data source option to convert the SQL Server geometry spatial data type column to it's well-known binary format.

    Create a DataFrame by reading all records in the database table that was created in the SQL Server write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
              .format("jdbc") \
              .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \
              .option("query", f"select unique_id, geom.STAsBinary() as wkb from {table_name}") \
              .option("user", username) \
              .option("password", password) \
              .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
              .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 wkb|
     +---------+--------------------+
     |        3|[01 01 00 00 00 0...|
     |        1|[01 01 00 00 00 0...|
     |        2|[01 01 00 00 00 0...|
     +---------+--------------------+

    Create a DataFrame using a where SQL statement in the Spark query data source option to read in only a subset of the database table that was created in the SQL Server write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
              .format("jdbc") \
              .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \
              .option("query", f"select unique_id, geom.STAsBinary() as wkb from {table_name} where unique_id = 1") \
              .option("user", username) \
              .option("password", password) \
              .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
              .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 wkb|
     +---------+--------------------+
     |        1|[01 01 00 00 00 0...|
     +---------+--------------------+
  2. Create a new Point geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.

    PythonPythonScala
    1
    2
    3
    4
    
    df = df.withColumn("geometry", ST.point_from_binary("wkb"))
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
      |-- geometry: point (nullable = true)
    
     +---------+--------------------+---------------+
     |unique_id|                 wkb|       geometry|
     +---------+--------------------+---------------+
     |        3|[01 01 00 00 00 0...|{"x":30,"y":30}|
     |        1|[01 01 00 00 00 0...|{"x":10,"y":10}|
     |        2|[01 01 00 00 00 0...|{"x":20,"y":20}|
     +---------+--------------------+---------------+

Oracle

Oracle write

  1. Write the DataFrame created in the Setup modules and data section to a new table within the Oracle database.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    # Create a unique table name using the UUID module
    table_name = f"df_write_{uuid.uuid4().hex[:6]}"
    
    # Write the DataFrame to the Oracle database
    df.write \
      .format("jdbc") \
      .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \
      .option("dbtable", table_name) \
      .option("user", username) \
      .option("password", password) \
      .option("driver", "oracle.jdbc.driver.OracleDriver") \
      .save()
  2. Use the Oracle Spatial Data Object FROM_WKBGEOMETRY function to create an SDO_Geometry type column from the well-known binary column. The SQL Query below will add an empty Oracle SDO Geometry column to the database table created in step 1 and then populate it with the results from FROM_WKBGEOMETRY.

    SQL
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    -- Add a geometry column to the database table
    ALTER TABLE table_name
    ADD geom SDO_GEOMETRY;
    
    -- Populate the geometry column with the results from SDO_UTIL.FROM_WKBGEOMETRY
    UPDATE table_name
    SET geom=SDO_UTIL.FROM_WKBGEOMETRY("geom_wkb");
    
    -- Set the SRID for the geometry column in the database
    UPDATE table_name tbl
    SET tbl.GEOM.SDO_SRID = SRID;
    
    -- Commit the changes to the database table
    commit;
    
    -- View the database table updates
    SELECT * FROM table_name;
    Result
    1
    2
    3
    4
    5
    6
    7
    +---------+--------------------------------------------+----------------------------------+
    |unique_id|geom_wkb                                    |GEOM                              |
    +---------+--------------------------------------------+----------------------------------+
    |2        |0x010100000000000000000034400000000000003440|{2001,4326,{20,20,null},null,null}|
    |1        |0x010100000000000000000024400000000000002440|{2001,4326,{10,10,null},null,null}|
    |3        |0x01010000000000000000003E400000000000003E40|{2001,4326,{30,30,null},null,null}|
    +---------+--------------------------------------------+----------------------------------+

Oracle Read

  1. When reading from a database table, you can read all records or a subset. Both options are demonstrated below.

    The Oracle Spatial Data Object TO_WKBGEOMETRY function to convert the Oracle SDO geometry column to it's well-known binary format.

    Create a DataFrame by reading all records in the database table that was created in the Oracle write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
              .format("jdbc") \
              .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \
              .option("query", f'select "unique_id", SDO_UTIL.TO_WKBGEOMETRY(GEOM) as wkb from {table_name}') \
              .option("user", username) \
              .option("password", password) \
              .option("driver", "oracle.jdbc.driver.OracleDriver") \
              .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     root
      |-- unique_id: decimal(19,0) (nullable = true)
      |-- WKB: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 WKB|
     +---------+--------------------+
     |        2|[00 00 00 00 01 4...|
     |        3|[00 00 00 00 01 4...|
     |        1|[00 00 00 00 01 4...|
     +---------+--------------------+

    Create a DataFrame using a where SQL statement in the Spark query data source option to read in only a subset of the database table that was created in the Oracle write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
            .format("jdbc") \
            .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \
            .option("query", f'select "unique_id", SDO_UTIL.TO_WKBGEOMETRY(GEOM) as wkb from {table_name} where rownum<3') \
            .option("user", username) \
            .option("password", password) \
            .option("driver", "oracle.jdbc.driver.OracleDriver") \
            .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     root
      |-- unique_id: decimal(19,0) (nullable = true)
      |-- WKB: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 WKB|
     +---------+--------------------+
     |        2|[00 00 00 00 01 4...|
     |        1|[00 00 00 00 01 4...|
     +---------+--------------------+
  2. Create a new Point geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.

    PythonPythonScala
    1
    2
    3
    4
    
    df = df.withColumn("geometry", ST.point_from_binary("wkb"))
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
     root
      |-- unique_id: decimal(19,0) (nullable = true)
      |-- WKB: binary (nullable = true)
      |-- geometry: point (nullable = true)
    
     +---------+--------------------+---------------+
     |unique_id|                 WKB|       geometry|
     +---------+--------------------+---------------+
     |        2|[00 00 00 00 01 4...|{"x":20,"y":20}|
     |        3|[00 00 00 00 01 4...|{"x":30,"y":30}|
     |        1|[00 00 00 00 01 4...|{"x":10,"y":10}|
     +---------+--------------------+---------------+

PostgreSQL

PostgreSQL Write

  1. Write the DataFrame created in the Setup modules and data section to a new table within the PostgreSQL database.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    # Create a unique table name using the UUID module
    table_name = f"df_write_{uuid.uuid4().hex[:6]}"
    
    # Write the DataFrame to the PostgreSQL database
    df.write \
      .format("jdbc") \
      .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \
      .option("dbtable", table_name) \
      .option("user", username) \
      .option("password", password) \
      .option("driver", "org.postgresql.Driver") \
      .save()
  2. Use the PostgreSQL ST_PointFromWKB function to create a POINT geometry type column from the well-known binary column. The SQL Query below will create a new database table that is a copy of the table created in step 1 but with an additional column for the PostgreSQL geometry type that was created using ST_PointFromWKB.

    SQL
    1
    2
    3
    4
    5
    6
    7
    
    -- Create a new database table with a geometry column with the results from ST_PointFromWKB
    DROP TABLE IF EXISTS new_table_name;
    SELECT *, ST_PointFromWKB(geom_wkb, SRID) AS geom INTO new_table_name FROM table_name;
    
    -- View the database table updates
    SELECT * FROM new_table_name;
    Result
    1
    2
    3
    4
    5
    6
    7
     +---------+--------------------------------------------+--------------------------------------------------------------------------------+
     |unique_id|geom_wkb                                    |geom                                                                            |
     +---------+--------------------------------------------+--------------------------------------------------------------------------------+
     |2        |0x010100000000000000000034400000000000003440|280000000100000001000400E61000000C0000000100000080A09B9FB91880A09B9FB91800000000|
     |3        |0x01010000000000000000003E400000000000003E40|280000000100000001000400E61000000C0000000100000080B0FADF831980B0FADF831900000000|
     |1        |0x010100000000000000000024400000000000002440|280000000100000001000400E61000000C000000010000008090BCDEEE178090BCDEEE1700000000|
     +---------+--------------------------------------------+--------------------------------------------------------------------------------+

PostgreSQL Read

  1. When reading from a database table, you can read all records or a subset. Both options are demonstrated below.

    The PostgreSQL STAsBinary function is used with the Spark query data source option to convert the PostgreSQL geometry column to it's well-known binary format.

    Create a DataFrame by reading all records in the database table that was created in the PostgreSQL write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
            .format("jdbc") \
            .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \
            .option("query", f"select unique_id, ST_AsBinary(geom) as wkb from {table_name}") \
            .option("user", username) \
            .option("password", password) \
            .option("driver", "org.postgresql.Driver") \
            .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 wkb|
     +---------+--------------------+
     |        3|[01 01 00 00 00 1...|
     |        1|[01 01 00 00 00 2...|
     |        2|[01 01 00 00 00 1...|
     +---------+--------------------+

    Create a DataFrame using a where SQL statement in the Spark query data source option to read in only a subset of the database table that was created in the PostgreSQL write example above.

    PythonPythonScala
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    df = spark.read \
            .format("jdbc") \
            .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \
            .option("query", f"select unique_id, ST_AsBinary(geom) as wkb from {table_name}  where unique_id > 1") \
            .option("user", username) \
            .option("password", password) \
            .option("driver", "org.postgresql.Driver") \
            .load()
    
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
    
     +---------+--------------------+
     |unique_id|                 wkb|
     +---------+--------------------+
     |        3|[01 01 00 00 00 1...|
     |        2|[01 01 00 00 00 1...|
     +---------+--------------------+
  2. Create a new Point geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.

    PythonPythonScala
    1
    2
    3
    4
    
    df = df.withColumn("geometry", ST.point_from_binary("wkb"))
    df.printSchema()
    df.show()
    Result
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
     root
      |-- unique_id: long (nullable = true)
      |-- wkb: binary (nullable = true)
      |-- geometry: point (nullable = true)
    
     +---------+--------------------+--------------------+
     |unique_id|                 wkb|            geometry|
     +---------+--------------------+--------------------+
     |        3|[01 01 00 00 00 1...|{"x":30.000000000...|
     |        1|[01 01 00 00 00 2...|{"x":10.000000000...|
     |        2|[01 01 00 00 00 1...|{"x":20.000000000...|
     +---------+--------------------+--------------------+

What's next?

Learn about how to read in other data types or analyze your data through SQL functions and analysis tools:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.

You can no longer sign into this site. Go to your ArcGIS portal or the ArcGIS Location Platform dashboard to perform management tasks.

Your ArcGIS portal

Create, manage, and access API keys and OAuth 2.0 developer credentials, hosted layers, and data services.

Your ArcGIS Location Platform dashboard

Manage billing, monitor service usage, and access additional resources.

Learn more about these changes in the What's new in Esri Developers June 2024 blog post.

Close