GeoAnalytics for Microsoft Fabric can read and write data to and from databases that are supported by Apache Spark. This tutorial demonstrates how to read and write data to and from DataFrames for the following databases:
- Microsoft SQL Server
- Oracle
- PostgreSQL
Steps
JDBC drivers
A JDBC driver is required to read and write data to and from a database with GeoAnalytics for Microsoft Fabric. Each database type (e.g. PostgreSQL, Microsoft SQL Server, etc.) requires a specific JDBC driver.
In this tutorial, the following JDBC drivers are used:
Database | JDBC Driver |
---|---|
Microsoft SQL Server | Spark MS SQL Connector |
Oracle | Ojdbc11 |
PostgreSQL | PostgreSQL JDBC Driver |
The database specific JDBC drivers can be included when starting up Spark/GeoAnalytics for Microsoft Fabric by using the packages flag and specifying a comma-delimited list of Maven coordinates.
Examples:
--packages group
Id :artifact Id :version --packages group
Id :artifact Id :version,group Id :artifact Id :version
Setup modules and data
-
Import the GeoAnalytics for Microsoft Fabric Spatial Type functions and the
Point
geometry type. The Pythonuuid
module is also imported to create a unique database table name.Python Python Scala import uuid from geoanalytics_fabric.sql import functions as ST from geoanalytics_fabric.sql import Point
-
Create a DataFrame with columns for a unique ID and
Point
type geometry.Python Python Scala data = [(1, Point(10, 10)), (2, Point(20, 20)), (3, Point(30, 30))] df = spark.createDataFrame(data, ["unique_id", "geometry"]) df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- geometry: point (nullable = true) +---------+---------------+ |unique_id| geometry| +---------+---------------+ | 1|{"x":10,"y":10}| | 2|{"x":20,"y":20}| | 3|{"x":30,"y":30}| +---------+---------------+
-
Create a new well-known binary column from the
Point
geometry column and then drop thePoint
geometry column.Python Python Scala df = df.withColumn("geom_wkb", ST.as_binary("geometry")) \ .drop("geometry") df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- geom_wkb: binary (nullable = true) +---------+--------------------+ |unique_id| geom_wkb| +---------+--------------------+ | 1|[01 01 00 00 00 0...| | 2|[01 01 00 00 00 0...| | 3|[01 01 00 00 00 0...| +---------+--------------------+
Microsoft SQL Server
SQL Server Write
-
Write the DataFrame created in the Setup modules and data section to a new table within the Microsoft SQL Server database.
Python Python Scala # Create a unique table name using the UUID module table_name = f"df_write_{uuid.uuid4().hex[:6]}" # Write the DataFrame to the SQL Server database df.write \ .format("jdbc") \ .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \ .option("dbtable", table_name) \ .option("user", username) \ .option("password", password) \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .save()
-
Use the Microsoft SQL Server STPointFromWKB geometry method to create a SQL Server point geometry type column from the well-known binary column. The SQL Query below will add an empty SQL Server geometry spatial data type column to the database table created in step 1 and then populate it with the results from
ST
.Point From WKB SQL-- Add a geometry column to the database table ALTER TABLE table_name ADD geom Geometry; -- Populate the geometry column with the results from STPointFromWKB and set the SRID UPDATE table_name SET geom=geometry::STPointFromWKB(geom_wkb, SRID); -- View the database table updates SELECT * FROM table_name;
Result+---------+--------------------------------------------+----------------------------------------------+ |unique_id|geom_wkb |geom | +---------+--------------------------------------------+----------------------------------------------+ |3 |0x01010000000000000000003E400000000000003E40|0xE6100000010C0000000000003E400000000000003E40| |1 |0x010100000000000000000024400000000000002440|0xE6100000010C00000000000024400000000000002440| |2 |0x010100000000000000000034400000000000003440|0xE6100000010C00000000000034400000000000003440| +---------+--------------------------------------------+----------------------------------------------+
SQL Server Read
-
When reading from a database table, you can read all records or a subset. Both options are demonstrated below.
The Microsoft SQL Server STAsBinary method is used with the Spark
query
data source option to convert the SQL Server geometry spatial data type column to it's well-known binary format.Create a DataFrame by reading all records in the database table that was created in the SQL Server write example above.
Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \ .option("query", f"select unique_id, geom.STAsBinary() as wkb from {table_name}") \ .option("user", username) \ .option("password", password) \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) +---------+--------------------+ |unique_id| wkb| +---------+--------------------+ | 3|[01 01 00 00 00 0...| | 1|[01 01 00 00 00 0...| | 2|[01 01 00 00 00 0...| +---------+--------------------+
Create a DataFrame using a
where
SQL statement in the Sparkquery
data source option to read in only a subset of the database table that was created in the SQL Server write example above.Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:sqlserver://{server_name}\\{instance_name};databaseName={database_name}") \ .option("query", f"select unique_id, geom.STAsBinary() as wkb from {table_name} where unique_id = 1") \ .option("user", username) \ .option("password", password) \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) +---------+--------------------+ |unique_id| wkb| +---------+--------------------+ | 1|[01 01 00 00 00 0...| +---------+--------------------+
-
Create a new
Point
geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.Python Python Scala df = df.withColumn("geometry", ST.point_from_binary("wkb")) df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) |-- geometry: point (nullable = true) +---------+--------------------+---------------+ |unique_id| wkb| geometry| +---------+--------------------+---------------+ | 3|[01 01 00 00 00 0...|{"x":30,"y":30}| | 1|[01 01 00 00 00 0...|{"x":10,"y":10}| | 2|[01 01 00 00 00 0...|{"x":20,"y":20}| +---------+--------------------+---------------+
Oracle
Oracle write
-
Write the DataFrame created in the Setup modules and data section to a new table within the Oracle database.
Python Python Scala # Create a unique table name using the UUID module table_name = f"df_write_{uuid.uuid4().hex[:6]}" # Write the DataFrame to the Oracle database df.write \ .format("jdbc") \ .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \ .option("dbtable", table_name) \ .option("user", username) \ .option("password", password) \ .option("driver", "oracle.jdbc.driver.OracleDriver") \ .save()
-
Use the Oracle Spatial Data Object FROM_WKBGEOMETRY function to create an SDO_Geometry type column from the well-known binary column. The SQL Query below will add an empty Oracle SDO Geometry column to the database table created in step 1 and then populate it with the results from
FROM
._WKBGEOMETRY SQL-- Add a geometry column to the database table ALTER TABLE table_name ADD geom SDO_GEOMETRY; -- Populate the geometry column with the results from SDO_UTIL.FROM_WKBGEOMETRY UPDATE table_name SET geom=SDO_UTIL.FROM_WKBGEOMETRY("geom_wkb"); -- Set the SRID for the geometry column in the database UPDATE table_name tbl SET tbl.GEOM.SDO_SRID = SRID; -- Commit the changes to the database table commit; -- View the database table updates SELECT * FROM table_name;
Result+---------+--------------------------------------------+----------------------------------+ |unique_id|geom_wkb |GEOM | +---------+--------------------------------------------+----------------------------------+ |2 |0x010100000000000000000034400000000000003440|{2001,4326,{20,20,null},null,null}| |1 |0x010100000000000000000024400000000000002440|{2001,4326,{10,10,null},null,null}| |3 |0x01010000000000000000003E400000000000003E40|{2001,4326,{30,30,null},null,null}| +---------+--------------------------------------------+----------------------------------+
Oracle Read
-
When reading from a database table, you can read all records or a subset. Both options are demonstrated below.
The Oracle Spatial Data Object TO_WKBGEOMETRY function to convert the Oracle SDO geometry column to it's well-known binary format.
Create a DataFrame by reading all records in the database table that was created in the Oracle write example above.
Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \ .option("query", f'select "unique_id", SDO_UTIL.TO_WKBGEOMETRY(GEOM) as wkb from {table_name}') \ .option("user", username) \ .option("password", password) \ .option("driver", "oracle.jdbc.driver.OracleDriver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: decimal(19,0) (nullable = true) |-- WKB: binary (nullable = true) +---------+--------------------+ |unique_id| WKB| +---------+--------------------+ | 2|[00 00 00 00 01 4...| | 3|[00 00 00 00 01 4...| | 1|[00 00 00 00 01 4...| +---------+--------------------+
Create a DataFrame using a
where
SQL statement in the Sparkquery
data source option to read in only a subset of the database table that was created in the Oracle write example above.Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:oracle:thin:@//{host}:{port}/{service_name}") \ .option("query", f'select "unique_id", SDO_UTIL.TO_WKBGEOMETRY(GEOM) as wkb from {table_name} where rownum<3') \ .option("user", username) \ .option("password", password) \ .option("driver", "oracle.jdbc.driver.OracleDriver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: decimal(19,0) (nullable = true) |-- WKB: binary (nullable = true) +---------+--------------------+ |unique_id| WKB| +---------+--------------------+ | 2|[00 00 00 00 01 4...| | 1|[00 00 00 00 01 4...| +---------+--------------------+
-
Create a new
Point
geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.Python Python Scala df = df.withColumn("geometry", ST.point_from_binary("wkb")) df.printSchema() df.show()
Resultroot |-- unique_id: decimal(19,0) (nullable = true) |-- WKB: binary (nullable = true) |-- geometry: point (nullable = true) +---------+--------------------+---------------+ |unique_id| WKB| geometry| +---------+--------------------+---------------+ | 2|[00 00 00 00 01 4...|{"x":20,"y":20}| | 3|[00 00 00 00 01 4...|{"x":30,"y":30}| | 1|[00 00 00 00 01 4...|{"x":10,"y":10}| +---------+--------------------+---------------+
PostgreSQL
PostgreSQL Write
-
Write the DataFrame created in the Setup modules and data section to a new table within the PostgreSQL database.
Python Python Scala # Create a unique table name using the UUID module table_name = f"df_write_{uuid.uuid4().hex[:6]}" # Write the DataFrame to the PostgreSQL database df.write \ .format("jdbc") \ .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \ .option("dbtable", table_name) \ .option("user", username) \ .option("password", password) \ .option("driver", "org.postgresql.Driver") \ .save()
-
Use the PostgreSQL ST_PointFromWKB function to create a POINT geometry type column from the well-known binary column. The SQL Query below will create a new database table that is a copy of the table created in step 1 but with an additional column for the PostgreSQL geometry type that was created using
ST
._Point From WKB SQL-- Create a new database table with a geometry column with the results from ST_PointFromWKB DROP TABLE IF EXISTS new_table_name; SELECT *, ST_PointFromWKB(geom_wkb, SRID) AS geom INTO new_table_name FROM table_name; -- View the database table updates SELECT * FROM new_table_name;
Result+---------+--------------------------------------------+--------------------------------------------------------------------------------+ |unique_id|geom_wkb |geom | +---------+--------------------------------------------+--------------------------------------------------------------------------------+ |2 |0x010100000000000000000034400000000000003440|280000000100000001000400E61000000C0000000100000080A09B9FB91880A09B9FB91800000000| |3 |0x01010000000000000000003E400000000000003E40|280000000100000001000400E61000000C0000000100000080B0FADF831980B0FADF831900000000| |1 |0x010100000000000000000024400000000000002440|280000000100000001000400E61000000C000000010000008090BCDEEE178090BCDEEE1700000000| +---------+--------------------------------------------+--------------------------------------------------------------------------------+
PostgreSQL Read
-
When reading from a database table, you can read all records or a subset. Both options are demonstrated below.
The PostgreSQL STAsBinary function is used with the Spark
query
data source option to convert the PostgreSQL geometry column to it's well-known binary format.Create a DataFrame by reading all records in the database table that was created in the PostgreSQL write example above.
Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \ .option("query", f"select unique_id, ST_AsBinary(geom) as wkb from {table_name}") \ .option("user", username) \ .option("password", password) \ .option("driver", "org.postgresql.Driver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) +---------+--------------------+ |unique_id| wkb| +---------+--------------------+ | 3|[01 01 00 00 00 1...| | 1|[01 01 00 00 00 2...| | 2|[01 01 00 00 00 1...| +---------+--------------------+
Create a DataFrame using a
where
SQL statement in the Sparkquery
data source option to read in only a subset of the database table that was created in the PostgreSQL write example above.Python Python Scala df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:postgresql://{host}:{port}/{database}") \ .option("query", f"select unique_id, ST_AsBinary(geom) as wkb from {table_name} where unique_id > 1") \ .option("user", username) \ .option("password", password) \ .option("driver", "org.postgresql.Driver") \ .load() df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) +---------+--------------------+ |unique_id| wkb| +---------+--------------------+ | 3|[01 01 00 00 00 1...| | 2|[01 01 00 00 00 1...| +---------+--------------------+
-
Create a new
Point
geometry column from the well-known binary column in the DataFrame using ST_PointFromBinary.Python Python Scala df = df.withColumn("geometry", ST.point_from_binary("wkb")) df.printSchema() df.show()
Resultroot |-- unique_id: long (nullable = true) |-- wkb: binary (nullable = true) |-- geometry: point (nullable = true) +---------+--------------------+--------------------+ |unique_id| wkb| geometry| +---------+--------------------+--------------------+ | 3|[01 01 00 00 00 1...|{"x":30.000000000...| | 1|[01 01 00 00 00 2...|{"x":10.000000000...| | 2|[01 01 00 00 00 1...|{"x":20.000000000...| +---------+--------------------+--------------------+
What's next?
Learn about how to read in other data types or analyze your data through SQL functions and analysis tools: