Java Database Connectivity (JDBC)

Apache Spark natively supports reading and writing data directly to and from several different types of databases. This is accomplished using Java Database Connectivity, commonly referred to as JDBC. The Spark JDBC data source reads and writes data directly to and from databases using Spark DataFrames. By extension, this means that GeoAnalytics Engine can read and write data to any of the databases supported by Apache Spark. A few of the supported database types are:

  • Microsoft SQL Server
  • Oracle
  • PostgreSQL

A JDBC driver is required to read and write data to and from a database with GeoAnalytics Engine. Each database type (e.g. PostgreSQL, Microsoft SQL Server, etc.) requires a specific JDBC driver. For example the JDBC driver for Microsoft SQL Server is different than the JDBC driver for PostgreSQL. The Maven repository contains the JDBC drivers required to connect to each of the databases supported by Apache Spark.

The table below outlines a few of the common JDBC data source options that are used when reading and writing to and from a database.

Data Source OptionExampleDescription
url.option("url", "jdbc:sqlserver://serverName\\instanceName;databaseName=databaseName")The JDBC connection URL for the database.
dbtable.option("dbtable", "databaseTableName")The name of the database table to read from or write to. Either query or dbtable must be specified.
query.option("query", "Select * from tableName where unique_id = 1")A SQL query used to read in a subset of the database table. Either query or dbtable must be specified.
user.option("user", "myDatabaseUsername")The username that will be used to connect to the database.
password.option("password", "myDatabaseUserPassword1!")The user password that will be used to connect to the database.
driver.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")The class name of the JDBC driver that will be used to connect to the database.
Database read

The Python code sample below creates a new DataFrame by reading in all of the database table records.

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
df = spark.read \
          .format("jdbc") \
          .option("url", "jdbc:postgresql://localhost:5432/sample") \
          .option("dbtable", "locations") \
          .option("user", "user1") \
          .option("password", "test123") \
          .option("driver", "org.postgresql.Driver") \
          .load()
Database write

The Python code sample below writes the data from a DataFrame to a database table.

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
df.write \
  .format("jdbc") \
  .option("url", "jdbc:sqlserver://localhost\demoInstance;databaseName=sampleTwo") \
  .option("dbtable", "places") \
  .option("user", "username2") \
  .option("password", "p@ssw0rd") \
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
  .save()

The read and write to databases tutorial includes detailed examples of how to connect to, read from, and write to Microsoft SQL Server, Oracle, and PostgreSQL.

Usage notes

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.