Query PostgreSQL with Azure Databricks
This example queries PostgreSQL using its JDBC driver. For more details on reading, writing, configuring parallelism, and query pushdown, see Query databases using JDBC.
Important
The configurations described in this article are Experimental. Experimental features are provided as-is and are not supported by Databricks through customer technical support. To get full query federation support, you should instead use Lakehouse Federation, which enables your Azure Databricks users to take advantage of Unity Catalog syntax and data governance tools.
Using JDBC
Python
driver = "org.postgresql.Driver"
database_host = "<database-host-url>"
database_port = "5432" # update if you use a non-default port
database_name = "<database-name>"
table = "<table-name>"
user = "<username>"
password = "<password>"
url = f"jdbc:postgresql://{database_host}:{database_port}/{database_name}"
remote_table = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
)
Scala
val driver = "org.postgresql.Driver"
val database_host = "<database-host-url>"
val database_port = "5432" # update if you use a non-default port
val database_name = "<database-name>"
val table = "<table-name>"
val user = "<username>"
val password = "<password>"
val url = s"jdbc:postgresql://${database_host}:${database_port}/${database_name}"
val remote_table = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
Using the PostgreSQL connector in Databricks Runtime
In Databricks Runtime 11.3 LTS and above, you can use the named connector to query PosgresQL. See the following examples:
Python
remote_table = (spark.read
.format("postgresql")
.option("dbtable", "schema_name.table_name") # if schema_name not provided, default to "public".
.option("host", "database_hostname")
.option("port", "5432") # Optional - will use default port 5432 if not specified.
.option("database", "database_name")
.option("user", "username")
.option("password", "password")
.load()
)
SQL
DROP TABLE IF EXISTS postgresql_table;
CREATE TABLE postgresql_table
USING postgresql
OPTIONS (
dbtable '<schema-name>.<table-name>' /* if schema_name not provided, default to "public". */,
host '<database-host-url>',
port '5432', /* Optional - will use default port 5432 if not specified. */
database '<database-name>',
user '<username>',
password '<password>'
);
Scala
val remote_table = spark.read
.format("postgresql")
.option("dbtable", "schema_name.table_name") # if schema_name not provided, default to "public".
.option("host", "database_hostname")
.option("port", "5432") # Optional - will use default port 5432 if not specified.
.option("database", "database_name")
.option("user", "username")
.option("password", "password")
.load()