Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows how you can connect Azure Databricks to Microsoft SQL server to read and write data.
Important
The configurations described in this article are Experimental. Experimental features are provided as-is and are not supported by Databricks through customer technical support. To get full query federation support, you should instead use Lakehouse Federation, which enables your Azure Databricks users to take advantage of Unity Catalog syntax and data governance tools.
In Databricks Runtime 11.3 LTS and above, you can use the sqlserver
keyword to use the included driver for connecting to SQL server. When working with DataFrames, use the following syntax:
remote_table = (spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") # optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") # (if schemaName not provided, default to "dbo")
.load()
)
val remote_table = spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") // optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") // (if schemaName not provided, default to "dbo")
.load()
When working with SQL, specify sqlserver
in the USING
clause and pass options while creating a table, as shown in the following example:
DROP TABLE IF EXISTS sqlserver_table;
CREATE TABLE sqlserver_table
USING sqlserver
OPTIONS (
dbtable '<schema-name.table-name>',
host '<host-name>',
port '1433',
database '<database-name>',
user '<username>',
password '<password>'
);
In Databricks Runtime 10.4 LTS and below, you must specify the driver and configurations using the JDBC settings. The following example queries SQL Server using its JDBC driver. For more details on reading, writing, configuring parallelism, and query pushdown, see Query databases using JDBC.
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
database_host = "<database-host-url>"
database_port = "1433" # update if you use a non-default port
database_name = "<database-name>"
table = "<table-name>"
user = "<username>"
password = "<password>"
url = f"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
remote_table = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val database_host = "<database-host-url>"
val database_port = "1433" // update if you use a non-default port
val database_name = "<database-name>"
val table = "<table-name>"
val user = "<username>"
val password = "<password>"
val url = s"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
val remote_table = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()