Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows how you can connect Azure Databricks to Azure SQL server to read and write data.
Important
The legacy query federation documentation has been retired and might not be updated. The configurations mentioned in this content are not officially endorsed or tested by Databricks. If Lakehouse Federation supports your source database, Databricks recommends using that instead.
In Databricks Runtime 11.3 LTS and above, you can use the sqlserver
keyword to use the included driver for connecting to SQL server. When working with DataFrames, use the following syntax:
remote_table = (spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") # optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") # (if schemaName not provided, default to "dbo")
.load()
)
val remote_table = spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") // optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") // (if schemaName not provided, default to "dbo")
.load()
When working with SQL, specify sqlserver
in the USING
clause and pass options while creating a table, as shown in the following example:
DROP TABLE IF EXISTS sqlserver_table;
CREATE TABLE sqlserver_table
USING sqlserver
OPTIONS (
dbtable '<schema-name.table-name>',
host '<host-name>',
port '1433',
database '<database-name>',
user '<username>',
password '<password>'
);
In Databricks Runtime 10.4 LTS and below, you must specify the driver and configurations using the JDBC settings. The following example queries SQL Server using its JDBC driver. For more details on reading, writing, configuring parallelism, and query pushdown, see Query databases using JDBC.
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
database_host = "<database-host-url>"
database_port = "1433" # update if you use a non-default port
database_name = "<database-name>"
table = "<table-name>"
user = "<username>"
password = "<password>"
url = f"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
remote_table = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val database_host = "<database-host-url>"
val database_port = "1433" // update if you use a non-default port
val database_name = "<database-name>"
val table = "<table-name>"
val user = "<username>"
val password = "<password>"
val url = s"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
val remote_table = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()