SQL Databases using the Apache Spark connector
The Apache Spark connector for Azure SQL Database and SQL Server enables these databases to act as input data sources and output data sinks for Apache Spark jobs. It allows you to use real-time transactional data in big data analytics and persist results for ad-hoc queries or reporting.
Compared to the built-in JDBC connector, this connector provides the ability to bulk insert data into SQL databases. It can outperform row-by-row insertion with 10x to 20x faster performance. The Spark connector for SQL Server and Azure SQL Database also supports Microsoft Entra ID authentication, enabling you to connect securely to your Azure SQL databases from Azure Databricks using your Microsoft Entra ID account. It provides interfaces that are similar to the built-in JDBC connector. It is easy to migrate your existing Spark jobs to use this connector.
Requirements
There are two versions of the Spark connector for SQL Server: one for Spark 2.4 and one for Spark 3.x. The Spark 3.x connector requires Databricks Runtime 7.x or above. The connector is community-supported and does not include Azure SLA support. File any issues on GitHub to engage the community for help.
Component | Versions Supported |
---|---|
Apache Spark | 3.0.x and 2.4x |
Databricks Runtime | Apache Spark 3.0 connector: Databricks Runtime 7.x and above |
Scala | Apache Spark 3.0 connector: 2.12 Apache Spark 2.4 connector: 2.11 |
Microsoft JDBC Driver for SQL Server | 8.2 |
Microsoft SQL Server | SQL Server 2008 and above |
Azure SQL Database | Supported |
Use the Spark connector
For instructions on using the Spark connector, see Apache Spark connector: SQL Server & Azure SQL.