使用 Apache Spark 连接器的 SQL 数据库 SQL Databases using the Apache Spark connector

借助适用于 Azure SQL 数据库和 SQL Server 的 Spark 连接器,这些数据库可以充当 Apache Spark 作业的输入数据源和输出数据接收器。The Spark connector for Azure SQL Database and SQL Server enables these databases to act as input data sources and output data sinks for Apache Spark jobs. 由此,可在大数据分析中使用实时事务数据,并保留临时查询或报告的结果。It allows you to use real-time transactional data in big data analytics and persist results for ad-hoc queries or reporting.

与内置 JDBC 连接器相比,此连接器能够将数据批量插入 SQL 数据库。Compared to the built-in JDBC connector, this connector provides the ability to bulk insert data into SQL databases. 它的性能可以比逐行插入快 10 倍到 20 倍。It can outperform row-by-row insertion with 10x to 20x faster performance. 适用于 SQL Server 和 Azure SQL 数据库的 Spark 连接器还支持 Azure Active Directory (Azure AD) 身份验证,从而使你可以使用 Azure AD 帐户从 Azure Databricks 安全地连接到 Azure SQL 数据库。The Spark connector for SQL Server and Azure SQL Database also supports Azure Active Directory (Azure AD) authentication, enabling you to connect securely to your Azure SQL databases from Azure Databricks using your Azure AD account. 它提供类似于内置 JDBC 连接器的接口。It provides interfaces that are similar to the built-in JDBC connector. 可以轻松迁移现有的 Spark 作业以使用此连接器。It is easy to migrate your existing Spark jobs to use this connector.

备注

支持 Databricks Runtime 7.x 的 Spark 连接器不可用。A Spark connector that supports Databricks Runtime 7.x is not available. Databricks 建议使用 JDBC 连接器或 Databricks Runtime 6.x 或更低版本。Databricks recommends that you use the JDBC connector or Databricks Runtime 6.x or below.

要求Requirements

组件Component 支持的版本Versions Supported
Apache SparkApache Spark 2.0.2 及更高版本2.0.2 and above
ScalaScala 2.10 及更高版本2.10 and above
Microsoft JDBC Driver for SQL ServerMicrosoft JDBC Driver for SQL Server 6.2 及更高版本6.2 and above
Microsoft SQL ServerMicrosoft SQL Server SQL Server 2008 及更高版本SQL Server 2008 and above
Azure SQL 数据库Azure SQL Database 支持Supported

创建并安装 Spark 连接器库Create and install Spark connector library

  1. 为 Spark 连接器创建 Azure Databricks 库作为 Maven 库Create an Azure Databricks library for the Spark connector as a Maven library. 使用坐标:com.microsoft.azure:azure-sqldb-spark:1.0.2Use the coordinate: com.microsoft.azure:azure-sqldb-spark:1.0.2.
  2. 在将访问数据库的群集中安装库Install the library in the cluster that will access the database.

使用 Spark 连接器Use the Spark connector

有关使用 Spark 连接器的说明,请参阅通过适用于 Azure SQL 数据库和 SQL Server 的 Spark 连接器,加速实时大数据分析For instructions on using the Spark connector, see Accelerate real-time big data analytics with Spark connector for Azure SQL Database and SQL Server.