适用于 Core (SQL) API 的 Azure Cosmos DB Apache Spark 3 OLTP 连接器(预览版):发行说明和资源Azure Cosmos DB Apache Spark 3 OLTP Connector for Core (SQL) API (Preview): Release notes and resources

适用于: SQL API

Azure Cosmos DB Spark 3 OLTP 连接器(预览版)为使用 SQL API 的 Azure Cosmos DB 提供 Apache Spark v3 支持。Azure Cosmos DB Spark 3 OLTP connector (Preview) provides Apache Spark v3 support for Azure Cosmos DB using the SQL API. Azure Cosmos DB 是一种多区域分布式数据库服务,它允许开发人员使用各种标准 API(如 SQL、MongoDB、Cassandra、Graph 和表)处理数据。Azure Cosmos DB is a multiple-regionally-distributed database service which allows developers to work with data using a variety of standard APIs, such as SQL, MongoDB, Cassandra, Graph, and Table.

备注

此版本的 Azure Cosmos DB Spark 3 OLTP 连接器是一个预览版本。This version of Azure Cosmos DB Spark 3 OLTP connector is a Preview build. 此版本尚未进行负载测试或性能测试。This build hasn't been load or performance tested. 不建议在生产方案中使用此版本。This build isn't recommended for use in production scenarios.

文档Documentation

版本兼容性Version compatibility

连接器Connector SparkSpark 最低 Java 版本Minimum Java version 支持的 Scala 版本Supported Scala versions
4.0.0-beta.14.0.0-beta.1 3.1.13.1.1 88 2.122.12

下载Download

可以使用 jar 的 maven 坐标自动从 Maven 将 Spark 连接器安装到 Databricks Runtime 8:com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.0.0-beta.1You can use the maven coordinate of the jar to auto install the Spark Connector to your Databricks Runtime 8 from Maven: com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.0.0-beta.1

还可以在 SBT 项目中与 Cosmos DB Spark 连接器集成:You can also integrate against Cosmos DB Spark Connector in your SBT project:

libraryDependencies += "com.azure.cosmos.spark" % "azure-cosmos-spark_3-1_2-12" % "4.0.0-beta.1"

Cosmos DB Spark 连接器适用于 Maven 中央存储库Cosmos DB Spark Connector is available on Maven Central Repo.

常规General

如果遇到任何 bug,请在此处提交问题。If you encounter any bug, please file an issue here.

若要建议新功能或可以进行的更改,请以与提交 bug 相同的方式提交问题。To suggest a new feature or changes that could be made, file an issue the same way you would for a bug.

发布历史记录Release History

4.0.0-beta.3(未发布)4.0.0-beta.3 (Unreleased)

4.0.0-beta.2 (2021-04-19)4.0.0-beta.2 (2021-04-19)

  • Cosmos DB Spark 3.1.1 连接器预览版 4.0.0-beta.2Cosmos DB Spark 3.1.1 Connector Preview 4.0.0-beta.2 Release.

新功能New Features

  • beta-2 现在功能齐全The beta-2 is feature-complete now
  • 用于使用更改源的 Spark 结构化流(微批处理)Spark structured streaming (micro batches) for consuming change feed
  • 为写入操作 (TableCapability.STREAMING_WRITE) 添加了 Spark 结构化流(微批处理)支持Spark structured streaming (micro batches) support added for writes (TableCapability.STREAMING_WRITE)
  • 允许在 Spark 目录中配置“Cosmos 视图”,以启用对 Spark 目录的直接查询Allowing configuration of "Cosmos views" in the Spark catalog to enable direct queries against Spark catalog

关键 Bug 修复Key Bug Fixes

  • 性能验证和优化(使读取代码路径的吞吐量显着提高)Perf validation and optimizations (resulting in significant better throughput for read code path)
  • 行转换:允许配置架构不匹配的行为 - 错误与 nullRow conversion: Allow configuration of behavior on schema mismatch - error vs. null
  • 行转换:支持 InternalRow 类型,以避免在使用 InternalRow(而不是 Row)的嵌套 StructType 时失败Row conversion: Supporting InternalRow type to avoid failures when using nested StructType of InternalRow (not Row)

已知限制Known limitations

  • 尚不支持连续处理(更改源)。No support for continuous processing (change feed) yet. (将在正式发布之后添加)(will be added after GA)
  • 尚未进行性能测试/优化,我们将在下一个预览版中迭代性能。No perf tests / optimizations have been done yet - we will iterate on perf in the next preview releases. 因此,此预览版的使用范围应仅限于非生产环境。So usage should be limited to non-production environments with this preview.

4.0.0-beta.1 (2021-03-22)4.0.0-beta.1 (2021-03-22)

  • Cosmos DB Spark 3.1.1 连接器预览版 4.0.0-beta.1Cosmos DB Spark 3.1.1 Connector Preview 4.0.0-beta.1 Release.

功能Features

  • 支持 Spark 3.1.1 和 Scala 2.12。Supports Spark 3.1.1 and Scala 2.12.
  • 与 Spark3 DataSourceV2 API 集成。Integrated against Spark3 DataSourceV2 API.
  • 使用 Cosmos DB Java V4 SDK 进行精细开发。Devloped ground up using Cosmos DB Java V4 SDK.
  • 添加了对 Spark 查询、写入和流式处理的支持。Added support for Spark Query, Write, and Streaming.
  • 添加了对 Spark3 目录元数据 API 的支持。Added support for Spark3 Catalog metadata APIs.
  • 添加了对 Java V4 吞吐量控制的支持。Added support for Java V4 Throughput Control.
  • 添加了对不同分区策略的支持。Added support for different partitioning strategies.
  • 与 Cosmos DB TCP 协议集成。Integrated against Cosmos DB TCP protocol.
  • 添加了对 Databricks 自动化 Maven Resolver 的支持。Added support for Databricks automated Maven Resolver.
  • 添加了对广播 CosmosClient 缓存的支持,以减少启动 RU 限制。Added support for broadcasting CosmosClient caches to reduce bootstrapping RU throttling.
  • 向 SparkRow 转换器添加了对 jackson ObjectNode 的支持。Added support for unified jackson ObjectNode to SparkRow Converter.
  • 添加了对原始 JSON 格式的支持。Added support for Raw Json format.
  • 添加了对配置验证的支持。Added support for Config Validation.
  • 添加了对 Spark 应用程序配置合并的支持。Added support for Spark application configuration consolidation.
  • 与 Cosmos DB FeedRange API 集成,以支持分区拆分校对。Integrated against Cosmos DB FeedRange API to support Partition Split Proofing.
  • 在 DataBricks 和 Cosmos DB 实时终结点上进行自动化 CI 测试。Automated CI testing on DataBricks and Cosmos DB live endpoint.
  • 在 Cosmos DB 模拟器上进行自动化 CI 测试。Automated CI Testing on Cosmos DB Emulator.

已知限制Known limitations

  • 已实现用于使用更改源的 Spark 结构化流(微批处理),但尚未进行全面的端到端测试,因此目前被认为是实验性的。Spark structured streaming (micro batches) for consuming change feed has been implemented but not tested end-to-end fully so is considered experimental at this point.
  • 尚不支持连续处理(更改源)。No support for continuous processing (change feed) yet.
  • 尚未进行性能测试/优化,我们将在下一个预览版中迭代性能。No perf tests / optimizations have been done yet - we will iterate on perf in the next preview releases. 因此,此预览版的使用范围应仅限于非生产环境。So usage should be limited to non-production environments with this preview.

4.0.0-alpha.1 (2021-03-17)4.0.0-alpha.1 (2021-03-17)

  • Cosmos DB Spark 3.1.1 连接器测试版。Cosmos DB Spark 3.1.1 Connector Test Release.

后续步骤Next steps

请参阅使用 Azure Cosmos DB Spark 3 OLTP 连接器的快速入门指南Review our quickstart guide for working with Azure Cosmos DB Spark 3 OLTP connector.