Azure Cosmos DB 批量执行程序库概述Azure Cosmos DB bulk executor library overview

适用于: SQL API

Azure Cosmos DB 是一种快速且灵活的多区域分布式数据库服务,旨在通过弹性的横向扩展来支持以下功能:Azure Cosmos DB is a fast, flexible, and multiple-regionally distributed database service that is designed to elastically scale out to support:

  • 大规模的读写吞吐量(每秒数百万操作)。Large read and write throughput (millions of operations per second).
  • 存储大量(数百 TB 甚至更多)事务性和操作性数据,延迟低至毫秒级且可预测。Storing high volumes of (hundreds of terabytes, or even more) transactional and operational data with predictable millisecond latency.

批量执行程序库可助你利用这个极大的吞吐量和存储。The bulk executor library helps you leverage this massive throughput and storage. 批量执行程序库允许通过批量导入和批量更新 API 在 Azure Cosmos DB 中执行批量操作。The bulk executor library allows you to perform bulk operations in Azure Cosmos DB through bulk import and bulk update APIs. 可在以下部分中详细了解批量执行程序库的功能。You can read more about the features of bulk executor library in the following sections.

备注

目前,批量执行程序库支持导入和更新操作,但该库仅受 Azure Cosmos DB SQL API 和 Gremlin API 帐户支持。Currently, bulk executor library supports import and update operations and this library is supported by Azure Cosmos DB SQL API and Gremlin API accounts only.

重要

无服务器帐户目前不支持批量执行工具库。The bulk executor library is not currently supported on serverless accounts. 在 .NET 上,建议使用 SDK V3 版本中提供的批量支持On .NET, it is recommended to use the bulk support available in the V3 version of the SDK.

批量执行程序库的主要功能Key features of the bulk executor library

  • 它可以显著减少使分配给容器的吞吐量达到饱和所需的客户端计算资源。It significantly reduces the client-side compute resources needed to saturate the throughput allocated to a container. 在客户端计算机的 CPU 已饱和的情况下,使用批量导入 API 来写入数据的单线程应用程序实现的写入吞吐量是以并行方式写入数据的多线程应用程序的 10 倍。A single threaded application that writes data using the bulk import API achieves 10 times greater write throughput when compared to a multi-threaded application that writes data in parallel while saturating the client machine's CPU.

  • 它可以在库中高效地处理请求限制、请求超时以及其他暂时性异常,因此用户不需要执行繁冗的编写应用程序逻辑任务来处理这些内容。It abstracts away the tedious tasks of writing application logic to handle rate limiting of request, request timeouts, and other transient exceptions by efficiently handling them within the library.

  • 它为执行批量操作的应用程序提供简化的横向扩展机制。单个运行在 Azure VM 上的批量执行程序实例的消耗可以大于 50 万 RU/秒,并且可以通过在单个客户端 VM 上添加更多的实例来实现更高的吞吐率。It provides a simplified mechanism for applications performing bulk operations to scale out. A single bulk executor instance running on an Azure VM can consume greater than 500K RU/s and you can achieve a higher throughput rate by adding additional instances on individual client VMs.

  • 它可以通过横向扩展体系结构在一小时内批量导入 1 TB 以上的数据。It can bulk import more than a terabyte of data within an hour by using a scale-out architecture.

  • 它可以作为修补程序批量更新 Azure Cosmos 容器中的现有数据。It can bulk update existing data in Azure Cosmos containers as patches.

批量执行程序如何进行操作?How does the bulk executor operate?

对一批实体触发用于导入或更新文档的批量操作时,这些实体一开始会随机分布到与其 Azure Cosmos DB 分区键范围相对应的 Bucket 中。When a bulk operation to import or update documents is triggered with a batch of entities, they are initially shuffled into buckets corresponding to their Azure Cosmos DB partition key range. 在每个与分区键范围相对应的 Bucket 中,这些实体会细分成迷你批,每个迷你批充当一个可以在服务器端提交的有效负载。Within each bucket that corresponds to a partition key range, they are broken down into mini-batches and each mini-batch act as a payload that is committed on the server-side. 批量执行程序库在分区键范围的内外对并发执行这些迷你批进行了内置的优化。The bulk executor library has built in optimizations for concurrent execution of these mini-batches both within and across partition key ranges. 下图演示了批量执行程序如何将数据按批分成不同的分区键:Following image illustrates how bulk executor batches data into different partition keys:

批量执行程序体系结构

批量执行程序库可确保最大程度地利用分配给集合的吞吐量。The bulk executor library makes sure to maximally utilize the throughput allocated to a collection. 它使用适用于每个 Azure Cosmos DB 分区键范围的  AIMD 样式拥塞控制机制,可以有效地处理速率限制和超时。It uses an AIMD-style congestion control mechanism for each Azure Cosmos DB partition key range to efficiently handle rate limiting and timeouts.

后续步骤Next Steps

  • 若要进行详细了解,请试用那些在 .NETJava 中使用批量执行程序库的示例应用程序。Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.
  • .NETJava 中查看批量执行程序 SDK 信息和发行说明。Check out the bulk executor SDK information and release notes in .NET and Java.
  • 批量执行程序库已集成到 Cosmos DB Spark 连接器中。若要进行详细的了解,请参阅 Azure Cosmos DB Spark 连接器一文。The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.
  • 批量执行程序库也已集成到新版 Azure Cosmos DB 连接器中,可供 Azure 数据工厂复制数据。The bulk executor library is also integrated into a new version of Azure Cosmos DB connector for Azure Data Factory to copy data.