读取 Azure Cosmos DB 更改源Reading Azure Cosmos DB change feed

可以使用推送模型或拉取模型来处理 Azure Cosmos DB 更改源。You can work with the Azure Cosmos DB change feed using either a push model or a pull model. 使用推送模型时,服务器(更改源处理器)会将工作推送到具有用于处理此工作的业务逻辑的客户端。With a push model, a server (the change feed processor) pushes work to a client that has business logic for processing this work. 但是,在检查工作以及存储上次已处理工作的状态时,所遇到的复杂情况将在服务器上进行处理。However, the complexity in checking for work and storing state for the last processed work is handled on the server.

使用拉取模型时,客户端必须从服务器拉取工作。With a pull model, the client has to pull the work from the server. 在这种情况下,客户端不仅具有用于处理工作的业务逻辑,而且还存储上次已处理工作的状态,在并行处理工作的多个客户端上处理负载均衡,并且处理错误。The client, in this case, not only has business logic for processing work but also storing state for the last processed work, handling load balancing across multiple clients processing work in parallel, and handling errors.

从 Azure Cosmos DB 更改源读取时,我们通常建议使用推送模型,因为这样就无需考虑以下事项:When reading from the Azure Cosmos DB change feed, we usually recommend using a push model because you won't need to worry about:

  • 在更改源中轮询以后的更改。Polling the change feed for future changes.
  • 存储上次已处理更改的状态。Storing state for the last processed change. 从更改源读取时,此信息会自动存储在租约容器中。When reading from the change feed, this is automatically stored in a lease container.
  • 在使用更改的多个客户端之间进行负载均衡。Load balancing across multiple clients consuming changes. 例如,如果某个客户端无法赶上处理更改的进度,而另一个客户端具有可用的容量。For example, if one client can't keep up with processing changes and another has available capacity.
  • 处理错误Handling errors. 例如,在代码中出现未经处理的异常或者发生暂时性网络问题后,自动重试未正确处理的已失败更改。For example, automatically retrying failed changes that weren't correctly processed after an unhandled exception in code or a transient network issue.

使用 Azure Cosmos DB 更改源的大多数方案都会使用推送模型选项之一。The majority of scenarios that use the Azure Cosmos DB change feed will use one of the push model options. 但在某些情况下,你可能想要对拉取模型进行更低级别的控制。However, there are some scenarios where you might want the additional low level control of the pull model. 其中包括:These include:

  • 从特定的分区键读取更改Reading changes from a particular partition key
  • 控制客户端接收要处理的更改的速度Controlling the pace at which your client receives changes for processing
  • 对更改源中的现有数据执行一次性读取(例如,执行数据迁移)Doing a one-time read of the existing data in the change feed (for example, to do a data migration)

使用推送模型读取更改源Reading change feed with a push model

使用推送模型是从更改源读取数据的最简单方法。Using a push model is the easiest way to read from the change feed. 可通过两种方式使用推送模型从更改源读取数据:Azure Functions Cosmos DB 触发器更改源处理器库There are two ways you can read from the change feed with a push model: Azure Functions Cosmos DB triggers and the change feed processor library. Azure Functions 在后台使用更改源处理器,因此这两种读取更改源的方式非常类似。Azure Functions uses the change feed processor behind the scenes, so these are both very similar ways to read the change feed. 可将 Azure Functions 简单地视为更改源处理器的托管平台,两者并非完全不同的更改源读取方式。Think of Azure Functions as simply a hosting platform for the change feed processor, not an entirely different way of reading the change feed.

Azure FunctionsAzure Functions

如果你对更改源还不太了解,则 Azure Functions 是最简单的选项。Azure Functions is the simplest option if you are just getting started using the change feed. 由于其简易性,它也是大多数更改源用例的建议选项。Due to its simplicity, it is also the recommended option for most change feed use cases. 为 Azure Cosmos DB 创建 Azure Functions 触发器时,请选择要连接的容器,每当该容器中发生更改时,都会触发 Azure 函数。When you create an Azure Functions trigger for Azure Cosmos DB, you select the container to connect, and the Azure Function gets triggered whenever there is a change in the container. 由于 Azure Functions 在后台使用更改源处理器,因此它会自动在容器的分区之间并行化更改处理操作。Because Azure Functions uses the change feed processor behind the scenes, it automatically parallelizes change processing across your container's partitions.

使用 Azure Functions 进行开发是一种简单体验,可能比在你自己的平台上部署更改源处理器更快。Developing with Azure Functions is an easy experience and can be faster than deploying the change feed processor on your own. 可以使用 Azure Functions 门户来创建触发器,也可以使用 SDK 以编程方式这样做。Triggers can be created using the Azure Functions portal or programmatically using SDKs. Visual Studio 和 VS Code 支持编写 Azure 函数,你甚至可以使用 Azure Functions CLI 进行跨平台开发。Visual Studio and VS Code provide support to write Azure Functions, and you can even use the Azure Functions CLI for cross-platform development. 可以在桌面上编写和调试代码,然后单击一下鼠标部署函数。You can write and debug the code on your desktop, and then deploy the function with one click. 有关详细信息,请参阅使用 Azure Functions 进行无服务器数据库计算将更改源与 Azure Functions 配合使用See Serverless database computing using Azure Functions and Using change feed with Azure Functions articles to learn more.

更改源处理器库Change feed processor library

更改源处理器库可让你更好地控制更改源,同时还能消除大部分复杂性。The change feed processor gives you more control of the change feed and still hides most complexity. 更改源处理器库遵循观察程序模式,在此模式中,处理函数由库调用。The change feed processor library follows the observer pattern, where your processing function is called by the library. 更改源处理器库将自动检查更改,如果找到更改,则将这些更改“推送”到客户端。The change feed processor library will automatically check for changes and, if changes are found, "push" these to the client. 如果有高吞吐量的更改源,可以实例化多个客户端来读取更改源。If you have a high throughput change feed, you can instantiate multiple clients to read the change feed. 更改源处理器库自动在不同客户端之间划分负载。The change feed processor library will automatically divide the load among the different clients. 你不需要实现任何用于在多个客户端之间进行负载均衡的逻辑,或者任何用于维护租约状态的逻辑。You won't have to implement any logic for load balancing across multiple clients or any logic to maintain the lease state.

更改源处理器库保证传送所有更改“至少一次”。The change feed processor library guarantees an "at-least-once" delivery of all of the changes. 换言之,如果你使用更改源处理器库,系统会针对更改源中的每个项成功调用处理函数。In other words, if you use the change feed processor library, your processing function will be called successfully for every item in the change feed. 如果处理函数中的业务逻辑发生未经处理的异常,则会不断重试已失败的更改,直到成功处理这些更改。If there is an unhandled exception in the business logic in your processing function, the failed changes will be retried until they are processed successfully. 若要防止更改源处理器陷入不断重试相同更改的状态,请在处理函数中添加逻辑,以便在出现异常时将文档写入死信队列。To prevent your change feed processor from getting "stuck" continuously retrying the same changes, add logic in your processing function to write documents, upon exception, to a dead-letter queue. 详细了解错误处理Learn more about error handling.

在 Azure Functions 中,有关如何处理错误的建议是相同的。In Azure Functions, the recommendation for handling errors is the same. 仍应在委托代码中添加逻辑,以便在出现异常时将文档写入死信队列。You should still add logic in your delegate code to write documents, upon exception, to a dead-letter queue. 但是,如果 Azure 函数中发生未经处理的异常,系统不会自动重试生成异常的更改。However, if there is an unhandled exception in your Azure Function, the change that generated the exception won't be automatically retried. 如果业务逻辑中发生未经处理的异常,Azure 函数会继续处理下一项更改。If there is an unhandled exception in the business logic, the Azure Function will move on to processing the next change. Azure 函数不会重试相同的已失败更改。The Azure Function won't retry the same failed change.

与使用 Azure Functions 时一样,使用更改源处理器库进行开发也很简单。Like Azure Functions, developing with the change feed processor library is easy. 但是,你要负责为更改源处理器部署一个或多个主机。However, you are responsible for deploying one or more hosts for the change feed processor. 主机是使用更改源处理器侦听更改的应用程序实例。A host is an application instance that uses the change feed processor to listen for changes. 尽管 Azure Functions 提供自动缩放功能,但你要负责缩放你的主机。While Azure Functions has capabilities for automatic scaling, you are responsible for scaling your hosts. 有关详细信息,请参阅使用更改源处理器To learn more, see using the change feed processor. 更改源处理器库是 Azure Cosmos DB SDK V3 的一部分。The change feed processor library is part of the Azure Cosmos DB SDK V3.

使用拉取模型读取更改源Reading change feed with a pull model

更改源拉取模型可让你按自己的步调使用更改源。The change feed pull model allows you to consume the change feed at your own pace. 更改必须由客户端请求,系统不会自动轮询更改。Changes must be requested by the client and there is no automatic polling for changes. 若要将上次已处理的更改永久“加入书签”(类似于推送模型的租约容器),需要保存一个继续标记If you want to permanently "bookmark" the last processed change (similar to the push model's lease container), you'll need to save a continuation token.

使用更改源拉取模型可以对更改源进行更低级别的控制。Using the change feed pull model, you get more low level control of the change feed. 使用拉取模型读取更改源时,有三个选项:When reading the change feed with the pull model, you have three options:

  • 读取整个容器的更改Read changes for an entire container
  • 读取特定 FeedRange 的更改Read changes for a specific FeedRange
  • 读取特定分区键值的更改Read changes for a specific partition key value

就像使用更改源处理器时一样,可以在多个客户端之间并行化更改处理操作。You can parallelize the processing of changes across multiple clients, just as you can with the change feed processor. 但是,拉取模型不会自动处理客户端之间的负载均衡。However, the pull model does not automatically handle load-balancing across clients. 使用拉取模型并行化更改源的处理时,首先需要获取 FeedRange 列表。When you use the pull model to parallelize processing of the change feed, you'll first obtain a list of FeedRanges. FeedRange 跨越一系列分区键值。A FeedRange spans a range of partition key values. 需要通过一个业务流程协调程序进程来获取 FeedRange 并在计算机之间分配这些 FeedRange。You'll need to have an orchestrator process that obtains FeedRanges and distributes them among your machines. 然后,可以使用这些 FeedRange 让多台计算机并行读取更改源。You can then use these FeedRanges to have multiple machines read the change feed in parallel.

拉取模型不提供内置的“至少一次”传送保证。There is no built-in "at-least-once" delivery guarantee with the pull model. 拉取模型允许进行较低级别的控制,可让你确定如何处理错误。The pull model gives you low level control to decide how you would like to handle errors.

备注

更改源拉取模型当前仅在 Azure Cosmos DB .NET SDK 中提供了预览版The change feed pull model is currently in preview in the Azure Cosmos DB .NET SDK only. 该预览版尚不可用于其他 SDK 版本。The preview is not yet available for other SDK versions.

适用于 Cassandra 和 MongoDB 的 API 中的更改源Change feed in APIs for Cassandra and MongoDB

在 MongoDB API 中,更改源功能显示为更改流;在 Cassandra API 中,它是以包含谓词的查询提供的。Change feed functionality is surfaced as change streams in MongoDB API and Query with predicate in Cassandra API. 若要详细了解 MongoDB API 的实现细节,请参阅 Azure Cosmos DB API for MongoDB 中的更改流To learn more about the implementation details for MongoDB API, see the Change streams in the Azure Cosmos DB API for MongoDB.

本机 Apache Cassandra 提供变更数据捕获 (CDC)。CDC 是一种机制,用于标记要存档的特定表,并在达到 CDC 日志的可配置磁盘空间大小时拒绝写入这些表。Native Apache Cassandra provides change data capture (CDC), a mechanism to flag specific tables for archival as well as rejecting writes to those tables once a configurable size-on-disk for the CDC log is reached. Azure Cosmos DB API for Cassandra 中的更改源功能增强了通过 CQL 使用谓词查询更改的功能。The change feed feature in Azure Cosmos DB API for Cassandra enhances the ability to query the changes with predicate via CQL. 若要详细了解实现细节,请参阅 Azure Cosmos DB API for Cassandra 中的更改源To learn more about the implementation details, see Change feed in the Azure Cosmos DB API for Cassandra.

后续步骤Next steps

现在,可以通过以下文章继续详细了解更改源:You can now continue to learn more about change feed in the following articles: