Azure Cosmos DB 中的更改源设计模式Change feed design patterns in Azure Cosmos DB

适用于: SQL API

Azure Cosmos DB 更改源可以高效处理具有大量写入的大型数据集。The Azure Cosmos DB change feed enables efficient processing of large datasets with a high volume of writes. 更改源还提供用于查询整个数据集以确定更改内容的替代方法。Change feed also offers an alternative to querying an entire dataset to identify what has changed. 本文档重点介绍常用的更改源设计模式、设计优缺点和更改源的限制。This document focuses on common change feed design patterns, design tradeoffs, and change feed limitations.

Azure Cosmos DB 非常适合用于 IoT、游戏、零售和操作日志记录应用程序。Azure Cosmos DB is well-suited for IoT, gaming, retail, and operational logging applications. 这些应用程序中的一种常见设计模式是使用数据更改来触发附加的操作。A common design pattern in these applications is to use changes to the data to trigger additional actions. 附加操作的示例包括:Examples of additional actions include:

  • 插入或更新项时触发通知或 API 调用。Triggering a notification or a call to an API, when an item is inserted or updated.
  • 对 IoT 的实时流式处理或对运营数据的实时分析处理。Real-time stream processing for IoT or real-time analytics processing on operational data.
  • 数据移动,例如与缓存、搜索引擎、数据仓库或冷存储进行同步。Data movement such as synchronizing with a cache, a search engine, a data warehouse, or cold storage.

使用 Azure Cosmos DB 中的更改源,可针对每种模式构建高效、可缩放的解决方案,如下图所示:The change feed in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns, as shown in the following image:

使用 Azure Cosmos DB 更改源促成实时分析和事件驱动的计算方案

事件计算和通知Event computing and notifications

Azure Cosmos DB 更改源可以简化需要基于特定事件触发通知或发送对 API 的调用的方案。The Azure Cosmos DB change feed can simplify scenarios that need to trigger a notification or send a call to an API based on a certain event. 可以使用更改源进程库自动轮询容器的更改,并在每次发生写入或更新操作时调用外部 API。You can use the Change Feed Process Library to automatically poll your container for changes and call an external API each time there is a write or update.

还可以基于特定的条件,有选择性地触发通知或发送对 API 的调用。You can also selectively trigger a notification or send a call to an API based on specific criteria. 例如,如果你要使用 Azure Functions 从更改源中读取数据,可以在函数中放置逻辑,仅在满足特定条件的情况下发送通知。For example, if you are reading from the change feed using Azure Functions, you can put logic into the function to only send a notification if a specific criteria has been met. 尽管在每次发生写入和更新操作期间 Azure 函数代码都会执行,但只有在满足特定的条件时才会发送通知。While the Azure Function code would execute during each write and update, the notification would only be sent if specific criteria had been met.

实时流处理Real-time stream processing

Azure Cosmos DB 更改源可用于 IoT 的实时流处理,或者基于操作数据进行实时分析处理。The Azure Cosmos DB change feed can be used for real-time stream processing for IoT or real-time analytics processing on operational data. 例如,可以接收和存储来自设备、传感器、基础结构和应用程序的事件数据,并使用 Spark 实时处理这些事件。For example, you might receive and store event data from devices, sensors, infrastructure and applications, and process these events in real time, using Spark. 下图显示了如何通过更改源使用 Azure Cosmos DB 实现 lambda 体系结构:The following image shows how you can implement a lambda architecture using the Azure Cosmos DB via change feed:

用于引入和查询的基于 Azure Cosmos DB 的 lambda 管道

在许多情况下,流处理实现首先会将大量传入数据接收到 Azure 事件中心或 Apache Kafka 等临时消息队列中。In many cases, stream processing implementations first receive a high volume of incoming data into a temporary message queue such as Azure Event Hub or Apache Kafka. 由于 Azure Cosmos DB 能够支持持续较高的数据引入速率,并保证较低的读取和写入延迟,因此,更改源是极佳的替代方案。The change feed is a great alternative due to Azure Cosmos DB's ability to support a sustained high rate of data ingestion with guaranteed low read and write latency. 基于消息队列的 Azure Cosmos DB 更改源的优势包括:The advantages of the Azure Cosmos DB change feed over a message queue include:

数据持久性Data persistence

写入到 Azure Cosmos DB 的数据将显示在更改源中,并在删除之前一直保留。Data written to Azure Cosmos DB will show up in the change feed and be retained until deleted. 消息队列通常具有最长的保留期。Message queues typically have a maximum retention period. 例如,Azure 事件中心提供的最长数据保留期为 90 天。For example, Azure Event Hub offers a maximum data retention of 90 days.

查询功能Querying ability

除了能够从 Cosmos 容器的更改源中读取数据以外,你还可以对 Azure Cosmos DB 中存储的数据运行 SQL 查询。In addition to reading from a Cosmos container's change feed, you can also run SQL queries on the data stored in Azure Cosmos DB. 更改源不是对容器中已有的数据进行复制,它只是一种不同的数据读取机制。The change feed isn't a duplication of data already in the container but rather just a different mechanism of reading the data. 因此,如果从更改源中读取数据,读取结果始终与同一 Azure Cosmos DB 容器的查询一致。Therefore, if you read data from the change feed, it will always be consistent with queries of the same Azure Cosmos DB container.

高可用性High availability

Azure Cosmos DB 提供高达 99.999% 的读取和写入可用性。Azure Cosmos DB offers up to 99.999% read and write availability. 与许多消息队列不同,Azure Cosmos DB 可以容易地进行多区域分布,并配置零 RTO(恢复时间目标)Unlike many message queues, Azure Cosmos DB data can be easily multiple-regionally distributed and configured with an RTO (Recovery Time Objective) of zero.

处理更改源中的项后,可以生成具体化视图,并将聚合值存回到 Azure Cosmos DB 中。After processing items in the change feed, you can build a materialized view and persist aggregated values back in Azure Cosmos DB. 例如,若要使用 Azure Cosmos DB 构建游戏,可使用更改源,根据已完成的游戏的分数实时更新排行榜。If you're using Azure Cosmos DB to build a game, you can, for example, use change feed to implement real-time leaderboards based on scores from completed games.

数据移动Data movement

还可以从更改源中读取数据,以实现实时数据移动。You can also read from the change feed for real-time data movement.

例如,更改源可帮助你有效执行以下任务:For example, the change feed helps you perform the following tasks efficiently:

  • 使用 Azure Cosmos DB 中存储的数据更新缓存、搜索索引或数据仓库。Update a cache, search index, or data warehouse with data stored in Azure Cosmos DB.

  • 使用不同的逻辑分区键零停机迁移到其他 Azure Cosmos 帐户或其他 Azure Cosmos 容器。Perform zero down-time migrations to another Azure Cosmos account or another Azure Cosmos container with a different logical partition key.

  • 实现应用程序级数据分层和存档。Implement an application-level data tiering and archival. 例如,可将“热数据”存储在 Azure Cosmos DB 中,并将陈旧的“冷数据”存储在 Azure Blob 存储等其他存储系统中。For example, you can store "hot data" in Azure Cosmos DB and age out "cold data" to other storage systems such as Azure Blob Storage.

如果必须反规范化各个分区和容器中的数据,可以从容器的更改源(用作此数据复制操作的源)中读取数据。When you have to denormalize data across partitions and containers, you can read from your container's change feed as a source for this data replication. 使用更改源的实时数据复制只能保证最终一致性。Real-time data replication with the change feed can only guarantee eventual consistency. 在 Cosmos 容器中处理更改时,可以监视更改源处理器的滞后程度You can monitor how far the Change Feed Processor lags behind in processing changes in your Cosmos container.

事件溯源Event sourcing

事件溯源模式涉及使用仅限追加的存储来记录对该数据执行的整个操作系列。The event sourcing pattern involves using an append-only store to record the full series of actions on that data. 在所有数据引入都建模为写入(无更新或删除)的事件溯源体系结构中,Azure Cosmos DB 的更改源非常适合用作中心数据存储。Azure Cosmos DB's change feed is a great choice as a central data store in event sourcing architectures where all data ingestion is modeled as writes (no updates or deletes). 在这种情况下,对 Azure Cosmos DB 的每次写入都是一个“事件”,你可以在更改源中获得以往事件的完整记录。In this case, each write to Azure Cosmos DB is an "event" and you'll have a full record of past events in the change feed. 中心事件存储发布的事件的典型用途是维护具体化视图或者与外部系统集成。Typical uses of the events published by the central event store are for maintaining materialized views or for integration with external systems. 由于更改源中不存在保留时间限制,因此可以通过从 Cosmos 容器更改源的开头部分进行读取,来重放所有以往的事件。Because there is no time limit for retention in the change feed, you can replay all past events by reading from the beginning of your Cosmos container's change feed.

可以让多个更改源使用者订阅同一个容器的更改源You can have multiple change feed consumers subscribe to the same container's change feed. 使用更改源只需支付租用容器的预配吞吐量费用,此外不会产生其他费用。Aside from the lease container's provisioned throughput, there is no cost to utilize the change feed. 不管是否使用更改源,都会在每个容器中提供更改源。The change feed is available in every container regardless of whether it is utilized.

由于 Azure Cosmos DB 在横向可伸缩性和高可用性方面具有优势,因此在事件溯源模式中,它是极佳的仅限追加的中心持久数据存储。Azure Cosmos DB is a great central append-only persistent data store in the event sourcing pattern because of its strengths in horizontal scalability and high availability. 此外,更改源处理器库提供“至少一次”保证,确保不会遗漏任何事件的处理。In addition, the change Feed Processor library offers an "at least once" guarantee, ensuring that you won't miss processing any events.

当前限制Current limitations

必须知道更改源的几项重要限制。The change feed has important limitations that you should understand. 尽管 Cosmos 容器中的项始终会保留在更改源中,但更改源并非完整的操作日志。While items in a Cosmos container will always remain in the change feed, the change feed is not a full operation log. 设计某个利用更改源的应用程序时,需要考虑到一些重要方面。There are important areas to consider when designing an application that utilizes the change feed.

中间更新Intermediate updates

更改源中仅包含最近对给定项所做的更改。Only the most recent change for a given item is included in the change feed. 处理更改时,将会读取最新可用的项版本。When processing changes, you will read the latest available item version. 如果在短时间内对同一项进行了多次更新,可能会遗漏中间更新的处理。If there are multiple updates to the same item in a short period of time, it is possible to miss processing intermediate updates. 若要跟踪更新并重放以往对某个项的更新,建议你改将这些更新建模为一系列写入操作。If you would like to track updates and be able to replay past updates to an item, we recommend modeling these updates as a series of writes instead.

DeletesDeletes

更改源不会捕获删除操作。The change feed does not capture deletes. 如果删除容器中的某个项,也会从更改源中删除该项。If you delete an item from your container, it is also removed from the change feed. 处理此操作的最常用方法是在要删除的项中添加一个软标记。The most common method of handling this is adding a soft marker on the items that are being deleted. 可以添加名为“deleted”的属性,并在执行删除操作时将其设置为“true”。You can add a property called "deleted" and set it to "true" at the time of deletion. 此文档更新将显示在更改源中。This document update will show up in the change feed. 可以在此项中设置 TTL,以后就会自动删除此项。You can set a TTL on this item so that it can be automatically deleted later.

保证顺序Guaranteed order

更改源中提供顺序保证,这种保证是在某个分区键值(而不是在各个分区键值)中提供的。There is guaranteed order in the change feed within a partition key value but not across partition key values. 应该选择可以提供有意义的顺序保证的分区键。You should select a partition key that gives you a meaningful order guarantee.

例如,假设某个零售应用程序使用事件溯源设计模式。For example, consider a retail application using the event sourcing design pattern. 在此应用程序中,每个不同的用户操作都是“事件”,这些事件建模为对 Azure Cosmos DB 的写入。In this application, different user actions are each "events" which are modeled as writes to Azure Cosmos DB. 假设按以下顺序发生了一些示例事件:Imagine if some example events occurred in the following sequence:

  1. 客户将商品 A 添加到其购物车Customer adds Item A to their shopping cart
  2. 客户将商品 B 添加到其购物车Customer adds Item B to their shopping cart
  3. 客户从购物车中删除商品 ACustomer removes Item A from their shopping cart
  4. 客户结帐,然后卖家交付购物车内容Customer checks out and shopping cart contents are shipped

将为每个客户保留当前购物车内容的具体化视图。A materialized view of current shopping cart contents is maintained for each customer. 此应用程序必须确保按事件的发生顺序处理这些事件。This application must ensure that these events are processed in the order in which they occur. 例如,如果在删除商品 A 之前处理了购物车结帐,则卖家可能已经为客户交付了商品 A,而不是所需的商品 B。为了保证按发生顺序处理这四个事件,这些事件应该位于同一个分区键值中。If, for example, the cart checkout were to be processed before Item A's removal, it is likely that the customer would have had Item A shipped, as opposed to the desired Item B. In order to guarantee that these four events are processed in order of their occurrence, they should fall within the same partition key value. 如果选择 用户名 (每个客户都有唯一的用户名)作为分区键,则可以保证这些事件按照它们写入到 Azure Cosmos DB 的顺序显示在更改源中。If you select username (each customer has a unique username) as the partition key, you can guarantee that these events show up in the change feed in the same order in which they are written to Azure Cosmos DB.

示例Examples

下面是一些真实的更改源代码示例,这些示例超出了 Azure 文档中所提供的示例的范围:Here are some real-world change feed code examples that extend beyond the scope of the samples provided in Azure docs:

后续步骤Next steps