Azure Cosmos DB 中的更改源 - 概述Change feed in Azure Cosmos DB - overview

Azure Cosmos DB 中更改源支持的工作原理是侦听 Azure Cosmos 容器中发生的任何更改。Change feed support in Azure Cosmos DB works by listening to an Azure Cosmos container for any changes. 然后会输出一个排序的列表,其中包含已更改的文档,其顺序与修改顺序一样。It then outputs the sorted list of documents that were changed in the order in which they were modified. 所做的更改会持久保存,并能够以异步和增量方式进行处理,而且输出可以分发到一个或多个使用者进行并行处理。The changes are persisted, can be processed asynchronously and incrementally, and the output can be distributed across one or more consumers for parallel processing.

Azure Cosmos DB 非常适合用于 IoT、游戏、零售和操作日志记录应用程序。Azure Cosmos DB is well-suited for IoT, gaming, retail, and operational logging applications. 这些应用程序中的一种常见设计模式是使用数据更改来触发附加的操作。A common design pattern in these applications is to use changes to the data to trigger additional actions. 附加操作的示例包括:Examples of additional actions include:

  • 插入或更新项时触发通知或 API 调用。Triggering a notification or a call to an API, when an item is inserted or updated.
  • 对 IoT 的实时流式处理或对运营数据的实时分析处理。Real-time stream processing for IoT or real-time analytics processing on operational data.
  • 通过与缓存、搜索引擎或数据仓库同步,或者将数据存档到冷存储,进行附加的数据移动。Additional data movement by either synchronizing with a cache or a search engine or a data warehouse or archiving data to cold storage.

使用 Azure Cosmos DB 中的更改源,可针对每种模式构建高效、可缩放的解决方案,如下图所示:The change feed in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns, as shown in the following image:

使用 Azure Cosmos DB 更改源促成实时分析和事件驱动的计算方案

支持的 API 和客户端 SDKSupported APIs and client SDKs

目前,以下 Azure Cosmos DB API 和客户端 SDK 支持此功能。This feature is currently supported by the following Azure Cosmos DB APIs and client SDKs.

客户端驱动程序Client drivers Azure CLIAzure CLI SQL APISQL API 用于 Cassandra 的 Azure Cosmos DB APIAzure Cosmos DB's API for Cassandra Azure Cosmos DB 的 API for MongoDBAzure Cosmos DB's API for MongoDB Gremlin APIGremlin API 表 APITable API
.NET.NET 不可用NA Yes Yes Yes Yes No
JavaJava 不可用NA Yes Yes Yes Yes No
PythonPython 不可用NA Yes Yes Yes Yes No
Node/JSNode/JS 不可用NA Yes Yes Yes Yes No

更改源和不同操作Change feed and different operations

如今,在更改流中可以看到所有操作。Today, you see all operations in the change feed. 针对只更新和不插入等特定操作的功能(可控制更改源)尚不可用。The functionality where you can control change feed, for specific operations such as updates only and not inserts is not yet available. 可以在更新项上添加“软标记”,并在更改源中处理项时根据标记进行筛选。You can add a "soft marker" on the item for updates and filter based on that when processing items in the change feed. 目前更改源不会记录删除操作。Currently change feed doesn't log deletes. 与前面的示例类似,可在要删除的项上添加软标记,例如,可在名为“已删除”的项中添加属性并将其设置为“true”,然后在该项上设置 TTL,这样系统就可将其自动删除。Similar to the previous example, you can add a soft marker on the items that are being deleted, for example, you can add an attribute in the item called "deleted" and set it to "true" and set a TTL on the item, so that it can be automatically deleted. 可以读取历史项的更改源(与该项相对应的最新更改,不包括中间更改),例如,在五年前添加的项。You can read the change feed for historic items (the most recent change corresponding to the item, it doesn't include the intermediate changes), for example, items that were added five years ago. 如果未删除该项,则可以读取不超过容器原始时间的更改源。If the item is not deleted you can read the change feed as far as the origin of your container.

更改源中项的排序顺序Sort order of items in change feed

更改源项按其修改时间排序。Change feed items come in the order of their modification time. 按逻辑分区键保证这种排序顺序。This sort order is guaranteed per logical partition key.

多区域 Azure Cosmos 帐户中的更改流Change feed in multi-region Azure Cosmos accounts

在多区域 Azure Cosmos 帐户中,如果写入区域进行故障转移,则更改源将在整个手动故障转移操作中运作,并且是连续的。In a multi-region Azure Cosmos account, if a write-region fails over, change feed will work across the manual failover operation and it will be contiguous.

更改源和生存时间 (TTL)Change feed and Time to Live (TTL)

如果某个项的 TTL(生存时间)属性设置为 -1,则将永久保留更改源。If a TTL (Time to Live) property is set on an item to -1, change feed will persist forever. 如果数据未被删除,它将会保留在更改源中。If the data is not deleted, it will remain in the change feed.

更改源和 _etag、_lsn 或 _tsChange feed and _etag, _lsn or _ts

_etag 属于内部格式,请不要依赖它,因为它随时可能更改。The _etag format is internal and you should not take dependency on it, because it can change anytime. _ts 是修改或创建时间戳。_ts is a modification or a creation timestamp. 可以使用 _ts 进行时间顺序比较。You can use _ts for chronological comparison. _lsn 是仅为更改源添加的批 ID,它表示事务 ID。_lsn is a batch ID that is added for change feed only; it represents the transaction ID. 许多项可能具有相同的 _lsn。Many items may have same _lsn. FeedResponse 上的 ETag 不同于项上看到的 _etag。ETag on FeedResponse is different from the _etag you see on the item. _etag 是用于并发控制的内部标识符,它告知项的版本,而 ETag 用于将源定序。_etag is an internal identifier and is used for concurrency control tells about the version of the item, whereas ETag is used for sequencing the feed.

更改源用例和方案Change feed use cases and scenarios

更改源可以高效处理具有大量写入的大型数据集。Change feed enables efficient processing of large datasets with a high volume of writes. 更改源还提供用于查询整个数据集以确定更改内容的替代方法。Change feed also offers an alternative to querying an entire dataset to identify what has changed.

用例Use cases

例如,使用更改源可以有效地执行以下任务:For example, with change feed you can perform the following tasks efficiently:

  • 使用 Azure Cosmos DB 中存储的数据更新缓存、搜索索引或数据仓库。Update a cache, update a search index, or update a data warehouse with data stored in Azure Cosmos DB.

  • 实现应用程序级别的数据分层和存档,例如,将“热数据”存储在 Azure Cosmos DB 中,将“冷数据”搁置在其他存储系统中(如 Azure Blob 存储)。Implement an application-level data tiering and archival, for example, store "hot data" in Azure Cosmos DB and age out "cold data" to other storage systems, for example, Azure Blob Storage.

  • 使用不同的逻辑分区键零停机迁移到其他 Azure Cosmos 帐户或其他 Azure Cosmos 容器。Perform zero down-time migrations to another Azure Cosmos account or another Azure Cosmos container with a different logical partition key.

  • 使用 Azure Cosmos DB 实现 lambda 体系结构,其中 Azure Cosmos DB 支持实时、批处理和查询服务层,因此实现的 lambda 体系结构 TCO 较低。Implement lambda architecture using Azure Cosmos DB, where Azure Cosmos DB supports both real-time, batch and query serving layers, thus enabling lambda architecture with low TCO.

  • 接收和存储设备、传感器、基础架构和应用程序发出的事件数据,并实时处理这些事件(例如,使用 Spark)。Receive and store event data from devices, sensors, infrastructure and applications, and process these events in real time, for example, using Spark. 下图显示了如何通过更改源使用 Azure Cosmos DB 实现 lambda 体系结构:The following image shows how you can implement lambda architecture using Azure Cosmos DB via change feed:

    用于引入和查询的基于 Azure Cosmos DB 的 lambda 管道


以下是一些可通过更改源轻松实现的方案:The following are some of the scenarios you can easily implement with change feed:

  • 无服务器 Web 应用或移动应用中,可以跟踪各种事件(例如,对客户配置文件、首选项或其位置的更改),并触发特定的操作(例如,使用 Azure Functions 向客户的设备发送推送通知)。Within your serverless web or mobile apps, you can track events such as all the changes to your customer's profile, preferences, or their location and trigger certain actions, for example, sending push notifications to their devices using Azure Functions.

  • 例如,若要使用 Azure Cosmos DB 来构建游戏,可以使用更改源,根据已完成的游戏的分数实时更新排行榜。If you're using Azure Cosmos DB to build a game, you can, for example, use change feed to implement real-time leaderboards based on scores from completed games.

使用更改源Working with change feed

可通过以下选项使用更改源:You can work with change feed using the following options:

更改源适用于容器中的每个逻辑分区键,它可以分配给一个或多个使用者进行并行处理,如下图所示。Change feed is available for each logical partition key within the container, and it can be distributed across one or more consumers for parallel processing as shown in the image below.

Azure Cosmos DB 更改源的分布式处理

更改源的功能Features of change feed

  • 默认情况下,所有 Azure Cosmos 帐户中都启用了更改源。Change feed is enabled by default for all Azure Cosmos accounts.

  • 就像执行任何其他 Azure Cosmos DB 操作一样,可使用预配吞吐量在与 Azure Cosmos 数据库关联的任何区域中从更改源读取数据。You can use your provisioned throughput to read from the change feed, just like any other Azure Cosmos DB operation, in any of the regions associated with your Azure Cosmos database.

  • 更改源包括针对容器中的项所执行的插入和更新操作。The change feed includes inserts and update operations made to items within the container. 在项(如文档)中的删除位置设置“软删除”标志,可以捕获删除操作。You can capture deletes by setting a "soft-delete" flag within your items (for example, documents) in place of deletes. 此外,也可以使用 TTL 功能为项设置有限的过期时段。Alternatively, you can set a finite expiration period for your items with the TTL capability. 例如,24 小时,可使用该属性的值来捕获删除操作。For example, 24 hours and use the value of that property to capture deletes. 使用此解决方案时,处理更改的时间间隔必须比 TTL 过期时段要短。With this solution, you have to process the changes within a shorter time interval than the TTL expiration period.

  • 在更改源中,对项的每个更改都将显示一次,且客户端必须管理其检查点逻辑。Each change to an item appears exactly once in the change feed, and the clients must manage the checkpointing logic. 如果想要避免管理检查点的复杂性,更改源处理器提供了自动检查点和“至少一次”语义。If you want to avoid the complexity of managing checkpoints, the change feed processor provides automatic checkpointing and "at least once" semantics. 请参阅将更改源与更改源处理器配合使用See using change feed with change feed processor.

  • 更改日志中仅包含最近对给定项所做的更改。Only the most recent change for a given item is included in the change log. 而不包含中途的更改。Intermediate changes may not be available.

  • 更改源按照每个逻辑分区键值中的修改顺序排序。The change feed is sorted by the order of modification within each logical partition key value. 无法保证各分区键值中的顺序一致。There is no guaranteed order across the partition key values.

  • 可从任意时间点同步更改,也就是说,发生更改的数据没有固定的数据保留期。Changes can be synchronized from any point-in-time, that is there is no fixed data retention period for which changes are available.

  • 对于 Azure Cosmos 容器的所有逻辑分区键,可以并行发生更改。Changes are available in parallel for all logical partition keys of an Azure Cosmos container. 多个使用者可以使用此功能并行处理大型容器中发生的更改。This capability allows changes from large containers to be processed in parallel by multiple consumers.

  • 应用程序可针对同一容器同时请求多个更改源。Applications can request multiple change feeds on the same container simultaneously. 可以使用 ChangeFeedOptions.StartTime 提供初始的起点。ChangeFeedOptions.StartTime can be used to provide an initial starting point. 例如,查找对应于给定时钟时间的继续令牌。For example, to find the continuation token corresponding to a given clock time. ContinuationToken(如果指定)优先于 StartTime 和 StartFromBeginning 值。The ContinuationToken, if specified, wins over the StartTime and StartFromBeginning values. ChangeFeedOptions.StartTime 的精度是 ~5 秒。The precision of ChangeFeedOptions.StartTime is ~5 secs.

用于 Cassandra 和 MongoDB 的 API 中的更改源Change feed in APIs for Cassandra and MongoDB

更改源功能在 MongoDB API 中作为更改流出现,在 Cassandra API 中作为带有谓词的查询出现。Change feed functionality is surfaced as change stream in MongoDB API and Query with predicate in Cassandra API. 若要了解有关 MongoDB API 的实现细节的详细信息,请参阅用于 MongoDB 的 Azure Cosmos DB API 中的更改流To learn more about the implementation details for MongoDB API, see the Change streams in the Azure Cosmos DB API for MongoDB.

本机 Apache Cassandra 提供了变更数据捕获 (CDC),这是一种机制,用于标记要存档的特定表,并在达到 CDC 日志的可配置磁盘大小时拒绝写入这些表。Native Apache Cassandra provides change data capture (CDC), a mechanism to flag specific tables for archival as well as rejecting writes to those tables once a configurable size-on-disk for the CDC log is reached. 用于 Cassandra 的 Azure Cosmos DB API 中的更改源功能增强了通过 CQL 使用谓词查询更改的功能。The change feed feature in Azure Cosmos DB API for Cassandra enhances the ability to query the changes with predicate via CQL. 若要了解有关实现细节的详细信息,请参阅用于 Cassandra 的 Azure Cosmos DB API 中的更改源To learn more about the implementation details, see Change feed in the Azure Cosmos DB API for Cassandra.

后续步骤Next steps

接下来,请通过以下文章继续详细了解更改源:You can now proceed to learn more about change feed in the following articles: