Azure Cosmos DB 中的更改源Change feed in Azure Cosmos DB

适用于: SQL API Cassandra API Gremlin API Azure Cosmos DB API for MongoDB

Azure Cosmos DB 中的更改源是一种持久记录,按发生顺序记录对容器所做的更改。Change feed in Azure Cosmos DB is a persistent record of changes to a container in the order they occur. Azure Cosmos DB 中更改源支持的工作原理是侦听 Azure Cosmos 容器中发生的任何更改。Change feed support in Azure Cosmos DB works by listening to an Azure Cosmos container for any changes. 然后,它会按照所更改文档的修改顺序输出这些文档的排序列表。It then outputs the sorted list of documents that were changed in the order in which they were modified. 持久保存的更改能够以异步和增量方式进行处理,而且输出可以分发到一个或多个使用者进行并行处理。The persisted changes can be processed asynchronously and incrementally, and the output can be distributed across one or more consumers for parallel processing.

详细了解更改源设计模式Learn more about change feed design patterns.

支持的 API 和客户端 SDKSupported APIs and client SDKs

目前,以下 Azure Cosmos DB API 和客户端 SDK 支持此功能。This feature is currently supported by the following Azure Cosmos DB APIs and client SDKs.

客户端驱动程序Client drivers SQL APISQL API 用于 Cassandra 的 Azure Cosmos DB APIAzure Cosmos DB's API for Cassandra Azure Cosmos DB 的 API for MongoDBAzure Cosmos DB's API for MongoDB Gremlin APIGremlin API 表 APITable API
.NET.NET Yes Yes Yes Yes No
JavaJava Yes Yes Yes Yes No
PythonPython Yes Yes Yes Yes No
Node/JSNode/JS Yes Yes Yes Yes No

更改源和不同操作Change feed and different operations

现在,可以在更改源中看到所有插入和更新。Today, you see all inserts and updates in the change feed. 无法为特定类型的操作筛选更改源。You can't filter the change feed for a specific type of operation. 一个可能的替代方法是在要更新的项上添加“软标记”,并在更改流中处理项时根据标记进行筛选。One possible alternative, is to add a "soft marker" on the item for updates and filter based on that when processing items in the change feed.

目前更改源不会记录日志删除操作。Currently change feed doesn't log deletes. 与上述示例类似,你可以对要删除的项添加软标记。Similar to the previous example, you can add a soft marker on the items that are being deleted. 例如,你可以在项中添加“deleted”属性,并将它设置为“true”。同时,对项设置 TTL,这样就可以自动删除项了。For example, you can add an attribute in the item called "deleted" and set it to "true" and set a TTL on the item, so that it can be automatically deleted. 你可以在更改源中读取历史项(与该项相对应的最新更改,不包括中间更改),例如,在五年前添加的项。You can read the change feed for historic items (the most recent change corresponding to the item, it doesn't include the intermediate changes), for example, items that were added five years ago. 你可以读取早至容器起源的更改源,但是如果删除了某个项,则会将其从更改源中删除。You can read the change feed as far back as the origin of your container but if an item is deleted, it will be removed from the change feed.

更改源中项的排序顺序Sort order of items in change feed

更改源项按其修改时间排序。Change feed items come in the order of their modification time. 按逻辑分区键保证这种排序顺序。This sort order is guaranteed per logical partition key.

一致性级别Consistency level

在“最终”一致性级别使用更改源时,后续更改源读取操作之间可能存在重复事件(一个读取操作的最后一个事件显示为下一个操作的第一个事件)。While consuming the change feed in an Eventual consistency level, there could be duplicate events in-between subsequent change feed read operations (the last event of one read operation appears as the first of the next).

多区域 Azure Cosmos 帐户中的更改流Change feed in multi-region Azure Cosmos accounts

在多区域 Azure Cosmos 帐户中,如果写入区域进行故障转移,则更改源将在整个手动故障转移操作中运作,并且是连续的。In a multi-region Azure Cosmos account, if a write-region fails over, change feed will work across the manual failover operation and it will be contiguous.

更改源和生存时间 (TTL)Change feed and Time to Live (TTL)

如果某个项的 TTL(生存时间)属性设置为 -1,则将永久保留更改源。If a TTL (Time to Live) property is set on an item to -1, change feed will persist forever. 如果数据未被删除,它将会保留在更改源中。If the data is not deleted, it will remain in the change feed.

更改源和 _etag、_lsn 或 _tsChange feed and _etag, _lsn or _ts

_etag 属于内部格式,请不要依赖它,因为它随时可能更改。The _etag format is internal and you should not take dependency on it, because it can change anytime. _ts 是修改或创建时间戳。_ts is a modification or a creation timestamp. 可以使用 _ts 进行时间顺序比较。You can use _ts for chronological comparison. _lsn 是仅为更改源添加的批 ID;它表示事务 ID。_lsn is a batch ID that is added for change feed only; it represents the transaction ID. 许多项可能具有相同的 _lsn。Many items may have same _lsn. FeedResponse 上的 ETag 不同于项上看到的 _etag。ETag on FeedResponse is different from the _etag you see on the item. _etag 是内部标识符,用于并发控制。_etag is an internal identifier and it is used for concurrency control. _etag 属性指示项的版本,而 ETag 属性用于对源进行排序。The _etag property tells about the version of the item, whereas the ETag property is used for sequencing the feed.

使用更改源Working with change feed

可通过以下选项使用更改源:You can work with change feed using the following options:

更改源适用于容器中的每个逻辑分区键,它可以分配给一个或多个使用者进行并行处理,如下图所示。Change feed is available for each logical partition key within the container, and it can be distributed across one or more consumers for parallel processing as shown in the image below.

Azure Cosmos DB 更改源的分布式处理

更改源的功能Features of change feed

  • 默认情况下,所有 Azure Cosmos 帐户中都启用了更改源。Change feed is enabled by default for all Azure Cosmos accounts.

  • 就像执行任何其他 Azure Cosmos DB 操作一样,可使用预配吞吐量在与 Azure Cosmos 数据库关联的任何区域中从更改源读取数据。You can use your provisioned throughput to read from the change feed, just like any other Azure Cosmos DB operation, in any of the regions associated with your Azure Cosmos database.

  • 更改源包括针对容器中的项所执行的插入和更新操作。The change feed includes inserts and update operations made to items within the container. 在项(如文档)中的删除位置设置“软删除”标志,可以捕获删除操作。You can capture deletes by setting a "soft-delete" flag within your items (for example, documents) in place of deletes. 此外,也可以使用 TTL 功能为项设置有限的过期时段。Alternatively, you can set a finite expiration period for your items with the TTL capability. 例如,24 小时,可使用该属性的值来捕获删除操作。For example, 24 hours and use the value of that property to capture deletes. 使用此解决方案时,处理更改的时间间隔必须比 TTL 过期时段要短。With this solution, you have to process the changes within a shorter time interval than the TTL expiration period.

  • 在更改源中,对项的每个更改都将显示一次,且客户端必须管理其检查点逻辑。Each change to an item appears exactly once in the change feed, and the clients must manage the checkpointing logic. 如果想要避免复杂的检查点管理过程,更改源处理器提供了自动检查点和“至少一次”语义。If you want to avoid the complexity of managing checkpoints, the change feed processor provides automatic checkpointing and "at least once" semantics. 请参阅将更改源与更改源处理器配合使用See using change feed with change feed processor.

  • 更改日志中仅包含最近对给定项所做的更改。Only the most recent change for a given item is included in the change log. 而不包含中途的更改。Intermediate changes may not be available.

  • 更改源按照每个逻辑分区键值中的修改顺序排序。The change feed is sorted by the order of modification within each logical partition key value. 无法保证各分区键值中的顺序一致。There is no guaranteed order across the partition key values.

  • 可从任意时间点同步更改,也就是说,发生更改的数据没有固定的数据保留期。Changes can be synchronized from any point-in-time, that is there is no fixed data retention period for which changes are available.

  • 对于 Azure Cosmos 容器的所有逻辑分区键,可以并行发生更改。Changes are available in parallel for all logical partition keys of an Azure Cosmos container. 多个使用者可以使用此功能并行处理大型容器中发生的更改。This capability allows changes from large containers to be processed in parallel by multiple consumers.

  • 应用程序可针对同一容器同时请求多个更改源。Applications can request multiple change feeds on the same container simultaneously. 可以使用 ChangeFeedOptions.StartTime 提供初始的起点。ChangeFeedOptions.StartTime can be used to provide an initial starting point. 例如,查找对应于给定时钟时间的继续令牌。For example, to find the continuation token corresponding to a given clock time. ContinuationToken(如果指定)优先于 StartTime 和 StartFromBeginning 值。The ContinuationToken, if specified, takes precedence over the StartTime and StartFromBeginning values. ChangeFeedOptions.StartTime 的精度是 ~5 秒。The precision of ChangeFeedOptions.StartTime is ~5 secs.

适用于 Cassandra 和 MongoDB 的 API 中的更改源Change feed in APIs for Cassandra and MongoDB

在 MongoDB API 中,更改源功能显示为更改流;在 Cassandra API 中,它是以包含谓词的查询提供的。Change feed functionality is surfaced as change stream in MongoDB API and Query with predicate in Cassandra API. 若要详细了解 MongoDB API 的实现细节,请参阅 Azure Cosmos DB API for MongoDB 中的更改流To learn more about the implementation details for MongoDB API, see the Change streams in the Azure Cosmos DB API for MongoDB.

本机 Apache Cassandra 提供变更数据捕获 (CDC)。CDC 是一种机制,用于标记要存档的特定表,并在达到 CDC 日志的可配置磁盘空间大小时拒绝写入这些表。Native Apache Cassandra provides change data capture (CDC), a mechanism to flag specific tables for archival as well as rejecting writes to those tables once a configurable size-on-disk for the CDC log is reached. Azure Cosmos DB API for Cassandra 中的更改源功能增强了通过 CQL 使用谓词查询更改的功能。The change feed feature in Azure Cosmos DB API for Cassandra enhances the ability to query the changes with predicate via CQL. 若要详细了解实现细节,请参阅 Azure Cosmos DB API for Cassandra 中的更改源To learn more about the implementation details, see Change feed in the Azure Cosmos DB API for Cassandra.

后续步骤Next steps

接下来,请通过以下文章继续详细了解更改源:You can now proceed to learn more about change feed in the following articles: