Azure Service Fabric 中 Reliable Collections 的相关指导原则和建议Guidelines and recommendations for Reliable Collections in Azure Service Fabric

本部分提供有关使用可靠状态管理器和 Reliable Collections 的指导原则。This section provides guidelines for using Reliable State Manager and Reliable Collections. 目的是帮助用户避免常见错误。The goal is to help users avoid common pitfalls.

这些指导原则被归纳整理成简单的建议,冠以务必请考虑避免切勿等提示语。The guidelines are organized as simple recommendations prefixed with the terms Do, Consider, Avoid and Do not.

  • 切勿修改读取操作(例如 TryPeekAsyncTryGetValueAsync)返回的自定义类型的对象。Do not modify an object of custom type returned by read operations (for example, TryPeekAsync or TryGetValueAsync). Reliable Collections 与 Concurrent Collections 一样,返回对这些对象的引用,而非副本。Reliable Collections, just like Concurrent Collections, return a reference to the objects and not a copy.
  • 在修改返回的自定义类型的对象之前,务必对其进行深层复制。Do deep copy the returned object of a custom type before modifying it. 由于结构和内置类型均按值传递,因此无需对其进行深层复制,除非它们包含要修改的引用类型字段或属性。Since structs and built-in types are pass-by-value, you do not need to do a deep copy on them unless they contain reference-typed fields or properties that you intend to modify.
  • 切勿对超时值使用 TimeSpan.MaxValueDo not use TimeSpan.MaxValue for time-outs. 应使用超时值来检测死锁。Time-outs should be used to detect deadlocks.
  • 切勿在已提交、中止或释放一个事务之后使用该事务。Do not use a transaction after it has been committed, aborted, or disposed.
  • 切勿在对其创建的事务范围之外使用枚举。Do not use an enumeration outside of the transaction scope it was created in.
  • 切勿在另一个事务的 using 语句内创建事务,因为它可能会导致死锁。Do not create a transaction within another transaction's using statement because it can cause deadlocks.
  • 不要通过 IReliableStateManager.GetOrAddAsync 创建可靠状态,请在同一事务中使用可靠状态。Do not create reliable state with IReliableStateManager.GetOrAddAsync and use the reliable state in the same transaction. 这会导致 InvalidOperationException。This results in an InvalidOperationException.
  • 务必确保 IComparable<TKey> 实现正确。Do ensure that your IComparable<TKey> implementation is correct. 系统依赖 IComparable<TKey> 进行检查点和行的合并。The system takes dependency on IComparable<TKey> for merging checkpoints and rows.
  • 意图更新某项而读取该项时,切勿更新锁以防止出现某类死锁。Do use Update lock when reading an item with an intention to update it to prevent a certain class of deadlocks.
  • 请考虑将每个分区的可靠集合数保持在 1000 以下。Consider keeping number of Reliable Collections per partition to be less than 1000. 建议在可靠集合中包含较多项,而不是使用较多可靠集合且在每个集合中包含较少项。Prefer Reliable Collections with more items over more Reliable Collections with fewer items.
  • 请考虑保留 80 KB 以下的项(例如 Reliable Dictionary 的 TKey + TValue):越小越好。Consider keeping your items (for example, TKey + TValue for Reliable Dictionary) below 80 KBytes: smaller the better. 这会减少大型对象堆的使用量,并降低磁盘和网络 IO 的要求。This reduces the amount of Large Object Heap usage as well as disk and network IO requirements. 通常情况下,还会减少在只更新一小部分值时复制的重复数据。Often, it reduces replicating duplicate data when only one small part of the value is being updated. 在 Reliable Dictionary 中实现此效果的常用方法是将一行划分为多行。Common way to achieve this in Reliable Dictionary, is to break your rows in to multiple rows.
  • 请考虑使用备份和还原功能进行灾难恢复。Consider using backup and restore functionality to have disaster recovery.
  • 避免在同一事务中混合使用单个实体操作和多个实体操作(例如 GetCountAsyncCreateEnumerableAsync),因为它们的隔离级别不同。Avoid mixing single entity operations and multi-entity operations (e.g GetCountAsync, CreateEnumerableAsync) in the same transaction due to the different isolation levels.
  • 务必处理 InvalidOperationException。Do handle InvalidOperationException. 系统可能出于各种原因中止用户事务。User transactions can be aborted by the system for variety of reasons. 例如,当可靠状态管理器将其角色从“主要”更改为其他角色时,或者当长时间运行的事务阻止截断事务日志时。For example, when the Reliable State Manager is changing its role out of Primary or when a long-running transaction is blocking truncation of the transactional log. 在这类情况下,用户可能会收到 InvalidOperationException,指示其事务已终止。In such cases, user may receive InvalidOperationException indicating that their transaction has already been terminated. 假设用户未请求终止事务,那么,处理此异常的最佳方式是释放事务,并检查是否发出了取消令牌(或者是否更改了副本的角色),如果没有,则创建新的事务并重试。Assuming, the termination of the transaction was not requested by the user, best way to handle this exception is to dispose the transaction, check if the cancellation token has been signaled (or the role of the replica has been changed), and if not create a new transaction and retry.

需谨记以下几点:Here are some things to keep in mind:

  • 所有 Reliable Collection API 的默认超时值均为 4 秒。The default time-out is four seconds for all the Reliable Collection APIs. 大多数用户应使用默认超时值。Most users should use the default time-out.
  • 所有 Reliable Collections API 中的默认取消标记均为 CancellationToken.NoneThe default cancellation token is CancellationToken.None in all Reliable Collections APIs.
  • 可靠字典的键类型参数 (TKey) 必须正确实现 GetHashCode()Equals()The key type parameter (TKey) for a Reliable Dictionary must correctly implement GetHashCode() and Equals(). 键必须不可变。Keys must be immutable.
  • 若要实现 Reliable Collections 的高可用性,每个服务应至少有一个目标,并且最小副本集大小必须为 3。To achieve high availability for the Reliable Collections, each service should have at least a target and minimum replica set size of 3.
  • 针对辅助副本的读取操作可能会读取未提交仲裁的版本。Read operations on the secondary may read versions that are not quorum committed. 这意味着从单个辅助副本读取的数据版本可能被错误处理。This means that a version of data that is read from a single secondary might be false progressed. 从主副本读取的数据始终是可靠的,绝不会被错误处理。Reads from Primary are always stable: can never be false progressed.
  • 应用程序在可靠集合中保留的数据的安全性/隐私性是用户决定,并受到存储管理的保护;即Security/Privacy of the data persisted by your application in a reliable collection is your decision and subject to the protections provided by your storage management; I.E. 操作系统磁盘加密可用于保护静态数据。Operating System disk encryption could be used to protect your data at rest.
  • ReliableDictionary 枚举使用按键排序的排序数据结构。ReliableDictionary enumeration uses a sorted data structure ordered by key. 为了使枚举高效,提交将会被添加到临时哈希表中,然后被移动到检查点后的主排序数据结构中。To make enumeration efficient, commits are added to a temporary hashtable and later moved into the main sorted data structure post checkpoint. 如果需要验证检查是否存在键,“添加”/“更新”/“删除”操作的最佳运行时为 O(1),最差运行时为 O(log n)。Adds/Updates/Deletes have best case runtime of O(1) and worst case runtime of O(log n), in the case of validation checks on the presence of the key. Gets 可能是 O(1) 或 O(log n),具体取决于你是从最近的提交还是从旧的提交中进行读取。Gets might be O(1) or O(log n) depending on whether you are reading from a recent commit or from an older commit.

易失可靠集合Volatile Reliable Collections

决定使用易失可靠集合时,请考虑以下事项:When deciding to use volatile reliable collections, consider the following:

  • ReliableDictionary 具有易失性支持ReliableDictionary does have volatile support
  • ReliableQueue 具有易失性支持ReliableQueue does have volatile support
  • ReliableConcurrentQueue 不具有易失性支持ReliableConcurrentQueue does NOT have volatile support
  • 持久化服务无法变为易失性服务。Persisted services CANNOT be made volatile. HasPersistedState 标志改为 false 需要从头开始重新创建整个服务Changing the HasPersistedState flag to false requires recreating the entire service from scratch
  • 易失性服务无法变为持久化服务。Volatile services CANNOT be made persisted. HasPersistedState 标志改为 true 需要从头开始重新创建整个服务Changing the HasPersistedState flag to true requires recreating the entire service from scratch
  • HasPersistedState 是服务级别的配置。这意味着所有集合都可归为持久化集合和易失性集合中的一种。HasPersistedState is a service level config. This means that ALL collections will either be persisted or volatile. 不能将易失性集合和持久化集合混合You cannot mix volatile and persisted collections
  • 易失性分区的仲裁丢失会导致数据完全丢失Quorum loss of a volatile partition results in complete data loss
  • 备份和还原不可用于易失性服务Backup and restore is NOT available for volatile services

后续步骤Next steps