配置有状态 Reliable ServicesConfigure stateful reliable services

有两组配置设置可供 Reliable Services 使用。There are two sets of configuration settings for reliable services. 一组适用于群集中的所有 Reliable Services,而另一组特定于特定的 Reliable Service。One set is global for all reliable services in the cluster while the other set is specific to a particular reliable service.

全局配置Global Configuration

全局 Reliable Service 配置在群集的群集清单中的 KtlLogger 节下面指定。The global reliable service configuration is specified in the cluster manifest for the cluster under the KtlLogger section. 它可配置共享日志位置和大小,以及记录器所使用的全局内存限制。It allows configuration of the shared log location and size plus the global memory limits used by the logger. 群集清单是单个 XML 文件,可保留适用于群集中所有节点和服务的设置与配置。The cluster manifest is a single XML file that holds settings and configurations that apply to all nodes and services in the cluster. 此文件通常称为 ClusterManifest.xml。The file is typically called ClusterManifest.xml. 可以使用 Get-ServiceFabricClusterManifest powershell 命令查看群集的群集清单。You can see the cluster manifest for your cluster using the Get-ServiceFabricClusterManifest powershell command.

配置名称Configuration names

NameName 计价单位Unit 默认值Default value 备注Remarks
WriteBufferMemoryPoolMinimumInKBWriteBufferMemoryPoolMinimumInKB 千字节Kilobytes 83886088388608 以内核模式分配给记录器写入缓冲区内存池的最小 KB 数。Minimum number of KB to allocate in kernel mode for the logger write buffer memory pool. 此内存池用于在将状态信息写入磁盘之前缓存这些信息。This memory pool is used for caching state information before writing to disk.
WriteBufferMemoryPoolMaximumInKBWriteBufferMemoryPoolMaximumInKB 千字节Kilobytes 无限制No Limit 记录器写入缓冲区内存池可以增长到的大小上限。Maximum size to which the logger write buffer memory pool can grow.
SharedLogIdSharedLogId GUIDGUID """" 指定用来标识默认共享日志文件的唯一 GUID,该文件用于群集中所有节点上的所有 Reliable Services(不会在其服务特定配置中指定 SharedLogId)。Specifies a unique GUID to use for identifying the default shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogId in their service specific configuration. 如果指定了 SharedLogId,还必须指定 SharedLogPath。If SharedLogId is specified, then SharedLogPath must also be specified.
SharedLogPathSharedLogPath 完全限定的路径名Fully qualified path name """" 指定完全限定的路径,该路径中的共享日志文件用于群集中所有节点上的所有 Reliable Services(不会在其服务特定配置中指定 SharedLogPath)。Specifies the fully qualified path where the shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogPath in their service specific configuration. 但是如果指定了 SharedLogPath,还必须指定 SharedLogId。However, if SharedLogPath is specified, then SharedLogId must also be specified.
SharedLogSizeInMBSharedLogSizeInMB 兆字节Megabytes 81928192 指定以静态方式分配给共享日志的磁盘空间 MB 数。Specifies the number of MB of disk space to statically allocate for the shared log. 此值必须为 2048 或更大。The value must be 2048 or larger.

在 Azure ARM 或本地 JSON 模板中,以下示例说明如何更改为支持有状态服务的任何可靠集合而创建的共享事务日志。In Azure ARM or on-premises JSON template, the example below shows how to change the shared transaction log that gets created to back any reliable collections for stateful services.

"fabricSettings": [{
    "name": "KtlLogger",
    "parameters": [{
        "name": "SharedLogSizeInMB",
        "value": "4096"
    }]
}]

示例本地开发人员群集清单部分Sample local developer cluster manifest section

如果想要在本地开发环境中更改此设置,需要编辑本地 clustermanifest.xml 文件。If you want to change this on your local development environment, you need to edit the local clustermanifest.xml file.

   <Section Name="KtlLogger">
     <Parameter Name="SharedLogSizeInMB" Value="4096"/>
     <Parameter Name="WriteBufferMemoryPoolMinimumInKB" Value="8192" />
     <Parameter Name="WriteBufferMemoryPoolMaximumInKB" Value="8192" />
     <Parameter Name="SharedLogId" Value="{7668BB54-FE9C-48ed-81AC-FF89E60ED2EF}"/>
     <Parameter Name="SharedLogPath" Value="f:\SharedLog.Log"/>
   </Section>

备注Remarks

记录器具有一个从未分页的内核内存分配的内存全局池,节点上的所有 Reliable Services 都可以使用该池在将状态数据写入与 Reliable Service 副本关联的专用日志之前缓存这些数据。The logger has a global pool of memory allocated from non paged kernel memory that is available to all reliable services on a node for caching state data before being written to the dedicated log associated with the reliable service replica. 池大小由 WriteBufferMemoryPoolMinimumInKB 和 WriteBufferMemoryPoolMaximumInKB 设置控制。The pool size is controlled by the WriteBufferMemoryPoolMinimumInKB and WriteBufferMemoryPoolMaximumInKB settings. WriteBufferMemoryPoolMinimumInKB 指定此内存池的初始大小,以及内存池可以缩小到的大小下限。WriteBufferMemoryPoolMinimumInKB specifies both the initial size of this memory pool and the lowest size to which the memory pool may shrink. WriteBufferMemoryPoolMaximumInKB 是内存池可以增长到的大小上限。WriteBufferMemoryPoolMaximumInKB is the highest size to which the memory pool may grow. 每个打开的 Reliable Service 副本都可能会增加内存池的大小,增加幅度从系统决定的数量到 WriteBufferMemoryPoolMaximumInKB。Each reliable service replica that is opened may increase the size of the memory pool by a system determined amount up to WriteBufferMemoryPoolMaximumInKB. 如果内存池的内存需求大于可用的内存,则会延迟内存请求,直到有可用的内存。If there is more demand for memory from the memory pool than is available, requests for memory will be delayed until memory is available. 因此,如果写入缓冲区内存池对特定配置而言太小,则性能可能会受到影响。Therefore if the write buffer memory pool is too small for a particular configuration then performance may suffer.

SharedLogId 和 SharedLogPath 设置始终一起使用,用于定义群集中所有节点的默认共享日志的 GUID 和位置。The SharedLogId and SharedLogPath settings are always used together to define the GUID and location for the default shared log for all nodes in the cluster. 默认共享日志可用于不在特定服务的 settings.xml 中指定设置的所有 Reliable Services。The default shared log is used for all reliable services that do not specify the settings in the settings.xml for the specific service. 为了获得最佳性能,共享日志文件应置于仅用于共享日志文件的磁盘上,以便减少争用。For best performance, shared log files should be placed on disks that are used solely for the shared log file to reduce contention.

SharedLogSizeInMB 指定要预先分配给所有节点上的默认共享日志的磁盘空间数量。SharedLogSizeInMB specifies the amount of disk space to preallocate for the default shared log on all nodes. 若要指定 SharedLogSizeInMB,不需要指定 SharedLogId 和 SharedLogPath。SharedLogId and SharedLogPath do not need to be specified in order for SharedLogSizeInMB to be specified.

服务特定配置Service Specific Configuration

可以通过使用配置包(配置)或服务实现(代码)来修改有状态 Reliable Services 的默认配置。You can modify stateful Reliable Services' default configurations by using the configuration package (Config) or the service implementation (code).

  • 配置 — 通过为应用程序中的每个服务更改 Microsoft Visual Studio 包根目录下的 Config 文件夹中生成的 Settings.xml 文件,可以使用配置包来完成配置。Config - Configuration via the config package is accomplished by changing the Settings.xml file that is generated in the Microsoft Visual Studio package root under the Config folder for each service in the application.
  • 代码 - 通过使用 ReliableStateManagerConfiguration 对象和相应的选项集创建 ReliableStateManager,可以使用代码来完成配置。Code - Configuration via code is accomplished by creating a ReliableStateManager using a ReliableStateManagerConfiguration object with the appropriate options set.

默认情况下,Azure Service Fabric 运行时在 Settings.xml 文件中查找预定义的节名称,并在创建基础运行时组件时使用这些配置值。By default, the Azure Service Fabric runtime looks for predefined section names in the Settings.xml file and consumes the configuration values while creating the underlying runtime components.

Note

删除 Visual Studio 解决方案中生成的 Settings.xml 文件中的以下配置的节名称,除非打算通过代码配置服务。Do not delete the section names of the following configurations in the Settings.xml file that is generated in the Visual Studio solution unless you plan to configure your service via code. 配置 ReliableStateManager 时,重命名配置包名称或节名称需要进行代码更改。Renaming the config package or section names will require a code change when configuring the ReliableStateManager.

复制器安全配置Replicator security configuration

复制器安全配置用于保护在复制过程中使用的通信通道的安全。Replicator security configurations are used to secure the communication channel that is used during replication. 这意味着服务无法看到对方的复制流量,从而确保高度可用的数据也处于安全状态。This means that services will not be able to see each other's replication traffic, ensuring that the data that is made highly available is also secure. 默认情况下,空的安全配置节会影响复制安全。By default, an empty security configuration section prevents replication security.

Important

在 Linux 节点上,证书必须是 PEM 格式。On Linux nodes, certificates must be PEM-formatted. 若要详细了解如何查找和配置适用于 Linux 的证书,请参阅在 Linux 上配置证书To learn more about locating and configuring certificates for Linux, see Configure certificates on Linux.

默认节名称Default section name

ReplicatorSecurityConfigReplicatorSecurityConfig

Note

要更改此节名称,请在创建此服务的 ReliableStateManager 时,将 replicatorSecuritySectionName 参数重写为 ReliableStateManagerConfiguration 构造函数。To change this section name, override the replicatorSecuritySectionName parameter to the ReliableStateManagerConfiguration constructor when creating the ReliableStateManager for this service.

复制器配置Replicator configuration

复制器配置用于配置通过在本地复制和保持状态,负责使有状态 Reliable Service 的状态高度可靠的复制器。Replicator configurations configure the replicator that is responsible for making the stateful Reliable Service's state highly reliable by replicating and persisting the state locally. 默认配置由 Visual Studio 模板生成,并应已足够。The default configuration is generated by the Visual Studio template and should suffice. 本部分介绍了可用于调整复制器的其他配置。This section talks about additional configurations that are available to tune the replicator.

默认节名称Default section name

ReplicatorConfigReplicatorConfig

Note

要更改此节名称,请在创建此服务的 ReliableStateManager 时,将 replicatorSettingsSectionName 参数重写为 ReliableStateManagerConfiguration 构造函数。To change this section name, override the replicatorSettingsSectionName parameter to the ReliableStateManagerConfiguration constructor when creating the ReliableStateManager for this service.

配置名称Configuration names

NameName 计价单位Unit 默认值Default value 备注Remarks
BatchAcknowledgementIntervalBatchAcknowledgementInterval Seconds 0.0150.015 收到操作后,在向主要复制器送回确认之前,辅助复制器等待的时间段。Time period for which the replicator at the secondary waits after receiving an operation before sending back an acknowledgement to the primary. 为在此间隔内处理的操作发送的任何其他确认都作为响应发送。Any other acknowledgements to be sent for operations processed within this interval are sent as one response.
ReplicatorEndpointReplicatorEndpoint 不适用N/A 无默认值--必选参数No default--required parameter 主要/辅助复制器用于与副本集中其他复制器通信的 IP 地址和端口。IP address and port that the primary/secondary replicator will use to communicate with other replicators in the replica set. 这应该引用服务清单中的 TCP 资源终结点。This should reference a TCP resource endpoint in the service manifest. 若要详细了解如何在服务清单中定义终结点资源,请参阅服务清单资源Refer to Service manifest resources to read more about defining endpoint resources in a service manifest.
MaxPrimaryReplicationQueueSizeMaxPrimaryReplicationQueueSize 操作的数量Number of operations 81928192 主要队列中的操作的最大数目。Maximum number of operations in the primary queue. 主复制器接收到来自所有辅助复制器的确认之后,释放一个操作。An operation is freed up after the primary replicator receives an acknowledgement from all the secondary replicators. 此值必须大于 64 和 2 的幂。This value must be greater than 64 and a power of 2.
MaxSecondaryReplicationQueueSizeMaxSecondaryReplicationQueueSize 操作的数量Number of operations 1638416384 辅助队列中的操作的最大数目。Maximum number of operations in the secondary queue. 会在使操作的状态在暂留期间高度可用后释放该操作。An operation is freed up after making its state highly available through persistence. 此值必须大于 64 和 2 的幂。This value must be greater than 64 and a power of 2.
CheckpointThresholdInMBCheckpointThresholdInMB MBMB 5050 创建状态检查点后的日志文件空间量。Amount of log file space after which the state is checkpointed.
MaxRecordSizeInKBMaxRecordSizeInKB KBKB 10241024 复制器可以在日志中写入的最大记录大小。Largest record size that the replicator may write in the log. 此值必须是 4 的倍数,且大于 16。This value must be a multiple of 4 and greater than 16.
MinLogSizeInMBMinLogSizeInMB MBMB 0(系统确定)0 (system determined) 事务日志的最小大小。Minimum size of the transactional log. 不允许将日志截断为低于此设置的大小。The log will not be allowed to truncate to a size below this setting. 0 表示复制器会确定最小日志大小。0 indicates that the replicator will determine the minimum log size. 由于减少了截断相关日志记录的可能性,所以增加此值会增加执行部分副本和增量备份的可能性。Increasing this value increases the possibility of doing partial copies and incremental backups since chances of relevant log records being truncated is lowered.
TruncationThresholdFactorTruncationThresholdFactor 因子Factor 22 确定会触发截断的日志的大小。Determines at what size of the log, truncation will be triggered. 截断阈值由 MinLogSizeInMB 乘以 TruncationThresholdFactor 确定。Truncation threshold is determined by MinLogSizeInMB multiplied by TruncationThresholdFactor. TruncationThresholdFactor 必须是大于 1。TruncationThresholdFactor must be greater than 1. MinLogSizeInMB * TruncationThresholdFactor 必须小于 MaxStreamSizeInMB。MinLogSizeInMB * TruncationThresholdFactor must be less than MaxStreamSizeInMB.
ThrottlingThresholdFactorThrottlingThresholdFactor 系数Factor 44 确定副本会开始受到限制的日志的大小。Determines at what size of the log, the replica will start being throttled. 限制阈值(以 MB 为单位)由 Max((MinLogSizeInMB * ThrottlingThresholdFactor),(CheckpointThresholdInMB * ThrottlingThresholdFactor)) 确定。Throttling threshold (in MB) is determined by Max((MinLogSizeInMB * ThrottlingThresholdFactor),(CheckpointThresholdInMB * ThrottlingThresholdFactor)). 限制阈值(以 MB 为单位)必须大于截断阈值(以 MB 为单位)。Throttling threshold (in MB) must be greater than truncation threshold (in MB). 截断阈值(以 MB 为单位)必须小于 MaxStreamSizeInMB。Truncation threshold (in MB) must be less than MaxStreamSizeInMB.
MaxAccumulatedBackupLogSizeInMBMaxAccumulatedBackupLogSizeInMB MBMB 800800 给定备份日志链中备份日志的最大累积大小(以 MB 为单位)。Max accumulated size (in MB) of backup logs in a given backup log chain. 如果增量备份会生成导致累积备份日志的备份日志,增量备份请求会失败,因为相关完整备份会大于此大小。An incremental backup requests will fail if the incremental backup would generate a backup log that would cause the accumulated backup logs since the relevant full backup to be larger than this size. 在这种情况下,用户需要执行完整备份。In such cases, user is required to take a full backup.
SharedLogIdSharedLogId GUIDGUID """" 指定要用于标识与此副本一起使用的共享日志文件的唯一 GUID。Specifies a unique GUID to use for identifying the shared log file used with this replica. 通常情况下,服务不应使用此设置。Typically, services should not use this setting. 但是如果指定了 SharedLogId,还必须指定 SharedLogPath。However, if SharedLogId is specified, then SharedLogPath must also be specified.
SharedLogPathSharedLogPath 完全限定的路径名Fully qualified path name """" 指定要在其中创建此副本共享日志文件的完全限定路径。Specifies the fully qualified path where the shared log file for this replica will be created. 通常情况下,服务不应使用此设置。Typically, services should not use this setting. 但是如果指定了 SharedLogPath,还必须指定 SharedLogId。However, if SharedLogPath is specified, then SharedLogId must also be specified.
SlowApiMonitoringDurationSlowApiMonitoringDuration Seconds 300300 设置托管 API 调用的监视间隔。Sets the monitoring interval for managed API calls. 示例:用户提供的备份回调函数。Example: user provided backup callback function. 此间隔时间过去后,会向运行状况管理器发送一个警告运行状况报告。After the interval has passed, a warning health report will be sent to the Health Manager.
LogTruncationIntervalSecondsLogTruncationIntervalSeconds Seconds 00 在每个副本上启动日志截断的可配置间隔。Configurable interval at which log truncation will be initiated on each replica. 它用于确保还基于时间而不仅仅是根据日志大小来截断日志。It is used to ensure log is also truncated based on time instead of just log size. 此设置还会强制清除可靠字典中的已删除条目。This setting also forces purge of deleted entries in reliable dictionary. 因此,它可用于确保及时清除已删除的项目。Hence it can be used to ensure deleted items are purged in a timely manner.

通过代码进行配置的示例Sample configuration via code

class Program
{
    /// <summary>
    /// This is the entry point of the service host process.
    /// </summary>
    static void Main()
    {
        ServiceRuntime.RegisterServiceAsync("HelloWorldStatefulType",
            context => new HelloWorldStateful(context, 
                new ReliableStateManager(context, 
        new ReliableStateManagerConfiguration(
                        new ReliableStateManagerReplicatorSettings()
            {
                RetryInterval = TimeSpan.FromSeconds(3)
                        }
            )))).GetAwaiter().GetResult();
    }
}    
class MyStatefulService : StatefulService
{
    public MyStatefulService(StatefulServiceContext context, IReliableStateManagerReplica stateManager)
        : base(context, stateManager)
    { }
    ...
}

示例配置文件Sample configuration file

<?xml version="1.0" encoding="utf-8"?>
<Settings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2011/01/fabric">
   <Section Name="ReplicatorConfig">
      <Parameter Name="ReplicatorEndpoint" Value="ReplicatorEndpoint" />
      <Parameter Name="BatchAcknowledgementInterval" Value="0.05"/>
      <Parameter Name="CheckpointThresholdInMB" Value="512" />
   </Section>
   <Section Name="ReplicatorSecurityConfig">
      <Parameter Name="CredentialType" Value="X509" />
      <Parameter Name="FindType" Value="FindByThumbprint" />
      <Parameter Name="FindValue" Value="9d c9 06 b1 69 dc 4f af fd 16 97 ac 78 1e 80 67 90 74 9d 2f" />
      <Parameter Name="StoreLocation" Value="LocalMachine" />
      <Parameter Name="StoreName" Value="My" />
      <Parameter Name="ProtectionLevel" Value="EncryptAndSign" />
      <Parameter Name="AllowedCommonNames" Value="My-Test-SAN1-Alice,My-Test-SAN1-Bob" />
   </Section>
</Settings>

备注Remarks

BatchAcknowledgementInterval 控制复制延迟。BatchAcknowledgementInterval controls replication latency. “0”值导致可能的最低延迟,但代价是牺牲吞吐量(因为必须发送和处理更多确认消息,每个包含较少的确认)。A value of '0' results in the lowest possible latency, at the cost of throughput (as more acknowledgement messages must be sent and processed, each containing fewer acknowledgements). BatchAcknowledgementInterval 的值越大,整体复制吞吐量就越高,但代价是导致更高的操作延迟。The larger the value for BatchAcknowledgementInterval, the higher the overall replication throughput, at the cost of higher operation latency. 这直接转换为事务提交的延迟。This directly translates to the latency of transaction commits.

CheckpointThresholdInMB 的值控制复制器可以用于将状态信息存储在副本的专用日志文件中的磁盘空间量。The value for CheckpointThresholdInMB controls the amount of disk space that the replicator can use to store state information in the replica's dedicated log file. 将此值提高到大于默认值可以在将副本添加到集时缩短重新配置的时间。Increasing this to a higher value than the default could result in faster reconfiguration times when a new replica is added to the set. 这是因为日志中会提供更多的操作历史记录,从而发生部分状态传输。This is due to the partial state transfer that takes place due to the availability of more history of operations in the log. 在崩溃后,这可能会延长副本恢复时间。This can potentially increase the recovery time of a replica after a crash.

MaxRecordSizeInKB 设置用于定义可由复制器写入日志文件的记录的最大大小。The MaxRecordSizeInKB setting defines the maximum size of a record that can be written by the replicator into the log file. 大多数情况下,默认的 1024-KB 记录大小是最佳大小。In most cases, the default 1024-KB record size is optimal. 但是,如果服务使更大数据项成为状态信息的一部分,则可能需要增加此值。However, if the service is causing larger data items to be part of the state information, then this value might need to be increased. 使 MaxRecordSizeInKB 小于 1024 几乎没什么好处,因为较小记录仅使用较小记录所需的空间。There is little benefit in making MaxRecordSizeInKB smaller than 1024, as smaller records use only the space needed for the smaller record. 我们预期此值只在极少数情况下需要更改。We expect that this value would need to be changed in only rare cases.

SharedLogId 和 SharedLogPath 设置始终一起使用,使服务可以使用与节点的默认共享日志不同的共享日志。The SharedLogId and SharedLogPath settings are always used together to make a service use a separate shared log from the default shared log for the node. 为获得最佳效率,应让尽可能多的服务指定相同共享日志。For best efficiency, as many services as possible should specify the same shared log. 共享日志文件应置于仅用于共享日志文件的磁盘上,以便减少磁头运动争用。Shared log files should be placed on disks that are used solely for the shared log file to reduce head movement contention. 我们预期此值只在极少数情况下需要更改。We expect that this value would need to be changed in only rare cases.

后续步骤Next steps