sfctl chaossfctl chaos

启动、停止和报告混沌测试服务。Start, stop, and report on the chaos test service.

子组Subgroups

子组Subgroup 说明Description
scheduleschedule 获取和设置 Chaos Schedule。Get and set the chaos schedule.

命令Commands

CommandCommand 说明Description
活动events 根据继续标记或时间范围获取 Chaos 事件的下一段。Gets the next segment of the Chaos events based on the continuation token or the time range.
getget 获取 Chaos 的状态。Get the status of Chaos.
startstart 在群集中启动混沌测试。Starts Chaos in the cluster.
stopstop 如果 Chaos 正在群集中运行,则会停止 Chaos 并将 Chaos Schedule 置于已停止状态。Stops Chaos if it is running in the cluster and put the Chaos Schedule in a stopped state.

sfctl chaos eventssfctl chaos events

根据继续标记或时间范围获取 Chaos 事件的下一段。Gets the next segment of the Chaos events based on the continuation token or the time range.

若要获取 Chaos 事件的下一段,可以指定 ContinuationToken。To get the next segment of the Chaos events, you can specify the ContinuationToken. 若要获取 Chaos 事件的新段的开头,可以通过 StartTimeUtc 和 EndTimeUtc 来指定时间范围。To get the start of a new segment of Chaos events, you can specify the time range through StartTimeUtc and EndTimeUtc. 在同一调用中不能同时指定 ContinuationToken 和时间范围。You cannot specify both the ContinuationToken and the time range in the same call. 当 Chaos 事件多于 100 个时,会在多个段中返回 Chaos 事件,其中每段包含不超过 100 个 Chaos 事件,若要获取下一段,请调用此 API 并使用 ContinuationToken。When there are more than 100 Chaos events, the Chaos events are returned in multiple segments where a segment contains no more than 100 Chaos events and to get the next segment you make a call to this API with the continuation token.

参数Arguments

参数Argument 说明Description
--continuation-token--continuation-token 继续标记参数用于获取下一组结果。The continuation token parameter is used to obtain next set of results. 如果单个响应无法容纳来自系统的结果,则 API 响应中包括含有非空值的继续标记。A continuation token with a non-empty value is included in the response of the API when the results from the system do not fit in a single response. 当此值传递到下一个 API 调用时,API 返回下一组结果。When this value is passed to the next API call, the API returns next set of results. 如果没有更多结果,则该继续标记不包含值。If there are no further results, then the continuation token does not contain a value. 不应将此参数的值进行 URL 编码。The value of this parameter should not be URL encoded.
--end-time-utc--end-time-utc Windows 文件时间,表示要生成 Chaos 报告的时间范围的结束时间。The Windows file time representing the end time of the time range for which a Chaos report is to be generated. 有关详细信息,请参阅 DateTime.ToFileTimeUtc 方法Consult DateTime.ToFileTimeUtc Method for details.
--max-results--max-results 作为分页查询的一部分返回的最大结果数。The maximum number of results to be returned as part of the paged queries. 此参数定义返回结果数的上限。This parameter defines the upper bound on the number of results returned. 如果根据配置中定义的最大消息大小限制,无法将这些结果容纳到消息中,则返回的结果数可能小于指定的最大结果数。The results returned can be less than the specified maximum results if they do not fit in the message as per the max message size restrictions defined in the configuration. 如果此参数为零或者未指定,则分页查询包含返回消息中最多可容纳的结果数。If this parameter is zero or not specified, the paged query includes as many results as possible that fit in the return message.
--start-time-utc--start-time-utc Windows 文件时间,表示要生成 Chaos 报告的时间范围的开始时间。The Windows file time representing the start time of the time range for which a Chaos report is to be generated. 有关详细信息,请参阅 DateTime.ToFileTimeUtc 方法Consult DateTime.ToFileTimeUtc Method for details.
--timeout -t--timeout -t 执行操作的服务器超时,以秒为单位。The server timeout for performing the operation in seconds. 此超时指定客户端可以等待请求的操作完成的持续时间。This timeout specifies the time duration that the client is willing to wait for the requested operation to complete. 此参数的默认值为 60 秒。The default value for this parameter is 60 seconds. 默认值: 60。Default: 60.

全局参数Global Arguments

参数Argument 说明Description
--debug--debug 提高日志记录详细程度以显示所有调试日志。Increase logging verbosity to show all debug logs.
--help -h--help -h 显示此帮助消息并退出。Show this help message and exit.
--output -o--output -o 输出格式。Output format. 允许的值: json、jsonc、table、tsv。Allowed values: json, jsonc, table, tsv. 默认值: json。Default: json.
--query--query JMESPath 查询字符串。JMESPath query string. 有关详细信息和示例,请参阅 http://jmespath.org/。See http://jmespath.org/ for more information and examples.
--verbose--verbose 提高日志记录详细程度。Increase logging verbosity. 使用 --debug 获取完整的调试日志。Use --debug for full debug logs.

sfctl chaos getsfctl chaos get

获取 Chaos 的状态。Get the status of Chaos.

获取 Chaos 的状态(指示 Chaos 是否正在运行)、用于运行 Chaos 的 Chaos 参数,以及 Chaos Schedule 的状态。Get the status of Chaos indicating whether or not Chaos is running, the Chaos parameters used for running Chaos and the status of the Chaos Schedule.

参数Arguments

参数Argument 说明Description
--timeout -t--timeout -t 执行操作的服务器超时,以秒为单位。The server timeout for performing the operation in seconds. 此超时指定客户端可以等待请求的操作完成的持续时间。This timeout specifies the time duration that the client is willing to wait for the requested operation to complete. 此参数的默认值为 60 秒。The default value for this parameter is 60 seconds. 默认值: 60。Default: 60.

全局参数Global Arguments

参数Argument 说明Description
--debug--debug 提高日志记录详细程度以显示所有调试日志。Increase logging verbosity to show all debug logs.
--help -h--help -h 显示此帮助消息并退出。Show this help message and exit.
--output -o--output -o 输出格式。Output format. 允许的值: json、jsonc、table、tsv。Allowed values: json, jsonc, table, tsv. 默认值: json。Default: json.
--query--query JMESPath 查询字符串。JMESPath query string. 有关详细信息和示例,请参阅 http://jmespath.org/。See http://jmespath.org/ for more information and examples.
--verbose--verbose 提高日志记录详细程度。Increase logging verbosity. 使用 --debug 获取完整的调试日志。Use --debug for full debug logs.

sfctl chaos startsfctl chaos start

在群集中启动混沌测试。Starts Chaos in the cluster.

如果尚未在群集中运行混沌测试,则使用混沌测试参数中指定的值开始运行混沌测试。If Chaos is not already running in the cluster, it starts Chaos with the passed in Chaos parameters. 如果进行此调用时运行混沌,调用将失败,错误代码为 FABRIC_E_CHAOS_ALREADY_RUNNING。If Chaos is already running when this call is made, the call fails with the error code FABRIC_E_CHAOS_ALREADY_RUNNING. 有关更多详细信息,请参阅在 Service Fabric 群集中引入受控的混沌测试一文。Refer to the article Induce controlled Chaos in Service Fabric clusters for more details.

参数Arguments

参数Argument 说明Description
--app-type-health-policy-map--app-type-health-policy-map 包含特定应用程序类型的最大不正常应用程序百分比的 JOSN 编码字典(键/值)条目的数组。JSON encoded array of dictionary (key / value) entries with max percentage unhealthy applications for specific application types. 每个字典以键的形式指定应用程序类型名称,并为值指定一个整数,该值表示用于评估指定应用程序类型的应用程序的 MaxPercentUnhealthyApplications 百分比。Each dictionary entry specifies as a key the application type name and an integer for value that represents the MaxPercentUnhealthyApplications percentage used to evaluate the applications of the specified application type.

定义包含特定应用程序类型的最大不正常应用程序百分比的映射。Defines a map with max percentage unhealthy applications for specific application types. 群集运行状况评估期间,可使用应用程序类型运行状况策略,描述单个应用程序类型。The application type health policy map can be used during cluster health evaluation to describe individual application types. 映射中包含的应用程序类型根据映射中指定的百分比,而不是群集运行状况策略中定义的全局 MaxPercentUnhealthyApplications 进行评估。The application types included in the map are evaluated against the percentage specified in the map, and not with the global MaxPercentUnhealthyApplications defined in the cluster health policy. 映射中指定的应用程序类型不会计入全局应用程序池。The applications of application types specified in the map are not counted against the global pool of applications. 例如,如果某种类型的应用程序至关重要,群集管理员可以将条目添加到该应用程序类型的映射,并将其分配值为 0%(不容忍任何失败)。For example, if some applications of a type are critical, the cluster administrator can add an entry to the map for that application type and assign it a value of 0% (do not tolerate any failures). 可以使用设置为 20% 容忍数千个应用程序实例外的一些故障 MaxPercentUnhealthyApplications 计算所有其他应用程序。All other applications can be evaluated with MaxPercentUnhealthyApplications set to 20% to tolerate some failures out of the thousands of application instances. 仅当群集清单启用应用程序类型运行状况评估的配置条目用于 HealthManager/EnableApplicationTypeHealthEvaluation 使用应用程序类型运行状况策略映射。The application type health policy map is used only if the cluster manifest enables application type health evaluation using the configuration entry for HealthManager/EnableApplicationTypeHealthEvaluation.

示例 JSON 编码的字符串:[{"key": "fabric:/Voting", "value":"0"}]Example JSON encoded string: [{"key": "fabric:/Voting", "value": "0"}]
--chaos-target-filter--chaos-target-filter JSON 编码字典具有两个字符串类型键。JSON encoded dictionary with two string type keys. 这两个键是 NodeTypeInclusionList 和 ApplicationInclusionList。The two keys are NodeTypeInclusionList and ApplicationInclusionList. 这两个键的值为字符串列表。Values for both of these keys are list of string. chaos_target_filter 定义所有筛选器目标混沌错误,例如,出错仅特定节点类型或出错仅某些应用程序。chaos_target_filter defines all filters for targeted Chaos faults, for example, faulting only certain node types or faulting only certain applications.

如未使用 chaos_target_filter,混沌测试会使所有群集实体故障。If chaos_target_filter is not used, Chaos faults all cluster entities. 如果使用 chaos_target_filter,混沌测试仅使满足 chaos_target_filter 规定的实体故障。If chaos_target_filter is used, Chaos faults only the entities that meet the chaos_target_filter specification. NodeTypeInclusionList 和 ApplicationInclusionList 仅允许联合语义。NodeTypeInclusionList and ApplicationInclusionList allow a union semantics only. 不可指定 NodeTypeInclusionList 和 ApplicationInclusionList 的交集。It is not possible to specify an intersection of NodeTypeInclusionList and ApplicationInclusionList. 例如,不可指定“仅当此应用程序在该节点类型上时使其故障”。For example, it is not possible to specify "fault this application only when it is on that node type." 一旦实体包含在 NodeTypeInclusionList 或 ApplicationInclusionList 中,便不能使用 ChaosTargetFilter 排除该实体。Once an entity is included in either NodeTypeInclusionList or ApplicationInclusionList, that entity cannot be excluded using ChaosTargetFilter. 即使 applicationX 未出现在 ApplicationInclusionList 中,在一些混沌测试迭代中,也可使 applicationX 故障,因为它恰好在 NodeTypeInclusionList 中的 nodeTypeY 的节点上。Even if applicationX does not appear in ApplicationInclusionList, in some Chaos iteration applicationX can be faulted because it happens to be on a node of nodeTypeY that is included in NodeTypeInclusionList. 如果 NodeTypeInclusionList 和 ApplicationInclusionList 为空,则会引发 ArgumentException。If both NodeTypeInclusionList and ApplicationInclusionList are empty, an ArgumentException is thrown. 所有类型故障(重启节点、重启代码包、删除副本、重启副本、移动主副本和移动辅助副本)均为这些节点类型的节点启用。All types of faults (restart node, restart code package, remove replica, restart replica, move primary, and move secondary) are enabled for the nodes of these node types. 如果节点类型(比如 NodeTypeX)未出现在 NodeTypeInclusionList 中,节点级别故障(比如 NodeRestart)将不会为 NodeTypeX 的节点启用。但是,如果 ApplicationInclusionList 中的应用程序碰巧位于 NodeTypeX 的节点上,那么代码包和副本故障仍可为 NodeTypeX 启用。If a node type (say NodeTypeX) does not appear in the NodeTypeInclusionList, then node level faults (like NodeRestart) will never be enabled for the nodes of NodeTypeX, but code package and replica faults can still be enabled for NodeTypeX if an application in the ApplicationInclusionList happens to reside on a node of NodeTypeX. 此列表最多可以包含 100 个节点类型名称,若要增加,MaxNumberOfNodeTypesInChaosEntityFilter 配置需要升级。At most 100 node type names can be included in this list, to increase this number, a config upgrade is required for MaxNumberOfNodeTypesInChaosEntityFilter configuration. 所有属于这些应用程序服务的副本服从混沌测试的副本故障(重启副本、删除副本、移动主副本和移动辅助副本)。All replicas belonging to services of these applications are amenable to replica faults (restart replica, remove replica, move primary, and move secondary) by Chaos. 仅在代码包仅托管这些应用程序的副本时,混沌测试可重启代码包。Chaos may restart a code package only if the code package hosts replicas of these applications only. 如果应用程序未出现在此列表中,那么还是可以在某些混沌测试迭代中使它故障,条件是应用程序最终位于 NodeTypeInclusionList 中的节点类型的节点上。If an application does not appear in this list, it can still be faulted in some Chaos iteration if the application ends up on a node of a node type that is included in NodeTypeInclusionList. 但是,如果 applicationX 通过放置约束固定为 nodeTypeY,并且 applicationX 不在 ApplicationInclusionList 中同时 nodeTypeY 不在 NodeTypeInclusionList 中,那么不会使 applicationX 故障。However if applicationX is tied to nodeTypeY through placement constraints and applicationX is absent from ApplicationInclusionList and nodeTypeY is absent from NodeTypeInclusionList, then applicationX will never be faulted. 此列表最多可以包含 1000 个应用程序名称,若要增加,MaxNumberOfApplicationsInChaosEntityFilter 配置需要升级。At most 1000 application names can be included in this list, to increase this number, a config upgrade is required for MaxNumberOfApplicationsInChaosEntityFilter configuration.
--context--context 类型键值对的 JSON 编码映射 (string, string)。JSON encoded map of (string, string) type key-value pairs. 此映射可用于记录混沌测试的相关运行信息。The map can be used to record information about the Chaos run. 这种键值对不能超过 100 个,并且每个字符串(键或值)的长度不能超过 4095 个字符。There cannot be more than 100 such pairs and each string (key or value) can be at most 4095 characters long. 此映射由混沌测试运行的启动程序设置为根据需要存储特定运行的相关上下文。This map is set by the starter of the Chaos run to optionally store the context about the specific run.
--disable-move-replica-faults--disable-move-replica-faults 禁用移动主副本错误和移动辅助副本错误。Disables the move primary and move secondary faults.
--max-cluster-stabilization--max-cluster-stabilization 等待所有群集实体变稳定和正常运行的最长时间。The maximum amount of time to wait for all cluster entities to become stable and healthy. 默认值: 60。Default: 60.

在每次迭代开始时它的验证群集实体的运行状况和混沌在迭代中执行。Chaos executes in iterations and at the start of each iteration it validates the health of cluster entities. 在验证期间是否群集实体不稳定状态并且正常内 MaxClusterStabilizationTimeoutInSeconds,混沌会生成验证失败的事件。During validation if a cluster entity is not stable and healthy within MaxClusterStabilizationTimeoutInSeconds, Chaos generates a validation failed event.
--max-concurrent-faults--max-concurrent-faults 每次迭代造成的最大并发错误数。The maximum number of concurrent faults induced per iteration. 混沌执行在迭代中,并验证阶段用分隔两个连续的迭代。Chaos executes in iterations and two consecutive iterations are separated by a validation phase. 并发越高,故障更主动-将系列更复杂的状态,以发现 bug 的注入。The higher the concurrency, the more aggressive the injection of faults -- inducing more complex series of states to uncover bugs. 建议是以开头的值为 2 或 3 并向上移动时请务必小心。The recommendation is to start with a value of 2 or 3 and to exercise caution while moving up. 默认值: 1。Default: 1.
--max-percent-unhealthy-apps--max-percent-unhealthy-apps 在 Chaos 期间评估群集运行状况时,报告错误之前允许的最大不正常应用程序百分比。When evaluating cluster health during Chaos, the maximum allowed percentage of unhealthy applications before reporting an error.

报告错误之前允许的最大不正常应用程序百分比。The maximum allowed percentage of unhealthy applications before reporting an error. 例如,若要允许 10% 的应用程序处于不正常状态,此值为 10。For example, to allow 10% of applications to be unhealthy, this value would be 10. 该百分比表示在将群集视为出错之前可处于不正常状态的应用程序的最大容许百分比。The percentage represents the maximum tolerated percentage of applications that can be unhealthy before the cluster is considered in error. 如果未超过该百分比,但至少存在一个不正常的应用程序,则将运行状况评估为 Warning。If the percentage is respected but there is at least one unhealthy application, the health is evaluated as Warning. 该百分比的计算方式是将不正常的应用程序数除以群集中的应用程序实例总数,不包括 ApplicationTypeHealthPolicyMap 中包含的应用程序类型的应用程序。This is calculated by dividing the number of unhealthy applications over the total number of application instances in the cluster, excluding applications of application types that are included in the ApplicationTypeHealthPolicyMap. 计算结果调高为整数,以便容忍少量应用程序出现一次失败。The computation rounds up to tolerate one failure on small numbers of applications. 默认百分比为零。Default percentage is zero.
--max-percent-unhealthy-nodes--max-percent-unhealthy-nodes 在 Chaos 期间评估群集运行状况时,报告错误之前允许的最大不正常节点百分比。When evaluating cluster health during Chaos, the maximum allowed percentage of unhealthy nodes before reporting an error.

报告错误之前允许的最大不正常节点百分比。The maximum allowed percentage of unhealthy nodes before reporting an error. 例如,若要允许 10% 的节点处于不正常状态,此值为 10。For example, to allow 10% of nodes to be unhealthy, this value would be 10. 该百分比表示在将群集视为出错之前可处于不正常状态的节点的最大容许百分比。The percentage represents the maximum tolerated percentage of nodes that can be unhealthy before the cluster is considered in error. 如果未超过该百分比,但至少存在一个不正常的节点,则将运行状况评估为警告。If the percentage is respected but there is at least one unhealthy node, the health is evaluated as Warning. 该百分比的计算方式是将不正常的节点数除以群集中的节点总数。The percentage is calculated by dividing the number of unhealthy nodes over the total number of nodes in the cluster. 计算结果调高为整数,以便容忍少量节点上出现一次失败。The computation rounds up to tolerate one failure on small numbers of nodes. 默认百分比为零。Default percentage is zero. 在大型群集中,始终会有一些要关闭或需要修复的节点,因此应配置此百分比以便容忍这种情况。In large clusters, some nodes will always be down or out for repairs, so this percentage should be configured to tolerate that.
--time-to-run--time-to-run Chaos 在自动停止之前要运行的总时间(以秒为单位)。Total time (in seconds) for which Chaos will run before automatically stopping. 最大允许值为 4,294,967,295 (System.UInt32.MaxValue)。The maximum allowed value is 4,294,967,295 (System.UInt32.MaxValue). 默认值: 4294967295。Default: 4294967295.
--timeout -t--timeout -t 默认值: 60。Default: 60.
--wait-time-between-faults--wait-time-between-faults 单次迭代中发生连续错误的间隔等待时间(以秒为单位)。Wait time (in seconds) between consecutive faults within a single iteration. 默认值: 20。Default: 20.

值越大越低错误和更简单之间重叠的状态的序列转换群集内通过。The larger the value, the lower the overlapping between faults and the simpler the sequence of state transitions that the cluster goes through. 建议是从开始向上移动时的 1 和 5 和练习请小心之间的值。The recommendation is to start with a value between 1 and 5 and exercise caution while moving up.
--wait-time-between-iterations--wait-time-between-iterations Chaos 的每两次连续迭代的间隔时间(以秒为单位)。Time-separation (in seconds) between two consecutive iterations of Chaos. 值越大,故障注入率越低。The larger the value, the lower the fault injection rate. 默认值: 30。Default: 30.
--warning-as-error--warning-as-error 指示是否将警告的严重性视为与错误相同。Indicates whether warnings are treated with the same severity as errors.

全局参数Global Arguments

参数Argument 说明Description
--debug--debug 提高日志记录详细程度以显示所有调试日志。Increase logging verbosity to show all debug logs.
--help -h--help -h 显示此帮助消息并退出。Show this help message and exit.
--output -o--output -o 输出格式。Output format. 允许的值: json、jsonc、table、tsv。Allowed values: json, jsonc, table, tsv. 默认值: json。Default: json.
--query--query JMESPath 查询字符串。JMESPath query string. 有关详细信息和示例,请参阅 http://jmespath.org/。See http://jmespath.org/ for more information and examples.
--verbose--verbose 提高日志记录详细程度。Increase logging verbosity. 使用 --debug 获取完整的调试日志。Use --debug for full debug logs.

sfctl chaos stopsfctl chaos stop

如果 Chaos 正在群集中运行,则会停止 Chaos 并将 Chaos Schedule 置于已停止状态。Stops Chaos if it is running in the cluster and put the Chaos Schedule in a stopped state.

使 Chaos 停止执行新故障。Stops Chaos from executing new faults. 正在进行的故障将继续执行,直到完成。In-flight faults will continue to execute until they are complete. 当前的 Chaos Schedule 将被置于已停止状态。The current Chaos Schedule is put into a stopped state. 当某个计划停止后,它将保持已停止状态,不会被 Chaos Schedule 用来安排 Chaos 的新运行。Once a schedule is stopped, it will stay in the stopped state and not be used to Chaos Schedule new runs of Chaos. 若要恢复计划执行,必须设置新的 Chaos Schedule。A new Chaos Schedule must be set in order to resume scheduling.

参数Arguments

参数Argument 说明Description
--timeout -t--timeout -t 执行操作的服务器超时,以秒为单位。The server timeout for performing the operation in seconds. 此超时指定客户端可以等待请求的操作完成的持续时间。This timeout specifies the time duration that the client is willing to wait for the requested operation to complete. 此参数的默认值为 60 秒。The default value for this parameter is 60 seconds. 默认值: 60。Default: 60.

全局参数Global Arguments

参数Argument 说明Description
--debug--debug 提高日志记录详细程度以显示所有调试日志。Increase logging verbosity to show all debug logs.
--help -h--help -h 显示此帮助消息并退出。Show this help message and exit.
--output -o--output -o 输出格式。Output format. 允许的值: json、jsonc、table、tsv。Allowed values: json, jsonc, table, tsv. 默认值: json。Default: json.
--query--query JMESPath 查询字符串。JMESPath query string. 有关详细信息和示例,请参阅 http://jmespath.org/。See http://jmespath.org/ for more information and examples.
--verbose--verbose 提高日志记录详细程度。Increase logging verbosity. 使用 --debug 获取完整的调试日志。Use --debug for full debug logs.

后续步骤Next steps