查看 Service Fabric 运行状况报告View Service Fabric health reports

Azure Service Fabric 引入了一种具有运行状况实体的运行状况模型,系统组件和监视器可以在其上报告它们监视的本地状况。Azure Service Fabric introduces a health model with health entities on which system components and watchdogs can report local conditions that they are monitoring. 运行状况存储聚合所有运行状况数据以确定实体是否正常运行。The health store aggregates all health data to determine whether entities are healthy.

群集会自动被系统组件发送的运行状况报告所填充。The cluster is automatically populated with health reports sent by the system components. 使用系统运行状况报告进行故障排除了解更多信息。Read more at Use system health reports to troubleshoot.

Service Fabric 提供多种方式来获取实体聚合运行状况:Service Fabric provides multiple ways to get the aggregated health of the entities:

  • Service Fabric Explorer 或其他可视化工具Service Fabric Explorer or other visualization tools
  • 运行状况查询(通过 PowerShell、API 或 REST)Health queries (through PowerShell, API, or REST)
  • 常规查询,返回将运行状况作为属性之一的实体的列表(通过 PowerShell、API 或 REST)General queries that return a list of entities that have health as one of the properties (through PowerShell, API, or REST)

为了演示这些选项,让我们使用一个具有五个节点的本地群集和 fabric:/WordCount 应用程序To demonstrate these options, let's use a local cluster with five nodes and the fabric:/WordCount application. fabric:/WordCount 应用程序包含两个默认服务:类型为 WordCountServiceType 的有状态服务,和类型为 WordCountWebServiceType 的无状态服务。The fabric:/WordCount application contains two default services, a stateful service of type WordCountServiceType, and a stateless service of type WordCountWebServiceType. 我更改了 ApplicationManifest.xml,从而需要有状态服务的七个目标副本以及一个分区。I changed the ApplicationManifest.xml to require seven target replicas for the stateful service and one partition. 由于群集中只有五个节点,因此系统组件会对服务分区报告警告,因为它低于目标计数。Because there are only five nodes in the cluster, the system components report a warning on the service partition because it is below the target count.

<Service Name="WordCountService">
  <StatefulService ServiceTypeName="WordCountServiceType" TargetReplicaSetSize="7" MinReplicaSetSize="2">
    <UniformInt64Partition PartitionCount="[WordCountService_PartitionCount]" LowKey="1" HighKey="26" />
  </StatefulService>
</Service>

Service Fabric Explorer 中的运行状况Health in Service Fabric Explorer

通过 Service Fabric Explorer,可直观查看群集。Service Fabric Explorer provides a visual view of the cluster. 在下图中,可以看到:In the image below, you can see that:

  • 应用程序 fabric:/WordCount 为红色(出错),因为 MyWatchdog 报告“可用性”属性有一个错误事件。The application fabric:/WordCount is red (in error) because it has an error event reported by MyWatchdog for the property Availability.
  • 其服务之一 fabric:/WordCount/WordCountService 为黄色(警告)。One of its services, fabric:/WordCount/WordCountService is yellow (in warning). 该服务配置了七个副本,而群集具有五个节点,因此有两个副本无法进行放置。The service is configured with seven replicas and the cluster has five nodes, so two replicas can't be placed. 尽管此处未显示,不过服务分区是黄色,因为来自 System.FM 的系统报告指示 Partition is below target replica or instance countAlthough it's not shown here, the service partition is yellow because of a system report from System.FM saying that Partition is below target replica or instance count. 黄色分区触发黄色服务。The yellow partition triggers the yellow service.
  • 由于应用程序为红色,因此群集为红色。The cluster is red because of the red application.

评估使用群集清单和应用程序清单的默认策略。The evaluation uses default policies from the cluster manifest and application manifest. 它们是严格的策略,不容许任何失败。They are strict policies and do not tolerate any failure.

使用 Service Fabric Explorer 查看群集:View of the cluster with Service Fabric Explorer:

使用 Service Fabric Explorer 查看群集。

备注

了解有关 Service Fabric Explorer 的更多信息。Read more about Service Fabric Explorer.

运行状况查询Health queries

Service Fabric 为每个支持的实体类型提供运行状况查询。Service Fabric exposes health queries for each of the supported entity types. 可以通过 API(使用 FabricClient.HealthManager 上的方法)、PowerShell cmdlet 和 REST 访问它们。They can be accessed through the API, using methods on FabricClient.HealthManager, PowerShell cmdlets, and REST. 这些查询返回有关实体的完整运行状况信息:聚合运行状况、实体运行状况事件、子运行状况(在适用时)、不正常评估(实体不正常时)以及子集运行状况统计信息(在适用时)。These queries return complete health information about the entity: the aggregated health state, entity health events, child health states (when applicable), unhealthy evaluations (when the entity is not healthy), and children health statistics (when applicable).

备注

填满运行状况存储时,返回运行状况实体。A health entity is returned when it is fully populated in the health store. 实体必须处于活动状态(未删除),并且具有系统报告。The entity must be active (not deleted) and have a system report. 层次结构链上其父实体还必须有系统报告。Its parent entities on the hierarchy chain must also have system reports. 如果不满足以上任何条件,则运行状况查询返回 FabricErrorCodeFabricHealthEntityNotFound(显示未返回实体的原因)的 FabricExceptionIf any of these conditions are not satisfied, the health queries return a FabricException with FabricErrorCode FabricHealthEntityNotFound that shows why the entity is not returned.

运行状况查询必须传递实体标识符,具体取决于实体类型。The health queries must pass in the entity identifier, which depends on the entity type. 这些查询接受可选的运行状况策略参数。The queries accept optional health policy parameters. 如果未指定运行状况策略,则使用来自群集或应用程序清单的运行状况策略进行评估。If no health policies are specified, the health policies from the cluster or application manifest are used for evaluation. 如果清单不包含运行状况策略的定义,则使用默认运行状况策略进行评估。If the manifests don't contain a definition for health policies, the default health policies are used for evaluation. 默认运行状况策略不容忍任何失败。The default health policies do not tolerate any failures. 这些查询还接受筛选器,以仅返回与指定筛选器有关的部分子项或事件。The queries also accept filters for returning only partial children or events--the ones that respect the specified filters. 另一个筛选器允许排除子级统计信息。Another filter allows excluding the children statistics.

备注

在服务器端应用输出筛选器,因此减小了消息回复大小。The output filters are applied on the server side, so the message reply size is reduced. 我们建议使用输出筛选器限制返回的数据,而不是在客户端上应用筛选器。We recommended that you use the output filters to limit the data returned, rather than apply filters on the client side.

实体的运行状况包含:An entity's health contains:

  • 实体的聚合运行状况状态。The aggregated health state of the entity. 由运行状况存储依据实体运行状况报告、子项运行状况(在适用时)和运行状况策略计算。Computed by the health store based on entity health reports, child health states (when applicable), and health policies. 了解有关实体运行状况评估的详细信息。Read more about entity health evaluation.
  • 实体上的运行状况事件。The health events on the entity.
  • 对于能够拥有子项的实体,为所有子项的运行状况集合。The collection of health states of all children for the entities that can have children. 运行状况状态包含实体标识符和聚合的运行状况状态。The health states contain entity identifiers and the aggregated health state. 若要获取某个子项的完整运行状况,请调用子实体类型的查询运行状况,并传递子标识符。To get complete health for a child, call the query health for the child entity type and pass in the child identifier.
  • 如果实体不正常,指向触发实体状态的报告的不正常评估。The unhealthy evaluations that point to the report that triggered the state of the entity, if the entity is not healthy. 评估是递归的,其中包含触发当前运行状况的子级运行状况评估。The evaluations are recursive, containing the children health evaluations that triggered current health state. 例如,监视程序针对副本报告了一个错误。For example, a watchdog reported an error against a replica. 应用程序运行状况显示服务不正常导致评估不正常;服务不正常的原因是分区存在错误;分区不正常的原因是副本存在错误;副本不正常的原因是监视程序错误运行状况报告。The application health shows an unhealthy evaluation due to an unhealthy service; the service is unhealthy due to a partition in error; the partition is unhealthy due to a replica in error; the replica is unhealthy due to the watchdog error health report.
  • 具有子级的实体的所有子级类型的运行状况统计信息。The health statistics for all children types of the entities that have children. 例如,群集运行状况显示群集中的应用程序、服务、分区、副本和部署的实体的总数。For example, cluster health shows the total number of applications, services, partitions, replicas, and deployed entities in the cluster. 服务运行状况显示指定服务下的分区和副本的总数。Service health shows the total number of partitions and replicas under the specified service.

获取群集运行状况Get cluster health

返回群集实体的运行状况,并包含应用程序和节点(群集的子项)的运行状况。Returns the health of the cluster entity and contains the health states of applications and nodes (children of the cluster). 输入:Input:

  • [可选] 用于评估节点和群集事件的群集运行状况策略。[Optional] The cluster health policy used to evaluate the nodes and the cluster events.
  • [可选] 应用程序运行状况策略与用于取代应用程序清单策略的运行状况策略进行映射。[Optional] The application health policy map, with the health policies used to override the application manifest policies.
  • [可选] 事件、节点和应用程序的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events, nodes, and applications that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件、节点及应用程序都用于评估实体聚合运行状况,无论筛选器为何。All events, nodes, and applications are used to evaluate the entity aggregated health, regardless of the filter.
  • [可选] 用于排除运行状况统计信息的筛选器。[Optional] Filter to exclude health statistics.
  • [可选] 用于在运行状况统计信息中包括 fabric:/System 运行状况统计信息的筛选器。[Optional] Filter to include fabric:/System health statistics in the health statistics. 仅当未排除运行状况统计信息时才适用。Only applicable when the health statistics are not excluded. 默认情况下,运行状况统计信息只包括用户应用程序的统计信息,而不包括系统应用程序的统计信息。By default, the health statistics include only statistics for user applications and not the System application.

APIAPI

若要获取群集运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetClusterHealthAsync 方法。To get cluster health, create a FabricClient and call the GetClusterHealthAsync method on its HealthManager.

以下调用会获取群集运行状况:The following call gets the cluster health:

ClusterHealth clusterHealth = await fabricClient.HealthManager.GetClusterHealthAsync();

以下代码使用针对节点和应用程序的自定义群集运行状况策略和筛选器获取群集运行状况。The following code gets the cluster health by using a custom cluster health policy and filters for nodes and applications. 它指定运行状况统计信息包括 fabric:/System 统计信息。It specifies that the health statistics include the fabric:/System statistics. 它会创建包含输入信息的 ClusterHealthQueryDescriptionIt creates ClusterHealthQueryDescription, which contains the input information.

var policy = new ClusterHealthPolicy()
{
    MaxPercentUnhealthyNodes = 20
};
var nodesFilter = new NodeHealthStatesFilter()
{
    HealthStateFilterValue = HealthStateFilter.Error | HealthStateFilter.Warning
};
var applicationsFilter = new ApplicationHealthStatesFilter()
{
    HealthStateFilterValue = HealthStateFilter.Error
};
var healthStatisticsFilter = new ClusterHealthStatisticsFilter()
{
    ExcludeHealthStatistics = false,
    IncludeSystemApplicationHealthStatistics = true
};
var queryDescription = new ClusterHealthQueryDescription()
{
    HealthPolicy = policy,
    ApplicationsFilter = applicationsFilter,
    NodesFilter = nodesFilter,
    HealthStatisticsFilter = healthStatisticsFilter
};

ClusterHealth clusterHealth = await fabricClient.HealthManager.GetClusterHealthAsync(queryDescription);

PowerShellPowerShell

用于获取群集运行状况的 cmdlet 为 Get-ServiceFabricClusterHealthThe cmdlet to get the cluster health is Get-ServiceFabricClusterHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

群集的状态:有五个节点、系统应用程序和如前所述配置的 fabric:/WordCount。The state of the cluster is five nodes, the system application, and fabric:/WordCount configured as described.

以下 cmdlet 使用默认运行状况策略获取群集运行状况。The following cmdlet gets cluster health by using default health policies. 聚合的运行状况为警告,因为 fabric:/WordCount 应用程序处于警告状态。The aggregated health state is warning, because the fabric:/WordCount application is in warning. 请注意不正常评估如何提供触发聚合运行状况的详细条件。Note how the unhealthy evaluations provide details on the conditions that triggered the aggregated health.

PS D:\ServiceFabric> Get-ServiceFabricClusterHealth

AggregatedHealthState   : Warning
UnhealthyEvaluations    : 
                          Unhealthy applications: 100% (1/1), MaxPercentUnhealthyApplications=0%.

                          Unhealthy application: ApplicationName='fabric:/WordCount', AggregatedHealthState='Warning'.

                            Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                            Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Warning'.

                                Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Warning'.

                                    Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.

NodeHealthStates        : 
                          NodeName              : _Node_4
                          AggregatedHealthState : Ok

                          NodeName              : _Node_3
                          AggregatedHealthState : Ok

                          NodeName              : _Node_2
                          AggregatedHealthState : Ok

                          NodeName              : _Node_1
                          AggregatedHealthState : Ok

                          NodeName              : _Node_0
                          AggregatedHealthState : Ok

ApplicationHealthStates : 
                          ApplicationName       : fabric:/System
                          AggregatedHealthState : Ok

                          ApplicationName       : fabric:/WordCount
                          AggregatedHealthState : Warning

HealthEvents            : None
HealthStatistics        : 
                          Node                  : 5 Ok, 0 Warning, 0 Error
                          Replica               : 6 Ok, 0 Warning, 0 Error
                          Partition             : 1 Ok, 1 Warning, 0 Error
                          Service               : 1 Ok, 1 Warning, 0 Error
                          DeployedServicePackage : 6 Ok, 0 Warning, 0 Error
                          DeployedApplication   : 5 Ok, 0 Warning, 0 Error
                          Application           : 0 Ok, 1 Warning, 0 Error

以下 PowerShell cmdlet 使用自定义应用程序策略获取群集的运行状况。The following PowerShell cmdlet gets the health of the cluster by using a custom application policy. 它筛选结果以只获取有错误或警告的应用程序和节点。It filters results to get only applications and nodes in error or warning. 因此,不会返回任何节点,因为这些节点都是正常的。As a result, no nodes are returned, as they are all healthy. 仅 fabric:/WordCount 应用程序符合应用程序筛选器。Only the fabric:/WordCount application respects the applications filter. 因为自定义策略指定对于 fabric:/WordCount 应用程序将警告视为错误,应用程序被评估为错误,从而群集也被评估为错误。Because the custom policy specifies to consider warnings as errors for the fabric:/WordCount application, the application is evaluated as in error, and so is the cluster.

PS D:\ServiceFabric> $appHealthPolicy = New-Object -TypeName System.Fabric.Health.ApplicationHealthPolicy
$appHealthPolicy.ConsiderWarningAsError = $true
$appHealthPolicyMap = New-Object -TypeName System.Fabric.Health.ApplicationHealthPolicyMap
$appUri1 = New-Object -TypeName System.Uri -ArgumentList "fabric:/WordCount"
$appHealthPolicyMap.Add($appUri1, $appHealthPolicy)
Get-ServiceFabricClusterHealth -ApplicationHealthPolicyMap $appHealthPolicyMap -ApplicationsFilter "Warning,Error" -NodesFilter "Warning,Error" -ExcludeHealthStatistics

AggregatedHealthState   : Error
UnhealthyEvaluations    : 
                          Unhealthy applications: 100% (1/1), MaxPercentUnhealthyApplications=0%.

                          Unhealthy application: ApplicationName='fabric:/WordCount', AggregatedHealthState='Error'.

                            Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                            Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Error'.

                                Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Error'.

                                    Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=true.

NodeHealthStates        : None
ApplicationHealthStates : 
                          ApplicationName       : fabric:/WordCount
                          AggregatedHealthState : Error

HealthEvents            : None

RESTREST

可以使用 GET 请求POST 请求获取群集运行状况,其中包括正文中所述的运行状况策略。You can get cluster health with a GET request or a POST request that includes health policies described in the body.

获取节点运行状况Get node health

返回节点实体的运行状况,并包含针对该节点报告的运行状况事件。Returns the health of a node entity and contains the health events reported on the node. 输入:Input:

  • [必需] 标识该节点的节点名称。[Required] The node name that identifies the node.
  • [可选 ] 用于评估运行状况的群集运行状况策略设置。[Optional] The cluster health policy settings used to evaluate health.
  • [可选] 事件的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件都用于评估实体聚合运行状况,无论筛选器为何。All events are used to evaluate the entity aggregated health, regardless of the filter.

APIAPI

若要通过 API 获取节点运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetNodeHealthAsync 方法。To get node health through the API, create a FabricClient and call the GetNodeHealthAsync method on its HealthManager.

以下代码获取指定节点名称的节点运行状况:The following code gets the node health for the specified node name:

NodeHealth nodeHealth = await fabricClient.HealthManager.GetNodeHealthAsync(nodeName);

以下代码获取指定节点名称的节点运行状况,并通过 NodeHealthQueryDescription 传入事件筛选器和自定义策略:The following code gets the node health for the specified node name and passes in events filter and custom policy through NodeHealthQueryDescription:

var queryDescription = new NodeHealthQueryDescription(nodeName)
{
    HealthPolicy = new ClusterHealthPolicy() {  ConsiderWarningAsError = true },
    EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = HealthStateFilter.Warning },
};

NodeHealth nodeHealth = await fabricClient.HealthManager.GetNodeHealthAsync(queryDescription);

PowerShellPowerShell

用于获取节点运行状况的 cmdlet 为 Get-ServiceFabricNodeHealthThe cmdlet to get the node health is Get-ServiceFabricNodeHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. 以下 cmdlet 使用默认运行状况策略获取节点运行状况:The following cmdlet gets the node health by using default health policies:

PS D:\ServiceFabric> Get-ServiceFabricNodeHealth _Node_1

NodeName              : _Node_1
AggregatedHealthState : Ok
HealthEvents          : 
                        SourceId              : System.FM
                        Property              : State
                        HealthState           : Ok
                        SequenceNumber        : 3
                        SentAt                : 7/13/2017 4:39:23 PM
                        ReceivedAt            : 7/13/2017 4:40:47 PM
                        TTL                   : Infinite
                        Description           : Fabric node is up.
                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Error->Ok = 7/13/2017 4:40:47 PM, LastWarning = 1/1/0001 12:00:00 AM

以下 cmdlet 获取群集中所有节点的运行状况:The following cmdlet gets the health of all nodes in the cluster:

PS D:\ServiceFabric> Get-ServiceFabricNode | Get-ServiceFabricNodeHealth | select NodeName, AggregatedHealthState | ft -AutoSize

NodeName AggregatedHealthState
-------- ---------------------
_Node_4                     Ok
_Node_3                     Ok
_Node_2                     Ok
_Node_1                     Ok
_Node_0                     Ok

RESTREST

可以使用 GET 请求POST 请求获取节点运行状况,其中包括正文中所述的运行状况策略。You can get node health with a GET request or a POST request that includes health policies described in the body.

获取应用程序运行状况Get application health

返回一个应用程序实体的运行状况。Returns the health of an application entity. 包含已部署应用程序和服务子项的运行状况状态。It contains the health states of the deployed application and service children. 输入:Input:

  • [必需] 标识应用程序的应用程序名称 (URI)。[Required] The application name (URI) that identifies the application.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略。[Optional] The application health policy used to override the application manifest policies.
  • [可选] 事件、服务和已部署应用程序的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events, services, and deployed applications that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件、服务和已部署应用程序都用于评估实体聚合运行状况,无论筛选器为何。All events, services, and deployed applications are used to evaluate the entity aggregated health, regardless of the filter.
  • [可选] 用于排除运行状况统计信息的筛选器。[Optional] Filter to exclude the health statistics. 如果未指定,则运行状况统计信息包括所有应用程序子级的正常、警告和错误计数:服务、分区、副本、部署的应用程序和部署的服务包。If not specified, the health statistics include the ok, warning, and error count for all application children: services, partitions, replicas, deployed applications, and deployed service packages.

APIAPI

若要获取应用程序运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetApplicationHealthAsync 方法。To get application health, create a FabricClient and call the GetApplicationHealthAsync method on its HealthManager.

以下代码获取具有指定应用程序名称 (URI) 的应用程序的运行状况:The following code gets the application health for the specified application name (URI):

ApplicationHealth applicationHealth = await fabricClient.HealthManager.GetApplicationHealthAsync(applicationName);

以下代码使用通过 ApplicationHealthQueryDescription 指定的筛选器和自定义策略,获取指定应用程序名称 (URI) 的应用程序运行状况。The following code gets the application health for the specified application name (URI), with filters and custom policies specified via ApplicationHealthQueryDescription.

HealthStateFilter warningAndErrors = HealthStateFilter.Error | HealthStateFilter.Warning;
var serviceTypePolicy = new ServiceTypeHealthPolicy()
{
    MaxPercentUnhealthyPartitionsPerService = 0,
    MaxPercentUnhealthyReplicasPerPartition = 5,
    MaxPercentUnhealthyServices = 0,
};
var policy = new ApplicationHealthPolicy()
{
    ConsiderWarningAsError = false,
    DefaultServiceTypeHealthPolicy = serviceTypePolicy,
    MaxPercentUnhealthyDeployedApplications = 0,
};

var queryDescription = new ApplicationHealthQueryDescription(applicationName)
{
    HealthPolicy = policy,
    EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = warningAndErrors },
    ServicesFilter = new ServiceHealthStatesFilter() { HealthStateFilterValue = warningAndErrors },
    DeployedApplicationsFilter = new DeployedApplicationHealthStatesFilter() { HealthStateFilterValue = warningAndErrors },
};

ApplicationHealth applicationHealth = await fabricClient.HealthManager.GetApplicationHealthAsync(queryDescription);

PowerShellPowerShell

用于获取应用程序运行状况的 cmdlet 为 Get-ServiceFabricApplicationHealthThe cmdlet to get the application health is Get-ServiceFabricApplicationHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

以下 cmdlet 返回 fabric:/WordCount 应用程序的运行状况:The following cmdlet returns the health of the fabric:/WordCount application:

PS D:\ServiceFabric> Get-ServiceFabricApplicationHealth fabric:/WordCount

ApplicationName                 : fabric:/WordCount
AggregatedHealthState           : Warning
UnhealthyEvaluations            : 
                                  Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                                  Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Warning'.

                                    Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                    Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Warning'.

                                        Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.

ServiceHealthStates             : 
                                  ServiceName           : fabric:/WordCount/WordCountWebService
                                  AggregatedHealthState : Ok

                                  ServiceName           : fabric:/WordCount/WordCountService
                                  AggregatedHealthState : Warning

DeployedApplicationHealthStates : 
                                  ApplicationName       : fabric:/WordCount
                                  NodeName              : _Node_4
                                  AggregatedHealthState : Ok

                                  ApplicationName       : fabric:/WordCount
                                  NodeName              : _Node_3
                                  AggregatedHealthState : Ok

                                  ApplicationName       : fabric:/WordCount
                                  NodeName              : _Node_0
                                  AggregatedHealthState : Ok

                                  ApplicationName       : fabric:/WordCount
                                  NodeName              : _Node_2
                                  AggregatedHealthState : Ok

                                  ApplicationName       : fabric:/WordCount
                                  NodeName              : _Node_1
                                  AggregatedHealthState : Ok

HealthEvents                    : 
                                  SourceId              : System.CM
                                  Property              : State
                                  HealthState           : Ok
                                  SequenceNumber        : 282
                                  SentAt                : 7/13/2017 5:57:05 PM
                                  ReceivedAt            : 7/13/2017 5:57:05 PM
                                  TTL                   : Infinite
                                  Description           : Application has been created.
                                  RemoveWhenExpired     : False
                                  IsExpired             : False
                                  Transitions           : Error->Ok = 7/13/2017 5:57:05 PM, LastWarning = 1/1/0001 12:00:00 AM

HealthStatistics                : 
                                  Replica               : 6 Ok, 0 Warning, 0 Error
                                  Partition             : 1 Ok, 1 Warning, 0 Error
                                  Service               : 1 Ok, 1 Warning, 0 Error
                                  DeployedServicePackage : 6 Ok, 0 Warning, 0 Error
                                  DeployedApplication   : 5 Ok, 0 Warning, 0 Error

以下 PowerShell cmdlet 传入自定义策略。The following PowerShell cmdlet passes in custom policies. 它还筛选子项和事件。It also filters children and events.

PS D:\ServiceFabric> Get-ServiceFabricApplicationHealth -ApplicationName fabric:/WordCount -ConsiderWarningAsError $true -ServicesFilter Error -EventsFilter Error -DeployedApplicationsFilter Error -ExcludeHealthStatistics

ApplicationName                 : fabric:/WordCount
AggregatedHealthState           : Error
UnhealthyEvaluations            : 
                                  Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                                  Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Error'.

                                    Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                    Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Error'.

                                        Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=true.

ServiceHealthStates             : 
                                  ServiceName           : fabric:/WordCount/WordCountService
                                  AggregatedHealthState : Error

DeployedApplicationHealthStates : None
HealthEvents                    : None

RESTREST

可以使用 GET 请求POST 请求获取应用程序运行状况,其中包括正文中所述的运行状况策略。You can get application health with a GET request or a POST request that includes health policies described in the body.

获取服务运行状况Get service health

返回一个服务实体的运行状况。Returns the health of a service entity. 包含分区运行状况状态。It contains the partition health states. 输入:Input:

  • [必需] 标识服务的服务名称 (URI)。[Required] The service name (URI) that identifies the service.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略。[Optional] The application health policy used to override the application manifest policy.
  • [可选] 事件和分区的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events and partitions that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件和分区都用于评估实体聚合运行状况,无论筛选器为何。All events and partitions are used to evaluate the entity aggregated health, regardless of the filter.
  • [可选] 用于排除运行状况统计信息的筛选器。[Optional] Filter to exclude health statistics. 如果未指定,则运行状况统计信息显示服务的所有分区和副本的正常、警告和错误计数。If not specified, the health statistics show the ok, warning, and error count for all partitions and replicas of the service.

APIAPI

若要通过 API 获取服务运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetServiceHealthAsync 方法。To get service health through the API, create a FabricClient and call the GetServiceHealthAsync method on its HealthManager.

以下示例获取具有指定服务名称 (URI) 的服务的运行状况:The following example gets the health of a service with specified service name (URI):

ServiceHealth serviceHealth = await fabricClient.HealthManager.GetServiceHealthAsync(serviceName);

以下代码通过 ServiceHealthQueryDescription 指定筛选器和自定义策略,从而获取指定服务名称 (URI) 的服务运行状况:The following code gets the service health for the specified service name (URI), specifying filters and custom policy via ServiceHealthQueryDescription:

var queryDescription = new ServiceHealthQueryDescription(serviceName)
{
    EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = HealthStateFilter.All },
    PartitionsFilter = new PartitionHealthStatesFilter() { HealthStateFilterValue = HealthStateFilter.Error },
};

ServiceHealth serviceHealth = await fabricClient.HealthManager.GetServiceHealthAsync(queryDescription);

PowerShellPowerShell

用于获取服务运行状况的 cmdlet 为 Get-ServiceFabricServiceHealthThe cmdlet to get the service health is Get-ServiceFabricServiceHealth. 首先,使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

以下 cmdlet 使用默认运行状况策略获取服务运行状况:The following cmdlet gets the service health by using default health policies:

PS D:\ServiceFabric> Get-ServiceFabricServiceHealth -ServiceName fabric:/WordCount/WordCountService

ServiceName           : fabric:/WordCount/WordCountService
AggregatedHealthState : Warning
UnhealthyEvaluations  : 
                        Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                        Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Warning'.

                            Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.

PartitionHealthStates : 
                        PartitionId           : af2e3e44-a8f8-45ac-9f31-4093eb897600
                        AggregatedHealthState : Warning

HealthEvents          : 
                        SourceId              : System.FM
                        Property              : State
                        HealthState           : Ok
                        SequenceNumber        : 15
                        SentAt                : 7/13/2017 5:57:05 PM
                        ReceivedAt            : 7/13/2017 5:57:18 PM
                        TTL                   : Infinite
                        Description           : Service has been created.
                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

HealthStatistics      : 
                        Replica               : 5 Ok, 0 Warning, 0 Error
                        Partition             : 0 Ok, 1 Warning, 0 Error

RESTREST

可以使用 GET 请求POST 请求获取服务运行状况,其中包括正文中所述的运行状况策略。You can get service health with a GET request or a POST request that includes health policies described in the body.

获取分区运行状况Get partition health

返回一个分区实体的运行状况。Returns the health of a partition entity. 包含副本运行状况状态。It contains the replica health states. 输入:Input:

  • [必需] 标识分区的分区 ID (GUID)。[Required] The partition ID (GUID) that identifies the partition.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略。[Optional] The application health policy used to override the application manifest policy.
  • [可选] 事件和副本的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events and replicas that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件和副本都用于评估实体聚合运行状况,无论筛选器为何。All events and replicas are used to evaluate the entity aggregated health, regardless of the filter.
  • [可选] 用于排除运行状况统计信息的筛选器。[Optional] Filter to exclude health statistics. 如果未指定,则运行状况统计信息显示处于正常、警告和错误状态的副本数。If not specified, the health statistics show how many replicas are in ok, warning, and error states.

APIAPI

若要通过 API 获取分区运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetPartitionHealthAsync 方法。To get partition health through the API, create a FabricClient and call the GetPartitionHealthAsync method on its HealthManager. 若要指定可选参数,请创建 PartitionHealthQueryDescriptionTo specify optional parameters, create PartitionHealthQueryDescription.

PartitionHealth partitionHealth = await fabricClient.HealthManager.GetPartitionHealthAsync(partitionId);

PowerShellPowerShell

用于获取分区运行状况的 cmdlet 为 Get-ServiceFabricPartitionHealthThe cmdlet to get the partition health is Get-ServiceFabricPartitionHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

以下 cmdlet 获取 fabric:/WordCount/WordCountService 服务的所有分区的运行状况,并筛选掉副本运行状况:The following cmdlet gets the health for all partitions of the fabric:/WordCount/WordCountService service and filters out replica health states:

PS D:\ServiceFabric> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricPartitionHealth -ReplicasFilter None

PartitionId           : af2e3e44-a8f8-45ac-9f31-4093eb897600
AggregatedHealthState : Warning
UnhealthyEvaluations  : 
                        Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.

ReplicaHealthStates   : None
HealthEvents          : 
                        SourceId              : System.FM
                        Property              : State
                        HealthState           : Warning
                        SequenceNumber        : 72
                        SentAt                : 7/13/2017 5:57:29 PM
                        ReceivedAt            : 7/13/2017 5:57:48 PM
                        TTL                   : Infinite
                        Description           : Partition is below target replica or instance count.
                        fabric:/WordCount/WordCountService 7 2 af2e3e44-a8f8-45ac-9f31-4093eb897600
                          N/P RD _Node_2 Up 131444422260002646
                          N/S RD _Node_4 Up 131444422293113678
                          N/S RD _Node_3 Up 131444422293113679
                          N/S RD _Node_1 Up 131444422293118720
                          N/S RD _Node_0 Up 131444422293118721
                          (Showing 5 out of 5 replicas. Total available replicas: 5.)

                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Ok->Warning = 7/13/2017 5:57:48 PM, LastError = 1/1/0001 12:00:00 AM

                        SourceId              : System.PLB
                        Property              : ServiceReplicaUnplacedHealth_Secondary_af2e3e44-a8f8-45ac-9f31-4093eb897600
                        HealthState           : Warning
                        SequenceNumber        : 131444445174851664
                        SentAt                : 7/13/2017 6:35:17 PM
                        ReceivedAt            : 7/13/2017 6:35:18 PM
                        TTL                   : 00:01:05
                        Description           : The Load Balancer was unable to find a placement for one or more of the Service's Replicas:
                        Secondary replica could not be placed due to the following constraints and properties:  
                        TargetReplicaSetSize: 7
                        Placement Constraint: N/A
                        Parent Service: N/A

                        Constraint Elimination Sequence:
                        Existing Secondary Replicas eliminated 4 possible node(s) for placement -- 1/5 node(s) remain.
                        Existing Primary Replica eliminated 1 possible node(s) for placement -- 0/5 node(s) remain.

                        Nodes Eliminated By Constraints:

                        Existing Secondary Replicas -- Nodes with Partition's Existing Secondary Replicas/Instances:
                        --
                        FaultDomain:fd:/4 NodeName:_Node_4 NodeType:NodeType4 UpgradeDomain:4 UpgradeDomain: ud:/4 Deactivation Intent/Status: None/None
                        FaultDomain:fd:/3 NodeName:_Node_3 NodeType:NodeType3 UpgradeDomain:3 UpgradeDomain: ud:/3 Deactivation Intent/Status: None/None
                        FaultDomain:fd:/1 NodeName:_Node_1 NodeType:NodeType1 UpgradeDomain:1 UpgradeDomain: ud:/1 Deactivation Intent/Status: None/None
                        FaultDomain:fd:/0 NodeName:_Node_0 NodeType:NodeType0 UpgradeDomain:0 UpgradeDomain: ud:/0 Deactivation Intent/Status: None/None

                        Existing Primary Replica -- Nodes with Partition's Existing Primary Replica or Secondary Replicas:
                        --
                        FaultDomain:fd:/2 NodeName:_Node_2 NodeType:NodeType2 UpgradeDomain:2 UpgradeDomain: ud:/2 Deactivation Intent/Status: None/None

                        RemoveWhenExpired     : True
                        IsExpired             : False
                        Transitions           : Error->Warning = 7/13/2017 5:57:48 PM, LastOk = 1/1/0001 12:00:00 AM

HealthStatistics      : 
                        Replica               : 5 Ok, 0 Warning, 0 Error

RESTREST

可以使用 GET 请求POST 请求获取分区运行状况,其中包括正文中所述的运行状况策略。You can get partition health with a GET request or a POST request that includes health policies described in the body.

获取副本运行状况Get replica health

返回有状态服务副本或无状态服务实例的运行状况。Returns the health of a stateful service replica or a stateless service instance. 输入:Input:

  • [必需] 分区 ID (GUID) 和用于标识副本的副本 ID。[Required] The partition ID (GUID) and replica ID that identifies the replica.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略参数。[Optional] The application health policy parameters used to override the application manifest policies.
  • [可选] 事件的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件都用于评估实体聚合运行状况,无论筛选器为何。All events are used to evaluate the entity aggregated health, regardless of the filter.

APIAPI

若要通过 API 获取副本运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetReplicaHealthAsync 方法。To get the replica health through the API, create a FabricClient and call the GetReplicaHealthAsync method on its HealthManager. 若要指定高级参数,请使用 ReplicaHealthQueryDescriptionTo specify advanced parameters, use ReplicaHealthQueryDescription.

ReplicaHealth replicaHealth = await fabricClient.HealthManager.GetReplicaHealthAsync(partitionId, replicaId);

PowerShellPowerShell

用于获取副本运行状况的 cmdlet 为 Get-ServiceFabricReplicaHealthThe cmdlet to get the replica health is Get-ServiceFabricReplicaHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

以下 cmdlet 获取服务的所有分区的主要副本运行状况:The following cmdlet gets the health of the primary replica for all partitions of the service:

PS D:\ServiceFabric> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricReplica | where {$_.ReplicaRole -eq "Primary"} | Get-ServiceFabricReplicaHealth

PartitionId           : af2e3e44-a8f8-45ac-9f31-4093eb897600
ReplicaId             : 131444422260002646
AggregatedHealthState : Ok
HealthEvents          : 
                        SourceId              : System.RA
                        Property              : State
                        HealthState           : Ok
                        SequenceNumber        : 131444422263668344
                        SentAt                : 7/13/2017 5:57:06 PM
                        ReceivedAt            : 7/13/2017 5:57:18 PM
                        TTL                   : Infinite
                        Description           : Replica has been created._Node_2
                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

RESTREST

可以使用 GET 请求POST 请求获取副本运行状况,其中包括正文中所述的运行状况策略。You can get replica health with a GET request or a POST request that includes health policies described in the body.

获取已部署应用程序的运行状况Get deployed application health

返回部署在节点实体上的一个应用程序的运行状况。Returns the health of an application deployed on a node entity. 包含已部署服务包运行状况状态。It contains the deployed service package health states. 输入:Input:

  • [必需] 标识已部署应用程序的应用程序名称 (URI) 和节点名称(字符串)。[Required] The application name (URI) and node name (string) that identify the deployed application.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略。[Optional] The application health policy used to override the application manifest policies.
  • [可选] 事件和已部署服务包的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events and deployed service packages that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件和已部署服务包都用于评估实体聚合运行状况,无论筛选器为何。All events and deployed service packages are used to evaluate the entity aggregated health, regardless of the filter.
  • [可选] 用于排除运行状况统计信息的筛选器。[Optional] Filter to exclude health statistics. 如果未指定,则运行状况统计信息显示处于正常、警告和错误运行状况的已部署服务包数。If not specified, the health statistics show the number of deployed service packages in ok, warning, and error health states.

APIAPI

若要通过 API 获取部署在节点上的一个应用程序的运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetDeployedApplicationHealthAsync方法。To get the health of an application deployed on a node through the API, create a FabricClient and call the GetDeployedApplicationHealthAsync method on its HealthManager. 若要指定可选参数,请使用 DeployedApplicationHealthQueryDescriptionTo specify optional parameters, use DeployedApplicationHealthQueryDescription.

DeployedApplicationHealth health = await fabricClient.HealthManager.GetDeployedApplicationHealthAsync(
    new DeployedApplicationHealthQueryDescription(applicationName, nodeName));

PowerShellPowerShell

用于获取已部署应用程序的运行状况的 cmdlet 为 Get-ServiceFabricDeployedApplicationHealthThe cmdlet to get the deployed application health is Get-ServiceFabricDeployedApplicationHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. 若要了解应用程序的部署位置,请运行 Get-ServiceFabricApplicationHealth 并查看已部署应用程序子项。To find out where an application is deployed, run Get-ServiceFabricApplicationHealth and look at the deployed application children.

以下 cmdlet 获取部署在 _Node_2 上的 fabric:/WordCount 应用程序的运行状况。The following cmdlet gets the health of the fabric:/WordCount application deployed on _Node_2.

PS D:\ServiceFabric> Get-ServiceFabricDeployedApplicationHealth -ApplicationName fabric:/WordCount -NodeName _Node_0

ApplicationName                    : fabric:/WordCount
NodeName                           : _Node_0
AggregatedHealthState              : Ok
DeployedServicePackageHealthStates : 
                                     ServiceManifestName   : WordCountServicePkg
                                     ServicePackageActivationId : 
                                     NodeName              : _Node_0
                                     AggregatedHealthState : Ok

                                     ServiceManifestName   : WordCountWebServicePkg
                                     ServicePackageActivationId : 
                                     NodeName              : _Node_0
                                     AggregatedHealthState : Ok

HealthEvents                       : 
                                     SourceId              : System.Hosting
                                     Property              : Activation
                                     HealthState           : Ok
                                     SequenceNumber        : 131444422261848308
                                     SentAt                : 7/13/2017 5:57:06 PM
                                     ReceivedAt            : 7/13/2017 5:57:17 PM
                                     TTL                   : Infinite
                                     Description           : The application was activated successfully.
                                     RemoveWhenExpired     : False
                                     IsExpired             : False
                                     Transitions           : Error->Ok = 7/13/2017 5:57:17 PM, LastWarning = 1/1/0001 12:00:00 AM

HealthStatistics                   : 
                                     DeployedServicePackage : 2 Ok, 0 Warning, 0 Error

RESTREST

可以使用 GET 请求POST 请求获取部署的应用程序运行状况,其中包括正文中所述的运行状况策略。You can get deployed application health with a GET request or a POST request that includes health policies described in the body.

获取已部署服务包的运行状况Get deployed service package health

返回一个已部署服务包实体的运行状况。Returns the health of a deployed service package entity. 输入:Input:

  • [必需] 标识已部署服务包的应用程序名称 (URI)、节点名称(字符串)和服务清单名称(字符串)。[Required] The application name (URI), node name (string), and service manifest name (string) that identify the deployed service package.
  • [可选] 用于取代应用程序清单策略的应用程序运行状况策略。[Optional] The application health policy used to override the application manifest policy.
  • [可选] 事件的筛选器,指定有哪些相关项目,并且应该在结果中返回项目(例如,仅错误或警告和错误)。[Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example, errors only, or both warnings and errors). 所有事件都用于评估实体聚合运行状况,无论筛选器为何。All events are used to evaluate the entity aggregated health, regardless of the filter.

APIAPI

若要通过 API 获取一个已部署服务包的运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetDeployedServicePackageHealthAsync 方法。To get the health of a deployed service package through the API, create a FabricClient and call the GetDeployedServicePackageHealthAsync method on its HealthManager. 若要指定可选参数,请使用 DeployedServicePackageHealthQueryDescriptionTo specify optional parameters, use DeployedServicePackageHealthQueryDescription.

DeployedServicePackageHealth health = await fabricClient.HealthManager.GetDeployedServicePackageHealthAsync(
    new DeployedServicePackageHealthQueryDescription(applicationName, nodeName, serviceManifestName));

PowerShellPowerShell

用于获取已部署服务包的运行状况的 cmdlet 为 Get-ServiceFabricDeployedServicePackageHealthThe cmdlet to get the deployed service package health is Get-ServiceFabricDeployedServicePackageHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. 若要查看应用程序的部署位置,请运行 Get-ServiceFabricApplicationHealth 并查看已部署应用程序。To see where an application is deployed, run Get-ServiceFabricApplicationHealth and look at the deployed applications. 若要查看一个应用程序中有哪些服务包,请在 Get-ServiceFabricDeployedApplicationHealth 输出中查看已部署的服务包子项。To see which service packages are in an application, look at the deployed service package children in the Get-ServiceFabricDeployedApplicationHealth output.

以下 cmdlet 获取部署在 _Node_2 上的 fabric:/WordCount 应用程序的 WordCountServicePkg 服务包的运行状况。The following cmdlet gets the health of the WordCountServicePkg service package of the fabric:/WordCount application deployed on _Node_2. 此实体的 System.Hosting 报告包含成功的服务包和入口点激活以及成功的服务类型注册。The entity has System.Hosting reports for successful service-package and entry-point activation, and successful service-type registration.

PS D:\ServiceFabric> Get-ServiceFabricDeployedApplication -ApplicationName fabric:/WordCount -NodeName _Node_2 | Get-ServiceFabricDeployedServicePackageHealth -ServiceManifestName WordCountServicePkg

ApplicationName            : fabric:/WordCount
ServiceManifestName        : WordCountServicePkg
ServicePackageActivationId : 
NodeName                   : _Node_2
AggregatedHealthState      : Ok
HealthEvents               : 
                             SourceId              : System.Hosting
                             Property              : Activation
                             HealthState           : Ok
                             SequenceNumber        : 131444422267693359
                             SentAt                : 7/13/2017 5:57:06 PM
                             ReceivedAt            : 7/13/2017 5:57:18 PM
                             TTL                   : Infinite
                             Description           : The ServicePackage was activated successfully.
                             RemoveWhenExpired     : False
                             IsExpired             : False
                             Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

                             SourceId              : System.Hosting
                             Property              : CodePackageActivation:Code:EntryPoint
                             HealthState           : Ok
                             SequenceNumber        : 131444422267903345
                             SentAt                : 7/13/2017 5:57:06 PM
                             ReceivedAt            : 7/13/2017 5:57:18 PM
                             TTL                   : Infinite
                             Description           : The CodePackage was activated successfully.
                             RemoveWhenExpired     : False
                             IsExpired             : False
                             Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

                             SourceId              : System.Hosting
                             Property              : ServiceTypeRegistration:WordCountServiceType
                             HealthState           : Ok
                             SequenceNumber        : 131444422272458374
                             SentAt                : 7/13/2017 5:57:07 PM
                             ReceivedAt            : 7/13/2017 5:57:18 PM
                             TTL                   : Infinite
                             Description           : The ServiceType was registered successfully.
                             RemoveWhenExpired     : False
                             IsExpired             : False
                             Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

RESTREST

可以使用 GET 请求POST 请求获取部署的服务包运行状况,其中包括正文中所述的运行状况策略。You can get deployed service package health with a GET request or a POST request that includes health policies described in the body.

运行状况区块查询Health chunk queries

运行状况区块查询可以根据输入筛选器返回多级群集子项(以递归方式)。The health chunk queries can return multi-level cluster children (recursively), per input filters. 它支持可用于非常灵活地选择要返回的子级的高级筛选器。It supports advanced filters that allow a lot of flexibility in choosing the children to be returned. 这些筛选器可以指定通过唯一标识符或通过其他组标识符和/或运行状况指定子级。The filters can specify children by the unique identifier or by other group identifiers and/or health states. 与始终包含第一级子项的运行状况命令不同的是,它在默认情况下不包含任何子项。By default, no children are included, as opposed to health commands that always include first-level children.

运行状况查询根据必要筛选器仅返回指定实体的第一级子项。The health queries return only first-level children of the specified entity per required filters. 若要获取子项的子项,必须调用每个相关实体的附加运行状况 API。To get the children of the children, you must call additional health APIs for each entity of interest. 同样,若要获取特定实体的运行状况,必须调用每个所需实体的一个运行状况 API。Similarly, to get the health of specific entities, you must call one health API for each desired entity. 使用区块查询高级筛选可在一个查询中请求多个相关项目,将消息大小和消息数目降至最低。The chunk query advanced filtering allows you to request multiple items of interest in one query, minimizing the message size and the number of messages.

使用区块查询的值可在一个调用中获取多个群集实体(可能是从必要的根开始的所有群集实体)的运行状况。The value of the chunk query is that you can get health state for more cluster entities (potentially all cluster entities starting at required root) in one call. 可以如下表示复杂的运行状况查询:You can express complex health query such as:

  • 仅返回状态为错误的应用程序,并且针对这些应用程序,包含所有状态为警告或错误的服务。Return only applications in error, and for those applications include all services in warning or error. 针对返回的服务,包含所有分区。For returned services, include all partitions.
  • 仅返回四个应用程序的运行状况,由其名称指定。Return only the health of four applications, specified by their names.
  • 仅返回所需应用程序类型的应用程序运行状况。Return only the health of applications of a desired application type.
  • 返回某个节点上所有已部署实体。Return all deployed entities on a node. 返回所有应用程序、指定节点上所有已部署应用程序,以及该节点上所有已部署服务包。Returns all applications, all deployed applications on the specified node and all the deployed service packages on that node.
  • 返回所有状态为错误的副本。Return all replicas in error. 返回所有应用程序、服务、分区,以及仅返回状态为错误的副本。Returns all applications, services, partitions, and only replicas in error.
  • 返回所有应用程序。Return all applications. 针对指定服务,包含所有分区。For a specified service, include all partitions.

运行状况区块查询目前仅对群集实体公开。Currently, the health chunk query is exposed only for the cluster entity. 它会返回群集运行状况区块,其中包含:It returns a cluster health chunk, which contains:

  • 群集聚合的运行状况状态。The cluster aggregated health state.
  • 采用输入筛选器的节点的运行状况状态区块列表。The health state chunk list of nodes that respect input filters.
  • 采用输入筛选器的应用程序的运行状况状态区块列表。The health state chunk list of applications that respect input filters. 每个应用程序运行状况状态区块都包含下列两个区块列表:包含所有采用输入筛选器的服务的区块列表,以及包含所有采用筛选器的已部署应用程序的区块列表。Each application health state chunk contains a chunk list with all services that respect input filters and a chunk list with all deployed applications that respect the filters. 对于服务和已部署应用程序的子项亦然。Same for the children of services and deployed applications. 这样,群集中的所有实体都有可能在请求时以分层方式返回。This way, all entities in the cluster can be potentially returned if requested, in a hierarchical fashion.

群集运行状况区块查询Cluster health chunk query

返回群集实体的运行状况,并包含必要子项的分层运行状况状态区块。Returns the health of the cluster entity and contains the hierarchical health state chunks of required children. 输入:Input:

  • [可选] 用于评估节点和群集事件的群集运行状况策略。[Optional] The cluster health policy used to evaluate the nodes and the cluster events.
  • [可选] 应用程序运行状况策略与用于取代应用程序清单策略的运行状况策略进行映射。[Optional] The application health policy map, with the health policies used to override the application manifest policies.
  • [可选] 节点和应用程序的筛选器,用于指定有哪些相关项目,并且应该在结果中返回项目。[Optional] Filters for nodes and applications that specify which entries are of interest and should be returned in the result. 筛选器特定于实体/实体组,或适用于该级别的所有实体。The filters are specific to an entity/group of entities or are applicable to all entities at that level. 筛选器列表可包含一个常规筛选器和/或由查询返回的精细实体的特定标识符筛选器。The list of filters can contain one general filter and/or filters for specific identifiers to fine-grain entities returned by the query. 如果筛选器列表为空,默认情况下不会返回任何子项。If empty, the children are not returned by default. 有关筛选器的详细信息,请参阅 NodeHealthStateFilterApplicationHealthStateFilterRead more about the filters at NodeHealthStateFilter and ApplicationHealthStateFilter. 应用程序筛选器可采用递归方式为子项指定高级筛选器。The application filters can recursively specify advanced filters for children.

区块结果包含采用筛选器的子项。The chunk result includes the children that respect the filters.

区块查询目前不会返回不正常的评估或实体事件。Currently, the chunk query does not return unhealthy evaluations or entity events. 可以使用现有的群集运行状况查询获取这些附加信息。That extra information can be obtained using the existing cluster health query.

APIAPI

若要获取群集运行状况,请创建 FabricClient 并在其 HealthManager 上调用 GetClusterHealthChunkAsync 方法。To get cluster health chunk, create a FabricClient and call the GetClusterHealthChunkAsync method on its HealthManager. 可以传入 ClusterHealthQueryDescription 来描述运行状况策略和高级筛选器。You can pass in ClusterHealthQueryDescription to describe health policies and advanced filters.

以下代码使用高级筛选器获取群集运行状况区块。The following code gets cluster health chunk with advanced filters.

var queryDescription = new ClusterHealthChunkQueryDescription();
queryDescription.ApplicationFilters.Add(new ApplicationHealthStateFilter()
    {
        // Return applications only if they are in error
        HealthStateFilter = HealthStateFilter.Error
    });

// Return all replicas
var wordCountServiceReplicaFilter = new ReplicaHealthStateFilter()
    {
        HealthStateFilter = HealthStateFilter.All
    };

// Return all replicas and all partitions
var wordCountServicePartitionFilter = new PartitionHealthStateFilter()
    {
        HealthStateFilter = HealthStateFilter.All
    };
wordCountServicePartitionFilter.ReplicaFilters.Add(wordCountServiceReplicaFilter);

// For specific service, return all partitions and all replicas
var wordCountServiceFilter = new ServiceHealthStateFilter()
{
    ServiceNameFilter = new Uri("fabric:/WordCount/WordCountService"),
};
wordCountServiceFilter.PartitionFilters.Add(wordCountServicePartitionFilter);

// Application filter: for specific application, return no services except the ones of interest
var wordCountApplicationFilter = new ApplicationHealthStateFilter()
    {
        // Always return fabric:/WordCount application
        ApplicationNameFilter = new Uri("fabric:/WordCount"),
    };
wordCountApplicationFilter.ServiceFilters.Add(wordCountServiceFilter);

queryDescription.ApplicationFilters.Add(wordCountApplicationFilter);

var result = await fabricClient.HealthManager.GetClusterHealthChunkAsync(queryDescription);

PowerShellPowerShell

用于获取群集运行状况的 cmdlet 为 Get-ServiceFabricClusterChunkHealthThe cmdlet to get the cluster health is Get-ServiceFabricClusterChunkHealth. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.

以下代码仅在节点处于“错误”状态时才获取节点,只有一个特定节点例外,任何情况下都应返回该节点。The following code gets nodes only if they are in Error except for a specific node, which should always be returned.

PS D:\ServiceFabric> $errorFilter = [System.Fabric.Health.HealthStateFilter]::Error;
$allFilter = [System.Fabric.Health.HealthStateFilter]::All;

$nodeFilter1 = New-Object System.Fabric.Health.NodeHealthStateFilter -Property @{HealthStateFilter=$errorFilter}
$nodeFilter2 = New-Object System.Fabric.Health.NodeHealthStateFilter -Property @{NodeNameFilter="_Node_1";HealthStateFilter=$allFilter}
# Create node filter list that will be passed in the cmdlet
$nodeFilters = New-Object System.Collections.Generic.List[System.Fabric.Health.NodeHealthStateFilter]
$nodeFilters.Add($nodeFilter1)
$nodeFilters.Add($nodeFilter2)

Get-ServiceFabricClusterHealthChunk -NodeFilters $nodeFilters

HealthState                  : Warning
NodeHealthStateChunks        : 
                               TotalCount            : 1

                               NodeName              : _Node_1
                               HealthState           : Ok

ApplicationHealthStateChunks : None

以下 cmdlet 使用应用程序筛选器获取群集区块。The following cmdlet gets cluster chunk with application filters.

PS D:\ServiceFabric> $errorFilter = [System.Fabric.Health.HealthStateFilter]::Error;
$allFilter = [System.Fabric.Health.HealthStateFilter]::All;

# All replicas
$replicaFilter = New-Object System.Fabric.Health.ReplicaHealthStateFilter -Property @{HealthStateFilter=$allFilter}

# All partitions
$partitionFilter = New-Object System.Fabric.Health.PartitionHealthStateFilter -Property @{HealthStateFilter=$allFilter}
$partitionFilter.ReplicaFilters.Add($replicaFilter)

# For WordCountService, return all partitions and all replicas
$svcFilter1 = New-Object System.Fabric.Health.ServiceHealthStateFilter -Property @{ServiceNameFilter="fabric:/WordCount/WordCountService"}
$svcFilter1.PartitionFilters.Add($partitionFilter)

$svcFilter2 = New-Object System.Fabric.Health.ServiceHealthStateFilter -Property @{HealthStateFilter=$errorFilter}

$appFilter = New-Object System.Fabric.Health.ApplicationHealthStateFilter -Property @{ApplicationNameFilter="fabric:/WordCount"}
$appFilter.ServiceFilters.Add($svcFilter1)
$appFilter.ServiceFilters.Add($svcFilter2)

$appFilters = New-Object System.Collections.Generic.List[System.Fabric.Health.ApplicationHealthStateFilter]
$appFilters.Add($appFilter)

Get-ServiceFabricClusterHealthChunk -ApplicationFilters $appFilters

HealthState                  : Error
NodeHealthStateChunks        : None
ApplicationHealthStateChunks : 
                               TotalCount            : 1

                               ApplicationName       : fabric:/WordCount
                               ApplicationTypeName   : WordCount
                               HealthState           : Error
                               ServiceHealthStateChunks : 
                                TotalCount            : 1

                                ServiceName           : fabric:/WordCount/WordCountService
                                HealthState           : Error
                                PartitionHealthStateChunks : 
                                    TotalCount            : 1

                                    PartitionId           : af2e3e44-a8f8-45ac-9f31-4093eb897600
                                    HealthState           : Error
                                    ReplicaHealthStateChunks : 
                                        TotalCount            : 5

                                        ReplicaOrInstanceId   : 131444422293118720
                                        HealthState           : Ok

                                        ReplicaOrInstanceId   : 131444422293118721
                                        HealthState           : Ok

                                        ReplicaOrInstanceId   : 131444422293113678
                                        HealthState           : Ok

                                        ReplicaOrInstanceId   : 131444422293113679
                                        HealthState           : Ok

                                        ReplicaOrInstanceId   : 131444422260002646
                                        HealthState           : Error

以下 cmdlet 返回某个节点上的所有已部署实体。The following cmdlet returns all deployed entities on a node.

PS D:\ServiceFabric> $errorFilter = [System.Fabric.Health.HealthStateFilter]::Error;
$allFilter = [System.Fabric.Health.HealthStateFilter]::All;

$dspFilter = New-Object System.Fabric.Health.DeployedServicePackageHealthStateFilter -Property @{HealthStateFilter=$allFilter}
$daFilter =  New-Object System.Fabric.Health.DeployedApplicationHealthStateFilter -Property @{HealthStateFilter=$allFilter;NodeNameFilter="_Node_2"}
$daFilter.DeployedServicePackageFilters.Add($dspFilter)

$appFilter = New-Object System.Fabric.Health.ApplicationHealthStateFilter -Property @{HealthStateFilter=$allFilter}
$appFilter.DeployedApplicationFilters.Add($daFilter)

$appFilters = New-Object System.Collections.Generic.List[System.Fabric.Health.ApplicationHealthStateFilter]
$appFilters.Add($appFilter)
Get-ServiceFabricClusterHealthChunk -ApplicationFilters $appFilters

HealthState                  : Error
NodeHealthStateChunks        : None
ApplicationHealthStateChunks : 
                               TotalCount            : 2

                               ApplicationName       : fabric:/System
                               HealthState           : Ok
                               DeployedApplicationHealthStateChunks : 
                                TotalCount            : 1

                                NodeName              : _Node_2
                                HealthState           : Ok
                                DeployedServicePackageHealthStateChunks :
                                    TotalCount            : 1

                                    ServiceManifestName   : FAS
                                    ServicePackageActivationId : 
                                    HealthState           : Ok

                               ApplicationName       : fabric:/WordCount
                               ApplicationTypeName   : WordCount
                               HealthState           : Error
                               DeployedApplicationHealthStateChunks : 
                                TotalCount            : 1

                                NodeName              : _Node_2
                                HealthState           : Ok
                                DeployedServicePackageHealthStateChunks :
                                    TotalCount            : 1

                                    ServiceManifestName   : WordCountServicePkg
                                    ServicePackageActivationId : 
                                    HealthState           : Ok

RESTREST

可以使用GET 请求POST 请求获取群集运行状况区块,其中包括正文中所述的运行状况策略和高级筛选器。You can get cluster health chunk with a GET request or a POST request that includes health policies and advanced filters described in the body.

常规查询General queries

常规查询返回指定类型的 Service Fabric 实体的列表。General queries return a list of Service Fabric entities of a specified type. 这些查询通过 API(通过 FabricClient.QueryManager 上的方法)、PowerShell cmdlet 和 REST 来公开。They are exposed through the API (via the methods on FabricClient.QueryManager), PowerShell cmdlets, and REST. 这些查询聚合了来自多个组件的子查询。These queries aggregate subqueries from multiple components. 其中一个组件是运行状况存储,该组件填充每个查询结果的聚合运行状况。One of them is the health store, which populates the aggregated health state for each query result.

备注

常规查询返回实体的聚合运行状况状态,不包含丰富的运行状况数据。General queries return the aggregated health state of the entity and do not contain rich health data. 如果一个实体不正常,可以通过运行状况查询跟进,以获得所有运行状况信息,包括事件、子项运行状况状态和不正常评估。If an entity is not healthy, you can follow up with health queries to get all its health information, including events, child health states, and unhealthy evaluations.

如果常规查询返回实体的未知运行状况状态,则可能表示运行状况存储中不存在有关该实体的完整数据。If general queries return an unknown health state for an entity, it's possible that the health store doesn't have complete data about the entity. 此外,也有可能对运行状况存储的子查询未成功(例如,发生通信错误,或运行状况存储已受限制)。It's also possible that a subquery to the health store wasn't successful (for example, there was a communication error, or the health store was throttled). 通过对实体进行运行状况查询跟进。Follow up with a health query for the entity. 如果子查询发生暂时性错误,例如网络问题,此跟进查询可能成功。If the subquery encountered transient errors, such as network issues, this follow-up query may succeed. 它还可以从运行状况存储提供关于为何实体未公开的详细信息。It may also give you more details from the health store about why the entity is not exposed.

包含实体的 HealthState 的查询为:The queries that contain HealthState for entities are:

备注

有些查询会返回已分页的结果。Some of the queries return paged results. 这些查询的返回结果是派生自 PagedList<T> 的列表。The return of these queries is a list derived from PagedList<T>. 如果一条消息无法容纳这些结果,则仅返回一页,以及一个用于跟踪枚举停止位置的 ContinuationToken。If the results do not fit a message, only a page is returned and a ContinuationToken that tracks where enumeration stopped. 继续调用相同的查询,并从先前的查询传入继续标记以获取后续结果。Continue to call the same query and pass in the continuation token from the previous query to get next results.

示例Examples

以下代码获取群集中不正常的应用程序:The following code gets the unhealthy applications in the cluster:

var applications = fabricClient.QueryManager.GetApplicationListAsync().Result.Where(
  app => app.HealthState == HealthState.Error);

以下 cmdlet 获取 fabric:/WordCount 应用程序的详细信息。The following cmdlet gets the application details for the fabric:/WordCount application. 请注意,运行状况状态为警告。Notice that health state is at warning.

PS C:\> Get-ServiceFabricApplication -ApplicationName fabric:/WordCount

ApplicationName        : fabric:/WordCount
ApplicationTypeName    : WordCount
ApplicationTypeVersion : 1.0.0
ApplicationStatus      : Ready
HealthState            : Warning
ApplicationParameters  : { "WordCountWebService_InstanceCount" = "1";
                         "_WFDebugParams_" = "[{"ServiceManifestName":"WordCountWebServicePkg","CodePackageName":"Code","EntryPointType":"Main","Debug
                         ExePath":"C:\\Program Files (x86)\\Microsoft Visual Studio
                         14.0\\Common7\\Packages\\Debugger\\VsDebugLaunchNotify.exe","DebugArguments":" {74f7e5d5-71a9-47e2-a8cd-1878ec4734f1} -p
                         [ProcessId] -tid [ThreadId]","EnvironmentBlock":"_NO_DEBUG_HEAP=1\u0000"},{"ServiceManifestName":"WordCountServicePkg","CodeP
                         ackageName":"Code","EntryPointType":"Main","DebugExePath":"C:\\Program Files (x86)\\Microsoft Visual Studio
                         14.0\\Common7\\Packages\\Debugger\\VsDebugLaunchNotify.exe","DebugArguments":" {2ab462e6-e0d1-4fda-a844-972f561fe751} -p
                         [ProcessId] -tid [ThreadId]","EnvironmentBlock":"_NO_DEBUG_HEAP=1\u0000"}]" }

以下 cmdlet 获取运行状况为错误的服务:The following cmdlet gets the services with a health state of error:

PS D:\ServiceFabric> Get-ServiceFabricApplication | Get-ServiceFabricService | where {$_.HealthState -eq "Error"}

ServiceName            : fabric:/WordCount/WordCountService
ServiceKind            : Stateful
ServiceTypeName        : WordCountServiceType
IsServiceGroup         : False
ServiceManifestVersion : 1.0.0
HasPersistedState      : True
ServiceStatus          : Active
HealthState            : Error

群集和应用程序升级Cluster and application upgrades

在群集与应用程序的受监视升级期间,Service Fabric 将检查运行状况,以确保一切都能维持在运行状况良好的状态。During a monitored upgrade of the cluster and application, Service Fabric checks health to ensure that everything remains healthy. 如果实体通过使用已配置的运行状况策略评估为不正常,升级过程通过应用升级特定的策略来确定后续措施。If an entity is unhealthy as evaluated by using configured health policies, the upgrade applies upgrade-specific policies to determine the next action. 升级可能会暂停,以允许用户交互(例如修复错误条件或更改策略),或是它自动回滚到以前的正常版本。The upgrade may be paused to allow user interaction (such as fixing error conditions or changing policies), or it may automatically roll back to the previous good version.

群集升级期间,可以获取群集升级状态。During a cluster upgrade, you can get the cluster upgrade status. 升级状态包括状况不正常的评估,指向群集中状况不正常的项目。The upgrade status includes unhealthy evaluations, which point to what is unhealthy in the cluster. 如果升级因运行状况问题而回滚,则升级状态将记住最后的不正常原因。If the upgrade is rolled back due to health issues, the upgrade status remembers the last unhealthy reasons. 此信息可帮助管理员调查升级回滚或停止后发生的问题。This information can help administrators investigate what went wrong after the upgrade rolled back or stopped.

同样,在 应用程序 升级期间,应用程序升级状态也会包含任何不正常的评估。Similarly, during an application upgrade, any unhealthy evaluations are contained in the application upgrade status.

以下代码显示修改后的 fabric:/WordCount 应用程序的应用程序升级状态。The following shows the application upgrade status for a modified fabric:/WordCount application. 监视器在其中一个副本上报告一个错误。A watchdog reported an error on one of its replicas. 因为运行状况检查不合格,升级回滚。The upgrade is rolling back because the health checks are not respected.

PS C:\> Get-ServiceFabricApplicationUpgrade fabric:/WordCount

ApplicationName               : fabric:/WordCount
ApplicationTypeName           : WordCount
TargetApplicationTypeVersion  : 1.0.0.0
ApplicationParameters         : {}
StartTimestampUtc             : 4/21/2017 5:23:26 PM
FailureTimestampUtc           : 4/21/2017 5:23:37 PM
FailureReason                 : HealthCheck
UpgradeState                  : RollingBackInProgress
UpgradeDuration               : 00:00:23
CurrentUpgradeDomainDuration  : 00:00:00
CurrentUpgradeDomainProgress  : UD1

                                NodeName            : _Node_1
                                UpgradePhase        : Upgrading

                                NodeName            : _Node_2
                                UpgradePhase        : Upgrading

                                NodeName            : _Node_3
                                UpgradePhase        : PreUpgradeSafetyCheck
                                PendingSafetyChecks :
                                EnsurePartitionQuorum - PartitionId: 30db5be6-4e20-4698-8185-4bd7ca744020
NextUpgradeDomain             : UD2
UpgradeDomainsStatus          : { "UD1" = "Completed";
                                "UD2" = "Pending";
                                "UD3" = "Pending";
                                "UD4" = "Pending" }
UnhealthyEvaluations          :
                                Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                                  Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Error'.

                                      Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                      Unhealthy partition: PartitionId='a1f83a35-d6bf-4d39-b90d-28d15f39599b', AggregatedHealthState='Error'.

                                          Unhealthy replicas: 20% (1/5), MaxPercentUnhealthyReplicasPerPartition=0%.

                                          Unhealthy replica: PartitionId='a1f83a35-d6bf-4d39-b90d-28d15f39599b',
                                  ReplicaOrInstanceId='131031502346844058', AggregatedHealthState='Error'.

                                              Error event: SourceId='DiskWatcher', Property='Disk'.

UpgradeKind                   : Rolling
RollingUpgradeMode            : UnmonitoredAuto
ForceRestart                  : False
UpgradeReplicaSetCheckTimeout : 00:15:00

了解有关 Service Fabric 应用程序升级的详细信息。Read more about the Service Fabric application upgrade.

使用运行状况评估进行故障排除Use health evaluations to troubleshoot

如果群集或应用程序出现问题,请立即查看群集或应用程序运行状况以找出错误。Whenever there is an issue with the cluster or an application, look at the cluster or application health to pinpoint what is wrong. 不正常评估将提供是什么触发了当前不正常状态的详细信息。The unhealthy evaluations provide details about what triggered the current unhealthy state. 如果需要,可以向下钻取到状况不正常的子实体,以识别根本原因。If you need to, you can drill down into unhealthy child entities to identify the root cause.

例如,将应用程序视为不正常,因为存在针对其副本之一的错误报告。For example, consider an application unhealthy because there is an error report on one of its replicas. 以下 Powershell cmdlet 显示不正常评估:The following Powershell cmdlet shows the unhealthy evaluations:

PS D:\ServiceFabric> Get-ServiceFabricApplicationHealth fabric:/WordCount -EventsFilter None -ServicesFilter None -DeployedApplicationsFilter None -ExcludeHealthStatistics

ApplicationName                 : fabric:/WordCount
AggregatedHealthState           : Error
UnhealthyEvaluations            : 
                                  Unhealthy services: 100% (1/1), ServiceType='WordCountServiceType', MaxPercentUnhealthyServices=0%.

                                  Unhealthy service: ServiceName='fabric:/WordCount/WordCountService', AggregatedHealthState='Error'.

                                    Unhealthy partitions: 100% (1/1), MaxPercentUnhealthyPartitionsPerService=0%.

                                    Unhealthy partition: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', AggregatedHealthState='Error'.

                                        Unhealthy replicas: 20% (1/5), MaxPercentUnhealthyReplicasPerPartition=0%.

                                        Unhealthy replica: PartitionId='af2e3e44-a8f8-45ac-9f31-4093eb897600', ReplicaOrInstanceId='131444422260002646', AggregatedHealthState='Error'.

                                            Error event: SourceId='MyWatchdog', Property='Memory'.

ServiceHealthStates             : None
DeployedApplicationHealthStates : None
HealthEvents                    : None

可以查看副本以获取详细信息:You can look at the replica to get more information:

PS D:\ServiceFabric> Get-ServiceFabricReplicaHealth -ReplicaOrInstanceId 131444422260002646 -PartitionId af2e3e44-a8f8-45ac-9f31-4093eb897600

PartitionId           : af2e3e44-a8f8-45ac-9f31-4093eb897600
ReplicaId             : 131444422260002646
AggregatedHealthState : Error
UnhealthyEvaluations  : 
                        Error event: SourceId='MyWatchdog', Property='Memory'.

HealthEvents          : 
                        SourceId              : System.RA
                        Property              : State
                        HealthState           : Ok
                        SequenceNumber        : 131444422263668344
                        SentAt                : 7/13/2017 5:57:06 PM
                        ReceivedAt            : 7/13/2017 5:57:18 PM
                        TTL                   : Infinite
                        Description           : Replica has been created._Node_2
                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Error->Ok = 7/13/2017 5:57:18 PM, LastWarning = 1/1/0001 12:00:00 AM

                        SourceId              : MyWatchdog
                        Property              : Memory
                        HealthState           : Error
                        SequenceNumber        : 131444451657749403
                        SentAt                : 7/13/2017 6:46:05 PM
                        ReceivedAt            : 7/13/2017 6:46:05 PM
                        TTL                   : Infinite
                        Description           : 
                        RemoveWhenExpired     : False
                        IsExpired             : False
                        Transitions           : Warning->Error = 7/13/2017 6:46:05 PM, LastOk = 1/1/0001 12:00:00 AM

备注

不正常评估会显示实体评估为当前运行状况状态的第一个原因。The unhealthy evaluations show the first reason the entity is evaluated to current health state. 可能有其他多个事件触发此状态,但是评估中不会反映这些事件。There may be multiple other events that trigger this state, but they are not be reflected in the evaluations. 若要获取更多信息,请向下钻取到运行状况实体,找出群集中的所有不正常报告。To get more information, drill down into the health entities to figure out all the unhealthy reports in the cluster.

后续步骤Next steps

使用系统运行状况报告进行故障排除Use system health reports to troubleshoot

添加自定义 Service Fabric 运行状况报告Add custom Service Fabric health reports

如何报告和检查服务运行状况How to report and check service health

在本地监视和诊断服务Monitor and diagnose services locally

Service Fabric 应用程序升级Service Fabric application upgrade