教程:在 Azure 中监视 Service Fabric 群集Tutorial: Monitor a Service Fabric cluster in Azure

在任何云环境中开发、测试和部署工作负荷时,监视和诊断至关重要。Monitoring and diagnostics are critical to developing, testing, and deploying workloads in any cloud environment. 本教程是一个系列的第二部分,介绍如何使用事件、性能计数器和运行状况报告来监视和诊断 Service Fabric 群集。This tutorial is part two of a series, and shows you how to monitor and diagnose a Service Fabric cluster using events, performance counters, and health reports. 有关更多信息,请阅读有关群集监视基础结构监视的概述。For more information, read the overview about cluster monitoring and infrastructure monitoring.

在本教程中,你将了解如何执行以下操作:In this tutorial, you learn how to:

  • 查看 Service Fabric 事件View Service Fabric events
  • 查询群集事件的 EventStore APIQuery EventStore APIs for cluster events
  • 监视基础架构/收集性能计数器Monitor infrastructure/collect perf counters
  • 查看群集运行状况报告View cluster health reports

在此系列教程中,你会学习如何:In this tutorial series you learn how to:

备注

本文已经过更新,以便使用 Azure Az PowerShell 模块。This article has been updated to use the Azure Az PowerShell module. 若要与 Azure 交互,建议使用的 PowerShell 模块是 Az PowerShell 模块。The Az PowerShell module is the recommended PowerShell module for interacting with Azure. 若要开始使用 Az PowerShell 模块,请参阅安装 Azure PowerShellTo get started with the Az PowerShell module, see Install Azure PowerShell. 若要了解如何迁移到 Az PowerShell 模块,请参阅 将 Azure PowerShell 从 AzureRM 迁移到 AzTo learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.

先决条件Prerequisites

在开始学习本教程之前:Before you begin this tutorial:

使用 Azure Monitor 日志查看 Service Fabric 事件View Service Fabric events using Azure Monitor logs

Azure Monitor 日志收集并分析云中托管的应用程序和服务的遥测,并提供分析工具以帮助最大限度地提高其可用性和性能。Azure Monitor logs collects and analyzes telemetry from applications and services hosted in the cloud and provides analysis tools to help you maximize their availability and performance. 你可以在 Azure Monitor 日志中运行查询,以获取见解并解决群集中发生的问题。You can run queries in Azure Monitor logs to gain insights and troubleshoot what is happening in your cluster.

要访问 Service Fabric 分析解决方案,请转至 Azure 门户,然后选择你在其中创建 Service Fabric 分析解决方案的资源组。To access the Service Fabric Analytics solution, go to the Azure portal and select the resource group in which you created the Service Fabric Analytics solution.

选择资源“ServiceFabric(mysfomsworkspace)”。Select the resource ServiceFabric(mysfomsworkspace).

在“概述”中,将看到每个已启用解决方案的图形磁贴,包括 Service Fabric 的磁贴。In Overview you see tiles in the form of a graph for each of the solutions enabled, including one for Service Fabric. 单击 Service Fabric 图形以转到 Service Fabric 分析解决方案。Click the Service Fabric graph to continue to the Service Fabric Analytics solution.

显示 Service Fabric 图的屏幕截图。

下图是 Service Fabric 分析解决方案的主页。The following image shows the home page of the Service Fabric Analytics solution. 此主页提供了群集中正在发生的事件的快照视图。This home page provides a snapshot view of what's happening in your cluster.

Service Fabric 解决方案

如果创建群集时启用了诊断,则可以看到以下对象的事件:If you enabled diagnostics upon cluster creation, you can see events for

备注

除了现成的 Service Fabric 事件之外,可以通过更新诊断扩展的配置来收集更详细的系统事件。In addition to the Service Fabric events out of the box, more detailed system events can be collected by updating the config of your diagnostics extension.

查看 Service Fabric 事件,包括对节点执行的操作View Service Fabric Events, including actions on nodes

在“Service Fabric 分析”页上,单击“群集事件”对应的图形。On the Service Fabric Analytics page, click on the graph for Cluster Events. 将显示已收集的所有系统事件的日志。The logs for all the system events that have been collected appear. 以下内容摘自 Azure 存储帐户中的“WADServiceFabricSystemEventsTable”以供参考,类似地,接下来看到的 Reliable Services 和 Reliable Actors 事件也都摘自相应的表。For reference, these are from the WADServiceFabricSystemEventsTable in the Azure Storage account, and similarly the reliable services and actors events you see next are from those respective tables.

查询操作通道

该查询使用 Kusto 查询语言,你可以修改该语言以优化要查找的内容。The query uses the Kusto query language, which you can modify to refine what you're looking for. 例如,若要查找针对群集中的节点执行的所有操作,可以使用以下查询。For example, to find all actions taken on nodes in the cluster, you can use the following query. 操作通道事件参考中可以找到下面使用的事件 ID。The event IDs used below are found in the operational channel events reference.

ServiceFabricOperationalEvent
| where EventId < 25627 and EventId > 25619 

Kusto 查询语言非常强大。The Kusto query language is powerful. 以下是一些其他有用的查询。Here are some other useful queries.

通过将查询保存为具有别名 ServiceFabricEvent 的函数,将 ServiceFabricEvent 查找表创建为用户定义的函数:Create a ServiceFabricEvent lookup table as user-defined function by saving the query as a function with alias ServiceFabricEvent:

let ServiceFabricEvent = datatable(EventId: int, EventName: string)
[
    ...
    18603, 'NodeUpOperational',
    18604, 'NodeDownOperational',
    ...
];
ServiceFabricEvent

返回过去一小时内记录的操作事件:Return operational events recorded in the last hour:

ServiceFabricOperationalEvent
| where TimeGenerated > ago(1h)
| join kind=leftouter ServiceFabricEvent on EventId
| project EventId, EventName, TaskName, Computer, ApplicationName, EventMessage, TimeGenerated
| sort by TimeGenerated

使用 EventId == 18604 和 EventName ==‘NodeDownOperational’ 返回操作事件:Return operational events with EventId == 18604 and EventName == 'NodeDownOperational':

ServiceFabricOperationalEvent
| where EventId == 18604
| project EventId, EventName = 'NodeDownOperational', TaskName, Computer, EventMessage, TimeGenerated
| sort by TimeGenerated 

使用 EventId == 18604 和 EventName ==‘NodeUpOperational’ 返回操作事件:Return operational events with EventId == 18604 and EventName == 'NodeUpOperational':

ServiceFabricOperationalEvent
| where EventId == 18603
| project EventId, EventName = 'NodeUpOperational', TaskName, Computer, EventMessage, TimeGenerated
| sort by TimeGenerated 

使用 HealthState == 3(错误)返回运行状况报告,并从 EventMessage 字段中提取其他属性:Returns Health Reports with HealthState == 3 (Error) and extract additional properties from the EventMessage field:

ServiceFabricOperationalEvent
| join kind=leftouter ServiceFabricEvent on EventId
| extend HealthStateId = extract(@"HealthState=(\S+) ", 1, EventMessage, typeof(int))
| where TaskName == 'HM' and HealthStateId == 3
| extend SourceId = extract(@"SourceId=(\S+) ", 1, EventMessage, typeof(string)),
         Property = extract(@"Property=(\S+) ", 1, EventMessage, typeof(string)),
         HealthState = case(HealthStateId == 0, 'Invalid', HealthStateId == 1, 'Ok', HealthStateId == 2, 'Warning', HealthStateId == 3, 'Error', 'Unknown'),
         TTL = extract(@"TTL=(\S+) ", 1, EventMessage, typeof(string)),
         SequenceNumber = extract(@"SequenceNumber=(\S+) ", 1, EventMessage, typeof(string)),
         Description = extract(@"Description='([\S\s, ^']+)' ", 1, EventMessage, typeof(string)),
         RemoveWhenExpired = extract(@"RemoveWhenExpired=(\S+) ", 1, EventMessage, typeof(bool)),
         SourceUTCTimestamp = extract(@"SourceUTCTimestamp=(\S+)", 1, EventMessage, typeof(datetime)),
         ApplicationName = extract(@"ApplicationName=(\S+) ", 1, EventMessage, typeof(string)),
         ServiceManifest = extract(@"ServiceManifest=(\S+) ", 1, EventMessage, typeof(string)),
         InstanceId = extract(@"InstanceId=(\S+) ", 1, EventMessage, typeof(string)),
         ServicePackageActivationId = extract(@"ServicePackageActivationId=(\S+) ", 1, EventMessage, typeof(string)),
         NodeName = extract(@"NodeName=(\S+) ", 1, EventMessage, typeof(string)),
         Partition = extract(@"Partition=(\S+) ", 1, EventMessage, typeof(string)),
         StatelessInstance = extract(@"StatelessInstance=(\S+) ", 1, EventMessage, typeof(string)),
         StatefulReplica = extract(@"StatefulReplica=(\S+) ", 1, EventMessage, typeof(string))

使用 EventId != 17523 返回事件的时间表:Return a time chart of events with EventId != 17523:

ServiceFabricOperationalEvent
| join kind=leftouter ServiceFabricEvent on EventId
| where EventId != 17523
| summarize Count = count() by Timestamp = bin(TimeGenerated, 1h), strcat(tostring(EventId), " - ", case(EventName != "", EventName, "Unknown"))
| render timechart 

获取与特定服务和节点聚合的 Service Fabric 操作事件:Get Service Fabric operational events aggregated with the specific service and node:

ServiceFabricOperationalEvent
| where ApplicationName  != "" and ServiceName != ""
| summarize AggregatedValue = count() by ApplicationName, ServiceName, Computer 

使用跨资源查询通过 EventId/EventName 呈现 Service Fabric 事件的计数:Render the count of Service Fabric events by EventId / EventName using a cross-resource query:

app('PlunkoServiceFabricCluster').traces
| where customDimensions.ProviderName == 'Microsoft-ServiceFabric'
| extend EventId = toint(customDimensions.EventId), TaskName = tostring(customDimensions.TaskName)
| where EventId != 17523
| join kind=leftouter ServiceFabricEvent on EventId
| extend EventName = case(EventName != '', EventName, 'Undocumented')
| summarize ["Event Count"]= count() by bin(timestamp, 30m), EventName = strcat(tostring(EventId), " - ", EventName)
| render timechart

查看 Service Fabric 应用程序事件View Service Fabric application events

可以查看在群集上部署的 Reliable Services 和 Reliable Actors 应用程序的事件。You can view events for the reliable services and reliable actors applications deployed on the cluster. 在“Service Fabric 分析”页上,单击“应用程序事件”对应的图形。On the Service Fabric Analytics page, click the graph for Application Events.

运行以下查询以查看 Reliable Services 应用程序的事件:Run the following query to view events from your reliable services applications:

ServiceFabricReliableServiceEvent
| sort by TimeGenerated desc

可以看到服务 runasync 在启动和完成时(通常发生在部署和升级时)的不同事件。You can see different events for when the service runasync is started and completed which typically happens on deployments and upgrades.

Service Fabric 解决方案 Reliable Services

此外,还可以使用 ServiceName == "fabric:/Watchdog/WatchdogService" 查找 Reliable Services 的事件:You can also find events for the reliable service with ServiceName == "fabric:/Watchdog/WatchdogService":

ServiceFabricReliableServiceEvent
| where ServiceName == "fabric:/Watchdog/WatchdogService"
| project TimeGenerated, EventMessage
| order by TimeGenerated desc  

可以类似的方式查看 Reliable Actors 事件:Reliable actor events can be viewed in a similar fashion:

ServiceFabricReliableActorEvent
| sort by TimeGenerated desc

要为可靠的 Reliable Actor 配置更详细的事件,可以在群集模板中更改诊断扩展配置中的 scheduledTransferKeywordFilterTo configure more detailed events for reliable actors, you can change the scheduledTransferKeywordFilter in the config for the diagnostic extension in the cluster template. Reliable Actors 事件参考中提供了这些参数值的详细信息。Details on the values for these are in the reliable actors events reference.

"EtwEventSourceProviderConfiguration": [
                {
                    "provider": "Microsoft-ServiceFabric-Actors",
                    "scheduledTransferKeywordFilter": "1",
                    "scheduledTransferPeriod": "PT5M",
                    "DefaultEvents": {
                    "eventDestination": "ServiceFabricReliableActorEventTable"
                    }
                },

使用 Azure Monitor 日志查看性能计数器View performance counters with Azure Monitor logs

要查看性能计数器,请转至 Azure 门户以及你在其中创建 Service Fabric 分析解决方案的资源组。To view performance counters, go to the Azure portal and the resource group in which you created the Service Fabric Analytics solution.

选择资源“ServiceFabric (mysfomsworkspace)”,并选择“Log Analytics 工作区”,然后选择“高级设置” 。Select the resource ServiceFabric(mysfomsworkspace), then Log Analytics Workspace, and then Advanced Settings.

单击“数据”,然后单击“Windows 性能计数器”。Click Data, then click Windows Performance Counters. 此时会显示一个可以选择的默认计数器列表,此外还可以设置收集间隔。There is a list of default counters you can choose to enable and you can set the interval for collection too. 还可以添加要收集的其他性能计数器You can also add additional performance counters to collect. 参考文章中介绍了正确的格式。The proper format is referenced in this article. 单击“保存”,然后单击“确定” 。Click Save, then click OK.

关闭“高级设置”边栏选项卡,然后在“常规”标题下选择“工作区摘要” 。Close the Advanced Settings blade and select Workspace summary under the General heading. 对于启用的每个解决方案,都有一个图形磁贴,包括 Service Fabric 的磁贴。For each of the solutions enabled there is a graphical tile, including one for Service Fabric. 单击 Service Fabric 图形以转到 Service Fabric 分析解决方案。Click the Service Fabric graph to continue to the Service Fabric Analytics solution.

操作通道和 Reliable Services 事件也有图形磁贴。There are graphical tiles for operational channel and reliable services events. 已选择计数器的流入数据的图形表示形式将会显示在“节点指标”下。The graphical representation of the data flowing in for the counters you have selected will appear under Node Metrics.

选择“容器指标”图形可了解更多详细信息。Select the Container Metric graph to see additional details. 还可以使用 Kusto 查询语言,像查询群集事件一样查询性能计数器数据,以及基于节点、性能计数器名称和值进行筛选。You can also query on performance counter data similarly to cluster events and filter on the nodes, perf counter name, and values using the Kusto query language.

查询 EventStore 服务Query the EventStore service

EventStore 服务提供了在给定时间点了解群集或工作负载状态的方法。The EventStore service provides a way to understand the state of your cluster or workloads at a given point in time. EventStore 是有状态 Service Fabric 服务,它维护群集中的事件。The EventStore is a stateful Service Fabric service that maintains events from the cluster. 事件通过 Service Fabric Explorer、REST 和 API 公开。The events are exposed through the Service Fabric Explorer, REST, and APIs. EventStore 直接查询群集以获取群集中任何实体的诊断数据,要查看 EventStore 中可用事件的完整列表,请参阅 Service Fabric 事件EventStore queries the cluster directly to get diagnostics data on any entity in your cluster To see a full list of events available in the EventStore, see Service Fabric events.

可以使用 Service Fabric 客户端库以编程方式查询 EventStore API。The EventStore APIs can be queried programmatically using the Service Fabric client library.

以下是通过 GetClusterEventListAsync 函数查询 2018-04-03T18:00:00Z 和 2018-04-04T18:00:00Z 之间的所有群集事件的示例请求。Here is an example request for all cluster events between 2018-04-03T18:00:00Z and 2018-04-04T18:00:00Z, via the GetClusterEventListAsync function.

var sfhttpClient = ServiceFabricClientFactory.Create(clusterUrl, settings);

var clstrEvents = sfhttpClient.EventsStore.GetClusterEventListAsync(
    "2018-04-03T18:00:00Z",
    "2018-04-04T18:00:00Z")
    .GetAwaiter()
    .GetResult()
    .ToList();

以下是另一个查询 2018 年 9 月群集运行状况和所有节点事件并将其打印出来的示例。Here is another example that queries for the cluster health and all node events in September 2018 and prints them out.

const int timeoutSecs = 60;
var clusterUrl = new Uri(@"http://localhost:19080"); // This example is for a Local cluster
var sfhttpClient = ServiceFabricClientFactory.Create(clusterUrl);

var clusterHealth = sfhttpClient.Cluster.GetClusterHealthAsync().GetAwaiter().GetResult();
Console.WriteLine("Cluster Health: {0}", clusterHealth.AggregatedHealthState.Value.ToString());

Console.WriteLine("Querying for node events...");
var nodesEvents = sfhttpClient.EventsStore.GetNodesEventListAsync(
    "2018-09-01T00:00:00Z",
    "2018-09-30T23:59:59Z",
    timeoutSecs,
    "NodeDown,NodeUp")
    .GetAwaiter()
    .GetResult()
    .ToList();
Console.WriteLine("Result Count: {0}", nodesEvents.Count());

foreach (var nodeEvent in nodesEvents)
{
    Console.Write("Node event happened at {0}, Node name: {1} ", nodeEvent.TimeStamp, nodeEvent.NodeName);
    if (nodeEvent is NodeDownEvent)
    {
        var nodeDownEvent = nodeEvent as NodeDownEvent;
        Console.WriteLine("(Node is down, and it was last up at {0})", nodeDownEvent.LastNodeUpAt);
    }
    else if (nodeEvent is NodeUpEvent)
    {
        var nodeUpEvent = nodeEvent as NodeUpEvent;
        Console.WriteLine("(Node is up, and it was last down at {0})", nodeUpEvent.LastNodeDownAt);
    }
}

监视群集运行状况Monitor cluster health

Service Fabric 引入了一种具有运行状况实体的运行状况模型,系统组件和监视器可以在其上报告它们监视的本地状况。Service Fabric introduces a health model with health entities on which system components and watchdogs can report local conditions that they are monitoring. 运行状况存储聚合所有运行状况数据以确定实体是否正常运行。The health store aggregates all health data to determine whether entities are healthy.

群集会自动被系统组件发送的运行状况报告所填充。The cluster is automatically populated with health reports sent by the system components. 使用系统运行状况报告进行故障排除了解更多信息。Read more at Use system health reports to troubleshoot.

Service Fabric 为每个支持的实体类型提供运行状况查询。Service Fabric exposes health queries for each of the supported entity types. 可以通过 API(使用 FabricClient.HealthManager 上的方法)、PowerShell cmdlet 和 REST 访问它们。They can be accessed through the API, using methods on FabricClient.HealthManager, PowerShell cmdlets, and REST. 这些查询返回有关实体的完整运行状况信息:聚合运行状况、实体运行状况事件、子运行状况(在适用时)、不正常评估(实体不正常时)以及子集运行状况统计信息(在适用时)。These queries return complete health information about the entity: the aggregated health state, entity health events, child health states (when applicable), unhealthy evaluations (when the entity is not healthy), and children health statistics (when applicable).

获取群集运行状况Get cluster health

Get-ServiceFabricClusterHealth cmdlet 返回群集实体的运行状况,并包含应用程序和节点(群集的子项)的运行状况。The Get-ServiceFabricClusterHealth cmdlet returns the health of the cluster entity and contains the health states of applications and nodes (children of the cluster). 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster using the Connect-ServiceFabricCluster cmdlet.

群集的状态是 11 个节点、系统应用程序和 fabric:/Voting 按所述进行配置。The state of the cluster is 11 nodes, the system application, and fabric:/Voting configured as described.

以下示例使用默认运行状况策略获取群集运行状况。The following example gets cluster health by using default health policies. 11 个节点处于正常状态,但是群集的聚合运行状况是错误,因为 fabric:/Voting 应用程序处于错误状态。The 11 nodes are healthy but the cluster aggregated health state is Error because the fabric:/Voting application is in Error. 请注意不正常评估如何提供触发聚合运行状况的详细条件。Note how the unhealthy evaluations provide details on the conditions that triggered the aggregated health.

Get-ServiceFabricClusterHealth

AggregatedHealthState   : Error
UnhealthyEvaluations    : 
                          100% (1/1) applications are unhealthy. The evaluation tolerates 0% unhealthy applications.

                          Application 'fabric:/Voting' is in Error.

                            33% (1/3) deployed applications are unhealthy. The evaluation tolerates 0% unhealthy deployed applications.

                            Deployed application on node '_nt2vm_3' is in Error.

                                50% (1/2) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '8723eb73-9b83-406b-9de3-172142ba15f3' is in Error.

                                    'System.Hosting' reported Error for property 'CodePackageActivation:Code:SetupEntryPoint:131959376195593305'.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1

NodeHealthStates        : 
                          NodeName              : _nt2vm_3
                          AggregatedHealthState : Ok

                          NodeName              : _nt1vm_4
                          AggregatedHealthState : Ok

                          NodeName              : _nt2vm_2
                          AggregatedHealthState : Ok

                          NodeName              : _nt1vm_3
                          AggregatedHealthState : Ok

                          NodeName              : _nt2vm_1
                          AggregatedHealthState : Ok

                          NodeName              : _nt1vm_2
                          AggregatedHealthState : Ok

                          NodeName              : _nt2vm_0
                          AggregatedHealthState : Ok

                          NodeName              : _nt1vm_1
                          AggregatedHealthState : Ok

                          NodeName              : _nt1vm_0
                          AggregatedHealthState : Ok

                          NodeName              : _nt3vm_0
                          AggregatedHealthState : Ok

                          NodeName              : _nt2vm_4
                          AggregatedHealthState : Ok

ApplicationHealthStates : 
                          ApplicationName       : fabric:/System
                          AggregatedHealthState : Ok

                          ApplicationName       : fabric:/Voting
                          AggregatedHealthState : Error

HealthEvents            : None
HealthStatistics        : 
                          Node                  : 11 Ok, 0 Warning, 0 Error
                          Replica               : 4 Ok, 0 Warning, 0 Error
                          Partition             : 2 Ok, 0 Warning, 0 Error
                          Service               : 2 Ok, 0 Warning, 0 Error
                          DeployedServicePackage : 3 Ok, 1 Warning, 1 Error
                          DeployedApplication   : 1 Ok, 1 Warning, 1 Error
                          Application           : 0 Ok, 0 Warning, 1 Error

以下示例使用自定义应用程序策略获取群集的运行状况。The following example gets the health of the cluster by using a custom application policy. 它筛选结果以只获取有错误或警告的应用程序和节点。It filters results to get only applications and nodes in error or warning. 此示例中不会返回任何节点,因为这些节点都是正常的。In this example no nodes are returned, as they are all healthy. 仅 fabric:/Voting 应用程序符合应用程序筛选器。Only the fabric:/Voting application respects the applications filter. 因为自定义策略指定对于 fabric:/Voting 应用程序将警告视为错误,应用程序被评估为错误,从而群集也被评估为错误。Because the custom policy specifies to consider warnings as errors for the fabric:/Voting application, the application is evaluated as in error and so is the cluster.

$appHealthPolicy = New-Object -TypeName System.Fabric.Health.ApplicationHealthPolicy
$appHealthPolicy.ConsiderWarningAsError = $true
$appHealthPolicyMap = New-Object -TypeName System.Fabric.Health.ApplicationHealthPolicyMap
$appUri1 = New-Object -TypeName System.Uri -ArgumentList "fabric:/Voting"
$appHealthPolicyMap.Add($appUri1, $appHealthPolicy)
Get-ServiceFabricClusterHealth -ApplicationHealthPolicyMap $appHealthPolicyMap -ApplicationsFilter "Warning,Error" -NodesFilter "Warning,Error" -ExcludeHealthStatistics

AggregatedHealthState   : Error
UnhealthyEvaluations    : 
                          100% (1/1) applications are unhealthy. The evaluation tolerates 0% unhealthy applications.

                          Application 'fabric:/Voting' is in Error.

                            100% (5/5) deployed applications are unhealthy. The evaluation tolerates 0% unhealthy deployed applications.

                            Deployed application on node '_nt2vm_3' is in Error.

                                50% (1/2) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '8723eb73-9b83-406b-9de3-172142ba15f3' is in Error.

                                    'System.Hosting' reported Error for property 'CodePackageActivation:Code:SetupEntryPoint:131959376195593305'.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1

                            Deployed application on node '_nt2vm_2' is in Error.

                                50% (1/2) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '2466f2f9-d5fd-410c-a6a4-5b1e00630cca' is in Error.

                                    'System.Hosting' reported Error for property 'CodePackageActivation:Code:SetupEntryPoint:131959376486201388'.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1

                            Deployed application on node '_nt2vm_4' is in Error.

                                100% (1/1) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '5faa5201-eede-400a-865f-07f7f886aa32' is in Error.

                                    'System.Hosting' reported Warning for property 'CodePackageActivation:Code:SetupEntryPoint:131959376207396204'. The evaluation treats 
                          Warning as Error.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1

                            Deployed application on node '_nt2vm_0' is in Error.

                                100% (1/1) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '204f1783-f774-4f3a-b371-d9983afaf059' is in Error.

                                    'System.Hosting' reported Error for property 'CodePackageActivation:Code:SetupEntryPoint:131959375885791093'.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1

                            Deployed application on node '_nt3vm_0' is in Error.

                                50% (1/2) deployed service packages are unhealthy.

                                Service package for manifest 'VotingWebPkg' and service package activation ID '2533ae95-2d2a-4f8b-beef-41e13e4c0081' is in Error.

                                    'System.Hosting' reported Error for property 'CodePackageActivation:Code:SetupEntryPoint:131959376108346272'.
                                    There was an error during CodePackage activation.The service host terminated with exit code:1                         

NodeHealthStates        : None
ApplicationHealthStates : 
                          ApplicationName       : fabric:/Voting
                          AggregatedHealthState : Error

HealthEvents            : None

获取节点运行状况Get node health

Get-ServiceFabricNodeHealth cmdlet 返回节点实体的运行状况,并包含针对该节点报告的运行状况事件。The Get-ServiceFabricNodeHealth cmdlet returns the health of a node entity and contains the health events reported on the node. 首先使用 Connect-ServiceFabricCluster cmdlet 连接到群集。First, connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. 以下示例使用默认运行状况策略获取特定节点的运行状况:The following example gets the health of a specific node by using default health policies:

Get-ServiceFabricNodeHealth _nt1vm_3

以下示例获取群集中所有节点的运行状况:The following example gets the health of all nodes in the cluster:

Get-ServiceFabricNode | Get-ServiceFabricNodeHealth | select NodeName, AggregatedHealthState | ft -AutoSize

获取系统服务运行状况Get system service health

获取系统服务的聚合运行状况:Get the aggregated health of the system services:

Get-ServiceFabricService -ApplicationName fabric:/System | Get-ServiceFabricServiceHealth | select ServiceName, AggregatedHealthState | ft -AutoSize

后续步骤Next steps

在本教程中,你了解了如何执行以下操作:In this tutorial, you learned how to:

  • 查看 Service Fabric 事件View Service Fabric events
  • 查询群集事件的 EventStore APIQuery EventStore APIs for cluster events
  • 监视基础架构/收集性能计数器Monitor infrastructure/collect perf counters
  • 查看群集运行状况报告View cluster health reports

接下来,请转到下一个教程了解如何缩放群集。Next, advance to the following tutorial to learn how to scale a cluster.