报告和检查服务运行状况Report and check service health

服务发生问题时,必须能够快速检测问题,才能响应并修复所有事件和中断。When your services encounter problems, your ability to respond to and fix incidents and outages depends on your ability to detect the issues quickly. 如果从服务代码向 Azure Service Fabric 运行状况管理器报告问题和失败,可使用 Service Fabric 提供的标准运行状况监视工具来检查运行状况。If you report problems and failures to the Azure Service Fabric health manager from your service code, you can use standard health monitoring tools that Service Fabric provides to check the health status.

可通过三种方式报告服务的运行状况:There are three ways that you can report health from the service:

  • 使用 PartitionCodePackageActivationContext 对象。Use Partition or CodePackageActivationContext objects.
    可以使用 PartitionCodePackageActivationContext 对象在属于当前上下文一部分的项目中报告运行状况。You can use the Partition and CodePackageActivationContext objects to report the health of elements that are part of the current context. 例如,作为副本一部分运行的代码只能报告该副本、其所属的分区,以及其所属应用程序的运行状况。For example, code that runs as part of a replica can report health only on that replica, the partition that it belongs to, and the application that it is a part of.
  • 使用 FabricClientUse FabricClient.
    如果群集不安全或者使用管理员权限运行服务,则可以使用 FabricClient 从服务代码中报告运行状况。You can use FabricClient to report health from the service code if the cluster is not secure or if the service is running with admin privileges. 大多数实际情况下都要求使用安全群集,或提供管理员权限。Most real-world scenarios do not use unsecured clusters, or provide admin privileges. 可以使用 FabricClient报告任何属于群集一部分的实体的运行状况。With FabricClient, you can report health on any entity that is a part of the cluster. 但是,在理想情况下,服务代码应该只发送与其本身运行状况相关的报告。Ideally, however, service code should only send reports that are related to its own health.
  • 在群集、应用程序、部署的应用程序、服务、服务包、分区、副本或节点级别上使用 REST API。Use the REST APIs at the cluster, application, deployed application, service, service package, partition, replica, or node levels. 可以在容器中用来报告运行状况。This can be used to report health from within a container.

本文将引导完成从服务代码报告运行状况的示例。This article walks you through an example that reports health from the service code. 本示例还演示如何使用 Service Fabric 提供的工具检查运行状况。The example also shows how the tools provided by Service Fabric can be used to check the health status. 本文旨在快速介绍 Service Fabric 中的运行状况监视功能。This article is intended to be a quick introduction to the health monitoring capabilities of Service Fabric. 有关详细信息,可以从本文末尾的链接开始,阅读一系列有关运行状况的深入文章。For more detailed information, you can read the series of in-depth articles about health that start with the link at the end of this article.

先决条件Prerequisites

必须已安装以下软件:You must have the following installed:

  • Visual Studio 2015 或 Visual Studio 2017Visual Studio 2015 or Visual Studio 2017
  • Service Fabric SDKService Fabric SDK

创建本地安全开发人员群集To create a local secure dev cluster

  • 以管理员权限打开 PowerShell 并运行以下命令:Open PowerShell with admin privileges, and run the following commands:

演示如何创建安全开发人员群集的命令

部署应用程序并检查其运行状况To deploy an application and check its health

  1. 以管理员的身份打开 Visual Studio。Open Visual Studio as an administrator.

  2. 使用 有状态服务 模板创建一个项目。Create a project by using the Stateful Service template.

    创建包含有状态服务的 Service Fabric 应用程序

  3. F5 以调试模式运行应用程序。Press F5 to run the application in debug mode. 应用程序将部署到本地群集。The application is deployed to the local cluster.

  4. 应用程序运行之后,在通知区域中的本地群集管理员图标上单击右键,然后从快捷菜单中选择“管理本地群集”打开 Service Fabric Explorer。After the application is running, right-click the Local Cluster Manager icon in the notification area and select Manage Local Cluster from the shortcut menu to open Service Fabric Explorer.

    从通知区域打开 Service Fabric Explorer

  5. 应用程序运行状况应如下图所示。The application health should be displayed as in this image. 此时,应用程序应该状况良好而没有任何错误。At this time, the application should be healthy with no errors.

    Service Fabric Explorer 中运行状况正常的应用程序

  6. 也可以使用 PowerShell 来检查运行状况。You can also check the health by using PowerShell. 可以使用 Get-ServiceFabricApplicationHealth 检查应用程序的运行状况,并可以使用 Get-ServiceFabricServiceHealth 来检查服务的运行状况。You can use Get-ServiceFabricApplicationHealth to check an application's health, and you can use Get-ServiceFabricServiceHealth to check a service's health. PowerShell 中针对同一应用程序的运行状况报告如下图所示。The health report for the same application in PowerShell is in this image.

    PowerShell 中运行状况正常的应用程序

将自定义运行状况事件添加到服务代码To add custom health events to your service code

Visual Studio 中的 Service Fabric 项目模板包含示例代码。The Service Fabric project templates in Visual Studio contain sample code. 以下步骤说明如何从服务代码报告自定义运行状况事件。The following steps show how you can report custom health events from your service code. 此类报告会自动显示在 Service Fabric 提供的标准运行状况监视工具中,例如 Service Fabric Explorer、Azure 门户运行状况视图以及 PowerShell。Such reports show up automatically in the standard tools for health monitoring that Service Fabric provides, such as Service Fabric Explorer, Azure portal health view, and PowerShell.

  1. 在 Visual Studio 中重新打开前面创建的应用程序,或者使用有状态服务 Visual Studio 模板创建新应用程序。Reopen the application that you created previously in Visual Studio, or create a new application by using the Stateful Service Visual Studio template.

  2. 打开 Stateful1.cs 文件并在 RunAsync 方法中找到 myDictionary.TryGetValueAsync 调用。Open the Stateful1.cs file, and find the myDictionary.TryGetValueAsync call in the RunAsync method. 可以看到,此方法返回保存当前计数器值的 result ,因为此应用程序中的关键逻辑是使计数保持运行。You can see that this method returns a result that holds the current value of the counter because the key logic in this application is to keep a count running. 如果这是一个真实应用程序,并且缺少结果即意味着失败,那么,你可能想要标记该事件。If this were a real application, and if the lack of result represented a failure, you would want to flag that event.

  3. 若要在缺少结果代表失败时报告运行状况事件,请添加以下步骤。To report a health event when the lack of result represents a failure, add the following steps.

    a.a. System.Fabric.Health 命名空间添加到 Stateful1.cs 文件。Add the System.Fabric.Health namespace to the Stateful1.cs file.

    using System.Fabric.Health;
    

    b.b. myDictionary.TryGetValueAsync 调用的后面添加以下代码Add the following code after the myDictionary.TryGetValueAsync call

    if (!result.HasValue)
    {
        HealthInformation healthInformation = new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error);
        this.Partition.ReportReplicaHealth(healthInformation);
    }
    

    我们将报告副本运行状况,由于它是从有状态服务报告的。We report replica health because it's being reported from a stateful service. HealthInformation 参数存储所要报告的运行状况问题的相关信息。The HealthInformation parameter stores information about the health issue that's being reported.

    如果创建了无状态服务,请使用以下代码If you had created a stateless service, use the following code

    if (!result.HasValue)
    {
        HealthInformation healthInformation = new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error);
        this.Partition.ReportInstanceHealth(healthInformation);
    }
    
  4. 如果使用管理员权限运行服务,或者群集不安全,则也可以使用 FabricClient 来报告运行状况,如以下步骤中所示。If your service is running with admin privileges or if the cluster is not secure, you can also use FabricClient to report health as shown in the following steps.

    a.a. var myDictionary 声明后面创建 FabricClientCreate the FabricClient instance after the var myDictionary declaration.

    var fabricClient = new FabricClient(new FabricClientSettings() { HealthReportSendInterval = TimeSpan.FromSeconds(0) });
    

    b.b. myDictionary.TryGetValueAsync 调用的后面添加以下代码。Add the following code after the myDictionary.TryGetValueAsync call.

    if (!result.HasValue)
    {
       var replicaHealthReport = new StatefulServiceReplicaHealthReport(
            this.Context.PartitionId,
            this.Context.ReplicaId,
            new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error));
        fabricClient.HealthManager.ReportHealth(replicaHealthReport);
    }
    
  5. 让我们模拟这种失败并看看它如何显示在运行状况监视工具中。Let's simulate this failure and see it show up in the health monitoring tools. 若要模拟这种失败,请注释掉之前添加的运行状况报告代码中的第一行。To simulate the failure, comment out the first line in the health reporting code that you added earlier. 注释掉第一行之后,代码将如以下示例所示。After you comment out the first line, the code will look like the following example.

    //if(!result.HasValue)
    {
        HealthInformation healthInformation = new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error);
        this.Partition.ReportReplicaHealth(healthInformation);
    }
    

    每当执行 RunAsync 时,此代码就会触发此运行状况报告。This code fires the health report each time RunAsync executes. 完成更改后,按 F5 运行应用程序。After you make the change, press F5 to run the application.

  6. 运行应用程序后,打开 Service Fabric Explorer 检查应用程序的运行状况。After the application is running, open Service Fabric Explorer to check the health of the application. 这一次,Service Fabric Explorer 显示应用程序状况不正常。This time, Service Fabric Explorer shows that the application is unhealthy. 这是因为我们在前面添加的代码报告了错误。This is because of the error that was reported from the code that we added previously.

    Service Fabric Explorer 中运行状况不正常的应用程序

  7. 如果在 Service Fabric 资源管理器的树视图中选择主副本,会看到 运行状况 也显示为出错。If you select the primary replica in the tree view of Service Fabric Explorer, you will see that Health State indicates an error, too. Service Fabric Explorer 还显示已添加到代码中 HealthInformation 参数的运行状况报告详细信息。Service Fabric Explorer also displays the health report details that were added to the HealthInformation parameter in the code. 可在 PowerShell 和 Azure 门户中查看相同的运行状况报告。You can see the same health reports in PowerShell and the Azure portal.

    Service Fabric Explorer 中的副本运行状况

此报告将保留在运行状况管理器中,直到被另一份报告替换或此副本被删除。This report remains in the health manager until it is replaced by another report or until this replica is deleted. 由于我们未在 HealthInformation 对象中设置此运行状况报告的 TimeToLive,因此报告永不过期。Because we did not set TimeToLive for this health report in the HealthInformation object, the report never expires.

我们建议在最细微的级别(在本例中为副本)报告运行状况。We recommend that health should be reported on the most granular level, which in this case is the replica. 也可以报告 Partition 的运行状况。You can also report health on Partition.

HealthInformation healthInformation = new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error);
this.Partition.ReportPartitionHealth(healthInformation);

若要报告 ApplicationDeployedApplicationDeployedServicePackage 的运行状况,请使用 CodePackageActivationContextTo report health on Application, DeployedApplication, and DeployedServicePackage, use CodePackageActivationContext.

HealthInformation healthInformation = new HealthInformation("ServiceCode", "StateDictionary", HealthState.Error);
var activationContext = FabricRuntime.GetActivationContext();
activationContext.ReportApplicationHealth(healthInformation);

后续步骤Next steps