使用 Windows Azure 诊断扩展进行性能监视Performance monitoring with the Windows Azure Diagnostics extension

本文档介绍通过 Windows 群集的 Windows Azure 诊断 (WAD) 扩展设置性能计数器集合所需的步骤。This document covers steps required to set up collection of performance counters via the Windows Azure Diagnostics (WAD) extension for Windows clusters.

备注

应针对适用于你的这些步骤在群集上部署 WAD 扩展。The WAD extension should be deployed on your cluster for these steps to work for you. 如果还未设置,请查看使用 Windows Azure 诊断的事件聚合和集合If it is not set up, head over to Event aggregation and collection using Windows Azure Diagnostics.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

通过 WadCfg 收集性能计数器Collect performance counters via the WadCfg

若要通过 WAD 收集性能计数器,则需要修改群集“资源管理器”模板中的相应配置。To collect performance counters via WAD, you need to modify the configuration appropriately in your cluster's Resource Manager template. 请按照下列步骤将要收集的性能计数器添加到模板,并运行资源管理器资源升级。Follow these steps to add a performance counter you want to collect to your template and run a Resource Manager resource upgrade.

  1. 在群集模板中找到 WAD 配置 - 查找 WadCfgFind the WAD configuration in your cluster's template - find WadCfg. 你将添加性能计数器以在 DiagnosticMonitorConfiguration 下进行收集。You will be adding performance counters to collect under the DiagnosticMonitorConfiguration.

  2. 通过将以下部分添加到 DiagnosticMonitorConfiguration,设置你的配置,以收集性能计数器。Set up your configuration to collect performance counters by adding the following section to your DiagnosticMonitorConfiguration.

    "PerformanceCounters": {
        "scheduledTransferPeriod": "PT1M",
        "PerformanceCounterConfiguration": []
    }
    

    scheduledTransferPeriod 定义所收集的计数器值传输到 Azure 存储表和任何已配置接收器的频率。The scheduledTransferPeriod defines how frequently the values of the counters that are collected are transferred to your Azure storage table and to any configured sink.

  3. 添加要收集到上一步中声明的 PerformanceCounterConfiguration 中的性能计数器。Add the performance counters you would like to collect to the PerformanceCounterConfiguration that was declared in the previous step. 要收集的每个计数器都通过 counterSpecifiersampleRateunitannotation 和任何相关的 sinks 来定义。Each counter you would like to collect is defined with a counterSpecifier, sampleRate, unit, annotation, and any relevant sinks.

下面是一个配置示例,其中有总处理器时间(CPU 处理操作所用的时间)的计数器和每秒 Service Fabric 执行组件方法调用数的计数器(Service Fabric 的自定义性能计数器之一) 。Here is an example of a configuration with the counter for the Total Processor Time (the amount of time the CPU was in use for processing operations) and Service Fabric Actor Method Invocations per Second, one of the Service Fabric custom performance counters. 若要获取 Service Fabric 自定义性能计数器的完整列表,请参考 Reliable Actor 性能计数器Reliable Service 性能计算器Refer to Reliable Actor Performance Counters and Reliable Service Performance Counters for a full list of Service Fabric custom perf counters.

"WadCfg": {
     "DiagnosticMonitorConfiguration": {
       "overallQuotaInMB": "50000",
       "EtwProviders": {
         "EtwEventSourceProviderConfiguration": [
           {
             "provider": "Microsoft-ServiceFabric-Actors",
             "scheduledTransferKeywordFilter": "1",
             "scheduledTransferPeriod": "PT5M",
             "DefaultEvents": {
               "eventDestination": "ServiceFabricReliableActorEventTable"
             }
           },
           {
             "provider": "Microsoft-ServiceFabric-Services",
             "scheduledTransferPeriod": "PT5M",
             "DefaultEvents": {
               "eventDestination": "ServiceFabricReliableServiceEventTable"
             }
           }
         ],
         "EtwManifestProviderConfiguration": [
           {
             "provider": "cbd93bc2-71e5-4566-b3a7-595d8eeca6e8",
             "scheduledTransferLogLevelFilter": "Information",
             "scheduledTransferKeywordFilter": "4611686018427387904",
             "scheduledTransferPeriod": "PT5M",
             "DefaultEvents": {
               "eventDestination": "ServiceFabricSystemEventTable"
             }
           }
         ]
       },
       "PerformanceCounters": {
             "scheduledTransferPeriod": "PT1M",
             "PerformanceCounterConfiguration": [
                 {
                     "counterSpecifier": "\\Processor(_Total)\\% Processor Time",
                     "sampleRate": "PT1M",
                     "unit": "Percent",
                     "annotation": [
                     ],
                     "sinks": ""
                 },
                 {
                     "counterSpecifier": "\\Service Fabric Actor Method(*)\\Invocations/Sec",
                     "sampleRate": "PT1M",
                 }
             ]
         }
     }
   },

计数器的采样率可根据需要进行修改。The sample rate for the counter can be modified as per your needs. 其格式为 PT<time><unit>,如果想要每秒收集一次计数器,则应设置 "sampleRate": "PT15S"The format for it is PT<time><unit>, so if you want the counter collected every second, then you should set the "sampleRate": "PT15S".

还可在 ARM 模板中使用变量来收集一组性能计数器,为每个进程收集性能计数器时,这些计数器可以派上用场。You can also use variables in your ARM template to collect an array of performance counters, which can come in handy when you collect performance counters per process. 在下面的示例中,我们均使用变量收集每个进程的处理器时间和垃圾回收器时间,然后在节点本身上收集 2 个性能计数器。In the below example, we are collecting processor time and garbage collector time per process and then 2 performance counters on the nodes themselves all using variables.

"variables": {
  "copy": [
      {
        "name": "processorTimeCounters",
        "count": "[length(parameters('monitoredProcesses'))]",
        "input": {
          "counterSpecifier": "\\Process([parameters('monitoredProcesses')[copyIndex('processorTimeCounters')]])\\% Processor Time",
          "sampleRate": "PT1M",
          "unit": "Percent",
          "sinks": "applicationInsights",
          "annotation": [
            {
              "displayName": "[concat(parameters('monitoredProcesses')[copyIndex('processorTimeCounters')],' Processor Time')]",
              "locale": "en-us"
            }
          ]
        }
      },
      {
        "name": "gcTimeCounters",
        "count": "[length(parameters('monitoredProcesses'))]",
        "input": {
          "counterSpecifier": "\\.NET CLR Memory([parameters('monitoredProcesses')[copyIndex('gcTimeCounters')]])\\% Time in GC",
          "sampleRate": "PT1M",
          "unit": "Percent",
          "sinks": "applicationInsights",
          "annotation": [
            {
              "displayName": "[concat(parameters('monitoredProcesses')[copyIndex('gcTimeCounters')],' Time in GC')]",
              "locale": "en-us"
            }
          ]
        }
      }
    ],
    "machineCounters": [
      {
        "counterSpecifier": "\\Memory\\Available Bytes",
        "sampleRate": "PT1M",
        "unit": "KB",
        "sinks": "applicationInsights",
        "annotation": [
          {
            "displayName": "Memory Available Kb",
            "locale": "en-us"
          }
        ]
      },
      {
        "counterSpecifier": "\\Memory\\% Committed Bytes In Use",
        "sampleRate": "PT15S",
        "unit": "percent",
        "annotation": [
          {
            "displayName": "Memory usage",
            "locale": "en-us"
          }
        ]
      }
    ]
  }
....
"WadCfg": {
    "DiagnosticMonitorConfiguration": {
      "overallQuotaInMB": "50000",
      "Metrics": {
        "metricAggregation": [
          {
            "scheduledTransferPeriod": "PT1M"
          }
        ],
        "resourceId": "[resourceId('Microsoft.Compute/virtualMachineScaleSets', variables('vmNodeTypeApp2Name'))]"
      },
      "PerformanceCounters": {
        "scheduledTransferPeriod": "PT1M",
        "PerformanceCounterConfiguration": "[concat(variables ('processorTimeCounters'), variables('gcTimeCounters'),  variables('machineCounters'))]"
      },
....
  1. 在添加需要收集的适当性能计数器后,需要升级群集资源,以让这些更改反映到正在运行的群集中。Once you have added the appropriate performance counters that need to be collected, you need to upgrade your cluster resource so that these changes are reflected in your running cluster. 保存已修改的 template.json 并打开 PowerShell。Save your modified template.json and open up PowerShell. 可以使用 New-AzResourceGroupDeployment 升级群集。You can upgrade your cluster using New-AzResourceGroupDeployment. 该调用需要资源组的名称、更新的模板文件和参数文件,并提示资源管理器对你更新的资源进行相应更改。The call requires the name of the resource group, the updated template file, and the parameters file, and prompts Resource Manager to make appropriate changes to the resources that you updated. 一旦登录到帐户并在正确的订阅中,使用以下命令来运行升级:Once you are signed into your account and are in the right subscription, use the following command to run the upgrade:

    New-AzResourceGroupDeployment -ResourceGroupName <ResourceGroup> -TemplateFile <PathToTemplateFile> -TemplateParameterFile <PathToParametersFile> -Verbose
    
  2. 升级完成后(需要 15-45 分钟,具体取决于是否为首次部署以及资源组的大小),WAD 应收集性能计数器,并将其发送到与群集关联的存储帐户中的 WADPerformanceCountersTable 表中。Once the upgrade finishes rolling out (takes between 15-45 minutes depending on whether it's the first deployment and the size of your resource group), WAD should be collecting the performance counters and sending them to the table named WADPerformanceCountersTable in the storage account associated with your cluster. 通过将 AI 接收器添加到资源管理器模板,查看 Application Insights 中的性能计数器。See your performance counters in Application Insights by adding the AI Sink to the Resource Manager template.

后续步骤Next steps