将监视和诊断与 Windows VM 和 Azure Resource Manager 模板配合使用Use monitoring and diagnostics with a Windows VM and Azure Resource Manager templates

Azure 诊断扩展可在基于 Windows 的 Azure 虚拟机上提供监视和诊断功能。The Azure Diagnostics Extension provides the monitoring and diagnostics capabilities on a Windows-based Azure virtual machine. 通过将该扩展纳入为 Azure 资源管理器模板的一部分,可以在虚拟机上启用这些功能。You can enable these capabilities on the virtual machine by including the extension as part of the Azure Resource Manager template. 有关将任何扩展纳入为虚拟机模板一部分的详细信息,请参阅使用 VM 扩展创作 Azure Resource Manager 模板See Authoring Azure Resource Manager Templates with VM Extensions for more information on including any extension as part of a virtual machine template. 本文介绍了如何将 Azure 诊断扩展添加到 Windows 虚拟机模板中。This article describes how you can add the Azure Diagnostics extension to a windows virtual machine template.

将 Azure 诊断扩展添加到 VM 资源定义中Add the Azure Diagnostics extension to the VM resource definition

要在 Windows 虚拟机上启用诊断扩展,需要将该扩展添加为资源管理器模板中的 VM 资源。To enable the diagnostics extension on a Windows Virtual Machine, you need to add the extension as a VM resource in the Resource Manager template.

对于基于 Resource Manager 的简单虚拟机,请将扩展配置添加到该虚拟机的 resources 数组:For a simple Resource Manager based Virtual Machine add the extension configuration to the resources array for the Virtual Machine:

"resources": [
    {
        "name": "Microsoft.Insights.VMDiagnosticsSettings",
        "type": "extensions",
        "location": "[resourceGroup().location]",
        "apiVersion": "2015-06-15",
        "dependsOn": [
            "[concat('Microsoft.Compute/virtualMachines/', variables('vmName'))]"
        ],
        "tags": {
            "displayName": "AzureDiagnostics"
        },
        "properties": {
            "publisher": "Microsoft.Azure.Diagnostics",
            "type": "IaaSDiagnostics",
            "typeHandlerVersion": "1.5",
            "autoUpgradeMinorVersion": true,
            "settings": {
                "xmlCfg": "[base64(concat(variables('wadcfgxstart'), variables('wadmetricsresourceid'), variables('vmName'), variables('wadcfgxend')))]",
                "storageAccount": "[parameters('existingdiagnosticsStorageAccountName')]"
            },
            "protectedSettings": {
                "storageAccountName": "[parameters('existingdiagnosticsStorageAccountName')]",
                "storageAccountKey": "[listkeys(variables('accountid'), '2015-05-01-preview').key1]",
                "storageAccountEndPoint": "https://core.chinacloudapi.cn"
            }
        }
    }
]

另一个常见惯例是在模板的根资源节点处添加扩展配置,而不是在虚拟机的资源节点下进行定义。Another common convention is to add the extension configuration at the root resources node of the template instead of defining it under the virtual machine's resources node. 使用这个方法时,必须用 name 和 type 值显式指定扩展与虚拟机之间的分层关系。With this approach, you have to explicitly specify a hierarchical relation between the extension and the virtual machine with the name and type values. 例如:For example:

"name": "[concat(variables('vmName'),'Microsoft.Insights.VMDiagnosticsSettings')]",
"type": "Microsoft.Compute/virtualMachines/extensions",

扩展始终与虚拟机关联,你可以直接在虚拟机的资源节点下定义扩展,也可以在基础级别定义扩展并使用分层命名约定将其与虚拟机关联。The extension is always associated with the virtual machine, you can either directly define it under the virtual machine's resource node directly or define it at the base level and use the hierarchical naming convention to associate it with the virtual machine.

对于虚拟机规模集,扩展配置是在 VirtualMachineProfileextensionProfile 属性中指定的。For Virtual Machine Scale Sets the extensions configuration is specified in the extensionProfile property of the VirtualMachineProfile.

值为 Microsoft.Azure.Diagnosticspublisher 属性和值为 IaaSDiagnosticstype 属性可唯一标识 Azure 诊断扩展。The publisher property with the value of Microsoft.Azure.Diagnostics and the type property with the value of IaaSDiagnostics uniquely identify the Azure Diagnostics extension.

name 属性的值可用来引用资源组中的扩展。The value of the name property can be used to refer to the extension in the resource group. 特别将其设为 Microsoft.Insights.VMDiagnosticsSettings 后,它可以轻松被 Azure 门户识别,从而确保监视图表在 Azure 门户中正确显示。Setting it specifically to Microsoft.Insights.VMDiagnosticsSettings enables it to be easily identified by the Azure portal ensuring that the monitoring charts show up correctly in the Azure portal.

typeHandlerVersion 指定要使用的扩展的版本。The typeHandlerVersion specifies the version of the extension you would like to use. 将 autoUpgradeMinorVersion 次要版本设置为 true 可确保获得可用的最新扩展次要版本。Setting autoUpgradeMinorVersion minor version to true ensures that you get the latest Minor version of the extension that is available. 强烈建议始终将 autoUpgradeMinorVersion 设置为 true,这样就可以随时获得并使用具有所有新功能和缺陷修复的最新的可用诊断扩展。It is highly recommended that you always set autoUpgradeMinorVersion to always be true so that you always get to use the latest available diagnostics extension with all the new features and bug fixes.

settings 元素包含扩展的配置属性(有时称为公共配置),这些属性可以从扩展设置和读回。The settings element contains configurations properties for the extension that can be set and read back from the extension (sometimes referred to as public configuration). xmlcfg 属性包含由诊断代理收集的诊断日志、性能计数器等内容的基于 xml 的配置。The xmlcfg property contains xml based configuration for the diagnostics logs, performance counters etc that are collected by the diagnostics agent. 有关 xml 架构本身的详细信息,请参阅诊断配置架构See Diagnostics Configuration Schema for more information about the xml schema itself. 常见的做法是将实际的 xml 配置存储为 Azure Resource Manager 模板中的变量,然后再进行连接和 base64 编码,以设置 xmlcfg 的值。A common practice is to store the actual xml configuration as a variable in the Azure Resource Manager template and then concatenate and base64 encode them to set the value for xmlcfg. 请参阅诊断配置变量部分,深入了解如何在变量中存储 xml。See the section on diagnostics configuration variables to understand more about how to store the xml in variables. storageAccount 属性指定向其传输诊断数据的存储帐户的名称。The storageAccount property specifies the name of the storage account to which diagnostics data is transferred.

protectedSettings 中的属性(有时称为专用配置)可设置,但在设置之后无法读回。The properties in protectedSettings (sometimes referred to as private configuration) can be set but cannot be read back after being set. protectedSettings 的只写性质使其非常适合存储类似存储帐户密钥(写入诊断数据的位置)的密码。The write-only nature of protectedSettings makes it useful for storing secrets like the storage account key where the diagnostics data is written.

将诊断存储帐户指定为参数Specifying diagnostics storage account as parameters

上述诊断扩展 json 代码片段采用两个参数:existingdiagnosticsStorageAccountName 和 existingdiagnosticsStorageResourceGroup,指定存储诊断数据的诊断存储帐户。The diagnostics extension json snippet above assumes two parameters existingdiagnosticsStorageAccountName and existingdiagnosticsStorageResourceGroup to specify the diagnostics storage account where diagnostics data is stored. 将诊断存储帐户指定为参数可轻松地跨不同环境更改诊断存储帐户,例如,可能想要使用不同诊断存储帐户进行测试,并且使用另外一个进行生产部署。Specifying the diagnostics storage account as a parameter makes it easy to change the diagnostics storage account across different environments, for example you may want to use a different diagnostics storage account for testing and a different one for your production deployment.

"existingdiagnosticsStorageAccountName": {
    "type": "string",
    "metadata": {
"description": "The name of an existing storage account to which diagnostics data is transfered."
    }
},
"existingdiagnosticsStorageResourceGroup": {
    "type": "string",
    "metadata": {
"description": "The resource group for the storage account specified in existingdiagnosticsStorageAccountName"
    }
}

最佳做法是在不同于虚拟机资源组的其他资源组中指定诊断存储帐户。It is best practice to specify a diagnostics storage account in a different resource group than the resource group for the virtual machine. 资源组可以视为具有自己的生存期的部署单位,可以部署虚拟机以及在新配置更新时重新部署,但是你可能想要跨这些虚拟机部署继续在相同的存储帐户中存储诊断数据。A resource group can be considered to be a deployment unit with its own lifetime, a virtual machine can be deployed and redeployed as new configurations updates are made it to it but you may want to continue storing the diagnostics data in the same storage account across those virtual machine deployments. 在不同的资源中拥有存储帐户可让存储帐户接受来自各种虚拟机部署的数据,方便解决各种版本之间的问题。Having the storage account in a different resource enables the storage account to accept data from various virtual machine deployments making it easy to troubleshoot issues across the various versions.

备注

如果从 Visual Studio 创建 Windows 虚拟机模板,默认存储帐户可能会设置为使用将虚拟机 VHD 上传到的存储帐户。If you create a windows virtual machine template from Visual Studio, the default storage account might be set to use the same storage account where the virtual machine VHD is uploaded. 这是为了简化 VM 的初始设置。This is to simplify initial setup of the VM. 重构模板以使用可以当作参数传入的不同存储帐户。Re-factor the template to use a different storage account that can be passed in as a parameter.

诊断配置变量Diagnostics configuration variables

上述诊断扩展 json 代码段定义了一个 accountid 变量,以简化获取诊断存储的存储帐户密钥的过程:The preceding diagnostics extension json snippet defines an accountid variable to simplify getting the storage account key for the diagnostics storage:

"accountid": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/',parameters('existingdiagnosticsStorageResourceGroup'), '/providers/','Microsoft.Storage/storageAccounts/', parameters('existingdiagnosticsStorageAccountName'))]"

诊断扩展的 xmlcfg 属性使用连接在一起的多个变量定义。The xmlcfg property for the diagnostics extension is defined using multiple variables that are concatenated together. 这些变量值的格式为 xml,因此必须在设置 json 变量时正确转义。The values of these variables are in xml so they need to be escaped correctly when setting the json variables.

下面的示例介绍了诊断配置 xml,它会收集标准系统级别性能计数器,以及一些 Windows 事件日志和诊断基础结构日志。The following example describes the diagnostics configuration xml that collects standard system level performance counters along with some windows event logs and diagnostics infrastructure logs. 该配置 xml 已正确转义和格式化,因此可以直接将配置粘贴到模板的 variables 节。It has been escaped and formatted correctly so that the configuration can directly be pasted into the variables section of your template. 有关该配置 xml 的更易理解的示例,请参阅诊断配置架构See the Diagnostics Configuration Schema for a more human readable example of the configuration xml.

"wadlogs": "<WadCfg> <DiagnosticMonitorConfiguration overallQuotaInMB=\"4096\" xmlns=\"http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration\"> <DiagnosticInfrastructureLogs scheduledTransferLogLevelFilter=\"Error\"/> <WindowsEventLog scheduledTransferPeriod=\"PT1M\" > <DataSource name=\"Application!*[System[(Level = 1 or Level = 2)]]\" /> <DataSource name=\"Security!*[System[(Level = 1 or Level = 2)]]\" /> <DataSource name=\"System!*[System[(Level = 1 or Level = 2)]]\" /></WindowsEventLog>",
"wadperfcounters1": "<PerformanceCounters scheduledTransferPeriod=\"PT1M\"><PerformanceCounterConfiguration counterSpecifier=\"\\Processor(_Total)\\% Processor Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"CPU utilization\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Processor(_Total)\\% Privileged Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"CPU privileged time\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Processor(_Total)\\% User Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"CPU user time\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Processor Information(_Total)\\Processor Frequency\" sampleRate=\"PT15S\" unit=\"Count\"><annotation displayName=\"CPU frequency\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\System\\Processes\" sampleRate=\"PT15S\" unit=\"Count\"><annotation displayName=\"Processes\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Process(_Total)\\Thread Count\" sampleRate=\"PT15S\" unit=\"Count\"><annotation displayName=\"Threads\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Process(_Total)\\Handle Count\" sampleRate=\"PT15S\" unit=\"Count\"><annotation displayName=\"Handles\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Memory\\% Committed Bytes In Use\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"Memory usage\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Memory\\Available Bytes\" sampleRate=\"PT15S\" unit=\"Bytes\"><annotation displayName=\"Memory available\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Memory\\Committed Bytes\" sampleRate=\"PT15S\" unit=\"Bytes\"><annotation displayName=\"Memory committed\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\Memory\\Commit Limit\" sampleRate=\"PT15S\" unit=\"Bytes\"><annotation displayName=\"Memory commit limit\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\% Disk Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"Disk active time\" locale=\"en-us\"/></PerformanceCounterConfiguration>",
"wadperfcounters2": "<PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\% Disk Read Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"Disk active read time\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\% Disk Write Time\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"Disk active write time\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Transfers/sec\" sampleRate=\"PT15S\" unit=\"CountPerSecond\"><annotation displayName=\"Disk operations\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Reads/sec\" sampleRate=\"PT15S\" unit=\"CountPerSecond\"><annotation displayName=\"Disk read operations\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Writes/sec\" sampleRate=\"PT15S\" unit=\"CountPerSecond\"><annotation displayName=\"Disk write operations\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Bytes/sec\" sampleRate=\"PT15S\" unit=\"BytesPerSecond\"><annotation displayName=\"Disk speed\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Read Bytes/sec\" sampleRate=\"PT15S\" unit=\"BytesPerSecond\"><annotation displayName=\"Disk read speed\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\PhysicalDisk(_Total)\\Disk Write Bytes/sec\" sampleRate=\"PT15S\" unit=\"BytesPerSecond\"><annotation displayName=\"Disk write speed\" locale=\"en-us\"/></PerformanceCounterConfiguration><PerformanceCounterConfiguration counterSpecifier=\"\\LogicalDisk(_Total)\\% Free Space\" sampleRate=\"PT15S\" unit=\"Percent\"><annotation displayName=\"Disk free space (percentage)\" locale=\"en-us\"/></PerformanceCounterConfiguration></PerformanceCounters>",
"wadcfgxstart": "[concat(variables('wadlogs'), variables('wadperfcounters1'), variables('wadperfcounters2'), '<Metrics resourceId=\"')]",
"wadmetricsresourceid": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name , '/providers/', 'Microsoft.Compute/virtualMachines/')]",
"wadcfgxend": "\"><MetricAggregation scheduledTransferPeriod=\"PT1H\"/><MetricAggregation scheduledTransferPeriod=\"PT1M\"/></Metrics></DiagnosticMonitorConfiguration></WadCfg>"

上述配置中的指标定义 xml 节点是一个重要的配置元素,因为它定义如何聚合和存储之前在 PerformanceCounter 节点中的 xml 中定义的性能计数器。The Metrics definition xml node in the above configuration is an important configuration element as it defines how the performance counters defined earlier in the xml in PerformanceCounter node are aggregated and stored.

重要

这些度量值是促使 Azure 门户中生成监视图表和警报的因素。These metrics drive the monitoring charts and alerts in the Azure portal. 如果需要在 Azure 门户中查看 VM 监视数据,则必须在诊断配置中包括 Metrics 节点以及 resourceIDMetricAggregationThe Metrics node with the resourceID and MetricAggregation must be included in the diagnostics configuration for your VM if you want to see the VM monitoring data in the Azure portal.

下面示例演示了指标定义的 xml:The following example shows the xml for metrics definitions:

<Metrics resourceId="/subscriptions/subscription().subscriptionId/resourceGroups/resourceGroup().name/providers/Microsoft.Compute/virtualMachines/vmName">
    <MetricAggregation scheduledTransferPeriod="PT1H"/>
    <MetricAggregation scheduledTransferPeriod="PT1M"/>
</Metrics>

resourceID 属性唯一标识订阅中的虚拟机。The resourceID attribute uniquely identifies the virtual machine in your subscription. 请确保使用 subscription() 和 resourceGroup() 函数,使该模板基于要部署到的订阅和资源组自动更新这些值。Make sure to use the subscription() and resourceGroup() functions so that the template automatically updates those values based on the subscription and resource group you are deploying to.

如果要在一个循环中创建多个虚拟机,则必须用 copyIndex() 函数填充 resourceID 值,以便正确区分每个 VM。If you are creating multiple Virtual Machines in a loop, you have to populate the resourceID value with an copyIndex() function to correctly differentiate each individual VM. xmlCfg 值可以更新以支持此功能,如下所示:The xmlCfg value can be updated to support this as follows:

"xmlCfg": "[base64(concat(variables('wadcfgxstart'), variables('wadmetricsresourceid'), concat(parameters('vmNamePrefix'), copyindex()), variables('wadcfgxend')))]", 

MetricAggregation 值 PT1MPT1H 分别表示一分钟的聚合和一小时的聚合。The MetricAggregation value of PT1M and PT1H signify an aggregation over a minute and an aggregation over an hour, respectively.

存储中的 WADMetrics 表WADMetrics tables in storage

上述指标配置会在诊断存储帐户中生成具有以下命名约定的表:The Metrics configuration above generates tables in your diagnostics storage account with the following naming conventions:

  • WADMetrics:所有 WADMetrics 表的标准前缀WADMetrics: Standard prefix for all WADMetrics tables
  • PT1HPT1M:表示表中包含 1 小时或 1 分钟内的聚合数据PT1H or PT1M: Signifies that the table contains aggregate data over 1 hour or 1 minute
  • P10D:表示表中包含它开始收集数据起 10 天内的数据P10D: Signifies the table will contain data for 10 days from when the table started collecting data
  • V2S:字符串常数V2S: String constant
  • yyyymmdd:表开始收集数据的日期yyyymmdd: The date at which the table started collecting data

示例:WADMetricsPT1HP10DV2S20151108 包含从 2015 年 11 月 11 日开始 10 天内长达一小时的聚合指标数据Example: WADMetricsPT1HP10DV2S20151108 contains metrics data aggregated over an hour for 10 days starting on 11-Nov-2015

每个 WADMetrics 表都包含以下列:Each WADMetrics table contains the following columns:

  • PartitionKey:分区键基于 resourceID 值构造,用于唯一地标识 VM 资源。PartitionKey: The partition key is constructed based on the resourceID value to uniquely identify the VM resource. 例如: 002Fsubscriptions:<subscriptionID>:002FresourceGroups:002F<ResourceGroupName>:002Fproviders:002FMicrosoft:002ECompute:002FvirtualMachines:002F<vmName>For example: 002Fsubscriptions:<subscriptionID>:002FresourceGroups:002F<ResourceGroupName>:002Fproviders:002FMicrosoft:002ECompute:002FvirtualMachines:002F<vmName>
  • RowKey:采用 <Descending time tick>:<Performance Counter Name> 格式。RowKey: Follows the format <Descending time tick>:<Performance Counter Name>. 递减时间刻度计算公式为最大时间刻度减去聚合期的开始时间。The descending time tick calculation is max time ticks minus the time of the beginning of the aggregation period. 例如,如果取样期间从 2015 年 11 月 10 日 00:00 (UTC) 开始,则计算公式为 DateTime.MaxValue.Ticks - (new DateTime(2015,11,10,0,0,0,DateTimeKind.Utc).Ticks)For example if the sample period started on 10-Nov-2015 and 00:00Hrs UTC then the calculation would be: DateTime.MaxValue.Ticks - (new DateTime(2015,11,10,0,0,0,DateTimeKind.Utc).Ticks). 对于内存可用字节性能计数器,行键如下所示:2519551871999999999__:005CMemory:005CAvailable:0020BytesFor the memory available bytes performance counter the row key will look like: 2519551871999999999__:005CMemory:005CAvailable:0020Bytes
  • CounterName:性能计数器的名称。CounterName: Is the name of the performance counter. 它与 xml 配置中定义的 counterSpecifier 相匹配。This matches the counterSpecifier defined in the xml config.
  • Maximum:聚合期间性能计数器的最大值。Maximum: The maximum value of the performance counter over the aggregation period.
  • 最低:聚合期间性能计数器的最小值。Minimum: The minimum value of the performance counter over the aggregation period.
  • Total:聚合期间报告的性能计数器的所有值的总和。Total: The sum of all values of the performance counter reported over the aggregation period.
  • 计数:针对性能计数器报告的值总数。Count: The total number of values reported for the performance counter.
  • Average:聚合期间性能计数器的平均(总计/计数)值。Average: The average (total/count) value of the performance counter over the aggregation period.

后续步骤Next Steps