使用资源管理器模板创建资源运行状况警报Configure resource health alerts using Resource Manager templates

本文介绍如何使用 Azure 资源管理器模板和 Azure PowerShell 以编程方式创建资源运行状况活动日志警报。This article will show you how to create Resource Health Activity Log Alerts programmatically using Azure Resource Manager templates and Azure PowerShell.

通过 Azure 资源运行状况可得知 Azure 资源的当前及历史运行状况。Azure Resource Health keeps you informed about the current and historical health status of your Azure resources. Azure 资源运行状况警报会在这些资源的运行状况发生变化时几乎实时地发出通知。Azure Resource Health alerts can notify you in near real-time when these resources have a change in their health status. 通过以编程方式创建资源运行状况警报,用户可以批量创建警报并对其进行自定义。Creating Resource Health alerts programmatically allow for users to create and customize alerts in bulk.

备注

资源运行状况警报目前为预览版。Resource Health alerts currently are in preview.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

先决条件Prerequisites

若要按本页中的说明操作,需事先进行几项设置:To follow the instructions on this page, you'll need to set up a few things in advance:

  1. 需安装 Azure PowerShell 模块You need to install the Azure PowerShell module
  2. 创建或重新使用配置为向你发出通知的操作组You need to create or reuse an Action Group configured to notify you

说明Instructions

  1. 使用 PowerShell,使用你的帐户登录到 Azure,并选择需与之交互的订阅Using PowerShell, log in to Azure using your account, and select the subscription you want to interact with

    Connect-AzAccount -Environment AzureChinaCloud
    Select-AzSubscription -Subscription <subscriptionId>
    

    可以使用 Get-AzSubscription 列出有权访问的订阅。You can use Get-AzSubscription to list the subscriptions you have access to.

  2. 查找并保存操作组的完整 Azure 资源管理器 IDFind and save the full Azure Resource Manager ID for your Action Group

    (Get-AzActionGroup -ResourceGroupName <resourceGroup> -Name <actionGroup>).Id
    
  3. 创建资源运行状况警报的资源管理器模板,并保存为 resourcehealthalert.json请参阅下面的详细信息Create and save a Resource Manager template for Resource Health alerts as resourcehealthalert.json (see details below)

  4. 使用该模板创建一个新的 Azure 资源管理器部署Create a new Azure Resource Manager deployment using this template

    New-AzResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName <resourceGroup> -TemplateFile <path\to\resourcehealthalert.json>
    
  5. 系统将提示你键入之前复制的警报名称和操作组资源 ID:You'll be prompted to type in the Alert Name and Action Group Resource ID you copied earlier:

    Supply values for the following parameters:
    (Type !? for Help.)
    activityLogAlertName: <Alert Name>
    actionGroupResourceId: /subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/microsoft.insights/actionGroups/<actionGroup>
    
  6. 如果一切正常,PowerShell 中将显示一条确认消息If everything worked successfully, you'll get a confirmation in PowerShell

    DeploymentName          : ExampleDeployment
    ResourceGroupName       : <resourceGroup>
    ProvisioningState       : Succeeded
    Timestamp               : 11/8/2017 2:32:00 AM
    Mode                    : Incremental
    TemplateLink            :
    Parameters              :
                            Name                     Type       Value
                            ===============          =========  ==========
                            activityLogAlertName     String     <Alert Name>
                            activityLogAlertEnabled  Bool       True
                            actionGroupResourceId    String     /...
    
    Outputs                 :
    DeploymentDebugLogLevel :
    

请注意,如果你打算使该进程完全自动化,只需在第 5 步中将资源管理器模板编辑为不提示值。Note that if you are planning on fully automating this process, you simply need to edit the Resource Manager template to not prompt for the values in Step 5.

资源运行状况警报的资源管理器模板选项Resource Manager template options for Resource Health alerts

可将此基本模板用作创建资源运行状况警报的起点。You can use this base template as a starting point for creating Resource Health alerts. 此模板按创建时指定的方式运行,将你注册为接收订阅中所有资源所有新激活的资源运行状况事件的警报。This template will work as written, and will sign you up to receive alerts for all newly activated resource health events across all resources in a subscription.

在本文末,我们还附有一个更为复杂的警报模板,与本模板相比,该模板增强了资源运行状况的噪音比信号。At the bottom of this article we have also included a more complex alert template which should increase the signal to noise ratio for Resource Health alerts as compared to this template.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "activityLogAlertName": {
      "type": "string",
      "metadata": {
        "description": "Unique name (within the Resource Group) for the Activity log alert."
      }
    },
    "actionGroupResourceId": {
      "type": "string",
      "metadata": {
        "description": "Resource Id for the Action group."
      }
    }
  },
  "resources": [   
    {
      "type": "Microsoft.Insights/activityLogAlerts",
      "apiVersion": "2017-04-01",
      "name": "[parameters('activityLogAlertName')]",      
      "location": "Global",
      "properties": {
        "enabled": true,
        "scopes": [
            "[subscription().id]"
        ],        
        "condition": {
          "allOf": [
            {
              "field": "category",
              "equals": "ResourceHealth"
            },
            {
              "field": "status",
              "equals": "Active"
            }
          ]
        },
        "actions": {
          "actionGroups":
          [
            {
              "actionGroupId": "[parameters('actionGroupResourceId')]"
            }
          ]
        }
      }
    }
  ]
}

但,不建议使用和本警报类似的广泛警报。However, a broad alert like this one is generally not recommended. 了解如何将本警报的范围精细调整为关注我们关心的以下事件。Learn how we can scope down this alert to focus on the events we care about below.

调整警报范围Adjusting the alert scope

可将资源运行状况警报配置为监视三种不同的范围的事件:Resource Health alerts can be configured to monitor events at three different scopes:

  • 订阅级别Subscription Level
  • 资源组级别Resource Group Level
  • 资源级别Resource Level

警报模板是在订阅级别配置的,但如果你需要将警报配置为仅通知关于某些资源或某个资源组内资源的运行状况,只需修改上述模板的 scopes 部分。The alert template is configured at the subscription level, but if you would like to configure your alert to only notify you about certain resources, or resources within a certain resource group, you simply need to modify the scopes section in the above template.

对于资源组级别的范围而言,范围部分应如下所示:For a resource group level scope, the scopes section should look like:

"scopes": [
    "/subscriptions/<subscription id>/resourcegroups/<resource group>"
],

对于资源级别的范围而言,范围部分应如下所示:And for a resource level scope, the scope section should look like:

"scopes": [
    "/subscriptions/<subscription id>/resourcegroups/<resource group>/providers/<resource>"
],

例如: "/subscriptions/d37urb3e-ed41-4670-9c19-02a1d2808ff9/resourcegroups/myRG/providers/microsoft.compute/virtualmachines/myVm"For example: "/subscriptions/d37urb3e-ed41-4670-9c19-02a1d2808ff9/resourcegroups/myRG/providers/microsoft.compute/virtualmachines/myVm"

查看 Azure 资源以获取此字符串时,你可以转到 Azure 门户并查看 URL。You can go to the Azure Portal and look at the URL when viewing your Azure resource to get this string.

调整向你发出警报的资源类型Adjusting the resource types which alert you

订阅级别或资源组级别的警报可能具有不同类型的资源。Alerts at the subscription or resource group level may have different kinds of resources. 如果要将警报限制为仅来自某个资源组子集,可在模板的 condition 部分进行定义,如下所示:If you want to limit alerts to only come from a certain subset of resource types, you can define that in the condition section of the template like so:

"condition": {
    "allOf": [
        ...,
        {
            "anyOf": [
                {
                    "field": "resourceType",
                    "equals": "MICROSOFT.COMPUTE/VIRTUALMACHINES",
                    "containsAny": null
                },
                {
                    "field": "resourceType",
                    "equals": "MICROSOFT.STORAGE/STORAGEACCOUNTS",
                    "containsAny": null
                },
                ...
            ]
        }
    ]
},

其中,我们使用 anyOf 包装器使资源运行状况警报符合指定的任何条件,从而实现以特定资源类型为目标的警报。Here we use the anyOf wrapper to allow the resource health alert to match any of the conditions we specify, allowing for alerts that target specific resource types.

调整向你发出警报的资源运行状况事件Adjusting the Resource Health events that alert you

在资源经历运行状况事件时,它们可经过几个代表运行状况事件状态的阶段:ActiveIn ProgressUpdatedResolvedWhen resources undergo a health event, they can go through a series of stages that represents the state of the health event: Active, In Progress, Updated, and Resolved.

你可能希望在资源运行状况不正常时获得通知,在这种情况下需将警报配置为仅在 statusActive 时发出通知。You may only want to be notified when a resource becomes unhealthy, in which case you want to configure your alert to only notify when the status is Active. 而如果希望在其他阶段也得到通知,可以像下面的示例那样添加相关详细信息:However if you want to also be notified on the other stages, you can add those details like so:

"condition": {
    "allOf": [
        ...,
        {
            "anyOf": [
                {
                    "field": "status",
                    "equals": "Active"
                },
                {
                    "field": "status",
                    "equals": "In Progress"
                },
                {
                    "field": "status",
                    "equals": "Resolved"
                },
                {
                    "field": "status",
                    "equals": "Updated"
                }
            ]
        }
    ]
}

如果希望在运行状况的全部四个阶段都获得通知,可以将这一条件全部删除,这样不管 status 属性是什么,警报都会向你发出通知。If you want to be notified for all four stages of health events, you can remove this condition all together, and the alert will notify you irrespective of the status property.

备注

每个“anyOf”部分应只包含一个字段类型值。Each "anyOf" section should contain just one field type values.

将资源运行状况警报调整为避免“Unknown”事件Adjusting the Resource Health alerts to avoid "Unknown" events

Azure 资源运行状况可通过使用测试运行器持续监控资源,向你报告资源的最新运行状况。Azure Resource Health can report to you the latest health of your resources by constantly monitoring them using test runners. 其报告的相关运行状况为:“Available”、“Unavailable”和“Degraded”。The relevant reported health statuses are: "Available", "Unavailable", and "Degraded". 但是,在运行器和 Azure 资源无法通信的情况下,会将资源的运行状况报告为被视为“Active”运行状况的“Unknown”。However, in situations where the runner and the Azure resource are unable to communicate, an "Unknown" health status is reported for the resource, and that is considered an "Active" health event.

但是,如果资源报告为“Unknown”,很有可能自上一次准确报告以来,其运行状况未发生变化。However, when a resource reports "Unknown", it's likely that its health status has not changed since the last accurate report. 如果想取消关于“Unknown”事件的警报,可以在模板中指明该逻辑:If you would like to eliminate alerts on "Unknown" events, you can specify that logic in the template:

"condition": {
    "allOf": [
        ...,
        {
            "anyOf": [
                {
                    "field": "properties.currentHealthStatus",
                    "equals": "Available",
                    "containsAny": null
                },
                {
                    "field": "properties.currentHealthStatus",
                    "equals": "Unavailable",
                    "containsAny": null
                },
                {
                    "field": "properties.currentHealthStatus",
                    "equals": "Degraded",
                    "containsAny": null
                }
            ]
        },
        {
            "anyOf": [
                {
                    "field": "properties.previousHealthStatus",
                    "equals": "Available",
                    "containsAny": null
                },
                {
                    "field": "properties.previousHealthStatus",
                    "equals": "Unavailable",
                    "containsAny": null
                },
                {
                    "field": "properties.previousHealthStatus",
                    "equals": "Degraded",
                    "containsAny": null
                }
            ]
        },
    ]
},

在此示例中,我们仅对当前和以前的运行状况不是“Unknown”的事件发出通知。In this example, we are only notifying on events where the current and previous health status does not have "Unknown". 如果你的警报被直接发送到移动电话货电子邮件,这一变化可能比较有用。This change may be a useful addition if your alerts are sent directly to your mobile phone or email.

请注意,在某些事件中,currentHealthStatus 和 previousHealthStatus 属性可能为 null。Note that it is possible for the currentHealthStatus and previousHealthStatus properties to be null in some events. 例如,发生更新事件时,资源的运行状况状态可能自上次报告以来并未变化,只有该额外的事件信息(例如,原因)可用。For example, when an Updated event occurs it's likely that the health status of the resource has not changed since the last report, only that additional event information is available (e.g. cause). 因此,使用上述原因可能导致某些警报无法触发,因为 properties.currentHealthStatus 和 properties.previousHealthStatus 的值将会设置为 null。Therefore, using the clause above may result in some alerts not being triggered, because the properties.currentHealthStatus and properties.previousHealthStatus values will be set to null.

将警报调整为避免通知用户发起的事件Adjusting the alert to avoid User Initiated events

资源运行状况事件可以由平台发起的事件和用户发起的事件触发。Resource Health events can be triggered by platform initiated and user initiated events. 它的意义在于仅在运行状况事件由 Azure 平台引发时发送通知。It may make sense to only send a notification when the health event is caused by the Azure platform.

很容易地将警报配置为仅筛选以下事件类型:It's easy to configure your alert to filter for only these kinds of events:

"condition": {
    "allOf": [
        ...,
        {
            "field": "properties.cause",
            "equals": "PlatformInitiated",
            "containsAny": null
        }
    ]
}

请注意,在某些事件中,原因字段可能为 null。Note that it is possible for the cause field to be null in some events. 也就是说,发生了运行状况转换(例如,从可用变为不可用),因此系统会立即记录事件,防止通知延迟。That is, a health transition takes place (e.g. available to unavailable) and the event is logged immediately to prevent notification delays. 因此,使用上述子句可能导致某个警报无法触发,因为 properties.clause 属性值将会设置为 null。Therefore, using the clause above may result in an alert not being triggered, because the properties.clause property value will be set to null.

完成资源运行状况警报模板Complete Resource Health alert template

下面是一个示例,该示例使用上一部分所述的不同调整方式进行了配置,可以最大限度增强信噪比。Using the different adjustments described in the previous section, here is a sample template that is configured to maximize the signal to noise ratio. 请牢记上述注意事项,即,在某些事件中,currentHealthStatus、previousHealthStatus 和 cause 属性值可能为 null。Bear in mind the caveats noted above where the currentHealthStatus, previousHealthStatus, and cause property values may be null in some events.

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "activityLogAlertName": {
            "type": "string",
            "metadata": {
                "description": "Unique name (within the Resource Group) for the Activity log alert."
            }
        },
        "actionGroupResourceId": {
            "type": "string",
            "metadata": {
                "description": "Resource Id for the Action group."
            }
        }
    },
    "resources": [
        {
            "type": "Microsoft.Insights/activityLogAlerts",
            "apiVersion": "2017-04-01",
            "name": "[parameters('activityLogAlertName')]",
            "location": "Global",
            "properties": {
                "enabled": true,
                "scopes": [
                    "[subscription().id]"
                ],
                "condition": {
                    "allOf": [
                        {
                            "field": "category",
                            "equals": "ResourceHealth",
                            "containsAny": null
                        },
                        {
                            "anyOf": [
                                {
                                    "field": "properties.currentHealthStatus",
                                    "equals": "Available",
                                    "containsAny": null
                                },
                                {
                                    "field": "properties.currentHealthStatus",
                                    "equals": "Unavailable",
                                    "containsAny": null
                                },
                                {
                                    "field": "properties.currentHealthStatus",
                                    "equals": "Degraded",
                                    "containsAny": null
                                }
                            ]
                        },
                        {
                            "anyOf": [
                                {
                                    "field": "properties.previousHealthStatus",
                                    "equals": "Available",
                                    "containsAny": null
                                },
                                {
                                    "field": "properties.previousHealthStatus",
                                    "equals": "Unavailable",
                                    "containsAny": null
                                },
                                {
                                    "field": "properties.previousHealthStatus",
                                    "equals": "Degraded",
                                    "containsAny": null
                                }
                            ]
                        },
                        {
                            "anyOf": [
                                {
                                    "field": "properties.cause",
                                    "equals": "PlatformInitiated",
                                    "containsAny": null
                                }
                            ]
                        },
                        {
                            "anyOf": [
                                {
                                    "field": "status",
                                    "equals": "Active",
                                    "containsAny": null
                                },
                                {
                                    "field": "status",
                                    "equals": "Resolved",
                                    "containsAny": null
                                },
                                {
                                    "field": "status",
                                    "equals": "In Progress",
                                    "containsAny": null
                                },
                                {
                                    "field": "status",
                                    "equals": "Updated",
                                    "containsAny": null
                                }
                            ]
                        }
                    ]
                },
                "actions": {
                    "actionGroups": [
                        {
                            "actionGroupId": "[parameters('actionGroupResourceId')]"
                        }
                    ]
                }
            }
        }
    ]
}

不过,什么样的配置最有效只有自己知道,请使用本文所述工具进行自定义吧。However, you'll know best what configurations are effective for you, so use the tools taught to you in this documentation to make your own customization.

后续步骤Next steps

了解有关资源运行状况的详细信息:Learn more about Resource Health:

创建服务运行状况警报:Create Service Health Alerts: