世纪互联 Azure 中的警报概述Overview of alerts in 21Vianet Azure

本文介绍什么是警报及其优点,以及如何开始使用警报。This article describes what alerts are, their benefits, and how to get started using them.

什么是世纪互联 Azure 中的警报?What are alerts in 21Vianet Azure?

在监视数据中发现重要情况时,警报会以主动的方式通知你。Alerts proactively notify you when important conditions are found in your monitoring data. 有了警报,你就可以在系统的用户注意到问题之前确定和解决这些问题。They allow you to identify and address issues before the users of your system notice them.

本文讨论 Azure Monitor 中的统一警报体验,其现在包括由 Log Analytics 和 Application Insights 管理的警报。This article discusses the unified alert experience in Azure Monitor, which now includes alerts that were managed by Log Analytics and Application Insights. 以前的警报体验和警报类型称为“经典警报”。The previous alert experience and alert types are called classic alerts. 单击警报页顶部的“查看经典警报”即可查看这个旧的体验和旧的警报类型。You can view this older experience and older alert type by clicking on View classic alerts at the top of the alert page.

概述Overview

下图表示了警报流。The diagram below represents the flow of alerts.

警报流示意图

警报规则独立于警报,也独立于警报触发时采取的操作。Alert rules are separated from alerts and the actions taken when an alert fires. 警报规则捕获警报的目标和条件。The alert rule captures the target and criteria for alerting. 警报规则可以是“已启用”或“已禁用”状态。The alert rule can be in an enabled or a disabled state. 警报只有在启用后才会触发。Alerts only fire when enabled.

警报规则的关键属性包括:The key attributes of an alert rule are:

目标资源 - 定义适用于警报的范围和信号。Target Resource - Defines the scope and signals available for alerting. 目标可以是任何 Azure 资源。A target can be any Azure resource. 示例目标:虚拟机、存储帐户、虚拟机规模集、Log Analytics 工作区或 Application Insights 资源。Example targets: a virtual machine, a storage account, a virtual machine scale set, a Log Analytics workspace, or an Application Insights resource. 对于某些资源(例如虚拟机)来说,可以将多个资源指定为警报规则的目标。For certain resources (like Virtual Machines), you can specify multiple resources as the target of the alert rule.

信号 - 信号由目标资源发出,可以有多种类型。Signal - Signals are emitted by the target resource and can be of several types. 指标、活动日志、Application Insights 和日志。Metric, Activity log, Application Insights, and Log.

条件 - 条件是应用于目标资源的信号和逻辑的组合。Criteria - Criteria is combination of Signal and Logic applied on a Target resource. 示例:Examples:

  • CPU 百分比 > 70%Percentage CPU > 70%
  • 服务器响应时间 > 4 毫秒Server Response Time > 4 ms
  • 日志查询的结果计数 > 100Result count of a log query > 100

警报名称 - 用户配置的警报规则的具体名称Alert Name – A specific name for the alert rule configured by the user

警报说明 - 用户配置的警报规则的说明Alert Description – A description for the alert rule configured by the user

严重性:警报规则中指定的条件符合后确定的警报严重性。Severity: The severity of the alert after the criteria specified in the alert rule is met. 严重性的范围为 0 到 4。Severity can range from 0 to 4.

  • 严重性为 0 = 严重Sev 0 = Critical
  • 严重性为 1 = 错误Sev 1 = Error
  • 严重性为 2 = 警告Sev 2 = Warning
  • 严重性为 3 = 信息Sev 3 = Informational
  • 严重性为 4 = 详细Sev 4 = Verbose

操作 - 触发警报时执行的特定操作。Action - A specific action taken when the alert is fired. 有关详细信息,请参阅操作组For more information, see Action Groups.

可以报警的内容What you can alert on

可以按照监视数据源中的说明,针对指标和日志发出警报。You can alert on metrics and logs, as described in monitoring data sources. 这些检查包括但不限于:These include but are not limited to:

  • 指标值Metric values
  • 日志搜索查询Log search queries
  • 活动日志事件Activity log events
  • 基础 Azure 平台的运行状况Health of the underlying Azure platform
  • 网站可用性测试Tests for website availability

以前,Azure Monitor 指标、Application Insights、Log Analytics 和服务运行状况都有单独的警报功能。Previously, Azure Monitor metrics, Application Insights, Log Analytics, and Service Health had separate alerting capabilities. 随着时间推移,Azure 改进并组合了用户界面和不同的警报方法。Over time, Azure improved and combined both the user interface and different methods of alerting. 此整合仍在进行中。This consolidation is still in process. 因此,仍有一些警报功能未出现在新的警报系统中。As a result, there are still some alerting capabilities not yet in the new alerts system.

监视器源Monitor source 信号类型Signal type 说明Description
服务运行状况Service health 活动日志Activity log 不支持。Not supported. 请参阅创建有关服务通知的活动日志警报See Create activity log alerts on service notifications.
Application InsightsApplication Insights Web 可用性测试Web availability tests 不支持。Not supported. 请参阅 Web 测试警报See Web test alerts. 适用于任何经检测可将数据发送到 Application Insights 的网站。Available to any website that's instrumented to send data to Application Insights. 网站的可用性或响应度低于预期时,就会收到通知。Receive a notification when availability or responsiveness of a website is below expectations.

管理警报Manage alerts

可以设置警报状态来指定它在解决过程中所处的阶段。You can set the state of an alert to specify where it is in the resolution process. 符合警报规则中指定的条件以后,就会创建或触发警报,其状态为“新”。 When the criteria specified in the alert rule is met, an alert is created or fired, it has a status of New. 可以在确认警报和关闭警报时更改状态。You can change the status when you acknowledge an alert and when you close it. 所有状态更改都存储在警报历史记录中。All state changes are stored in the history of the alert.

支持以下警报状态。The following alert states are supported.

状态State 说明Description
新建New 只是检测到了问题,但尚未审查问题。The issue has just been detected and has not yet been reviewed.
已确认Acknowledged 管理员已审查警报,并已开始进行处理。An administrator has reviewed the alert and started working on it.
已关闭Closed 问题已解决。The issue has been resolved. 关闭某个警报后,可通过将其更改为另一种状态来重新打开它。After an alert has been closed, you can reopen it by changing it to another state.

警报状态不同于且独立于监视条件Alert state is different and independent of the monitor condition. 警报状态是由用户设置的。Alert state is set by the user. 监视条件是由系统设置的。Monitor condition is set by the system. 当警报触发后,警报的监视条件设置为“已触发”。When an alert fires, the alert's monitor condition is set to fired. 当导致警报触发的基础条件解除后,监视条件会设置为“已解决”。When the underlying condition that caused the alert to fire clears, the monitor condition is set to resolved. 在用户更改警报状态之前,警报状态不会改变。The alert state isn't changed until the user changes it. 了解如何更改警报和智能组的状态Learn how to change the state of your alerts and smart groups.

智能组Smart groups

智能组为预览版。Smart Groups are in preview.

详细了解智能组如何管理智能组Learn more about Smart Groups and how to manage your smart groups.

警报体验Alerts experience

默认的“警报”页提供特定时间范围内创建的警报的摘要。The default Alerts page provides a summary of alerts that are created within a particular time window. 该页显示每种严重性的警报总数,列中会标识处于每种状态的、具有每种严重性的警报总数。It displays the total alerts for each severity with columns that identify the total number of alerts in each state for each severity. 选择任一严重性可打开按该严重性筛选的“所有警报”页。Select any of the severities to open the All Alerts page filtered by that severity.

或者,可以使用 REST API 以编程方式枚举在订阅上生成的警报实例Alternatively, you can programmatically enumerate the alert instances generated on your subscriptions by using REST APIs.

Note

只能访问过去 30 天内生成的警报。You can only access alerts generated in the last 30 days.

它不显示或跟踪经典警报。It doesn't show or track classic alerts. 可以通过更改订阅或筛选器参数来更新页面。You can change the subscriptions or filter parameters to update the page.

“警报”页

可以通过选择页面顶部的下拉菜单中的值,来对此视图进行筛选。You can filter this view by selecting values in the dropdown menus at the top of the page.

Column 说明Description
订阅Subscription 最多可选择五个 Azure 订阅。Select up to five Azure subscriptions. 只有选定订阅中的警报才会包含在视图中。Only alerts in the selected subscriptions are included in the view.
资源组Resource group 选择单个资源组。Select a single resource group. 只有包含选定资源组中的目标的警报才会包含在视图中。Only alerts with targets in the selected resource group are included in the view.
时间范围Time range 只有在选定时间范围内触发的警报才会包含在该视图中。Only alerts fired within the selected time window are included in the view. 支持的值为过去 1 小时、过去 24 小时、过去 7 天和过去 30 天。Supported values are the past hour, the past 24 hours, the past 7 days, and the past 30 days.

选择“警报”页面顶部的以下值打开另一个页面。Select the following values at the top of the Alerts page to open another page.

ValueValue 说明Description
警报总数Total alerts 符合选定条件的警报总数。The total number of alerts that match the selected criteria. 选择此值会打开未经筛选的“所有警报”视图。Select this value to open the All Alerts view with no filter.
智能组Smart groups 从符合选定条件的警报创建的智能组总数。The total number of smart groups that were created from the alerts that match the selected criteria. 选择此值会在“所有警报”视图中打开智能组列表。Select this value to open the smart groups list in the All Alerts view.
警报规则总数Total alert rules 选定订阅和资源组中的警报规则总数。The total number of alert rules in the selected subscription and resource group. 选择此值会打开根据选定订阅和资源组筛选的“规则”视图。Select this value to open the Rules view filtered on the selected subscription and resource group.

管理警报规则Manage alert rules

若要显示“规则”页,请选择“管理警报规则”。 To show the Rules page, select Manage alert rules. “规则”页是用于跨 Azure 订阅管理所有警报规则的一个地方。The Rules page is a single place for managing all alert rules across your Azure subscriptions. 此页列出所有警报规则,这些规则可以根据目标资源、资源组、规则名称或状态排序。It lists all alert rules and can be sorted based on target resources, resource groups, rule name, or status. 还可以在此页中编辑、启用或禁用警报规则。You can also edit, enable, or disable alert rules from this page.

“规则”页的屏幕截图

创建警报规则Create an alert rule

可以通过一致的方式创作警报,而不考虑监视服务或信号类型。Alerts can be authored in a consistent manner regardless of the monitoring service or signal type. 单个页面中提供了所有触发的警报和相关详细信息。All fired alerts and related details are available in single page.

下面介绍如何创建新警报规则:Here's how to create a new alert rule:

  1. 选取警报的目标。 Pick the target for the alert.
  2. 从目标的可用信号中选择信号。 Select the signal from the available signals for the target.
  3. 指定要应用到信号中数据的逻辑。 Specify the logic to be applied to data from the signal.

这个创作过程经过了简化,用户在选择 Azure 资源之前,不再需要知道受支持的监视源或信号。This simplified authoring process no longer requires you to know the monitoring source or signals that are supported before selecting an Azure resource. 可用信号列表会根据你选择的目标资源自动筛选。The list of available signals is automatically filtered based on the target resource that you select. 另外,还将根据该目标引导你自动定义警报规则的逻辑。Also based on that target, you are guided through defining the logic of the alert rule automatically.

可以在使用 Azure Monitor 创建、查看和管理警报中详细了解如何创建警报规则。You can learn more about how to create alert rules in Create, view, and manage alerts using Azure Monitor.

警报可在多个 Azure 监视服务中使用。Alerts are available across several Azure monitoring services. 有关如何以及何时使用其中每种服务的信息,请参阅监视 Azure 应用程序和资源For information about how and when to use each of these services, see Monitoring Azure applications and resources.

“所有警报”页All Alerts page

若要查看“所有警报”页,请选择“警报总数”。 To see the All Alerts page, select Total Alerts. 在这里,可以查看在选定时间内创建的警报列表。Here you can view a list of alerts created within the selected time. 可以查看各个警报的列表,或包含这些警报的智能组列表。You can view either a list of the individual alerts or a list of the smart groups that contain the alerts. 选择页面顶部的标题可在视图之间进行切换。Select the banner at the top of the page to toggle between views.

“所有警报”页的屏幕截图

可以通过选择页面顶部的下拉菜单中的以下值,可以对该视图进行筛选。You can filter the view by selecting the following values in the dropdown menus at the top of the page.

Column 说明Description
订阅Subscription 最多可选择五个 Azure 订阅。Select up to five Azure subscriptions. 只有选定订阅中的警报才会包含在视图中。Only alerts in the selected subscriptions are included in the view.
资源组Resource group 选择单个资源组。Select a single resource group. 只有包含选定资源组中的目标的警报才会包含在视图中。Only alerts with targets in the selected resource group are included in the view.
资源类型Resource type 选择一个或多个资源类型。Select one or more resource types. 只有包含选定类型中的目标的警报才会包含在视图中。Only alerts with targets of the selected type are included in the view. 仅在指定资源组后,才显示此列。This column is only available after a resource group has been specified.
ResourceResource 选择资源。Select a resource. 只有包含该资源(作为目标)的警报才会包含在视图中。Only alerts with that resource as a target are included in the view. 仅在指定资源类型后,才显示此列。This column is only available after a resource type has been specified.
严重性Severity 选择警报严重性,或选择“所有”以包含所有严重性的警报。Select an alert severity, or select All to include alerts of all severities.
监视条件Monitor condition 选择监视条件,或选择“所有”以包括所有条件的警报。Select a monitor condition, or select All to include alerts of conditions.
警报状态Alert state 选择警报状态,或选择“所有”以包含所有状态的警报。Select an alert state, or select All to include alerts of states.
监视服务Monitor service 选择一个服务,或选择“所有”以包含所有服务。Select a service, or select All to include all services. 只会包含使用该服务(作为目标)的规则创建的警报。Only alerts created by rules that use service as a target are included.
时间范围Time range 只有在选定时间范围内触发的警报才会包含在该视图中。Only alerts fired within the selected time window are included in the view. 支持的值为过去 1 小时、过去 24 小时、过去 7 天和过去 30 天。Supported values are the past hour, the past 24 hours, the past 7 days, and the past 30 days.

选择页面顶部的“列”即可选择要显示的列 。Select Columns at the top of the page to select which columns to show.

“警报详细信息”页Alert details page

当你选择某个警报时,此页会提供该警报的详细信息,并允许你更改其状态。When you select an alert, this page provides details of the alert and enables you to change its state.

“警报详细信息”页的屏幕截图

“警报详细信息”页包括以下部分:The Alert details page includes the following sections:

部分Section 说明Description
摘要Summary 显示警报的属性和其他重要信息。Displays the properties and other significant information about the alert.
历史记录History 列出警报执行的每个操作,以及对警报进行的任何更改。Lists each action taken by the alert and any changes made to the alert. 目前仅限状态更改。Currently limited to state changes.
诊断Diagnostics 有关包含警报的智能组的信息。Information about the smart group in which the alert is included. “警报计数”表示包含在智能组中的警报数量 。The alert count refers to the number of alerts that are included in the smart group. 包括同一智能组中在过去 30 天内创建的其他警报,无论警报列表页面中的时间筛选器是什么。Includes other alerts in the same smart group that were created in the past 30 days, regardless of the time filter in the alerts list page. 选择某个警报以查看其详细信息。Select an alert to view its detail.

警报实例的基于角色的访问控制 (RBAC)Role-based access control (RBAC) for your alert instances

使用和管理警报实例需要用户具有监视参与者监视读取者的内置 RBAC 角色。The consumption and management of alert instances requires the user to have the built-in RBAC roles of either monitoring contributor or monitoring reader. 在任何 Azure 资源管理器范围(从订阅级别到资源级别的粒度分配)内都支持这些角色。These roles are supported at any Azure Resource Manager scope, from the subscription level to granular assignments at a resource level. 例如,如果用户只具有虚拟机 ContosoVM1 的“监视参与者”访问权限,则该用户只能使用和管理 ContosoVM1 上生成的警报。For example, if a user only has monitoring contributor access for virtual machine ContosoVM1, that user can consume and manage only alerts generated on ContosoVM1.

以编程方式管理警报实例Manage your alert instances programmatically

你可能希望以编程方式查询针对订阅生成的警报。You might want to query programmatically for alerts generated against your subscription. 这可以是在 Azure 门户之外创建自定义视图,也可以是分析警报以确定模式和趋势。This might be to create custom views outside of the Azure portal, or to analyze your alerts to identify patterns and trends.

可以使用警报管理 REST API用于警报的 Azure Resource Graph REST API 查询针对订阅生成的警报。You can query for alerts generated against your subscriptions either by using the Alert Management REST API or by using the Azure Resource Graph REST API for Alerts.

用于警报的 Azure Resource Graph REST API 允许你大规模地查询警报实例。The Azure Resource Graph REST API for Alerts allows you to query for alert instances at scale. 如果必须管理跨多个订阅生成的警报,建议使用此 API。This is recommended when you have to manage alerts generated across many subscriptions.

以下对 API 的示例请求返回一个订阅中的警报计数:The following sample request to the API returns the count of alerts within one subscription:

{
  "subscriptions": [
    <subscriptionId>
  ],
  "query": "where type =~ 'Microsoft.AlertsManagement/alerts' | summarize count()",
  "options": {
            "dataset":"alerts"
  }
}

可以查询警报的基本字段。You can query the alerts for their essential fields.

可以使用警报管理 REST API 获取有关特定警报的详细信息,包括其警报上下文字段。Use the Alert Management REST API to get more information about specific alerts, including their alert context fields.

后续步骤Next steps