借助 Azure Monitor 警报对事件做出响应Respond to events with Azure Monitor Alerts

Azure Monitor 中的警报可以标识 Log Analytics 存储库中的重要信息。Alerts in Azure Monitor can identify important information in your Log Analytics repository. 它们由定期自动运行日志搜索的警报规则创建;如果日志搜索的结果符合特定条件,则创建一条警报记录,并可将此记录配置为执行自动响应。They are created by alert rules that automatically run log searches at regular intervals, and if results of the log search match particular criteria, then an alert record is created and it can be configured to perform an automated response. 本教程是创建和共享 Log Analytics 数据的仪表板教程的延续。This tutorial is a continuation of the Create and share dashboards of Log Analytics data tutorial.

在本教程中,你将了解如何执行以下操作:In this tutorial, you learn how to:

  • 创建警报规则Create an alert rule
  • 配置用于发送电子邮件通知的操作组Configure an Action Group to send an e-mail notification

要完成本教程中的示例,必须将现有虚拟机连接到 Log Analytics 工作区To complete the example in this tutorial, you must have an existing virtual machine connected to the Log Analytics workspace.

登录到 Azure 门户Sign in to Azure portal

https://portal.azure.cn 中登录 Azure 门户。Sign in to the Azure portal at https://portal.azure.cn.

创建警报Create alerts

警报通过警报规则在 Azure Monitor 中创建,可以按固定的时间间隔自动运行保存的查询或自定义日志搜索。Alerts are created by alert rules in Azure Monitor and can automatically run saved queries or custom log searches at regular intervals. 可按以下条件创建警报:基于特定性能指标、在创建某些事件时、缺少某事件时,或者在特定时间范围内创建大量事件时。You can create alerts based on specific performance metrics or when certain events are created, absence of an event, or a number of events are created within a particular time window. 例如,警报可用于在以下情况下通知你:平均 CPU 使用率超过特定阈值、检测到缺少更新,或者在检测到未运行特定 Windows 服务或 Linux 后台程序时生成了某个事件。For example, alerts can be used to notify you when average CPU usage exceeds a certain threshold, when a missing update is detected, or when an event is generated upon detecting that a specific Windows service or Linux daemon is not running. 如果日志搜索的结果符合特定条件,则会创建警报。If the results of the log search match particular criteria, then an alert is created. 然后,该规则可自动运行一个或多个操作,例如通知你存在警报或调用另一个进程。The rule can then automatically run one or more actions, such as notify you of the alert or invoke another process.

在以下示例中,请根据保存在可视化数据教程中的“Azure VM - 处理器利用率”查询创建一条指标度量警报规则。In the following example, you create a metric measurement alert rule based off of the Azure VMs - Processor Utilization query saved in the Visualize data tutorial. 将会为每个超出阈值 (90%) 的虚拟机创建一条警报。An alert is created for each virtual machine that exceeds a threshold of 90%.

  1. 在 Azure 门户中,单击“所有服务” 。In the Azure portal, click All services. 在资源列表中,键入“Log Analytics” 。In the list of resources, type Log Analytics. 开始键入时,会根据输入筛选该列表。As you begin typing, the list filters based on your input. 选择“Log Analytics” 。Select Log Analytics.

  2. 在左窗格中选择“警报”,然后单击页面顶部的“新建警报规则”,以便创建新的警报。 In the left-hand pane, select Alerts and then click New Alert Rule from the top of the page to create a new alert.

    创建新的警报规则Create new alert rule

  3. 第一步是在“创建警报”部分选择充当资源的 Log Analytics 工作区, 因为这是基于日志的警报信号。For the first step, under the Create Alert section, you are going to select your Log Analytics workspace as the resource, since this is a log based alert signal. 对结果进行筛选,方法是:从下拉列表中选择特定的订阅(如果有多个),其中包含此前创建的 VM 和 Log Analytics 工作区。Filter the results by choosing the specific Subscription from the drop-down list if you have more than one, which contains the VM and Log Analytics workspace created earlier. 从下拉列表中选择“Log Analytics”,对“资源类型”进行筛选。 Filter the Resource Type by selecting Log Analytics from the drop-down list. 最后,选择资源 DefaultLAWorkspace,然后单击“完成”。 Finally, select the Resource DefaultLAWorkspace and then click Done.

    创建警报步骤 1 任务Create alert step 1 task

  4. 在“警报条件”部分单击“添加条件”, 选择保存的查询,然后指定警报规则遵循的逻辑。Under the section Alert Criteria, click Add Criteria to select our saved query and then specify logic that the alert rule follows. 在“配置信号逻辑”窗格的列表中选择“Azure VM - 处理器利用率”。 From the Configure signal logic pane, select Azure VMs - Processor Utilization from the list. 此窗格会进行更新,以呈现警报的配置设置。The pane updates to present the configuration settings for the alert. 在顶部,它会显示所选信号和搜索查询本身在过去 30 分钟的结果。On the top, it shows the results for the last 30 minutes of the selected signal and the search query itself.

  5. 使用以下信息配置警报:Configure the alert with the following information:
    a.a. 从“基于”下拉列表中选择“指标度量” 。From the Based on drop-down list, select Metric measurement. 指标度量将为查询中其值超出指定阈值的每个对象创建一个警报。A metric measurement will create an alert for each object in the query with a value that exceeds our specified threshold.
    b.b. 选择“大于”作为“条件”,然后输入 90 作为“阈值”。 For the Condition, select Greater than and enter 90 for Threshold.
    c.c. 在“触发警报的条件”部分选择“连续违规”,然后从下拉列表中选择“大于”并输入 3 作为值。 Under Trigger Alert Based On section, select Consecutive breaches and from the drop-down list select Greater than enter a value of 3.
    d.d. 在“评估条件”部分下,将“期间” 值修改为 30 分钟。Under Evaluation based on section, modify the Period value to 30 minutes. 此规则将每五分钟运行一次,返回从当前时间算起过去 30 分钟内创建的记录。The rule will run every five minutes and return records that were created within the last thirty minutes from the current time. 将时间段设置为更宽的时间窗口可以解决数据延迟的可能性,并确保查询返回数据以避免警报永远不会触发的漏报。Setting the time period to a wider window accounts for the potential of data latency, and ensures the query returns data to avoid a false negative where the alert never fires.

  6. 单击“完成”,完成警报规则。 Click Done to complete the alert rule.

    配置警报信号Configure alert signal

  7. 现在转到第二步,在“警报规则名称”字段中提供警报的名称,例如“CPU 百分比大于 90%”。 Now moving onto the second step, provide a name of your alert in the Alert rule name field, such as Percentage CPU greater than 90 percent. 指定“说明”,详细描述该警报的具体信息,并从提供的选项中选择“关键(严重性 0)”作为“严重性”值。 Specify a Description detailing specifics for the alert, and select Critical(Sev 0) for the Severity value from the options provided.

    配置警报详细信息Configure alert details

  8. 若要在创建后立即激活警报规则,请接受“创建后启用规则”选项的默认值。 To immediately activate the alert rule on creation, accept the default value for Enable rule upon creation.

  9. 第三步也是最后一步,指定“操作组” ,确保每次触发警报时都执行相同的操作,而且这些操作可以用于定义的每项规则。For the third and final step, you specify an Action Group, which ensures that the same actions are taken each time an alert is triggered and can be used for each rule you define. 使用以下信息配置新操作组:Configure a new action group with the following information:
    a.a. 选择“新建操作组”,此时会显示“添加操作组”窗格。 Select New action group and the Add action group pane appears.
    b.b. 对于“操作组名称” ,请指定一个长名称,例如“IT 操作 - 通知”,以及一个“短名称”,例如“itops-n”。 For Action group name, specify a name such as IT Operations - Notify and a Short name such as itops-n.
    c.c. 验证“订阅”和“资源组”的默认值是否正确 。Verify the default values for Subscription and Resource group are correct. 如果否,请从下拉列表中选择正确的值。If not, select the correct one from the drop-down list.
    d.d. 在“操作”部分指定操作的名称,例如“发送电子邮件”,然后在“操作类型”下的下拉列表中选择“电子邮件/短信/推送/语音”。 Under the Actions section, specify a name for the action, such as Send Email and under Action Type select Email/SMS/Push/Voice from the drop-down list. “电子邮件/短信/推送/语音”属性窗格会在右侧打开,其中包含更多的信息。 The Email/SMS/Push/Voice properties pane will open to the right in order to provide additional information.
    e.e. 在“电子邮件/短信/推送/语音”窗格中启用“电子邮件”, 并提供有效的可以接收邮件的电子邮件 SMTP 地址。On the Email/SMS/Push/Voice pane, enable Email and provide a valid email SMTP address to deliver the message to.
    f.f. 单击 “确定” 保存所做的更改。Click OK to save your changes.

    创建新的操作组

  10. 单击“确定” ,完成操作组。Click OK to complete the action group.

  11. 单击“创建警报规则”,完成警报规则。 Click Create alert rule to complete the alert rule. 该警报会立即开始运行。It starts running immediately.

    完成新警报规则的创建Complete creating new alert rule

在 Azure 门户中查看警报View your alerts in Azure portal

创建警报后,即可在单个窗格中查看 Azure 警报,并可跨 Azure 订阅管理所有警报规则。Now that you have created an alert, you can view Azure alerts in a single pane and manage all alert rules across your Azure subscriptions. 此页面列出所有警报规则(已启用的或已禁用的),这些规则可以根据目标资源、资源组、规则名称或状态排序。It lists all the alert rules (enabled or disabled) and can be sorted based on target resources, resource groups, rule name, or status. 包含所有已触发警报和所有已配置/已启用警报规则的聚合摘要。Included is an aggregated summary of all the fired alerts, and total configured/enabled alert rules.

Azure 警报状态页

警报触发时,此表会反映相关条件以及警报在所选时间范围(默认为过去六小时)内发生的次数。When the alert triggers, the table reflects the condition and how many times it occurred within the time range selected (the default is last six hours). 收件箱中会有一封相应的电子邮件,该邮件类似于以下示例,说明了有问题的虚拟机以及在这种情况下与搜索查询最匹配的结果。There should be a corresponding email in your inbox similar to the following example showing the offending virtual machine and the top results that matched the search query in this case.

警报电子邮件操作示例

后续步骤Next steps

在本教程中,介绍了警报规则按照计划的时间间隔运行且匹配特定条件时如何主动识别问题并做出响应。In this tutorial, you learned how alert rules can proactively identify and respond to an issue when they run log searches at scheduled intervals and match a particular criteria.

请访问以下链接,查看预生成的 Log Analytics 脚本示例。Follow this link to see pre-built Log Analytics script samples.