监视计划事件Monitoring Scheduled Events

每天都有更新应用到 Azure 的不同组成部分,使其中的服务保持安全和最新状态。Updates are applied to different parts of Azure every day, to keep the services running on them secure, and up-to-date. 除了计划内更新以外,还可能发生计划外事件。In addition to planned updates, unplanned events may also occur. 例如,如果检测到任何硬件降级或故障,Azure 服务可能需要执行计划外维护。For example, if any hardware degradation or fault is detected, Azure services may need to perform unplanned maintenance. 使用实时迁移时,内存预留将会更新,同时,更新所造成的影响通常受到严密跟踪,在大多数情况下,这些事件对于客户而言几乎是透明的,不会对他们造成任何影响,或者最多只会造成虚拟机冻结几秒钟。Using live migration, memory preserving updates and generally keeping a strict bar on the impact of updates, in most cases these events are almost transparent to customers, and they have no impact or at most cause a few seconds of virtual machine freeze. 但是,对于某些应用程序而言,即使是几秒钟的虚拟机冻结,也可能会造成影响。However, for some applications, even a few seconds of virtual machine freeze could cause an impact. 提前了解即将进行的 Azure 维护非常重要,这可以确保为这些应用程序提供的最佳体验。Knowing in advance about upcoming Azure maintenance is important, to ensure the best experience for those applications. 计划事件服务提供一个编程接口,即将进行维护时它会发出通知,并使你能够正确处理维护。Scheduled Events service provides you a programmatic interface to be notified about upcoming maintenance, and enables you to gracefully handle the maintenance.

本文将会介绍如何使用计划事件来接收有关可能影响 VM 的维护事件的通知,并构建某种简单的自动化机制来帮助进行监视和分析。In this article, we will show how you can use scheduled events to be notified about maintenance events that could be affecting your VMs and build some basic automation that can help with monitoring and analysis.

将计划事件路由到 Log AnalyticsRouting scheduled events to Log Analytics

计划事件作为 Azure 实例元数据服务的一部分提供,该服务已在每个 Azure 虚拟机上提供。Scheduled Events is available as part of the Azure Instance Metadata Service, which is available on every Azure virtual machine. 客户可以编写自动化代码来查询其虚拟机的终结点,以查找计划性维护通知并执行缓解措施,例如,保存状态,并从轮换列表中删除其虚拟机。Customers can write automation to query the endpoint of their virtual machines to find scheduled maintenance notifications and perform mitigations, like saving the state and taking the virtual machine out of rotation. 我们建议生成自动化代码来记录计划事件,以便可以获取 Azure 维护事件的审核日志。We recommend building automation to record the Scheduled Events so you can have an auditing log of Azure maintenance events.

本文逐步介绍如何将维护计划事件捕获到 Log Analytics。In this article, we will walk you through how to capture maintenance Scheduled Events to Log Analytics. 然后,将触发一些基本的通知操作,例如,将电子邮件发送给团队,并获取对虚拟机造成了影响的所有事件的历史视图。Then, we will trigger some basic notification actions, like sending an email to your team and getting a historical view of all events that have affected your virtual machines. 对于事件聚合与自动化,我们将使用 Log Analytics,但你可以使用任何监视解决方案来收集这些日志并触发自动化。For the event aggregation and automation we will use Log Analytics, but you can use any monitoring solution to collect these logs and trigger automation.

显示事件生命周期的示意图

先决条件Prerequisites

对于本示例,需要在可用性集中创建一个 Windows 虚拟机For this example, you will need to create a Windows Virtual Machine in an Availability Set. 对于可能会影响可用性集、云服务、虚拟机规模集或独立 VM 中的任何虚拟机的更改,计划事件会提供相关的通知。Scheduled Events provide notifications about changes that can affect any of the virtual machines in your availability set, Cloud Service, Virtual Machine Scale Set or standalone VMs. 我们将运行一个服务来轮询充当收集器的某个 VM 上的计划事件,以获取可用性集中所有其他 VM 的事件。We will be running a service that polls for scheduled events on one of the VMs that will act as a collector, to get events for all of the other VMs in the availability set.

在本教程结束时,请不要删除组资源组。Don't delete the group resource group at the end of the tutorial.

还需要创建一个 Log Analytics 工作区,用于从可用性集中的 VM 聚合信息。You will also need to create a Log Analytics workspace that we will use to aggregate information from the VMs in the availability set.

设置环境Set up the environment

现在,可用性集中应有 2 个初始 VM。You should now have 2 initial VMs in an availability set. 现在,需要在同一个可用性集中创建名为 myCollectorVM 的第 3 个 VM。Now we need to create a 3rd VM, called myCollectorVM, in the same availability set.

New-AzVm `
   -ResourceGroupName "myResourceGroupAvailability" `
   -Name "myCollectorVM" `
   -Location "China East" `
   -VirtualNetworkName "myVnet" `
   -SubnetName "mySubnet" `
   -SecurityGroupName "myNetworkSecurityGroup" `
   -OpenPorts 3389 `
   -PublicIpAddressName "myPublicIpAddress3" `
   -AvailabilitySetName "myAvailabilitySet" `
   -Credential $cred

GitHub 下载项目的 .zip 安装文件。Download the installation .zip file of the project from GitHub.

连接到 myCollectorVM ,将该 .zip 文件复制到虚拟机并解压缩其中的所有文件。Connect to myCollectorVM and copy the .zip file to the virtual machine and extract all of the files. 在 VM 上打开 PowerShell 提示符。On your VM, open a PowerShell prompt. 在提示符下切换到包含 SchService.ps1 的文件夹(例如 PS C:\Users\azureuser\AzureScheduledEventsService-master\AzureScheduledEventsService-master\Powershell>),然后设置服务。Move your prompt into the folder containing SchService.ps1, for example: PS C:\Users\azureuser\AzureScheduledEventsService-master\AzureScheduledEventsService-master\Powershell>, and set up the service.

.\SchService.ps1 -Setup

启动该服务。Start the service.

.\SchService.ps1 -Start

现在,该服务将开始每隔 10 秒轮询所有计划事件,并审批事件以加速维护。The service will now start polling every 10 seconds for any scheduled events and approve the events to expedite the maintenance. “冻结”、“重新启动”、“重新部署”和“抢占”是计划事件捕获的事件。Freeze, Reboot, Redeploy, and Preempt are the events captured by Schedule events. 请注意,可以扩展脚本,以便在审批事件之前触发某些缓解措施。Note that you can extend the script to trigger some mitigations prior to approving the event.

验证服务状态,确保它正在运行。Validate the service status and make sure it is running.

.\SchService.ps1 -status  

此命令应返回 RunningThis should return Running.

现在,该服务将开始每隔 10 秒轮询所有计划事件,并审批事件以加速维护。The service will now start polling every 10 seconds for any scheduled events and approve the events to expedite the maintenance. “冻结”、“重新启动”、“重新部署”和“抢占”是计划事件捕获的事件。Freeze, Reboot, Redeploy and Preempt are the events captured by Schedule events. 可以扩展脚本,以便在审批事件之前触发某些缓解措施。You can extend the script to trigger some mitigations prior to approving the event.

当计划事件服务捕获到上述任一事件时,该事件将记录到“应用程序事件日志事件状态”、“事件类型”、“资源”(虚拟机名称)和“不早于”(最小通知期限)中。When any of the above events are captured by Schedule Event service, it will get logged in the Application Event Log Event Status, Event Type, Resources (Virtual machine names) and NotBefore (minimum notice period). 可以在应用程序事件日志中找到 ID 为 1234 的事件。You can locate the events with ID 1234 in the Application Event Log.

设置并启动服务后,它会将事件记录到 Windows 应用程序日志中。Once the service is set up and started, it will log events in the Windows Application logs. 若要验证是否可正常执行此操作,请重启可用性集中的某个虚拟机,然后,应会在事件查看器的“Windows 日志”>“应用程序日志”中看到记录了一个事件,其中显示 VM 已重启。To verify this works, restart one of the virtual machines in the availability set and you should see an event being logged in Event viewer in Windows Logs > Application log showing the VM restarted.

事件查看器的屏幕截图。

当计划事件服务捕获到事件时,该事件将记录到应用程序事件日志中,并显示“事件状态”、“事件类型”、“资源”(VM 名称)和“不早于”(最小通知期限)属性。When events are captured by the Schedule Event service, it will get logged in the application even log with Event Status, Event Type, Resources (VM name) and NotBefore (minimum notice period). 可以在应用程序事件日志中找到 ID 为 1234 的事件。You can locate the events with ID 1234 in the Application Event Log.

备注

在本示例中,虚拟机位于可用性集中,因此,我们可将一个虚拟机指定为收集器,以侦听计划事件并将其路由到 Log Analytics 工作区。In this example, the virtual machines were are in an availability set, which enabled us to designate a single virtual machine as the collector to listen and route scheduled events to our log analytics works space. 如果你有独立的虚拟机,可在每个虚拟机上运行该服务,然后将这些虚拟机分别连接到 Log Analytics 工作区。If you have standalone virtual machines, you can run the service on every virtual machine, and then connect them individually to your log analytics workspace.

在我们的设置中,我们选择了“Windows”,但你可以在 Linux 上设计类似的解决方案。For our set up, we chose Windows, but you can design a similar solution on Linux.

随时可以使用开关 -stop-remove 来停止/删除计划事件服务。At any point you can stop/remove the Scheduled Event Service by using the switches -stop and -remove.

连接到工作区Connect to the workspace

现在,我们想要将 Log Analytics 工作区连接到收集器 VM。We now want to connect a Log Analytics Workspace to the collector VM. Log Analytics 工作区充当存储库。我们将配置事件日志收集以从收集器 VM 捕获应用程序日志。The Log Analytics workspace acts as a repository and we will configure event log collection to capture the application logs from the collector VM.

若要将计划事件路由到事件日志(服务会将其保存为应用程序日志),需要将虚拟机连接到 Log Analytics 工作区。To route the Scheduled Events to the Events Log, which will be saved as Application log by our service, you will need to connect your virtual machine to your Log Analytics workspace.

  1. 打开所创建的工作区的页面。Open the page for the workspace you created.

  2. 在“连接到数据源”下,选择“Azure 虚拟机(VM)”。 Under Connect to a data source select Azure virtual machines (VMs) .

    连接到用作数据源的 VM

  3. 搜索并选择“myCollectorVM”。Search for and select myCollectorVM .

  4. 在“myCollectorVM”的新页面上,选择“连接”。 On the new page for myCollectorVM , select Connect .

这会在虚拟机中安装 Microsoft 监视代理This will install the Microsoft Monitoring agent in your virtual machine. 将 VM 连接到工作区并安装扩展的过程需要几分钟时间。It will take a few minutes to connect your VM to the workspace and install the extension.

配置工作区Configure the workspace

  1. 打开工作区的页面,选择“高级设置”。Open the page for your workspace and select Advanced settings .

  2. 在左侧菜单中选择“数据”,然后选择“Windows 事件日志”。 Select Data from the left menu, then select Windows Event Logs .

  3. 在“从以下事件日志收集”中键入“应用程序”,然后从列表中选择“应用程序”。In Collect from the following event logs , start typing application and then select Application from the list.

    选择“高级设置”

  4. 保留“错误”、“警告”和“信息”,然后选择“保存”以保存设置。 Leave ERROR , WARNING , and INFORMATION selected and then select Save to save the settings.

    备注

    此时会出现一定的延迟,最长可能需要在 10 分钟之后才会显示日志。There will be some delay, and it may take up to 10 minutes before the log is available.

使用 Azure Monitor 创建警报规则Creating an alert rule with Azure Monitor

将事件推送到 Log Analytics 后,可运行以下查询来查找计划事件。Once the events are pushed to Log Analytics, you can run the following query to look for the schedule Events.

  1. 在页面顶部选择“日志”,将以下内容粘贴到文本框中:At the top of the page, select Logs and paste the following into the text box:

    Event
    | where EventLog == "Application" and Source contains "AzureScheduledEvents" and RenderedDescription contains "Scheduled" and RenderedDescription contains "EventStatus" 
    | project TimeGenerated, RenderedDescription
    | extend ReqJson= parse_json(RenderedDescription)
    | extend EventId = ReqJson["EventId"]
    ,EventStatus = ReqJson["EventStatus"]
    ,EventType = ReqJson["EventType"]
    ,NotBefore = ReqJson["NotBefore"]
    ,ResourceType = ReqJson["ResourceType"]
    ,Resources = ReqJson["Resources"]
    | project-away RenderedDescription,ReqJson
    
  2. 选择“保存”,键入 logQuery 作为名称,保留“查询”作为类型,键入 VMLogs 作为 类别 ,然后选择“保存”。 Select Save , and then type logQuery for the name, leave Query as the type, type VMLogs as the Category , and then select Save .

    保存查询

  3. 选择“新建警报规则”。Select New alert rule .

  4. 在“创建规则”页中,保留 collectorworkspace 作为 资源In the Create rule page, leave collectorworkspace as the Resource .

  5. 在“条件”下,选择条目“每当客户日志搜索为 时”。Under Condition , select the entry Whenever the customer log search is . 此时将打开“配置信号逻辑”页。The Configure signal logic page will open.

  6. 在“阈值”下输入 0 ,然后选择“完成”。 Under Threshold value , enter 0 and then select Done .

  7. 在“操作”下,选择“创建操作组”。 Under Actions , select Create action group . 此时将打开“添加操作组”页。The Add action group page will open.

  8. 在“操作组名称”中键入 myActionGroupIn Action group name , type myActionGroup .

  9. 在“短名称”中键入 myActionGroup 。In Short name , type myActionGroup .

  10. 在“资源组”中选择“myResourceGroupAvailability”。 In Resource group , select myResourceGroupAvailability .

  11. 在“操作”下的“操作名称”中键入“电子邮件”,然后选择“电子邮件/短信”。 Under Actions, in ACTION NAME type Email , and then select Email/SMS . 此时将打开“电子邮件/短信”页。The Email/SMS page will open.

  12. 选择“电子邮件”,键入电子邮件地址,然后选择“确定”。 Select Email , type in your e-mail address, then select OK .

  13. 在“添加操作组”页中选择“确定”。 In the Add action group page, select OK .

  14. 在“创建规则”页中的“警报详细信息”下,为“警报规则名称”键入 myAlert ,然后为“说明”键入“电子邮件警报规则”。 In the Create rule page, under ALERT DETAILS , type myAlert for the Alert rule name , and then type Email alert rule for the Description .

  15. 完成后,选择“创建警报规则”。When you are finished, select Create alert rule .

  16. 重启可用性集中的一个 VM。Restart one of the VMs in the availability set. 几分钟后,你应会收到一封电子邮件,指出已触发该警报。Within a few minutes, you should get an e-mail that the alert has been triggered.

若要管理警报规则,请转到资源组,在左侧菜单中选择“警报”,然后在页面顶部选择“管理警报规则”。 To manage your alert rules, go to the resource group, select Alerts from the left menu, and then select Manage alert rules from the top of the page.

后续步骤Next steps

有关详细信息,请参阅 GitHub 上的计划事件服务页。To learn more, see the Scheduled events service page on GitHub.