处理计划内维护通知Handling planned maintenance notifications

Azure 定期执行更新,以提高虚拟机的主机基础结构的可靠性、性能及安全性。Azure periodically performs updates to improve the reliability, performance, and security of the host infrastructure for virtual machines. 更新包括如下更改:修补托管环境或升级以及解除硬件授权。Updates are changes like patching the hosting environment or upgrading and decommissioning hardware. 大多数此类更新在完成时不会影响托管的虚拟机。A majority of these updates are completed without any impact to the hosted virtual machines. 但是,也会存在更新产生影响的情况:However, there are cases where updates do have an impact:

  • 如果维护不需重新启动,Azure 会在更新主机时使用就地迁移来暂停 VM。If the maintenance does not require a reboot, Azure uses in-place migration to pause the VM while the host is updated. 这些类型的维护操作将逐个容错域进行应用。These types maintenance operations are applied fault domain by fault domain. 如果接收到任何警告健康状况信号,则进程将停止。Progress is stopped if any warning health signals are received.

  • 如果维护需重新启动,系统会告知计划维护的时间。If maintenance requires a reboot, you get a notice of when the maintenance is planned. 系统会提供一个大约 35 天的时间窗口,方便我们在适当的时间自行启动维护。You are given a time window of about 35 days where you can start the maintenance yourself, when it works for you.

需要重启的计划内维护是按批进行计划的。Planned maintenance that requires a reboot is scheduled in waves. 每个批具有不同的作用域(区域)。Each wave has different scope (regions).

  • 一个批从向客户发送通知开始。A wave starts with a notification to customers. 默认情况下,通知将发送给订阅的管理员和共同管理员。By default, the notification is sent to the subscription admin and co-admins. 可以使用活动日志警报添加更多收件人和消息传送选项,例如电子邮件、短信和 Webhook。You can add more recipients and messaging options like email, SMS, and webhooks, using Activity Log Alerts.
  • 出现通知后会提供一个自助服务时段。Once a notification goes out, a self-service window is made available. 在此时间窗口内,你可以查询哪些虚拟机受影响,并根据你自己的计划需求来启动维护。During this window, you can query which of your virtual machines are affected and start maintenance based on your own scheduling needs. 自助服务时间窗口通常为大约 35 天。The self-service window is typically about 35 days.
  • 自助时段过后,就会开始计划内维护时段。After the self-service window, a scheduled maintenance window begins. 在此时段的某个时刻,Azure 会计划所需的维护,并将其应用于虚拟机。At some point during this window, Azure schedules and applies the required maintenance to your virtual machine.

设置这两个时段的目的是,在了解 Azure 何时将自动启动维护时,提供足够的时间来启动维护和重新启动虚拟机。The goal in having two windows is to give you enough time to start maintenance and reboot your virtual machine while knowing when Azure will automatically start maintenance.

可以使用 Azure 门户、PowerShell、REST API 和 CLI 查询 VM 的维护时段并启动自助式维护。You can use the Azure portal, PowerShell, REST API, and CLI to query for the maintenance windows for your VMs and start self-service maintenance.

你是否应在自助时段启动维护?Should you start maintenance using during the self-service window?

可以先阅读以下指南,然后再决定是否使用此功能按自己的时间来启动维护。The following guidelines should help you decide whether to use this capability and start maintenance at your own time.

备注

自助维护不一定适用于所有 VM。Self-service maintenance might not be available for all of your VMs. 若要确定是否可以对 VM 进行主动重新部署,请在维护状态中查找“立即启动”。To determine if proactive redeploy is available for your VM, look for the Start now in the maintenance status. 自助维护目前不适用于云服务(Web/辅助角色)和 Service Fabric。Self-service maintenance is currently not available for Cloud Services (Web/Worker Role) and Service Fabric.

对于使用 可用性集 的部署,不推荐使用自助维护。Self-service maintenance is not recommended for deployments using availability sets. 可用性集一次只能更新一个更新域。Availability sets are already only updated one update domain at a time.

  • 让 Azure 触发维护。Let Azure trigger the maintenance. 对于需要重启的维护,各个更新域将逐个执行维护。For maintenance that requires reboot, maintenance will be done update domain by update domain. 各个更新域不一定会按顺序接受维护,并且每个更新域之间会有 30 分钟的暂停。The update domains do not necessarily receive the maintenance sequentially, and that there is a 30-minute pause between update domains.
  • 如果某些容量(1 个更新域)的临时损失是个需要顾虑的问题,在维护期间可以增加实例。If a temporary loss of some capacity (1 update domain) is a concern, you can add instances during the maintenance period.
  • 对于不需重启的维护,更新在容错域级别应用。For maintenance that does not require reboot, updates are applied at the fault domain level.

以下情况 请勿 使用自助维护:Don't use self-service maintenance in the following scenarios:

  • 如果频繁关闭 VM,不管是使用手动方式、使用开发测试实验室、使用自动关闭还是按计划来完成,都可能会还原维护状态,从而导致停机时间延长。If you shut down your VMs frequently, either manually, using DevTest Labs, using auto-shutdown, or following a schedule, it could revert the maintenance status and therefore cause additional downtime.
  • VM 的生存期短,已确定在维护结束之前就会被删除。On short-lived VMs that you know will be deleted before the end of the maintenance wave.
  • 工作负荷的状态为“大”,存储在本地(临时)磁盘中,需要在更新后进行维护。For workloads with a large state stored in the local (ephemeral) disk that is desired to be maintained upon update.
  • 经常需要重设 VM 大小的情况,可能还原维护状态。For cases where you resize your VM often, as it could revert the maintenance status.
  • 在采用的计划事件允许对工作负荷进行主动故障转移或正常关闭的情况下,在启动维护性关闭之前的 15 分钟If you have adopted scheduled events that enable proactive failover or graceful shutdown of your workload, 15 minutes before start of maintenance shutdown

如果打算在计划性维护阶段不间断地运行 VM,而且上述禁忌均不适用,则可 使用 自助维护。Use self-service maintenance, if you are planning to run your VM uninterrupted during the scheduled maintenance phase and none of the counter-indications mentioned above are applicable.

以下情况最好使用自助维护:It is best to use self-service maintenance in the following cases:

  • 需要向管理层或最终客户告知确切的维护时段。You need to communicate an exact maintenance window to your management or end-customer.
  • 需要在给定的日期之前完成维护。You need to complete the maintenance by a given date.
  • 需要控制维护顺序,例如,应用程序为多层应用程序,需要确保安全地进行恢复。You need to control the sequence of maintenance, for example, multi-tier application to guarantee safe recovery.
  • 在两个更新域 (UD) 之间,需要的 VM 恢复时间超过 30 分钟。More than 30 minutes of VM recovery time is needed between two update domains (UDs). 为了控制更新域之间的时间,一次只能在一个更新域 (UD) 的 VM 上触发维护。To control the time between update domains, you must trigger maintenance on your VMs one update domain (UD) at a time.

常见问题FAQ

问:为什么需要立即重新启动虚拟机?Q: Why do you need to reboot my virtual machines now?

答: 虽然对 Azure 平台的大多数更新和升级不会影响虚拟机的可用性,但在某些情况下无法避免重新启动 Azure 中托管的虚拟机。A: While the majority of updates and upgrades to the Azure platform do not impact virtual machine's availability, there are cases where we can't avoid rebooting virtual machines hosted in Azure. 我们累积了多个需要重启服务器的更改,这会导致虚拟机重启。We have accumulated several changes that require us to restart our servers that will result in virtual machines reboot.

问:如果我按建议使用可用性集实现高可用性,我是否安全?Q: If I follow your recommendations for High Availability by using an Availability Set, am I safe?

答: 对于部署在可用性集或虚拟机规模集中的虚拟机,我们有一个概念:更新域 (UD)。A: Virtual machines deployed in an availability set or virtual machine scale sets have the notion of Update Domains (UD). 执行维护时,Azure 遵循 UD 约束,不会从不同 UD(在同一可用性集中)重新启动虚拟机。When performing maintenance, Azure honors the UD constraint and will not reboot virtual machines from different UD (within the same availability set). Azure 还会至少等待 30 分钟,然后才移到下一组虚拟机。Azure also waits for at least 30 minutes before moving to the next group of virtual machines.

有关高可用性的详细信息,请参阅 Azure 中虚拟机的可用性For more information about high availability, see Availability for virtual machines in Azure.

问:如何收到有关计划内维护的通知?Q: How do I get notified about planned maintenance?

答: 一次计划内维护是通过将计划设置到一个或多个 Azure 区域启动的。A: A planned maintenance wave starts by setting a schedule to one or more Azure regions. 不久以后,电子邮件通知将发送到订阅的管理员和共同管理员(每个订阅一封电子邮件)。Soon after, an email notification is sent to the subscription admin and co-admins (one email per subscription). 可以使用活动日志警报配置此通知的其他通道和收件人。Additional channels and recipients for this notification could be configured using Activity Log Alerts. 如果将虚拟机部署到已安排计划内维护的区域,将不会收到通知,而是需要检查 VM 的维护状态。In case you deploy a virtual machine to a region where planned maintenance is already scheduled, you will not receive the notification but rather need to check the maintenance state of the VM.

问:我在门户、PowerShell 或 CLI 中看不到计划内维护的任何指示。出了什么问题?Q: I don't see any indication of planned maintenance in the portal, PowerShell, or CLI. What is wrong?

答: 一次计划内维护期间,与计划内维护相关的信息仅适用于将受到一次计划内维护影响的 VM。A: Information related to planned maintenance is available during a planned maintenance wave only for the VMs that are going to be impacted by it. 换而言之,如果你看不到数据,则可能是这次维护已完成(或未启动)或虚拟机已在更新的服务器中托管。In other words, if you see not data, it could be that the maintenance wave has already completed (or not started) or that your virtual machine is already hosted in an updated server.

问:有什么方法可以知道虚拟机受影响的确切时间?Q: Is there a way to know exactly when my virtual machine will be impacted?

答: 设置计划时,我们定义了长达几天的时间窗口。A: When setting the schedule, we define a time window of several days. 但是,服务器(和 VM)在此时间窗口内的确切排序是未知的。However, the exact sequencing of servers (and VMs) within this window is unknown. 想要知道其 VM 确切时间的客户可以使用计划事件并从虚拟机中进行查询,这样就会在 VM 重启前 15 分钟收到通知。Customers who would like to know the exact time for their VMs can use scheduled events and query from within the virtual machine and receive a 15-minute notification before a VM reboot.

问:重新启动虚拟机需要多长时间?Q: How long will it take you to reboot my virtual machine?

答: 根据 VM 的大小,在自助维护时段内,重启最多可能需要几分钟时间。A: Depending on the size of your VM, reboot may take up to several minutes during the self-service maintenance window. 当 Azure 在计划性维护时段内启动重启时,重启通常需要 25 分钟左右。During the Azure initiated reboots in the scheduled maintenance window, the reboot will typically take about 25 minutes. 请注意,如果使用云服务(Web/辅助角色)、虚拟机规模集或可用性集,则在计划性维护时段内每组 VM (UD) 之间有 30 分钟的可用时间。Note that in case you use Cloud Services (Web/Worker Role), Virtual Machine Scale Sets, or availability sets, you will be given 30 minutes between each group of VMs (UD) during the scheduled maintenance window.

问:使用虚拟机规模集时的体验如何?Q: What is the experience in the case of Virtual Machine Scale Sets?

答: 计划内维护现在适用于虚拟机规模集。A: Planned maintenance is now available for Virtual Machine Scale Sets. 有关如何启动自助维护的说明,请参阅虚拟机规模集的计划内维护文档。For instructions on how to initiate self-service maintenance refer planned maintenance for virtual machine scale sets document.

问:使用云服务(Web/辅助角色)和 Service Fabric 时的体验如何?Q: What is the experience in the case of Cloud Services (Web/Worker Role) and Service Fabric?

答: 虽然这些平台会受到计划内维护的影响,但使用这些平台的客户可以安全地进行操作,因为在任何给定的时间,只有单个升级域 (UD) 中的 VM 受影响。A: While these platforms are impacted by planned maintenance, customers using these platforms are considered safe given that only VMs in a single Upgrade Domain (UD) will be impacted at any given time. 自助维护目前不适用于云服务(Web/辅助角色)和 Service Fabric。Self-service maintenance is currently not available for Cloud Services (Web/Worker Role) and Service Fabric.

问:我在 VM 上看不到任何维护信息,是哪里出错了?Q: I don't see any maintenance information on my VMs. What went wrong?

答: 有很多原因会导致在 VM 上看不到任何维护信息:A: There are several reasons why you're not seeing any maintenance information on your VMs:

  1. 使用的是标记为“Azure 内部”的订阅。You are using a subscription marked as Azure internal.
  2. VM 未计划进行维护。Your VMs are not scheduled for maintenance. 可能是这次维护已结束、已取消或已改变计划,因此你的 VM 不再受其影响。It could be that the maintenance wave has ended, canceled, or modified so that your VMs are no longer impacted by it.
  3. 已解除分配 VM,然后启动了它。You have deallocated VM and then started it. 这可能会导致 VM 移动到没有安排计划内维护批次的位置。This can cause VM to move to a location which does not have planned maintenance wave scheduled. 因此,VM 将不再显示维护信息。So the VM will not show maintenance information any more.
  4. 你没有将“维护”列添加到 VM 列表视图。You don't have the Maintenance column added to your VM list view. 虽然我们已向默认视图添加此列,但配置为查看非默认列的客户必须手动将“维护”列添加到其 VM 列表视图。While we have added this column to the default view, customers who configured to see non-default columns must manually add the Maintenance column to their VM list view.

问:我的 VM 已计划进行第二次维护,为什么?Q: My VM is scheduled for maintenance for the second time. Why?

答: 多种用例都会看到在完成维护性重新部署后,VM 仍进行计划性维护:A: There are several use cases where you will see your VM scheduled for maintenance after you have already completed your maintenance-redeploy:

  1. 我们已取消这次维护,并使用不同的有效负载重新启动它。We have canceled the maintenance wave and restarted it with a different payload. 可能是我们已检测到出错的有效负载,只需部署其他有效负载。It could be that we've detected faulted payload and we simply need to deploy an additional payload.
  2. 由于硬件故障,已在另一个节点上对 VM 进行服务修复。Your VM was service healed to another node due to a hardware fault.
  3. 选择了停止(解除分配)VM 并将其重启。You have selected to stop (deallocate) and restart the VM.
  4. 已经为 VM 启用了 自动关闭You have auto shutdown turned on for the VM.

后续步骤Next steps

还可以使用 Azure CLIAzure PowerShell门户处理计划内维护。You can handle planned maintenance using the Azure CLI, Azure PowerShell or portal.