Azure 中虚拟机的维护Maintenance for virtual machines in Azure

Azure 定期更新其平台,以提高虚拟机的主机基础结构的可靠性、性能及安全性。Azure periodically updates its platform to improve the reliability, performance, and security of the host infrastructure for virtual machines. 此类更新的目的包括修补托管环境中的软件组件、升级网络组件以及硬件解除授权,等等。The purpose of these updates ranges from patching software components in the hosting environment to upgrading networking components or decommissioning hardware.

更新极少会影响到托管的 VM。Updates rarely affect the hosted VMs. 如果更新确实会造成影响,Azure 将选择影响最小的方法进行更新:When updates do have an effect, Azure chooses the least impactful method for updates:

  • 如果更新不需要重新启动,则会在更新主机时暂停 VM,或者将 VM 实时迁移到已更新的主机。If the update doesn't require a reboot, the VM is paused while the host is updated, or the VM is live-migrated to an already updated host.
  • 如果维护需要重新启动,则你会收到计划内维护通知。If maintenance requires a reboot, you're notified of the planned maintenance. Azure 还会提供一个时间范围,方便你在合适的时间自行启动维护。Azure also provides a time window in which you can start the maintenance yourself, at a time that works for you. 除非紧急执行维护,否则自我维护时限通常为 30 天。The self-maintenance window is typically 30 days unless the maintenance is urgent. Azure 正在投资相关技术,减少进行计划内平台维护时必须重新启动 VM 的情况。Azure is investing in technologies to reduce the number of cases in which planned platform maintenance requires the VMs to be rebooted. 有关管理计划内维护的说明,请参阅“使用 Azure CLIPowerShell门户处理计划内维护通知”。For instructions on managing planned maintenance, see Handling planned maintenance notifications using the Azure CLI, PowerShell or portal.

本页介绍 Azure 如何执行上述两种类型的维护。This page describes how Azure performs both types of maintenance. 有关非计划事件(服务中断)的详细信息,请参阅 管理 Windows VM 的可用性或适用于 Linux 的相应文章。For more information about unplanned events (outages), see Manage the availability of VMs for Windows or the corresponding article for Linux.

在 VM 中,可以使用适用于 WindowsLinux 的计划事件获取有关即将进行维护的通知。Within a VM, you can get notifications about upcoming maintenance by using Scheduled Events for Windows or for Linux.

不需要重新启动的维护Maintenance that doesn't require a reboot

大多数平台更新不会影响客户 VM。Most platform updates don't affect customer VMs. 如果更新不可避免地造成影响,Azure 会选择对客户 VM 影响最小的更新机制。When a no-impact update isn't possible, Azure chooses the update mechanism that's least impactful to customer VMs.

大多数有影响的维护只会导致 VM 暂停 10 秒以下。Most nonzero-impact maintenance pauses the VM for less than 10 seconds. 在某些情况下,Azure 使用内存预留维护机制。In certain cases, Azure uses memory-preserving maintenance mechanisms. 这些机制最长会将 VM 暂停 30 秒,并在 RAM 中预留内存。These mechanisms pause the VM for up to 30 seconds and preserve the memory in RAM. VM 随后会恢复,其时钟会自动同步。The VM is then resumed, and its clock is automatically synchronized.

内存预留维护适用于 90% 以上的 Azure VM。Memory-preserving maintenance works for more than 90 percent of Azure VMs. 它不适用于 M、N 和 H 系列。It doesn't work for M, N, and H series. Azure 逐渐趋向于使用实时迁移技术并改进内存保留维护机制,目的是缩短暂停时间。Azure increasingly uses live-migration technologies and improves memory-preserving maintenance mechanisms to reduce the pause durations.

无需重新启动的维护操作每次应用到一个容错域。These maintenance operations that don't require a reboot are applied one fault domain at a time. 如果出现任何警告运行状况信号,这些操作将会停止。They stop if they receive any warning health signals.

此类更新可能会影响某些应用程序。These types of updates can affect some applications. 将 VM 实时迁移到另一主机时,某些敏感的工作负荷可能会有几分钟出现导致 VM 暂停的性能略微下降情况。When the VM is live-migrated to a different host, some sensitive workloads might show a slight performance degradation in the few minutes leading up to the VM pause. 若要准备 VM 维护并减小 Azure 维护期间造成的影响,请尝试使用此类应用程序的适用于 WindowsLinux 的计划事件。To prepare for VM maintenance and reduce impact during Azure maintenance, try using Scheduled Events for Windows or Linux for such applications.

公共预览版中还提供了一项“维护控制”功能,这项功能可以帮助管理不需要重新启动的维护。There is also a feature, maintenance control, in public preview that can help manage maintenance that doesn't require a reboot. 必须使用 Azure 专用主机You must be using either Azure Dedicated Hosts. 使用维护控制可以选择跳过平台更新并选择在 35 天滚动时段内的某个时间应用更新。Maintenance control gives you the option to skip platform updates and apply the updates at your choice of time within a 35-day rolling window. 有关详细信息,请参阅使用维护控制和 Azure CLI 来控制更新For more information, see Control updates with Maintenance Control and the Azure CLI.

实时迁移Live migration

实时迁移是一个无需重新启动的操作,可为 VM 预留内存。Live migration is an operation that doesn't require a reboot and that preserves memory for the VM. 它会导致暂停或冻结,但持续时间通常不超过 5 秒。It causes a pause or freeze, typically lasting no more than 5 seconds. 除 M、N 和 H 系列以外其他所有基础结构即服务 (IaaS) VM 都符合实时迁移的条件。Except for M, N, and H series, all infrastructure as a service (IaaS) VMs, are eligible for live migration. 符合条件的 VM 代表了可部署到 Azure 机群的 90% 以上的 IaaS VM。Eligible VMs represent more than 90 percent of the IaaS VMs that are deployed to the Azure fleet.

出现以下情况时,Azure 平台将启动实时迁移:The Azure platform starts live migration in the following scenarios:

  • 计划内维护Planned maintenance
  • 硬件故障Hardware failure
  • 分配优化Allocation optimizations

某些计划内维护方案使用实时迁移,你可以使用计划事件来提前了解实时迁移操作何时开始。Some planned-maintenance scenarios use live migration, and you can use Scheduled Events to know in advance when live migration operations will start.

当 Azure 机器学习算法预测即将发生的硬件故障,或者当你想要优化 VM 分配时,也可以使用实时迁移来移动 VM。Live migration can also be used to move VMs when Azure Machine Learning algorithms predict an impending hardware failure or when you want to optimize VM allocations. 有关用于检测降级硬件实例的预测建模的详细信息,请参阅使用预测性机器学习和实时迁移提高 Azure VM 的复原能力For more information about predictive modeling that detects instances of degraded hardware, see Improving Azure VM resiliency with predictive machine learning and live migration. 实时迁移通知将显示在 Azure 门户上的“监视”和“服务运行状况”日志中以及“计划事件”中(如果使用这些服务)。Live-migration notifications appear in the Azure portal in the Monitor and Service Health logs as well as in Scheduled Events if you use these services.

需要重新启动的维护Maintenance that requires a reboot

在极少数情况下,VM 需重新启动以进行计划内维护,这种情况下系统会提前通知。In the rare case where VMs need to be rebooted for planned maintenance, you'll be notified in advance. 计划内维护有两个阶段:自助服务阶段和计划内维护阶段。Planned maintenance has two phases: the self-service phase and a scheduled maintenance phase.

自助服务阶段通常持续四周,你将在 VM 上启动维护。During the self-service phase, which typically lasts four weeks, you start the maintenance on your VMs. 在自助服务期间,可以查询每个 VM 来了解其状态,以及上次维护请求的结果。As part of the self-service, you can query each VM to see its status and the result of your last maintenance request.

启动自助维护时,VM 会重新部署到已更新的某个节点。When you start self-service maintenance, your VM is redeployed to an already updated node. 由于 VM 重新启动,临时磁盘会丢失,而与虚拟网络接口关联的动态 IP 地址会更新。Because the VM reboots, the temporary disk is lost and dynamic IP addresses associated with the virtual network interface are updated.

如果在自助维护期间出错,操作将会停止,VM 不会更新,而你可以使用相应的选项来重试自助维护。If an error arises during self-service maintenance, the operation stops, the VM isn't updated, and you get the option to retry the self-service maintenance.

自助维护阶段结束时,计划内维护阶段随即开始。When the self-service phase ends, the scheduled maintenance phase begins. 在此阶段,仍可以查询维护阶段,但无法自行启动维护。During this phase, you can still query for the maintenance phase, but you can't start the maintenance yourself.

有关管理需要重新启动的维护的详细信息,请参阅“使用 Azure CLIPowerShell门户处理计划内维护通知”。For more information on managing maintenance that requires a reboot, see Handling planned maintenance notifications using the Azure CLI, PowerShell or portal.

计划内维护期间的可用性注意事项Availability considerations during scheduled maintenance

如果你决定等到计划内维护阶段开始,为了保持 VM 的最高可用性,应考虑到一些要素。If you decide to wait until the scheduled maintenance phase, there are a few things you should consider to maintain the highest availability of your VMs.

配对区域Paired regions

每个 Azure 区域将与同一邻近地理范围内的另一区域配对。Each Azure region is paired with another region within the same geographical vicinity. 它们共同构成了一个区域对。Together, they make a region pair. 在计划内维护阶段,Azure 只会更新一个区域对中单个区域内的 VM。During the scheduled maintenance phase, Azure updates only the VMs in a single region of a region pair. 例如,更新中国北部的 VM 时,Azure 不会同时更新中国东部的任何 VM。For example, while updating the VM in China North, Azure doesn't update any VM in China East at the same time.

可用性集和规模集Availability sets and scale sets

在 Azure VM 上部署工作负荷时,可以在可用性集中创建 VM,向应用程序提供高可用性。When deploying a workload on Azure VMs, you can create the VMs within an availability set to provide high availability to your application. 使用可用性集可以确保在发生服务中断或需要重新启动的维护事件期间,至少有一个 VM 可用。Using availability sets, you can ensure that during either an outage or maintenance events that require a reboot, at least one VM is available.

在可用性集中,个别 VM 可分布在最多 20 个更新域中。Within an availability set, individual VMs are spread across up to 20 update domains. 在计划性维护期间,任何给定时间都只更新一个更新域。During scheduled maintenance, only one update domain is updated at any given time. 更新域不一定要按顺序更新。Update domains aren't necessarily updated sequentially.

虚拟机规模集是一种 Azure 计算资源,可用于将一组相同的 VM 作为单个资源进行部署和管理。Virtual machine scale sets are an Azure compute resource that you can use to deploy and manage a set of identical VMs as a single resource. 规模集自动跨 UD 进行部署,此类更新域就像可用性集中的 VM 一样。The scale set is automatically deployed across UDs, like VMs in an availability set. 在计划内维护期间使用规模集时,就像使用可用性集一样,在任意给定时间只会更新一个 UD。As with availability sets, when you use scale sets, only one UD is updated at any given time during scheduled maintenance.

有关设置 VM 以实现高可用性的详细信息,请参阅 管理 Windows VM 的可用性或适用于 Linux 的相应文章。For more information about setting up your VMs for high availability, see Manage the availability of your VMs for Windows or the corresponding article for Linux.

后续步骤Next steps

可以使用 Azure CLIAzure PowerShell门户来管理计划内维护。You can use the Azure CLI, Azure PowerShell or the portal to manage planned maintenance.