在 Azure 中创建、重启 VM 或重设其大小时排查分配失败Troubleshoot allocation failures when you create, restart, or resize VMs in Azure

创建虚拟机 (VM)、重启已停止(已解除分配)的 VM 或重设 VM 大小时,Azure 会为订阅分配计算资源。When you create a virtual machine (VM), restart stopped (deallocated) VMs, or resize a VM, Azure allocates compute resources to your subscription. 我们会不断投入其他基础结构和功能,确保始终提供可用于支持客户需求的所有 VM 类型。We are continually investing in additional infrastructure and features to make sure that we always have all VM types available to support customer demand. 但是,偶尔可能因特定区域中前所未有的 Azure 服务需求增长而遇到资源分配失败的情况。However, you may occasionally experience resource allocation failures because of unprecedented growth in demand for Azure services in specific regions. 当尝试在区域中创建或启动 VM,同时 VM 显示以下错误代码和消息时,会出现此问题:This problem can occur when you try to create or start VMs in a region while the VMs display the following error code and message:

错误代码:AllocationFailed 或 ZonalAllocationFailedError code: AllocationFailed or ZonalAllocationFailed

错误消息:“分配失败。Error message: "Allocation failed. 对于此区域中请求的 VM 大小,我们的容量不够。We do not have sufficient capacity for the requested VM size in this region. https://aka.ms/allocation-guidance 阅读有关提高分配成功可能性的详细信息”Read more about improving likelihood of allocation success at https://aka.ms/allocation-guidance"

本文说明一些常见分配故障的原因,并建议可能的补救方法。This article explains the causes of some of the common allocation failures and suggests possible remedies.

如果本文未解决你的 Azure 问题,请访问 Azure 支持If your Azure issue is not addressed in this article, visit the Azure support. 此外,还可以在 Azure 支持站点上提交 Azure 支持请求。Also, you can file an Azure support request on the Azure support site.

在你首选的 VM 类型在首选区域中提供前,建议遇到部署问题的客户考虑下表中的指南,作为临时解决办法。Until your preferred VM type is available in your preferred region, we advise customers who encounter deployment issues to consider the guidance in the following table as a temporary workaround.

确定最适合你的情况的方案,然后使用对应的建议解决办法重试分配请求,增加分配成功的可能性。Identify the scenario that best matches your case, and then retry the allocation request by using the corresponding suggested workaround to increase the likelihood of allocation success. 或者,始终可以稍后重试。Alternatively, you can always retry later. 这是因为群集、区域 (region) 或区域 (zone) 中已释放足够资源,可以满足你的请求。This is because enough resources may have been freed in the cluster, region, or zone to accommodate your request.

重设 VM 大小或向现有可用性集添加 VMResize a VM or add VMs to an existing availability set

原因Cause

必须在托管现有可用性集的原始群集上,尝试请求重设 VM 大小或向现有可用性集添加 VM。A request to resize a VM or add a VM to an existing availability set must be tried at the original cluster that hosts the existing availability set. 群集支持请求的 VM 大小,但群集当前可能容量不够。The requested VM size is supported by the cluster, but the cluster may not currently have sufficient capacity.

解决方法Workaround

如果 VM 可以属于不同的可用性集,请在不同的可用性集(位于相同区域)中创建 VM。If the VM can be part of a different availability set, create a VM in a different availability set (in the same region). 然后,可以将这个新的 VM 添加到相同的虚拟网络中。This new VM can then be added to the same virtual network.

停止(解除分配)同一可用性集中的所有 VM,然后重启每个 VM。Stop (deallocate) all VMs in the same availability set, then restart each one. 若要停止:单击“资源组”> [资源组] >“资源”> [可用性集] >“虚拟机”> [虚拟机] >“停止”。To stop: Click Resource groups > [your resource group] > Resources > [your availability set] > Virtual Machines > [your virtual machine] > Stop. 所有 VM 都停止后,选中第一个 VM 并单击“启动”。After all VMs stop, select the first VM, and then click Start. 此步骤可确保运行新的分配尝试,而且可以选择有足够容量的新群集。This step makes sure that a new allocation attempt is run and that a new cluster can be selected that has sufficient capacity.

重启部分停止(已解除分配)的 VMRestart partially stopped (deallocated) VMs

原因Cause

部分解除分配表示已停止(解除分配)可用性集中的一或多个(但不是全部)VM。Partial deallocation means that you stopped (deallocated) one or more, but not all, VMs in an availability set. 解除分配 VM 时会释放相关联的资源。When you deallocate a VM, the associated resources are released. 重新启动已部分解除分配的可用性集相当于向现有可用性集添加 VM。Restarting VMs in a partially deallocated availability set is the same as adding VMs to an existing availability set. 因此,必须在承载的现有可用性集的容量可能不足的原始群集上尝试分配请求。Therefore, the allocation request must be tried at the original cluster that hosts the existing availability set that may not have sufficient capacity.

解决方法Workaround

停止(解除分配)同一可用性集中的所有 VM,然后重启每个 VM。Stop (deallocate) all VMs in the same availability set, then restart each one. 若要停止:单击“资源组”> [资源组] >“资源”> [可用性集] >“虚拟机”> [虚拟机] >“停止”。To stop: Click Resource groups > [your resource group] > Resources > [your availability set] > Virtual Machines > [your virtual machine] > Stop. 所有 VM 都停止后,选中第一个 VM 并单击“启动”。After all VMs stop, select the first VM, and then click Start. 这可确保运行新的分配尝试,而且可以选择有足够容量的新群集。This will make sure that a new allocation attempt is run and that a new cluster can be selected that has sufficient capacity.

重启完全停止(已解除分配)的 VMRestart fully stopped (deallocated) VMs

原因Cause

完全解除分配表示已停止(解除分配)可用性集中的所有 VM。Full deallocation means that you stopped (deallocated) all VMs in an availability set. 发出分配请求来重新启动这些 VM 时,会以支持区域内所需大小的所有群集为目标。The allocation request to restart these VMs will target all clusters that support the desired size within the region or zone. 根据本文中的建议更改分配请求,并重试请求,以提高分配成功的机会。Change your allocation request per the suggestions in this article, and retry the request to improve the chance of allocation success.

解决方法Workaround

如果使用 Dv1、DSv1、Av1、D15v2 或 DS15v2 之类的较旧 VM 系列或大小,请考虑移到较新版本。If you use older VM series or sizes, such as Dv1, DSv1, Av1, D15v2, or DS15v2, consider moving to newer versions. 对特定的 VM 大小参阅以下建议。See these recommendations for specific VM sizes. 如果没有使用其他 VM 大小的选项,请尝试部署到同一地域中的其他区域。If you don't have the option to use a different VM size, try deploying to a different region within the same geo. 有关每个区域中可用的 VM 大小的详细信息,请访问 https://status.azure.com/statusFor more information about the available VM sizes in each region at https://status.azure.com/status

如果分配请求较大(超过 500 个内核),请参阅下节中的指南,将请求分解为较小的部署。If your allocation request is large (more than 500 cores), see the guidance in the following sections to break up the request into smaller deployments.

尝试重新部署 VMTry redeploying the VM. 重新部署 VM 会将 VM 分配到该区域中的新群集。Redeploying the VM allocates the VM to a new cluster within the region.

针对较旧 VM 大小(Av1、Dv1、DSv1、D15v2、DS15v2 等)的分配失败Allocation failures for older VM sizes (Av1, Dv1, DSv1, D15v2, DS15v2, etc.)

在扩展 Azure 基础结构的同时,我们会部署旨在支持最新虚拟机类型的更新一代硬件。As we expand Azure infrastructure, we deploy newer-generation hardware that's designed to support the latest virtual machine types. 某些较旧 VM 系列不在我们最新的基础结构上运行。Some of the older series VMs do not run on our latest generation infrastructure. 为此,客户偶尔可能遇到这些旧版 SKU 的分配失败。For this reason, customers may occasionally experience allocation failures for these legacy SKUs. 为避免此问题,建议使用旧版虚拟机系列的客户考虑按以下建议迁移至等效的新版 VM:这些 VM 已针对最新的硬件进行优化,具有更高的性价比。To avoid this problem, we encourage customers who are using legacy series virtual machines to consider moving to the equivalent newer VMs per the following recommendations: These VMs are optimized for the latest hardware and will let you take advantage of better pricing and performance.

旧版 VM 系列/大小Legacy VM-series/size 建议使用新版 VM 系列/大小Recommended newer VM-series/size 详细信息More information
Av1 系列Av1-series Av2 系列Av2-series https://azure.microsoft.com/blog/new-av2-series-vm-sizes/
Dv1 或 DSv1 系列(D1 到 D5)Dv1 or DSv1-series (D1 to D5) Dv3 或 DSv3 系列Dv3 or DSv3-series https://azure.microsoft.com/blog/introducing-the-new-dv3-and-ev3-vm-sizes/
Dv1 或 DSv1 系列(D11 到 D14)Dv1 or DSv1-series (D11 to D14) Ev3 或 ESv3 系列Ev3 or ESv3-series
D15v2 或 DS15v2D15v2 or DS15v2 如果你使用资源管理器部署模型以便充分利用更大的 VM 大小,请考虑移动到 D16v3/DS16v3 或 D32v3/DS32v3。If you are using theResource Manager deployment model in order to take advantage of the larger VM sizes, consider moving to D16v3/DS16v3 or D32v3/DS32v3. 这些为在最新硬件上运行而设计。These are designed to run on the latest generation hardware. 如果使用资源管理器部署模型以确保你的 VM 实例独立于单个客户专用的硬件,请考虑移动到新的独立 VM 大小 E64i_v3 或 E64is_v3,它们为在最新硬件上运行而设计。If you are using the Resource Manager deployment model to make sure your VM instance is isolated to hardware dedicated to a single customer, consider moving to the new isolated VM sizes, E64i_v3 or E64is_v3, which are designed to run on the latest generation hardware. https://azure.microsoft.com/blog/new-isolated-vm-sizes-now-available/

大型部署(超过 500 个内核)的分配失败Allocation failures for large deployments (more than 500 cores)

减少请求的 VM 大小的实例数,然后重试部署操作。Reduce the number of instances of the requested VM size, and then retry the deployment operation. 此外,对于大型部署,建议评估 Azure 虚拟机规模集Additionally, for larger deployments, you may want to evaluate Azure virtual machine scale sets. VM 实例数可自动增加或减少以响应需求或定义的计划,并且分配成功的可能性更大,因为部署可以分布在多个群集中。The number of VM instances can automatically increase or decrease in response to demand or a defined schedule, and you have a greater chance of allocation success because the deployments can be spread across multiple clusters.

背景信息Background information

分配的工作原理How allocation works

Azure 数据中心的服务器分区成群集。The servers in Azure datacenters are partitioned into clusters. 通常会尝试向多个群集发出分配请求,但分配请求可能带有某些约束,从而强制 Azure 平台只尝试向一个群集发出请求。Normally, an allocation request is attempted in multiple clusters, but it's possible that certain constraints from the allocation request force the Azure platform to attempt the request in only one cluster. 在本文中,这种情况称为“固定到群集”。In this article, we'll refer to this as "pinned to a cluster." 下图 1 演示了在多个群集中尝试进行一般分配的情况。Diagram 1 below illustrates the case of a normal allocation that is attempted in multiple clusters. 图 2 演示了固定到群集 2(因为现有的云服务 CS_1 或可用性集托管于此处)的分配情况。Diagram 2 illustrates the case of an allocation that's pinned to Cluster 2 because that's where the existing Cloud Service CS_1 or availability set is hosted. 分配图

发生分配失败的原因Why allocation failures happen

当分配请求固定到某个群集时,由于可用的资源池较小,很可能找不到可用的资源。When an allocation request is pinned to a cluster, there's a higher chance of failing to find free resources since the available resource pool is smaller. 此外,如果分配请求固定到某个群集,但该群集不支持你所请求的资源类型,那么,即使该群集有可用的资源,请求仍会失败。Furthermore, if your allocation request is pinned to a cluster but the type of resource you requested is not supported by that cluster, your request will fail even if the cluster has free resources. 下图 3 说明由于唯一候选群集没有可用的资源,导致已固定的分配失败的情况。The following Diagram 3 illustrates the case where a pinned allocation fails because the only candidate cluster does not have free resources. 图 4 说明由于唯一候选群集不支持所请求的 VM 大小(虽然群集有可用的资源),导致已固定的分配失败的情况。Diagram 4 illustrates the case where a pinned allocation fails because the only candidate cluster does not support the requested VM size, even though the cluster has free resources.

分配图