对在 Azure 中部署云服务时的分配失败进行故障排除Troubleshooting allocation failure when you deploy Cloud Services in Azure

总结Summary

将实例部署到云服务或者添加新的 Web 角色或辅助角色实例时,Microsoft Azure 会分配计算资源。When you deploy instances to a Cloud Service or add new web or worker role instances, Microsoft Azure allocates compute resources. 在执行这些操作时,甚至在达到 Azure 订阅限制之前,有时可能会收到错误。You may occasionally receive errors when performing these operations even before you reach the Azure subscription limits. 本文说明一些常见分配故障的原因,并建议可能的补救方法。This article explains the causes of some of the common allocation failures and suggests possible remediation. 规划服务的部署时,本信息可能也有用。The information may also be useful when you plan the deployment of your services.

如果本文未解决你的 Azure 问题,请访问 MSDN 和 CSDN 上的 Azure 论坛。If your Azure issue is not addressed in this article, visit the Azure forums on MSDN and CSDN. 可以在这些论坛上发布问题。You can post your issue in these forums. 还可提交 Azure 支持请求。You also can submit an Azure support request. 若要提交支持请求,请在 Azure 支持页上提交。To submit a support request, on the Azure support page.

背景 - 分配的工作原理Background - How allocation works

Azure 数据中心的服务器分区成群集。The servers in Azure datacenters are partitioned into clusters. 会在多个群集中尝试新的云服务分配请求。A new cloud service allocation request is attempted in multiple clusters. 将第一个实例部署到云服务时(不管是部署到过渡环境还是生产环境),都会将该云服务固定到某个群集。When the first instance is deployed to a cloud service(in either staging or production), that cloud service gets pinned to a cluster. 云服务的任何进一步部署都会发生在同一个群集。Any further deployments for the cloud service will happen in the same cluster. 在本文中,这种情况称为“固定到群集”。In this article, we'll refer to this as "pinned to a cluster". 下面的图 1 说明在多个群集中尝试进行一般分配的情况;图 2 说明固定到群集 2(因为现有的云服务 CS_1 托管于此处)的分配情况。Diagram 1 below illustrates the case of a normal allocation which is attempted in multiple clusters; Diagram 2 illustrates the case of an allocation that's pinned to Cluster 2 because that's where the existing Cloud Service CS_1 is hosted.

分配图

发生分配故障的原因Why allocation failure happens

当分配请求固定到某个群集时,由于可用的资源池仅限于某个群集,很可能找不到可用的资源。When an allocation request is pinned to a cluster, there's a higher chance of failing to find free resources since the available resource pool is limited to a cluster. 此外,如果分配请求固定到某个群集,但该群集不支持你所请求的资源类型,那么,即使该群集有可用的资源,请求仍会失败。Furthermore, if your allocation request is pinned to a cluster but the type of resource you requested is not supported by that cluster, your request will fail even if the cluster has free resource. 下图 3 说明由于唯一候选群集没有可用的资源,导致已固定的分配失败的情况。Diagram 3 below illustrates the case where a pinned allocation fails because the only candidate cluster does not have free resources. 图 4 说明由于唯一候选群集不支持所请求的 VM 大小(虽然群集有可用的资源),导致已固定的分配失败的情况。Diagram 4 illustrates the case where a pinned allocation fails because the only candidate cluster does not support the requested VM size, even though the cluster has free resources.

固定分配故障

云服务分配失败疑难解答Troubleshooting allocation failure for cloud services

错误消息Error Message

可能会看到以下错误消息:You may see the following error message:

“Azure 操作‘{operation id}’失败,代码为 Compute.ConstrainedAllocationFailed。"Azure operation '{operation id}' failed with code Compute.ConstrainedAllocationFailed. 详细信息:分配失败;无法满足请求中的约束。Details: Allocation failed; unable to satisfy constraints in request. 请求的新服务部署绑定至地缘组,或以虚拟网络为目标,或此托管服务下已经有部署。The requested new service deployment is bound to an Affinity Group, or it targets a Virtual Network, or there is an existing deployment under this hosted service. 上述任一情况都会将新的部署局限于特定的 Azure 资源。Any of these conditions constrains the new deployment to specific Azure resources. 请稍后重试,或尝试减少 VM 大小或角色实例数目。Please retry later or try reducing the VM size or number of role instances. 或者,可能的话,删除先前提到的约束,或尝试部署到不同的区域。”Alternatively, if possible, remove the aforementioned constraints or try deploying to a different region."

常见问题Common Issues

以下是造成分配请求被固定到单个群集的常见分配案例。Here are the common allocation scenarios that cause an allocation request to be pinned to a single cluster.

  • 部署到过渡槽 - 如果某个云服务在任一槽中存在部署,则会将整个云服务固定到特定的群集。Deploying to Staging Slot - If a cloud service has a deployment in either slot, then the entire cloud service is pinned to a specific cluster. 这意味着,如果生产槽中已存在部署,则只能将新的过渡部署分配到与生产槽相同的群集中。This means that if a deployment already exists in the production slot, then a new staging deployment can only be allocated in the same cluster as the production slot. 如果群集已接近容量,则请求可能失败。If the cluster is nearing capacity, the request may fail.
  • 缩放 - 将新实例添加到现有云服务时,必须在同一群集中进行分配。Scaling - Adding new instances to an existing cloud service must allocate in the same cluster. 通常可分配小型缩放请求,但情况并非总是如此。Small scaling requests can usually be allocated, but not always. 如果群集已接近容量,则请求可能失败。If the cluster is nearing capacity, the request may fail.
  • 地缘组 - 进行新的目标为空云服务的部署时,可以通过该区域任何群集中的结构对部署进行分配,除非已将云服务固定到地缘组。Affinity Group - A new deployment to an empty cloud service can be allocated by the fabric in any cluster in that region, unless the cloud service is pinned to an affinity group. 将会在相同的群集中尝试部署到相同的地缘组。Deployments to the same affinity group will be attempted on the same cluster. 如果群集已接近容量,则请求可能失败。If the cluster is nearing capacity, the request may fail.
  • 地缘组 vNet - 旧式虚拟网络已绑定到地缘组而不是区域,而这些虚拟网络中的云服务则会固定到地缘组群集。Affinity Group vNet - Older Virtual Networks were tied to affinity groups instead of regions, and cloud services in these Virtual Networks would be pinned to the affinity group cluster. 将会在固定的群集中尝试部署到此类虚拟网络。Deployments to this type of virtual network will be attempted on the pinned cluster. 如果群集已接近容量限制,则请求可能失败。If the cluster is nearing capacity, the request may fail.

解决方案Solutions

  1. 重新部署到新的云服务 - 这种解决方案很可能是最成功的,因为它允许平台从该区域的所有群集中进行选择。Redeploy to a new cloud service - This solution is likely to be most successful as it allows the platform to choose from all clusters in that region.

    • 将工作负荷部署到新的云服务Deploy the workload to a new cloud service
    • 更新 CNAME 或 A 记录,以将流量指向新的云服务Update the CNAME or A record to point traffic to the new cloud service
    • 一旦零流量流向旧站点,就可以删除旧的云服务。Once zero traffic is going to the old site, you can delete the old cloud service. 此解决方案应该不会导致停机。This solution should incur zero downtime.
  2. 删除生产槽位和过渡槽位 - 此解决方案会保留现有的 DNS 名称,但会导致应用程序停机。Delete both production and staging slots - This solution will preserve your existing DNS name, but will cause downtime to your application.

    • 请删除现有云服务的生产槽位和过渡槽位,使云服务为空,然后Delete the production and staging slots of an existing cloud service so that the cloud service is empty, and then
    • 在现有云服务中创建新部署。Create a new deployment in the existing cloud service. 这会在该区域的所有群集上重新尝试进行分配。This will re-attempt to allocation on all clusters in the region. 确保云服务未绑定到地缘组。Ensure the cloud service is not tied to an affinity group.
  3. 保留 IP - 此解决方案将保留现有 IP 地址,但会导致应用程序停机。Reserved IP - This solution will preserve your existing IP address, but will cause downtime to your application.

    • 请使用 Powershell 为现有部署创建 ReservedIPCreate a ReservedIP for your existing deployment using Powershell

      New-AzureReservedIP -ReservedIPName {new reserved IP name} -Location {location} -ServiceName {existing service name}
      
    • 按照上面的第 2 种方法进行操作,确保在服务的 CSCFG 中指定新的 ReservedIP。Follow #2 from above, making sure to specify the new ReservedIP in the service's CSCFG.

  4. 删除新部署的地缘组 - 不再建议使用地缘组。Remove affinity group for new deployments - Affinity Groups are no longer recommended. 按照上面第 1 种方法的步骤部署新的云服务。Follow steps for #1 above to deploy a new cloud service. 确保云服务不在地缘组中。Ensure cloud service is not in an affinity group.

  5. 转换为区域虚拟网络 - 请参阅如何从地缘组迁移到区域虚拟网络 (VNet)Convert to a Regional Virtual Network - See How to migrate from Affinity Groups to a Regional Virtual Network (VNet).