使用大型虚拟机规模集Working with large virtual machine scale sets

用户现在可以创建容量高达 1,000 台 VM 的 Azure 虚拟机规模集You can now create Azure virtual machine scale sets with a capacity of up to 1,000 VMs. 在本文档中,_大型虚拟机规模集_定义为能够扩展到 100 台 VM 以上的规模集。In this document, a large virtual machine scale set is defined as a scale set capable of scaling to greater than 100 VMs. 此功能通过规模集属性 (singlePlacementGroup=False) 设置。This capability is set by a scale set property (singlePlacementGroup=False).

大型规模集在某些方面(例如负载均衡和容错域)的表现不同于标准规模集。Certain aspects of large scale sets, such as load balancing and fault domains behave differently to a standard scale set. 本文档介绍了大型规模集的特征,说明了在应用程序中成功使用大型规模集需要了解的事项。This document explains the characteristics of large scale sets, and describes what you need to know to successfully use them in your applications.

部署大型的云基础结构的常用方法是创建一组_缩放单元_,例如跨多个 VNET 和存储帐户创建多个 VM 规模集。A common approach for deploying cloud infrastructure at large scale is to create a set of scale units, for example by creating multiple VMs scale sets across multiple VNETs and storage accounts. 此方法和单个 VM 相比可以提供更简单的管理,并且多个缩放单元对于许多应用程序很有益处,尤其是那些需要其他堆叠组件(如多个虚拟网络和终结点)的应用程序。This approach provides easier management compared to single VMs, and multiple scale units are useful for many applications, particularly those that require other stackable components like multiple virtual networks and endpoints. 不过,如果应用程序需要单个大型群集,则部署高达 1,000 台 VM 的单个规模集可能更直接。If your application requires a single large cluster however, it can be more straightforward to deploy a single scale set of up to 1,000 VMs. 示例方案包括:集中式大数据部署、需要对大型工作节点池进行简单管理的计算网格。Example scenarios include centralized big data deployments, or compute grids requiring simple management of a large pool of worker nodes. 用户可以将大型规模集与虚拟机规模集附加数据磁盘结合使用,通过单次操作部署包含数千 vCPU 和千万亿字节存储的可缩放基础结构。Combined with virtual machine scale set attached data disks, large scale sets enable you to deploy a scalable infrastructure consisting of thousands of vCPUs and petabytes of storage, as a single operation.

放置组Placement groups

大型 规模集之所以特别,不是因为 VM 数,而是因为其包含的_放置组_ 数。What makes a large scale set special is not the number of VMs, but the number of placement groups it contains. 放置组是类似于 Azure 可用性集的构造,具有自己的容错域和升级域。A placement group is a construct similar to an Azure availability set, with its own fault domains and upgrade domains. 默认情况下,一个规模集包含一个放置组,最大大小为 100 台 VM。By default, a scale set consists of a single placement group with a maximum size of 100 VMs. 如果将名为 singlePlacementGroup 的规模集属性设置为 false,则该规模集可以由多个放置组组成,其范围为 0-1,000 台 VM。If a scale set property called singlePlacementGroup is set to false, the scale set can be composed of multiple placement groups and has a range of 0-1,000 VMs. 设置为默认值 true 时,规模集由单个放置组组成,其范围为 0-100 台 VM。When set to the default value of true, a scale set is composed of a single placement group, and has a range of 0-100 VMs.

使用大型规模集时的核对清单Checklist for using large scale sets

若要确定应用程序能否有效使用大型规模集,请考虑以下要求:To decide whether your application can make effective use of large scale sets, consider the following requirements:

  • 如果计划部署大量 VM,可能需要提高计算 vCPU 配额限制。If you are planning to deploy large number of VMs, your Compute vCPU quota limits may need to be increased.
  • 从 Azure 市场映像创建的规模集的最大规模可以是 1,000 台 VM。Scale sets created from Azure Marketplace images can scale up to 1,000 VMs.
  • 从自定义映像(用户自己创建和上传的 VM 映射)创建的规模集目前的最大规模可以是 600 台 VM。Scale sets created from custom images (VM images you create and upload yourself) can currently scale up to 600 VMs.
  • 大型规模集需要 Azure 托管磁盘。Large scale sets require Azure Managed Disks. 不通过托管磁盘创建的规模集需要多个存储帐户(每 20 台 VM 需要一个)。Scale sets that are not created with Managed Disks require multiple storage accounts (one for every 20 VMs). 根据设计,大型规模集专用于托管磁盘,其目的是减少存储管理开销,避免遇到存储帐户订阅限制的风险。Large scale sets are designed to work exclusively with Managed Disks to reduce your storage management overhead, and to avoid the risk of running into subscription limits for storage accounts.
  • 大型规模 (SPG=false) 不支持 InfiniBand 网络Large scale (SPG=false) does not support InfiniBand networking
  • 对于由多个放置组组成的规模集,在进行第 4 层负载均衡时需要 Azure 负载均衡器标准 SKULayer-4 load balancing with scale sets composed of multiple placement groups requires Azure Load Balancer Standard SKU. 负载均衡器标准 SKU 还有其他优势,例如能够在多个规模集之间进行负载均衡。The Load Balancer Standard SKU provides additional benefits, such as the ability to load balance between multiple scale sets. 标准 SKU 还要求规模集有与之关联的网络安全组,否则 NAT 池无法正常使用。Standard SKU also requires that the scale set has a Network Security Group associated with it, otherwise NAT pools don't work correctly. 若需使用 Azure 负载均衡器基本 SKU,请确保将规模集配置为使用单个放置组,这是默认设置。If you need to use the Azure Load Balancer Basic SKU, make sure the scale set is configured to use a single placement group, which is the default setting.
  • 所有规模集均支持通过 Azure 应用程序网关进行的第 7 层负载均衡。Layer-7 load balancing with the Azure Application Gateway is supported for all scale sets.
  • 规模集按定义使用单个子网 - 请确保子网的地址空间能够容纳所需的所有 VM。A scale set is defined with a single subnet - make sure your subnet has an address space large enough for all the VMs you need. 默认情况下,规模集会进行过度预配(在部署或扩展时创建额外的 VM,免费),目的是提高部署可靠性和性能。By default a scale set overprovisions (creates extra VMs at deployment time or when scaling out, which you are not charged for) to improve deployment reliability and performance. 请额外预留 20% 的地址空间(相对于计划扩展的目标 VM 数)。Allow for an address space 20% greater than the number of VMs you plan to scale to.
  • 容错域和升级域仅在放置组内保持一致性。Fault domains and upgrade domains are only consistent within a placement group. 此体系结构不会改变规模集的总体可用性,因为 VM 在不同的物理硬件中是均衡分布的,但却意味着,如果需要保证两台 VM 位于不同的硬件中,则必须确保其位于同一放置组的不同容错域中。This architecture does not change the overall availability of a scale set, as VMs are evenly distributed across distinct physical hardware, but it does means that if you need to guarantee two VMs are on different hardware, make sure they are in different fault domains in the same placement group. 请参阅此链接:可用性选项Please refer to this link Availability options.
  • 容错域和放置组 ID 显示在规模集 VM 的_实例视图_ 中。Fault domain and placement group ID are shown in the instance view of a scale set VM.

创建大型规模集Creating a large scale set

在 Azure 门户中创建规模集时,请直接指定实例计数值(最大为 1,000)。When you create a scale set in the Azure portal, just specify the Instance count value of up to 1,000. 如果超出 100 个实例,请将“允许缩放到 100 个实例以上”设置为“是”,这样就可以缩放成多个放置组。 If it is more than 100 instances, Enable scaling beyond 100 instances will be set to Yes, which will allow it to scale to multiple placement groups.

此图像显示了 Azure 门户的“实例”边栏选项卡。

可以使用 Azure CLI az vmss create 命令创建大型虚拟机规模集。You can create a large virtual machine scale set using the Azure CLI az vmss create command. 该命令根据 instance-count 参数设置智能默认值(例如子网大小):This command sets intelligent defaults such as subnet size based on the instance-count argument:

az group create -l chinanorth2 -n biginfra
az vmss create -g biginfra -n bigvmss --image ubuntults --instance-count 1000

vmss create 命令会对某些配置值进行默认设置(如果用户未指定这些值)。The vmss create command defaults certain configuration values if you do not specify them. 若要查看可重写的选项,请尝试以下命令:To see the available options that you can override, try:

az vmss create --help

若要通过编写 Azure 资源管理器模板来创建大型规模集,请确保该模板基于 Azure 托管磁盘创建规模集。If you are creating a large scale set by composing an Azure Resource Manager template, make sure the template creates a scale set based on Azure Managed Disks. 可以在 Microsoft.Compute/virtualMachineScaleSets 资源的 properties 节将 singlePlacementGroup 属性设置为 falseYou can set the singlePlacementGroup property to false in the properties section of the Microsoft.Compute/virtualMachineScaleSets resource. 以下 JSON 片段显示了规模集模板的开头,包括 1,000 VM 容量和 "singlePlacementGroup" : false 设置:The following JSON fragment shows the beginning of a scale set template, including the 1,000 VM capacity and the "singlePlacementGroup" : false setting:

{
  "type": "Microsoft.Compute/virtualMachineScaleSets",
  "location": "chinanorth2",
  "name": "bigvmss",
  "sku": {
    "name": "Standard_DS1_v2",
    "tier": "Standard",
    "capacity": 1000
  },
  "properties": {
    "singlePlacementGroup": false,
    "upgradePolicy": {
      "mode": "Automatic"
    }

有关大型规模集模板的完整示例,请参阅 https://github.com/gbowerman/azure-myriad/blob/main/bigtest/bigbottle.jsonFor a complete example of a large scale set template, refer to https://github.com/gbowerman/azure-myriad/blob/main/bigtest/bigbottle.json.

将现有的规模集转换为跨多个放置组Converting an existing scale set to span multiple placement groups

要使现有的虚拟机规模集能够扩展到 100 个以上的 VM,需在规模集模型中将 singlePlacementGroup 属性更改为 falseTo make an existing virtual machine scale set capable of scaling to more than 100 VMs, you need to change the singlePlacementGroup property to false in the scale set model. 找到现有的规模集,选择“编辑”,并更改 singlePlacementGroup 属性。Find an existing scale set, select Edit and change the singlePlacementGroup property. 如果看不到该属性,则可能是在使用旧版 Microsoft.Compute API 查看规模集。If you do not see this property, you may be viewing the scale set with an older version of the Microsoft.Compute API.

备注

可以将规模集从仅支持单个放置组(默认行为)更改为支持多个放置组,但不能反过来进行转换。You can change a scale set from supporting a single placement group only (the default behavior) to a supporting multiple placement groups, but you cannot convert the other way around. 因此,请确保在进行转换之前了解大型规模集的属性。Therefore make sure you understand the properties of large scale sets before converting.