Azure Stack 计算容量规划Azure Stack compute capacity planning

Azure Stack 上支持的 VM 大小是在 Azure 上支持的 VM 大小的子集。The VM sizes supported on Azure Stack are a subset of those supported on Azure. Azure 在多方面施加资源限制,以避免资源(服务器本地和服务级别)的过度消耗。Azure imposes resource limits along many vectors to avoid overconsumption of resources (server local and service-level). 如果未对租户使用资源施加一些限制,则当一些租户过度使用资源时,另一些租户的体验就会变差。Without imposing some limits on tenant consumption, the tenant experiences will suffer when other tenants overconsume resources. VM 的网络出口在 Azure Stack 上有与 Azure 限制一致的带宽上限。For networking egress from the VM, there are bandwidth caps in place on Azure Stack that match Azure limitations. 就存储资源来说,在 Azure Stack 上实施存储 IOPS 限制可以避免租户因访问存储而造成资源过度使用。For storage resources, storage IOPs limits have been implemented on Azure Stack to avoid basic overconsumption of resources by tenants for storage access.

VM 放置以及虚拟核心与物理核心的比率预配过度VM placement and virtual to physical core overprovisioning

在 Azure Stack 中,租户无法将特定的服务器指定用于 VM 放置。In Azure Stack, there is no way for a tenant to specify a specific server to use for VM placement. 放置 VM 时,唯一的考虑是主机上是否有足够的内存用于该 VM 类型。The only consideration when placing VMs is whether there is enough memory on the host for that VM type. Azure Stack 不允许过度使用内存,但允许过度使用核心数。Azure Stack does not overcommit memory; however, an overcommit of the number of cores is allowed. 由于放置算法不将现在的虚拟核心与物理核心的预配过度比率视为一个因素,因此每个主机可以有不同的比率。Since placement algorithms do not look at the existing virtual to physical core overprovisioning ratio as a factor, each host could have a different ratio.

在 Azure 中,为了实现多 VM 生产系统的高可用性,可以将 VM 置于横跨多个容错域的可用性集中。In Azure, to achieve high availability of a multi-VM production system, VMs are placed in an availability set to be spread across multiple fault domains. 在 Azure Stack 中,可用性集中的容错域定义为缩放单元中的单个节点。In Azure Stack, a fault domain in an availability set is defined as a single node in the scale unit.

在发生硬件故障时,虽然 Azure Stack 的基础结构具备故障还原能力,但基础技术(故障转移群集功能)的局限仍会导致受影响物理服务器上的 VM 出现停机。While the infrastructure of Azure Stack is resilient to failures, the underlying technology (failover clustering) still incurs some downtime for VMs on an impacted physical server in the event of a hardware failure. 目前,为了与 Azure 保持一致,Azure Stack 支持的可用性集最多有三个容错域。Currently, Azure Stack supports having an availability set with a maximum of three fault domains to be consistent with Azure. 置于可用性集中的 VM 在物理上是彼此隔离的,换句话说,会尽可能均衡地让其分散到多个容错域(Azure Stack 节点)中。VMs placed in an availability set will be physically isolated from each other by spreading them as evenly as possible over multiple fault domains (Azure Stack nodes). 出现硬件故障时,发生故障的容错域中的 VM 会在其他节点中重启,但在将其置于容错域中时,会尽可能让其与同一可用性集中的其他 VM 隔离。If there is a hardware failure, VMs from the failed fault domain will be restarted in other nodes, but, if possible, kept in separate fault domains from the other VMs in the same availability set. 当硬件重新联机时,会对 VM 重新进行均衡操作,以维持高可用性。When the hardware comes back online, VMs will be rebalanced to maintain high availability.

另一个由 Azure 用于提供高可用性的概念采用可用性集中的更新域的形式。Another concept that is used by Azure to provide high availability is in the form of update domains in availability sets. 更新域是可以同时维护或重新启动的基础硬件逻辑组。An update domain is a logical group of underlying hardware that can undergo maintenance or be rebooted at the same time. 在 Azure Stack 中,VM 会先跨群集中的其他联机主机进行实时迁移,然后其基础主机才会进行更新。In Azure Stack, VMs are live migrated across the other online hosts in the cluster before their underlying host is updated. 由于在主机更新期间不会造成租户停机,因此 Azure Stack 上存在更新域功能只是为了确保与 Azure 实现模板兼容。Since there is no tenant downtime during a host update, the update domain feature on Azure Stack only exists for template compatibility with Azure.

Azure Stack 可恢复性资源Azure Stack resiliency resources

为了对 Azure Stack 集成系统进行修补和更新,以及为了确保物理硬件故障时的复原能力,系统需保留总服务器内存的一部分,该部分不可用于租户虚拟机 (VM) 放置。To allow for patch and update of an Azure Stack integrated system, and to be resilient to physical hardware failures, a portion of the total server memory is reserved and unavailable for tenant virtual machine (VM) placement.

如果某个服务器故障,托管在故障服务器上的 VM 会在其余的可用服务器上重启,以确保 VM 可用性。If a server fails, VMs hosted on the failed server will be restarted on remaining, available servers to provide for VM availability. 类似地,在修补和更新过程中,在某个服务器上运行的 VM 会实时迁移到其他可用服务器。Similarly, during the patch and update process, all VMs running on a server will be live migrated to other available, server. 这种 VM 管理或移动能够实现的前提是存在保留容量,允许重启或迁移发生。This VM management or movement can only be achieved if there is reserved capacity to allow for the restart or migration to occur.

以下计算会生成一个可用于租户 VM 放置的总可用内存。The following calculation results in the total, available memory that can be used for tenant VM placement. 该内存容量适用于整个 Azure Stack 缩放单元。This memory capacity is for the entirety of the Azure Stack Scale Unit.

VM 放置的可用内存 = 服务器总内存 - 复原保留 - 运行 VM 所使用的内存 - Azure Stack 基础结构开销1Available Memory for VM placement = Total Server Memory - Resiliency Reserve - Memory used by running VMs - Azure Stack Infrastructure Overhead 1

复原保留 = H + R * ((N-1) * H) + V * (N-2)Resiliency reserve = H + R * ((N-1) * H) + V * (N-2)

其中:Where:

  • H = 单服务器内存的大小H = Size of single server memory
  • N = 缩放单元的大小(服务器数)N = Size of Scale Unit (number of servers)
  • R = 操作系统针对 OS 开销的保留2R = Operating system reserve for OS overhead2
  • V = 缩放单元中的最大 VMV = Largest VM in the scale unit

1 Azure Stack 基础结构开销 = 230 GB1 Azure Stack Infrastructure Overhead = 230 GB

2 操作系统针对开销的保留 = 15% 的节点内存。2 Operating system reserve for overhead = 15% of node memory. 操作系统保留值是一个估计值,具体取决于服务器的物理内存容量和常规的操作系统开销。The operating system reserve value is an estimate and will vary based on the physical memory capacity of the server and general operating system overhead.

值 V(缩放单元中的最大 VM)是动态变化的,具体取决于最大的租户 VM 内存大小。The value V, largest VM in the scale unit, is dynamically based on the largest tenant VM memory size. 例如,最大 VM 值可能是 7 GB 或 112 GB,或者是 Azure Stack 解决方案中任何其他受支持的 VM 内存大小。For example, the largest VM value could be 7 GB or 112 GB or any other supported VM memory size in the Azure Stack solution.

上述计算是一个估计值,可能因 Azure Stack 的当前版本而异。The above calculation is an estimate and subject to change based on the current version of Azure Stack. 能否部署租户 VM 和服务取决于已部署解决方案的具体情况。Ability to deploy tenant VMs and services is based on the specifics of the deployed solution. 此示例计算仅作参考,不是能否部署 VM 的绝对标准。This example calculation is just a guide and not the absolute answer of the ability to deploy VMs.

后续步骤Next steps

存储容量规划Storage capacity planning