规划容量和缩放以便将 VMware 灾难恢复到 AzurePlan capacity and scaling for VMware disaster recovery to Azure

本文介绍在使用 Azure Site Recovery 将本地 VMware VM 和物理服务器复制到 Azure 时如何规划容量和缩放。Use this article to plan for capacity and scaling when you replicate on-premises VMware VMs and physical servers to Azure by using Azure Site Recovery.

如何开始容量规划?How do I start capacity planning?

若要了解 Azure Site Recovery 基础结构要求,请针对 VMware 复制运行 Azure Site Recovery 部署规划器,以收集有关复制环境的信息。To learn about Azure Site Recovery infrastructure requirements, gather information about your replication environment by running Azure Site Recovery Deployment Planner for VMware replication. 有关详细信息,请参阅关于用于 VMware 到 Azure 复制的 Azure Site Recovery 部署规划器For more information, see About Site Recovery Deployment Planner for VMware to Azure.

Site Recovery 部署规划器将提供一份报告,其中包含有关兼容和不兼容的 VM、每个 VM 的磁盘以及每个磁盘数据变动率的完整信息。Site Recovery Deployment Planner provides a report that has complete information about compatible and incompatible VMs, disks per VM, and data churn per disk. 该工具还将汇总符合目标 RPO 所要满足的网络带宽要求,以及成功完成复制和测试故障转移所需的 Azure 基础结构。The tool also summarizes network bandwidth requirements to meet target RPO and the Azure infrastructure that's required for successful replication and test failover.

容量注意事项Capacity considerations

组件Component 详细信息Details
复制Replication 每日最大更改率:一台受保护的计算机只能使用一个进程服务器。Maximum daily change rate: A protected machine can use only one process server. 单个进程服务器可以处理多达 2 TB 的每日更改率。A single process server can handle a daily change rate up to 2 TB. 因此,2 TB 是受保护计算机支持的每日数据更改率上限。So, 2 TB is the maximum daily data change rate that's supported for a protected machine.

最大吞吐量:在 Azure 中,一个复制的计算机可能属于一个存储帐户。Maximum throughput: A replicated machine can belong to one storage account in Azure. 一个标准 Azure 存储帐户每秒最多可以处理 20,000 个请求。A standard Azure Storage account can handle a maximum of 20,000 requests per second. 我们建议将源计算机中的每秒输入/输出操作次数 (IOPS) 限制在 20,000。We recommend that you limit the number of input/output operations per second (IOPS) across a source machine to 20,000. 例如,如果源计算机有 5 个磁盘,每个磁盘在源计算机上生成 120 IOPS(8 K 大小),则 Azure 中源计算机的每磁盘 IOPS 限制为 500。For example, if you have a source machine that has five disks and each disk generates 120 IOPS (8 K in size) on the source machine, the source machine is within the Azure per-disk IOPS limit of 500. (所需的存储帐户数等于源计算机总 IOPS 除以 20,000。)(The number of storage accounts required is equal to the total source machine IOPS divided by 20,000.)
配置服务器Configuration server 配置服务器必须能够处理受保护计算机上运行的所有工作负荷的每日更改率容量。The configuration server must be able to handle the daily change rate capacity across all workloads running on protected machines. 配置计算机必须有足够的带宽,可持续将数据复制到 Azure 存储。The configuration machine must have sufficient bandwidth to continuously replicate data to Azure Storage.

最佳做法是将配置服务器放置在与要保护的计算机相同的网络和 LAN 网段上。A best practice is to place the configuration server on the same network and LAN segment as the machines that you want to protect. 可将配置服务器放置在不同的网络中,但要保护的计算机应可通过第 3 层网络来发现它。You can place the configuration server on a different network, but machines that you want to protect should have layer 3 network visibility.

以下部分的表中汇总了配置服务器的建议大小。Size recommendations for the configuration server are summarized in the table in the following section.
进程服务器Process server 默认情况下,第一个进程服务器安装在配置服务器上。The first process server is installed by default on the configuration server. 可以部署更多的进程服务器以扩展环境。You can deploy additional process servers to scale your environment.

进程服务器从受保护计算机接收复制数据。The process server receives replication data from protected machines. 进程服务器使用缓存、压缩和加密来优化数据。The process server optimizes data by using caching, compression, and encryption. 然后,进程服务器将数据发送到 Azure。Then, the process server sends the data to Azure. 进程服务器计算机必须有足够的资源来执行这些任务。The process server machine must have sufficient resources to perform these tasks.

进程服务器使用基于磁盘的缓存。The process server uses a disk-based cache. 如果发生网络瓶颈或服务中断,可使用至少 600 GB 的独立缓存磁盘来处理存储的数据更改。Use a separate cache disk of 600 GB or more to handle data changes that are stored if a network bottleneck or outage occurs.

配置服务器和内置进程处理器的大小建议Size recommendations for the configuration server and inbuilt process server

使用内置进程服务器保护工作负荷的配置服务器可以根据以下配置最多处理 200 个虚拟机:A configuration server that uses an inbuilt process server to protect the workload can handle up to 200 virtual machines based on the following configurations:

CPUCPU 内存Memory 缓存磁盘大小Cache disk size 数据更改率Data change rate 受保护的计算机Protected machines
8 个 vCPU(2 个插槽 * 4 个核心 @ 2.5 GHz)8 vCPUs (2 sockets * 4 cores @ 2.5 GHz) 16 GB16 GB 300 GB300 GB 500 GB 或更少500 GB or less 用于复制 100 台以下的计算机。Use to replicate fewer than 100 machines.
12 个 vCPU(2 个插槽 * 6 个核心 @ 2.5 GHz)12 vCPUs (2 sockets * 6 cores @ 2.5 GHz) 18 GB18 GB 600 GB600 GB 501 GB 到 1 TB501 GB to 1 TB 用于复制 100 到 150 台计算机。Use to replicate 100 to 150 machines.
16 个 vCPU(2 个插槽 * 8 个核心 @ 2.5 GHz)16 vCPUs (2 sockets * 8 cores @ 2.5 GHz) 32 GB32 GB 1 TB1 TB >1 TB 到 2 TB>1 TB to 2 TB 用于复制 151 到 200 台计算机。Use to replicate 151 to 200 machines.
使用 OVF 模板部署另一个配置服务器。Deploy another configuration server by using an OVF template. 如果要复制 200 台以上的计算机,请部署新的配置服务器。Deploy a new configuration server if you're replicating more than 200 machines.
部署另一个进程服务器Deploy another process server. >2 TB>2 TB 如果每日总数据更改率超过 2 TB,请部署新的横向扩展进程服务器。Deploy a new scale-out process server if the overall daily data change rate is greater than 2 TB.

在这些配置中:In these configurations:

  • 每台源计算机有 3 个 100 GB 的磁盘。Each source machine has three disks of 100 GB each.
  • 我们使用了基准测试存储来测量缓存磁盘,该存储由 8 个采用 RAID 10 配置的 10K RPM 共享访问签名驱动器组成。We used benchmarking storage of eight shared access signature drives of 10 K RPM with RAID 10 for cache disk measurements.

进程服务器的建议大小Size recommendations for the process server

进程服务器是处理 Azure Site Recovery 中数据复制的组件。The process server is the component that handles data replication in Azure Site Recovery. 如果每日更改率大于 2 TB,则必须添加一个横向扩展进程服务器来处理复制负载。If the daily change rate is greater than 2 TB, you must add scale-out process servers to handle the replication load. 若要扩大,可执行以下操作:To scale out, you can:

  • 增加通过 OVF 模板部署的配置服务器数目。Increase the number of configuration servers by deploying by using an OVF template. 例如,可以使用两个配置服务器来保护最多 400 台计算机。For example, you can protect up to 400 machines by using two configuration servers.
  • 添加横向扩展进程服务器Add scale-out process servers. 使用横向扩展进程服务器处理复制流量,而无需添加配置服务器。Use the scale-out process servers to handle replication traffic instead of (or in addition to) the configuration server.

下表描述了此方案:The following table describes this scenario:

  • 设置了一个横向扩展进程服务器。You set up a scale-out process server.
  • 已将受保护的虚拟机配置为使用横向扩展进程服务器。You configured protected virtual machines to use the scale-out process server.
  • 每台受保护的源计算机有 3 个 100 GB 的磁盘。Each protected source machine has three disks of 100 GB each.
其他进程服务器Additional process server 缓存磁盘大小Cache disk size 数据更改率Data change rate 受保护的计算机Protected machines
4 个 vCPU(2 个插槽 * 2 个核心 @ 2.5 GHz),8 GB 内存4 vCPUs (2 sockets * 2 cores @ 2.5 GHz), 8 GB of memory 300 GB300 GB 250 GB 或更少250 GB or less 用于复制 85 台或更少的计算机。Use to replicate 85 or fewer machines.
8 个 vCPU(2 个插槽 * 4 个核心 @ 2.5 GHz),12 GB 内存8 vCPUs (2 sockets * 4 cores @ 2.5 GHz), 12 GB of memory 600 GB600 GB 251 GB 到 1 TB251 GB to 1 TB 用于复制 86 到 150 台计算机。Use to replicate 86 to 150 machines.
12 个 vCPU(2 个插槽 * 6 个核心 @ 2.5 GHz),24 GB 内存12 vCPUs (2 sockets * 6 cores @ 2.5 GHz) 24 GB of memory 1 TB1 TB >1 TB 到 2 TB>1 TB to 2 TB 用于复制 151 到 225 台计算机。Use to replicate 151 to 225 machines.

如何缩放服务器取决于是偏好纵向扩展模型还是横向扩展模型。How you scale your servers depends on your preference for a scale-up or scale-out model. 若要纵向扩展,请部署一些高端配置服务器和进程服务器。To scale up, deploy a few high-end configuration servers and process servers. 若要横向扩展,请部署具有更少资源的更多服务器。To scale out, deploy more servers that have fewer resources. 例如,如果需要对每日总数据更改率为 1.5 TB 的 200 台计算机进行保护,可执行以下操作之一:For example, if you want to protect 200 machines with an overall daily data change rate of 1.5 TB, you could take one of the following actions:

  • 设置单个进程服务器(16 个 vCPU,24 GB RAM)。Set up a single process server (16 vCPU, 24 GB of RAM).
  • 设置 2 个进程服务器(2 x 8 个 vCPU,2 * 12 GB RAM)。Set up two process servers (2 x 8 vCPU, 2* 12 GB of RAM).

控制网络带宽Control network bandwidth

使用 Site Recovery 部署规划器计算复制(分别为初始复制和增量复制)所需的带宽后,可以使用多个选项来控制用于复制的带宽量:After you use Site Recovery Deployment Planner to calculate the bandwidth you need for replication (initial replication and then the delta), you have a couple of options for controlling the amount of bandwidth that's used for replication:

  • 限制带宽:复制到 Azure 的 VMware 流量会经过特定的进程服务器。Throttle bandwidth: VMware traffic that replicates to Azure goes through a specific process server. 可以在作为进程服务器运行的计算机上限制带宽。You can throttle bandwidth on the machines that are running as process servers.
  • 控制带宽:可以使用几个注册表项来控制用于复制的带宽:Influence bandwidth: You can influence the bandwidth that's used for replication by using a couple of registry keys:
    • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication\UploadThreadsPerVM 注册表值指定用于磁盘数据传输(初始或增量复制)的线程数。The HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication\UploadThreadsPerVM registry value specifies the number of threads that are used for data transfer (initial or delta replication) of a disk. 使用较大的值会增加复制所用的网络带宽。A higher value increases the network bandwidth that's used for replication.
    • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication\DownloadThreadsPerVM 注册表值指定故障回复期间用于数据传输的线程数。The HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication\DownloadThreadsPerVM registry value specifies the number of threads that are used for data transfer during failback.

限制带宽Throttle bandwidth

  1. 在用作进程服务器的计算机上打开 Azure 备份 MMC 管理单元。Open the Azure Backup MMC snap-in on the machine you use as the process server. 默认情况下,备份的快捷方式位于桌面上或在以下文件夹中:C:\Program Files\Microsoft Azure Recovery Services Agent\bin。By default, a shortcut for Backup is available on the desktop or in the following folder: C:\Program Files\Microsoft Azure Recovery Services Agent\bin.

  2. 在该管理单元中,选择“更改属性”。 In the snap-in, select Change Properties.

    用于更改属性的 Azure 备份 MMC 管理单元选项的屏幕截图

  3. 在“限制” 选项卡上,选择“为备份操作启用 Internet 带宽使用限制” 。On the Throttling tab, select Enable internet bandwidth usage throttling for backup operations. 设置工作和非工作小时数限制。Set the limits for work and non-work hours. 有效范围为 512 Kbps 到 1,023 Mbps。Valid ranges are from 512 Kbps to 1,023 Mbps.

    “Azure 备份属性”对话框的屏幕截图

也可以使用 Set-OBMachineSetting cmdlet 来设置限制。You can also use the Set-OBMachineSetting cmdlet to set throttling. 下面是一个示例:Here's an example:

$mon = [System.DayOfWeek]::Monday
$tue = [System.DayOfWeek]::Tuesday
Set-OBMachineSetting -WorkDay $mon, $tue -StartWorkHour "9:00:00" -EndWorkHour "18:00:00" -WorkHourBandwidth  (512*1024) -NonWorkHourBandwidth (2048*1024)

Set-OBMachineSetting -NoThrottle 表示不需要限制。Set-OBMachineSetting -NoThrottle indicates that no throttling is required.

更改 VM 的网络带宽Alter the network bandwidth for a VM

  1. 在 VM 的注册表中,转到 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\ReplicationIn the VM's registry, go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication.
    • 若要更改复制磁盘上的带宽流量,请修改 UploadThreadsPerVM 的值。To alter the bandwidth traffic on a replicating disk, modify the value of UploadThreadsPerVM. 如果该项不存在,请创建该项。Create the key if it doesn't exist.
    • 若要更改用于从 Azure 故障回复流量的带宽,请修改 DownloadThreadsPerVM 的值。To alter the bandwidth for failback traffic from Azure, modify the value of DownloadThreadsPerVM.
  2. 每个项的默认值为 4The default value for each key is 4. 在“过度预配型”网络中,这些注册表项需要更改,不能使用默认值。In an "overprovisioned" network, these registry keys should be changed from the default values. 可以使用的最大值为 32The maximum value you can use is 32. 监视流量以优化值。Monitor traffic to optimize the value.

将 Azure Site Recovery 基础结构设置为保护 500 个以上的 VMSet up the Site Recovery infrastructure to protect more than 500 VMs

在设置 Site Recovery 基础结构之前,请访问环境以测量以下因素:兼容的虚拟机、每日数据更改率、实现 RPO 所要提供的网络带宽、所需的 Site Recovery 组件数,以及完成初始复制所需的时间。Before you set up the Site Recovery infrastructure, access the environment to measure the following factors: compatible virtual machines, the daily data change rate, the required network bandwidth for the RPO you want to achieve, the number of Site Recovery components that are required, and the time it takes to complete the initial replication. 完成以下步骤以收集所需的信息:Complete the following steps to gather the required information:

  1. 若要测量这些参数,请在环境中运行 Site Recovery 部署规划器。To measure these parameters, run Site Recovery Deployment Planner on your environment. 如需指导帮助,请参阅关于用于 VMware 到 Azure 复制的 Site Recovery 部署规划器For helpful guidelines, see About Site Recovery Deployment Planner for VMware to Azure.
  2. 部署符合配置服务器的大小建议的配置服务器。Deploy a configuration server that meets the size recommendations for the configuration server. 如果生产工作负荷超过 650 个虚拟机,请部署另一个配置服务器。If your production workload exceeds 650 virtual machines, deploy another configuration server.
  3. 根据测得的每日数据更改率,借助大小指导部署横向扩展进程服务器Based on the measured daily data change rate, deploy scale-out process servers with the help of size guidelines.
  4. 如果预期磁盘虚拟机的数据更改率超过 2 MBps,请确保使用高级托管磁盘。If you expect the data change rate for a disk virtual machine to exceed 2 MBps, ensure that you use premium managed disks. Site Recovery 部署规划器运行特定的一段时间。Site Recovery Deployment Planner runs for a specific time period. 报告中可能不会捕获其他时间段的数据更改率峰值。Peaks in the data change rate at other times might not be captured in the report.
  5. 请根据所要实现的 RPO 设置网络带宽Set the network bandwidth based on the RPO you want to achieve.
  6. 设置基础结构时,请为工作负荷启用灾难恢复。When the infrastructure is set up, enable disaster recovery for your workload. 有关操作方法,请参阅为 VMware 到 Azure 的复制设置源环境To learn how, see Set up the source environment for VMware to Azure replication.

部署额外的进程服务器Deploy additional process servers

如果将部署扩展到 200 台以上的源计算机,或者每日总变动率超过 2 TB,则必须添加进程服务器来处理流量。If you scale out your deployment beyond 200 source machines or if you have a total daily churn rate of more than 2 TB, you must add process servers to handle the traffic volume. 我们在 9.24 版本中增强了产品,以便提供有关何时设置横向扩展进程服务器的进程服务器警报We have enhanced the product in 9.24 version to provide process server alerts on when to set up a scale-out process server. 设置进程服务器以保护新的源计算机或均衡负载Set up the process server to protect new source machines or balance the load.

对计算机进行迁移,以使用新的进程服务器Migrate machines to use the new process server

  1. 选择“设置” > “Site Recovery 服务器”。 Select Settings > Site Recovery servers. 选择配置服务器,然后展开“进程服务器”。 Select the configuration server, and then expand Process servers.

    “进程服务器”对话框的屏幕截图

  2. 右键单击当前正在使用的进程服务器,然后选择“切换”。 Right-click the process server currently in use, and then select Switch.

    “配置服务器”对话框的屏幕截图

  3. 在“选择目标进程服务器”中,选择要使用的新进程服务器。 In Select target process server, select the new process server you want to use. 然后选择该服务器将要处理的虚拟机。Then, select the virtual machines that the server will handle. 若要获取有关该服务器的信息,请选择信息图标。To get information about the server, select the information icon. 为了帮助你做出负载决策,随后会显示将每个所选虚拟机复制到新进程服务器所需的平均空间。To help you make load decisions, the average space that's required to replicate each selected virtual machine to the new process server is shown. 请选择勾选标记开始复制到新的进程服务器。Select the check mark to begin replicating to the new process server.

部署其他主目标服务器Deploy additional master target servers

在以下情况下,需要多个主目标服务器:In the following scenarios, more than one master target server is required:

  • 要保护基于 Linux 的虚拟机。You want to protect a Linux-based virtual machine.
  • 配置服务器上的主目标服务器无法访问 VM 的数据存储。The master target server available on the configuration server doesn't have access to the datastore of the VM.
  • 主目标服务器上的磁盘总数(服务器上的本地磁盘数加上要保护的磁盘数)超过 60。The total number of disks on the master target server (the number of local disks on server plus the number of disks to be protected) is greater than 60 disks.

若要了解如何为基于 Linux 的虚拟机添加主目标服务器,请参阅安装用于故障回复的 Linux 主目标服务器To learn how to add a master target server for a Linux-based virtual machine, see Install a Linux master target server for failback.

为基于 Windows 的虚拟机添加主目标服务器:To add a master target server for a Windows-based virtual machine:

  1. 转到“恢复服务保管库” > “Site Recovery 基础结构” > “配置服务器” 。Go to Recovery Services Vault > Site Recovery Infrastructure > Configuration servers.

  2. 选择所需的配置服务器,然后选择“主目标服务器”。 Select the required configuration server, and then select Master Target Server.

    显示“添加主目标服务器”按钮的屏幕截图

  3. 下载统一安装程序文件并在 VM 上运行该文件,以安装主目标服务器。Download the unified setup file, and then run the file on the VM to set up the master target server.

  4. 选择“安装主目标” > “下一步” 。Select Install master target > Next.

    显示“安装主目标”选项的屏幕截图

  5. 选择默认安装位置,然后选择“安装”。 Select the default installation location, and then select Install.

    显示默认安装位置的屏幕截图

  6. 若要将主目标注册到配置服务器,请选择“转到配置”。 To register the master target with the configuration server, select Proceed To Configuration.

    显示“转到配置”按钮的屏幕截图

  7. 输入配置服务器的 IP 地址,然后输入通行短语。Enter the IP address of the configuration server, and then enter the passphrase. 若要了解如何生成通行短语,请参阅生成配置服务器通行短语To learn how to generate a passphrase, see Generate a configuration server passphrase.

    显示配置服务器 IP 地址和通行短语输入位置的屏幕截图

  8. 选择“注册” 。Select Register. 完成注册后,选择“完成”。 When registration is finished, select Finish.

注册成功完成后,该服务器将在 Azure 门户上的“恢复服务保管库” > “Site Recovery 基础结构” > “配置服务器”中列出,在配置服务器的主目标服务器列表中可以找到它。 When registration finishes successfully, the server is listed in the Azure portal at Recovery Services Vault > Site Recovery Infrastructure > Configuration servers, in the master target servers of the configuration server.

后续步骤Next steps

下载并运行 Site Recovery 部署规划器Download and run Site Recovery Deployment Planner.