优化 Lsv2 系列虚拟机上的性能Optimize performance on the Lsv2-series virtual machines

Lsv2 系列虚拟机在广泛的应用程序和行业中支持需要在本地存储上实现高 I/O 和高吞吐量的各种工作负荷。Lsv2-series virtual machines support a variety of workloads that need high I/O and throughput on local storage across a wide range of applications and industries. Lsv2 系列最适用于大数据、SQL、NoSQL 数据库、数据仓库和大型事务数据库,其中包括 Cassandra、MongoDB、Cloudera 和 Redis。The Lsv2-series is ideal for Big Data, SQL, NoSQL databases, data warehousing and large transactional databases, including Cassandra, MongoDB, Cloudera, and Redis.

Lsv2 系列虚拟机 (VM) 的设计最充分地利用了 AMD EPYC™ 7551 处理器,以在处理器、内存、NVMe 设备和 VM 等方面提供最佳性能。The design of the Lsv2-series Virtual Machines (VMs) maximizes the AMD EPYC™ 7551 processor to provide the best performance between the processor, memory, NVMe devices, and the VMs. 通过与 Linux 方面的合作伙伴合作,我们在 Azure 市场中提供了几个针对 Lsv2 系列性能进行了优化的版本,目前有:Working with partners in Linux, several builds are available Azure Marketplace that are optimized for Lsv2-series performance and currently include:

  • Ubuntu 18.04Ubuntu 18.04

  • Ubuntu 16.04Ubuntu 16.04

  • Debian 9Debian 9

  • Debian 10Debian 10

本文提供了一些技巧和建议来确保你的工作负荷和应用程序达到 VM 中设计的最高性能。This article provides tips and suggestions to ensure your workloads and applications achieve the maximum performance designed into the VMs. 随着更多 Lsv2 优化映像添加到 Azure 市场中,此页面上的信息将不断更新。The information on this page will be continuously updated as more Lsv2 optimized images are added to the Azure Marketplace.

AMD EYPC™ 芯片组体系结构AMD EYPC™ chipset architecture

Lsv2 系列 VM 使用基于 Zen 微体系结构的 AMD EYPC™ 服务器处理器。Lsv2-series VMs use AMD EYPC™ server processors based on the Zen microarchitecture. AMD 为 EYPC™ 开发了 Infinity Fabric (IF) 作为其 NUMA 模型的可缩放互连,该模型可用于芯片内通信、封装内通信以及多封装通信。AMD developed Infinity Fabric (IF) for EYPC™ as scalable interconnect for its NUMA model that could be used for on-die, on-package, and multi-package communications. 与 Intel 现代单片处理器上使用的 QPI(快速路径互连)和 UPI(超高性能路径互连)相比,AMD 的多 NUMA 小型芯片体系结构可能同时带来了性能优势和挑战。Compared with QPI (Quick-Path Interconnect) and UPI (Ultra-Path Interconnect) used on Intel modern monolithic-die processors, AMD's many-NUMA small-die architecture may bring both performance benefits as well as challenges. 内存带宽和延迟约束的实际影响取决于所运行的工作负荷的类型。The actual impact of memory bandwidth and latency constraints could vary depending on the type of workloads running.

最大程度地提高性能的技巧Tips to maximize performance

  • 如果要为工作负荷上传自定义的 Linux GuestOS,请注意,默认情况下,加速网络将处于关闭状态。If you are uploading a custom Linux GuestOS for your workload, note that Accelerated Networking will be OFF by default. 如果打算启用加速网络以获得最佳性能,请在创建 VM 时启用它。If you intend to enable Accelerated Networking, enable it at the time of VM creation for best performance.

  • 为 Lsv2 系列 VM 提供支持的硬件利用了具有八个 I/O 队列对 (QP) 的 NVMe 设备。The hardware that powers the Lsv2-series VMs utilizes NVMe devices with eight I/O Queue Pairs (QP)s. 每个 NVMe 设备 I/O 队列实际上是一对:一个提交队列和一个完成队列。Every NVMe device I/O queue is actually a pair: a submission queue and a completion queue. NVMe 驱动程序经过设置,可以按轮循机制计划来分发 I/O,以便优化这八个 I/O QP 的利用率。The NVMe driver is set up to optimize the utilization of these eight I/O QPs by distributing I/O's in a round robin schedule. 为了获得最高性能,每个设备应相应地运行八个作业。To gain max performance, run eight jobs per device to match.

  • 在存在活动工作负荷期间,避免将 NVMe 管理命令(例如,NVMe SMART 信息查询等)与 NVMe I/O 命令混合使用。Avoid mixing NVMe admin commands (for example, NVMe SMART info query, etc.) with NVMe I/O commands during active workloads. Lsv2 NVMe 设备由 Hyper-V NVMe Direct 技术提供支持,只要有任何 NVMe 管理命令挂起,该技术就会切换到“慢速模式”。Lsv2 NVMe devices are backed by Hyper-V NVMe Direct technology, which switches into "slow mode" whenever any NVMe admin commands are pending. 如果发生这种情况,Lsv2 用户可能会看到 NVMe I/O 性能显著下降。Lsv2 users could see a dramatic performance drop in NVMe I/O performance if that happens.

  • Lsv2 用户不应依赖于在 VM 内针对数据驱动器报告的设备 NUMA 信息(全为 0)来确定其应用的 NUMA 关联。Lsv2 users should not rely on device NUMA information (all 0) reported from within the VM for data drives to decide the NUMA affinity for their apps. 为了获得更好的性能,建议尽可能将工作负荷分散到多个 CPU。The recommended way for better performance is to spread workloads across CPUs if possible.

  • 对于 Lsv2 VM NVMe 设备,每个 I/O 队列对支持的最大队列深度为 1024(而Amazon i3 QD 限制为 32)。The maximum supported queue depth per I/O queue pair for Lsv2 VM NVMe device is 1024 (vs. Amazon i3 QD 32 limit). Lsv2 用户应将其(合成)基准工作负荷限制为队列深度 1024 或更低,以避免触发队列填满的情况,发生这种情况会降低性能。Lsv2 users should limit their (synthetic) benchmarking workloads to queue depth 1024 or lower to avoid triggering queue full conditions, which can reduce performance.

利用本地 NVMe 存储Utilizing local NVMe storage

所有 Lsv2 VM 的 1.92 TB NVMe 磁盘上的本地存储都是暂时的。Local storage on the 1.92 TB NVMe disk on all Lsv2 VMs is ephemeral. 如果以标准方式重启 VM 成功,则本地 NVMe 磁盘上的数据会保留。During a successful standard reboot of the VM, the data on the local NVMe disk will persist. 如果对 VM 执行了重新部署、解除分配或删除操作,则不会在 NVMe 上持久保留数据。The data will not persist on the NVMe if the VM is redeployed, de-allocated, or deleted. 如果其他问题导致 VM 或运行它的硬件出现故障,则数据不会持久保留。Data will not persist if another issue causes the VM, or the hardware it is running on, to become unhealthy. 发生这种情况时,系统会安全地擦除旧主机上的任何数据。When this happens, any data on the old host is securely erased.

在某些情况下,可能需要将 VM 移到其他主机,例如,在执行计划内维护操作期间这样做。There will also be cases when the VM needs to be moved to a different host machine, for example, during a planned maintenance operation. 可以通过 Scheduled Events 来预见计划内维护操作和某些硬件故障。Planned maintenance operations and some hardware failures can be anticipated with Scheduled Events. 应该使用 Scheduled Events 来及时了解任何预计的维护和恢复操作。Scheduled Events should be used to stay updated on any predicted maintenance and recovery operations.

如果计划内维护事件要求在具有空的本地磁盘的新主机上重新创建 VM,则需要重新同步数据(同样,旧主机上的任何数据都会被安全地擦除)。In the case that a planned maintenance event requires the VM to be recreated on a new host with empty local disks, the data will need to be resynchronized (again, with any data on the old host being securely erased). 出现这种情况的原因是,Lsv2 系列 VM 当前不支持本地 NVMe 磁盘上的实时迁移。This occurs because Lsv2-series VMs do not currently support live migration on the local NVMe disk.

计划内维护有两种模式。There are two modes for planned maintenance.

标准 VM 客户控制的维护Standard VM customer-controlled maintenance

  • VM 会在 30 天时间窗口内移到已更新的主机。The VM is moved to an updated host during a 30-day window.
  • Lsv2 本地存储数据可能会丢失,因此建议在事件发生前备份数据。Lsv2 local storage data could be lost, so backing-up data prior to the event is recommended.

自动维护Automatic maintenance

  • 如果客户不执行客户控制的维护,或者出现了紧急情况(如零日漏洞安全事件),则会进行自动维护。Occurs if the customer does not execute customer-controlled maintenance, or in the event of emergency procedures such as a security zero-day event.
  • 目的是保留客户数据,但存在 VM 冻结或重启风险,不过风险很小。Intended to preserve customer data, but there is a small risk of a VM freeze or reboot.
  • Lsv2 本地存储数据可能会丢失,因此建议在事件发生前备份数据。Lsv2 local storage data could be lost, so backing-up data prior to the event is recommended.

对于即将发生的任何服务事件,请使用受控的维护过程选择对你而言最方便的更新时间。For any upcoming service events, use the controlled maintenance process to select a time most convenient to you for the update. 在事件发生之前,你可以将数据备份到高级存储中。Prior to the event, you may back up your data in premium storage. 在维护事件完成后,可以将数据返回到刷新后的 Lsv2 VM 本地 NVMe 存储。After the maintenance event completes, you can return your data to the refreshed Lsv2 VMs local NVMe storage.

在本地 NVMe 磁盘上维护数据的场景包括:Scenarios that maintain data on local NVMe disks include:

  • VM 正在运行且运行正常。The VM is running and healthy.
  • VM(被你或 Azure)就地重启。The VM is rebooted in place (by you or Azure).
  • VM 已暂停(已停止,未解除分配)。The VM is paused (stopped without de-allocation).
  • 大部分计划内维护服务操作。The majority of the planned maintenance servicing operations.

安全擦除数据以保护客户的场景包括:Scenarios that securely erase data to protect the customer include:

  • VM 被(你)重新部署、停止(解除分配)或删除。The VM is redeployed, stopped (de-allocated), or deleted (by you).
  • VM 会变得不正常,并且由于硬件问题,必须通过服务修复移到另一节点。The VM becomes unhealthy and has to service heal to another node due to a hardware issue.
  • 要求将 VM 重新分配到另一台主机进行维护的少量计划内维护服务操作。A small number of the planned maintenance servicing operations that requires the VM to be reallocated to another host for servicing.

若要详细了解在本地存储中备份数据的选项,请参阅 Azure IaaS 磁盘的备份和灾难恢复To learn more about options for backing up data in local storage, see Backup and disaster recovery for Azure IaaS disks.

常见问题Frequently asked questions

  • 如何开始部署 Lsv2 系列 VM?How do I start deploying Lsv2-series VMs?
    与任何其他 VM 一样,请使用门户Azure CLIPowerShell 来创建 VM。Much like any other VM, use the Portal, Azure CLI, or PowerShell to create a VM.

  • 单个 NVMe 磁盘故障是否会导致主机上的所有 VM 都出现故障?Will a single NVMe disk failure cause all VMs on the host to fail?
    如果在硬件节点上检测到磁盘故障,则硬件处于故障状态。If a disk failure is detected on the hardware node, the hardware is in a failed state. 出现这种情况时,系统会自动解除分配该节点上的所有 VM,并将其移到正常节点。When this occurs, all VMs on the node are automatically de-allocated and moved to a healthy node. 对于 Lsv2 系列 VM,这意味着在发生故障的节点上,客户的数据也会被安全地擦除,并需要由客户在新节点上重新创建。For Lsv2-series VMs, this means that the customer's data on the failing node is also securely erased and will need to be recreated by the customer on the new node. 如前所述,在实时迁移在 Lsv2 上变得可用之前,故障节点上的数据会在 VM 转移到另一节点时主动随 VM 一起移动。As noted, before live migration becomes available on Lsv2, the data on the failing node will be proactively moved with the VMs as they are transferred to another node.

  • 是否需要对 rq_affinity 进行调整以提高性能?Do I need to make any adjustments to rq_affinity for performance?
    当使用绝对的每秒最大输入/输出操作 (IOPS) 时,rq_affinity 设置是一个微小调整。The rq_affinity setting is a minor adjustment when using the absolute maximum input/output operations per second (IOPS). 如果其他一切正常,请尝试将 rq_affinity 设置为 0,看它是否会造成差异。Once everything else is working well, then try to set rq_affinity to 0 to see if it makes a difference.

  • 是否需要更改 blk_mq 设置?Do I need to change the blk_mq settings?
    CentOS 7.x 自动将 blk-mq 用于 NVMe 设备。CentOS 7.x automatically uses blk-mq for the NVMe devices. 无需进行任何配置更改或设置。No configuration changes or settings are necessary. scsi_mod.use_blk_mq 设置仅适用于 SCSI 并且之前在 Lsv2 预览版中使用,因为那时 NVMe 设备在来宾 VM 中显示为 SCSI 设备。The scsi_mod.use_blk_mq setting is for SCSI only and was used during Lsv2 Preview because the NVMe devices were visible in the guest VMs as SCSI devices. 目前,NVMe 设备显示为 NVMe 设备,因此 SCSI blk-mq 设置无关紧要。Currently, the NVMe devices are visible as NVMe devices, so the SCSI blk-mq setting is irrelevant.

  • 是否需要更改“fio”?Do I need to change "fio"?
    若要使用性能测量工具(例如 L64v2 和 L80v2 VM 大小中的“fio”)获得最大 IOPS,请在每个 NVMe 设备上将“rq_affinity”设置为 0。To get maximum IOPS with a performance measuring tool like 'fio' in the L64v2 and L80v2 VM sizes, set "rq_affinity" to 0 on each NVMe device. 例如,以下命令行会针对 L80v2 VM 中的所有 10 个 NVMe 设备将“rq_affinity”设置为零:For example, this command line will set "rq_affinity" to zero for all 10 NVMe devices in an L80v2 VM:

    for i in `seq 0 9`; do echo 0 >/sys/block/nvme${i}n1/queue/rq_affinity; done

    另请注意,直接对具有未分区、无文件系统、无 RAID 0 配置等特点的每个原始 NVMe 设备执行 I/O 操作时,会获得最佳性能。启动测试会话之前,请在每个 NVMe 设备上运行 blkdiscard,确保配置处于已知的最新/干净状态。Also note that the best performance is obtained when I/O is done directly to each of the raw NVMe devices with no partitioning, no file systems, no RAID 0 config, etc. Before starting a testing session, ensure the configuration is in a known fresh/clean state by running blkdiscard on each of the NVMe devices.

后续步骤Next steps