对磁盘进行基准检验Benchmark a disk

基准测试是指模拟应用程序的不同工作负荷,针对每个工作负荷来测量应用程序性能这样一个过程。Benchmarking is the process of simulating different workloads on your application and measuring the application performance for each workload. 通过为实现高性能而设计一文中描述的步骤,你已经收集了应用程序性能要求。Using the steps described in the designing for high performance article, you have gathered the application performance requirements. 通过在托管应用程序的 VM 上运行基准测试工具,可以确定应用程序在高级 SSD 中能够达到的性能级别。By running benchmarking tools on the VMs hosting the application, you can determine the performance levels that your application can achieve with premium SSDs. 在本文中,我们提供了如何对预配了 Azure 高级 SSD 的 Standard_D8ds_v4 VM 进行基准测试的示例。In this article, we provide you examples of benchmarking a Standard_D8ds_v4 VM provisioned with Azure premium SSDs.

我们使用了常见的基准测试工具 DiskSpd 和 FIO(分别适用于 Windows 和 Linux)。We have used common benchmarking tools DiskSpd and FIO, for Windows and Linux respectively. 这些工具会生成多个线程,这些线程模拟类似生产的工作负荷,并测量系统性能。These tools spawn multiple threads simulating a production like workload, and measure the system performance. 使用这些工具还可以配置各种参数(例如块大小和队列深度),应用程序的这些参数通常无法更改。Using the tools you can also configure parameters like block size and queue depth, which you normally cannot change for an application. 这样,就可以在预配了高级 SSD 的大规模 VM 上,为不同类型的应用程序工作负荷灵活实现最高性能。This gives you more flexibility to drive the maximum performance on a high scale VM provisioned with premium SSDs for different types of application workloads. 若要详细了解每种基准测试工具,请参阅 DiskSpdFIOTo learn more about each benchmarking tool visit DiskSpd and FIO.

若要按以下示例进行操作,请创建一个 Standard_D8ds_v4 VM,并将四个高级 SSD 附加到该 VM。To follow the examples below, create a Standard_D8ds_v4 and attach four premium SSDs to the VM. 在这四个磁盘中,将三个磁盘的主机缓存配置为“无”,并将其条带化到名为 NoCacheWrites 的卷中。Of the four disks, configure three with host caching as "None" and stripe them into a volume called NoCacheWrites. 将剩余磁盘上的主机缓存配置为“ReadOnly”,在该磁盘上创建名为 CacheReads 的卷。Configure host caching as "ReadOnly" on the remaining disk and create a volume called CacheReads with this disk. 使用这种设置,即可发现 Standard_D8ds_v4 VM 展现出最大的读写性能。Using this setup, you are able to see the maximum Read and Write performance from a Standard_D8ds_v4 VM. 有关创建具有高级 SSD 的 Standard_D8ds_v4 VM 的详细步骤,请参阅高性能设计For detailed steps about creating a Standard_D8ds_v4 with premium SSDs, see Designing for high performance.

预热缓存Warm up the Cache

启用 ReadOnly 主机缓存的磁盘能够提供比磁盘限制更高的 IOPS。The disk with ReadOnly host caching is able to give higher IOPS than the disk limit. 若要通过主机缓存来实现此最大读取性能,首先必须对此磁盘的缓存进行预热。To get this maximum read performance from the host cache, first you must warm up the cache of this disk. 这样可确保需要通过基准测试工具在 CacheReads 卷上实现的读取 IO 实际上可以直接命中缓存而不是磁盘。This ensures that the Read IOs that the benchmarking tool will drive on CacheReads volume, actually hits the cache, and not the disk directly. 命中缓存导致单个启用缓存的磁盘可以实现更高的 IOPS。The cache hits result in more IOPS from the single cache enabled disk.

重要

每次重启 VM 后,必须在运行基准测试之前预热缓存。You must warm up the cache before running benchmarks every time VM is rebooted.

DISKSPDDISKSPD

在 VM 上下载 DISKSP 工具Download the DISKSP tool on the VM. DISKSPD 是一种可自定义的工具,用于创建你自己的合成工作负载。DISKSPD is a tool that you can customize to create your own synthetic workloads. 我们将使用上面介绍的相同设置来运行基准测试。We will use the same setup described above to run benchmarking tests. 你可以更改规范以测试不同的工作负载。You can change the specifications to test different workloads.

在此示例中,我们使用以下基准参数集:In this example, we use the following set of baseline parameters:

  • -c200G:创建(或重新创建)测试中使用的示例文件。-c200G: Creates (or recreates) the sample file used in the test. 可以按字节、KiB、MiB、GiB 或块进行设置。It can be set in bytes, KiB, MiB, GiB, or blocks. 本例中使用大型的 200 GiB 目标文件来最大程度地减少内存缓存。In this case, a large file of 200-GiB target file is used to minimize memory caching.
  • -w100:指定属于写入请求的操作的百分比(-w0 等效于 100% 读取)。-w100: Specifies the percentage of operations that are write requests (-w0 is equivalent to 100% read).
  • -b4K:表示块大小(以字节、KiB、MiB 或 GiB 为单位)。-b4K: Indicates the block size in bytes, KiB, MiB, or GiB. 本例中使用 4K 块大小来模拟随机 I/O 测试。In this case, 4K block size is used to simulate a random I/O test.
  • -F4:共设置四个线程。-F4: Sets a total of four threads.
  • -r:标识随机 I/O 测试(替代 -s 参数)。-r: Indicates the random I/O test (overrides the -s parameter).
  • -o128:表示每个线程的每个目标的未完成 I/O 请求数。-o128: Indicates the number of outstanding I/O requests per target per thread. 这也称为队列深度。This is also known as the queue depth. 本例中使用 128 来对 CPU 施加压力。In this case, 128 is used to stress the CPU.
  • -W7200:指定测量开始之前的预热时长。-W7200: Specifies the duration of the warm-up time before measurements start.
  • -d30:指定测试的持续时间,不包括预热。-d30: Specifies the duration of the test, not including warm-up.
  • -Sh:禁用软件和硬件写入缓存(等效于 -Suw)。-Sh: Disable software and hardware write caching (equivalent to -Suw).

如需参数的完整列表,请参阅 GitHub 存储库For a complete list of parameters, see the GitHub repository.

最大写入 IOPSMaximum write IOPS

我们使用 128 的高队列深度,8 KB 的小型块大小,以及 4 个工作线程用于推动写入操作。We use a high queue depth of 128, a small block size of 8 KB, and four worker threads for driving Write operations. 写入工作线程推动“NoCacheWrites”卷上的流量,该卷有 3 个磁盘的缓存设置为“无”。The write workers are driving traffic on the "NoCacheWrites" volume, which has three disks with cache set to "None".

运行以下命令,进行 30 秒的预热和 30 秒的测量:Run the following command for 30 seconds of warm-up and 30 seconds of measurement:

diskspd -c200G -w100 -b8K -F4 -r -o128 -W30 -d30 -Sh testfile.dat

结果显示 Standard_D8ds_v4 VM 传送的最大写入 IOPS 限制为 12,800。Results show that the Standard_D8ds_v4 VM is delivering its maximum write IOPS limit of 12,800.

对于 3208642560 的总字节数,最大总 I/O 次数为 391680 次,总计 101.97 MiB/s,每秒总计 13052.65 次 I/O。

最大读取 IOPSMaximum read IOPS

我们使用 128 的高队列深度,4 KB 的小型块大小,以及 4 个工作线程用于推动读取操作。We use a high queue depth of 128, a small block size of four KB, and four worker threads for driving Read operations. 读取工作线程推动“CacheReads”卷上的流量,该卷有 1 个磁盘的缓存设置为“ReadOnly”。The read workers are driving traffic on the "CacheReads" volume, which has one disk with cache set to "ReadOnly".

运行以下命令,进行两小时的预热和 30 秒的测量:Run the following command for two hours of warm-up and 30 seconds of measurement:

diskspd -c200G -b4K -F4 -r -o128 -W7200 -d30 -Sh testfile.dat

结果显示 Standard_D8ds_v4 VM 传送的最大读取 IOPS 限制为 77,000。Results show that the Standard_D8ds_v4 VM is delivering its maximum read IOPS limit of 77,000.

对于 9652785152 的总字节数,共有 2356637 次 I/O,总计 306.72 MiB/s,每秒总计 78521.23 次 I/O。

最大吞吐量Maximum throughput

若要获取最大读取和写入吞吐量,可以更改为更大的块大小 64 KB。To get the maximum read and write throughput, you can change to a larger block size of 64 KB.

FIOFIO

FIO 是一种常用工具,可以在 Linux VM 上对存储进行基准测试。FIO is a popular tool to benchmark storage on the Linux VMs. 它可以灵活地选择不同的 IO 大小、顺序或随机读取和写入。It has the flexibility to select different IO sizes, sequential or random reads and writes. 它生成的工作线程或进程可以执行指定的 I/O 操作。It spawns worker threads or processes to perform the specified I/O operations. 可以指定每个工作线程使用作业文件时必须执行的 I/O 操作类型。You can specify the type of I/O operations each worker thread must perform using job files. 我们根据以下示例所描述的方案创建了一个作业文件。We created one job file per scenario illustrated in the examples below. 可以更改这些作业文件中的规范,以便对在高级存储上运行的不同工作负荷进行基准测试。You can change the specifications in these job files to benchmark different workloads running on Premium Storage. 在这些示例中,我们使用运行 Ubuntu 的 Standard_D8ds_v4。In the examples, we are using a Standard_D8ds_v4 running Ubuntu. 运行基准测试之前,请使用基准测试部分开头所述的相同设置并预热缓存。Use the same setup described in the beginning of the benchmark section and warm up the cache before running the benchmark tests.

开始之前,请下载 FIO 并将其安装在虚拟机上。Before you begin, download FIO and install it on your virtual machine.

针对 Ubuntu 运行以下命令:Run the following command for Ubuntu,

apt-get install fio

我们在磁盘上使用 4 个工作线程来执行写入操作,4 个工作线程来执行读取操作。We use four worker threads for driving Write operations and four worker threads for driving Read operations on the disks. 写入工作线程推动“nocache”卷上的流量,该卷有 3 个磁盘的缓存设置为“无”。The write workers are driving traffic on the "nocache" volume, which has three disks with cache set to "None". 读取工作线程推动“readcache”卷上的流量,该卷有 1 个磁盘的缓存设置为“ReadOnly”。The read workers are driving traffic on the "readcache" volume, which has one disk with cache set to "ReadOnly".

最大写入 IOPSMaximum write IOPS

使用以下规范创建作业文件,以便获得最大写入 IOPS。Create the job file with following specifications to get maximum Write IOPS. 将其命名为“fiowrite.ini”。Name it "fiowrite.ini".

[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=4k
numjobs=4

[writer1]
rw=randwrite
directory=/mnt/nocache

请注意以下重要事项,这些事项必须符合前面部分讨论的设计准则。Note the follow key things that are in line with the design guidelines discussed in previous sections. 这些规范是实现最大 IOPS 所必需的。These specifications are essential to drive maximum IOPS,

  • 较高的队列深度:256。A high queue depth of 256.
  • 较小的块大小:4 KB。A small block size of 4 KB.
  • 多个执行随机写入的线程。Multiple threads performing random writes.

运行以下命令,开始进行 30 秒的 FIO 测试:Run the following command to kick off the FIO test for 30 seconds,

sudo fio --runtime 30 fiowrite.ini

进行测试时,就能够看到 VM 和高级磁盘传送的写入 IOPS 数。While the test runs, you are able to see the number of write IOPS the VM and Premium disks are delivering. 如以下示例所示,Standard_D8ds_v4 VM 传送的最大写入 IOPS 限制为 12,800 IOPS。As shown in the sample below, the Standard_D8ds_v4 VM is delivering its maximum write IOPS limit of 12,800 IOPS.

VM 和高级 SSD 传送的写入 IOPS 数,显示写入数为 13.1k IOPS。

最大读取 IOPSMaximum read IOPS

使用以下规范创建作业文件,以便获得最大读取 IOPS。Create the job file with following specifications to get maximum Read IOPS. 将其命名为“fioread.ini”。Name it "fioread.ini".

[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=4k
numjobs=4

[reader1]
rw=randread
directory=/mnt/readcache

请注意以下重要事项,这些事项必须符合前面部分讨论的设计准则。Note the follow key things that are in line with the design guidelines discussed in previous sections. 这些规范是实现最大 IOPS 所必需的。These specifications are essential to drive maximum IOPS,

  • 较高的队列深度:256。A high queue depth of 256.
  • 较小的块大小:4 KB。A small block size of 4 KB.
  • 多个执行随机写入的线程。Multiple threads performing random writes.

运行以下命令,开始进行 30 秒的 FIO 测试:Run the following command to kick off the FIO test for 30 seconds,

sudo fio --runtime 30 fioread.ini

进行测试时,就能够看到 VM 和高级磁盘传送的读取 IOPS 数。While the test runs, you are able to see the number of read IOPS the VM and Premium disks are delivering. 如以下示例所示,Standard_D8ds_v4 VM 传送了超过 77,000 个读取 IOPS。As shown in the sample below, the Standard_D8ds_v4 VM is delivering more than 77,000 Read IOPS. 这是磁盘和缓存性能的组合。This is a combination of the disk and the cache performance.
VM 和高级 SSD 传送的写入 IOPS 的屏幕截图,其中显示的读数为 78.6k。

最大读取和写入 IOPSMaximum read and write IOPS

使用以下规范创建作业文件,以便获得最大读写组合 IOPS。Create the job file with following specifications to get maximum combined Read and Write IOPS. 将其命名为“fioreadwrite.ini”。Name it "fioreadwrite.ini".

[global]
size=30g
direct=1
iodepth=128
ioengine=libaio
bs=4k
numjobs=4

[reader1]
rw=randread
directory=/mnt/readcache

[writer1]
rw=randwrite
directory=/mnt/nocache
rate_iops=3200

请注意以下重要事项,这些事项必须符合前面部分讨论的设计准则。Note the follow key things that are in line with the design guidelines discussed in previous sections. 这些规范是实现最大 IOPS 所必需的。These specifications are essential to drive maximum IOPS,

  • 较高的队列深度:128。A high queue depth of 128.
  • 较小的块大小:4 KB。A small block size of 4 KB.
  • 多个执行随机读取和写入的线程。Multiple threads performing random reads and writes.

运行以下命令,开始进行 30 秒的 FIO 测试:Run the following command to kick off the FIO test for 30 seconds,

sudo fio --runtime 30 fioreadwrite.ini

进行测试时,就能够看到 VM 和高级磁盘传送的组合型读取和写入 IOPS 数。While the test runs, you are able to see the number of combined read and write IOPS the VM and Premium disks are delivering. 如以下示例所示,Standard_D8ds_v4 VM 传送了超过 90,000 个读写组合 IOPS。As shown in the sample below, the Standard_D8ds_v4 VM is delivering more than 90,000 combined Read and Write IOPS. 这是磁盘和缓存性能的组合。This is a combination of the disk and the cache performance.
合并的读取和写入 IOPS,显示读取数为 78.3k IOPS,写入数为 12.6k IOPS。

最大组合吞吐量Maximum combined throughput

若要获得最大读写组合吞吐量,请使用较大的块大小和大的队列深度,并通过多个线程执行读取和写入操作。To get the maximum combined Read and Write Throughput, use a larger block size and large queue depth with multiple threads performing reads and writes. 可以使用 64 KB 的块大小,128 的队列深度。You can use a block size of 64 KB and queue depth of 128.

后续步骤Next steps

继续阅读有关针对高性能进行设计的文章。Proceed to our article on designing for high performance.

在该文中,你将为原型创建一个类似于现有应用程序的清单。In that article, you create a checklist similar to your existing application for the prototype. 使用各种能够用来模拟工作负荷并衡量原型应用程序性能的基准测试工具。Using Benchmarking tools you can simulate the workloads and measure performance on the prototype application. 这样做可以确定哪些磁盘产品可以满足或超过你的应用程序性能要求。By doing so, you can determine which disk offering can match or surpass your application performance requirements. 然后,就可以将相同的准则实施到生产型应用程序中。Then you can implement the same guidelines for your production application.