Azure 高级存储:高性能设计Azure premium storage: design for high performance

本文提供了使用 Azure 高级存储构建高性能应用程序的准则。This article provides guidelines for building high performance applications using Azure Premium Storage. 可将本文档中提供的说明与适用于应用程序所用技术的性能最佳做法结合使用。You can use the instructions provided in this document combined with performance best practices applicable to technologies used by your application. 为了说明这些准则,我们在本文档中使用了在高级存储上运行的 SQL Server 作为示例。To illustrate the guidelines, we have used SQL Server running on Premium Storage as an example throughout this document.

由于本文重点介绍针对存储层的性能方案,因此需要优化应用程序层。While we address performance scenarios for the Storage layer in this article, you will need to optimize the application layer. 例如,若要在 Azure 高级存储上托管 SharePoint 场,可使用本文中的 SQL Server 示例来优化数据库服务器。For example, if you are hosting a SharePoint Farm on Azure Premium Storage, you can use the SQL Server examples from this article to optimize the database server. 另请优化 SharePoint 场的 Web 服务器和应用程序服务器以获取最高性能。Additionally, optimize the SharePoint Farm's Web server and Application server to get the most performance.

本文帮助解答有关如何在 Azure 高级存储上优化应用程序性能的以下常见问题:This article will help answer following common questions about optimizing application performance on Azure Premium Storage,

  • 如何度量应用程序性能?How to measure your application performance?
  • 为什么看不到预期的高性能?Why are you not seeing expected high performance?
  • 哪些因素会影响应用程序在高级存储上的性能?Which factors influence your application performance on Premium Storage?
  • 这些因素如何影响应用程序在高级存储上的性能?How do these factors influence performance of your application on Premium Storage?
  • 如何针对 IOPS、带宽和延迟进行优化?How can you optimize for IOPS, Bandwidth and Latency?

我们所提供的这些准则是专门针对高级存储的,因为在高级存储上运行的工作负荷具有高度的性能敏感性。We have provided these guidelines specifically for Premium Storage because workloads running on Premium Storage are highly performance sensitive. 我们根据需要提供示例。We have provided examples where appropriate. 也可将其中部分准则应用于在使用标准存储磁盘的 IaaS VM 上运行的应用程序。You can also apply some of these guidelines to applications running on IaaS VMs with Standard Storage disks.

Note

有时,显示为磁盘性能问题的原因实际上是网络瓶颈。Sometimes, what appears to be a disk performance issue is actually a network bottleneck. 在这些情况下,应优化网络性能In these situations, you should optimize your network performance.

如果你希望对磁盘进行基准测试,请参阅我们的关于磁盘基准测试的文章。If you are looking to benchmark your disk, see our article on Benchmarking a disk.

如果 VM 支持加速网络,则应确保它已启用。If your VM supports accelerated networking, you should make sure it is enabled. 如果未启用,则可以在 WindowsLinux 上已部署的 VM 上启用它。If it is not enabled, you can enable it on already deployed VMs on both Windows and Linux.

如果不熟悉高级存储,请在开始之前首先阅读为 IaaS VM 选择 Azure 磁盘类型存储帐户的 Azure 存储可伸缩性和性能目标Before you begin, if you are new to Premium Storage, first read the Select an Azure disk type for IaaS VMs and Azure Storage scalability and performance targets for storage accounts.

应用程序性能指标Application performance indicators

我们评估应用程序的性能好坏时,会使用下面这样的性能指标:应用程序处理用户请求的速度如何、应用程序每个请求处理多少数据、应用程序在特定时间内处理多少请求、用户在提交其请求后必须等待多长时间才能获得响应。We assess whether an application is performing well or not using performance indicators like, how fast an application is processing a user request, how much data an application is processing per request, how many requests is an application processing in a specific period of time, how long a user has to wait to get a response after submitting their request. 与这些性能指标相对应的技术术语是:IOPS、吞吐量或带宽、延迟。The technical terms for these performance indicators are, IOPS, Throughput or Bandwidth, and Latency.

我们会在本部分讨论使用高级存储情况下的常见性能指标。In this section, we will discuss the common performance indicators in the context of Premium Storage. 在随后的“收集应用程序要求”部分,介绍如何为应用程序度量这些性能指标。In the following section, Gathering Application Requirements, you will learn how to measure these performance indicators for your application. 在随后的“优化应用程序性能”部分,介绍影响这些性能指标的各种因素,并提供优化建议。Later in Optimizing Application Performance, you will learn about the factors affecting these performance indicators and recommendations to optimize them.

IOPSIOPS

IOPS,或每秒输入/输出操作,是指应用程序在一秒内发送到存储磁盘的请求数。IOPS, or Input/output Operations Per Second, is the number of requests that your application is sending to the storage disks in one second. 可以按顺序或随机读取或写入输入/输出操作。An input/output operation could be read or write, sequential, or random. 联机事务处理 (OLTP) 应用程序(例如在线零售网站)需要即时处理多个并发用户请求。Online Transaction Processing (OLTP) applications like an online retail website need to process many concurrent user requests immediately. 用户请求是插入和更新操作密集型数据库事务,必须通过应用程序进行快速处理。The user requests are insert and update intensive database transactions, which the application must process quickly. 因此,OLTP 应用程序需要很高的 IOPS。Therefore, OLTP applications require very high IOPS. 此类应用程序处理数百万个小型和随机的 IO 请求。Such applications handle millions of small and random IO requests. 如果应用程序是这样的,则必须在设计应用程序基础结构时针对 IOPS 进行优化。If you have such an application, you must design the application infrastructure to optimize for IOPS. 在后面的“优化应用程序性能”部分,我们会详细讨论获取高 IOPS 必须考虑的所有因素。 In the later section, Optimizing Application Performance, we discuss in detail all the factors that you must consider to get high IOPS.

将高级存储磁盘连接到大型 VM 时,Azure 会根据磁盘规格预配保障数目的 IOPS。When you attach a premium storage disk to your high scale VM, Azure provisions for you a guaranteed number of IOPS as per the disk specification. 例如,P50 磁盘预配 7500 IOPS。For example, a P50 disk provisions 7500 IOPS. 每个大型 VM 还存在一个其所能承受的特定 IOPS 限制。Each high scale VM size also has a specific IOPS limit that it can sustain.

吞吐量Throughput

吞吐量或带宽是指应用程序在指定时间间隔内发送到存储磁盘的数据量。Throughput, or bandwidth is the amount of data that your application is sending to the storage disks in a specified interval. 如果应用程序执行的输入/输出操作使用的 IO 单位很大,则需要高吞吐量。If your application is performing input/output operations with large IO unit sizes, it requires high throughput. 数据仓库应用程序往往会发出扫描密集型操作(这些操作一次就会访问大量的数据),并且通常会执行批处理操作。Data warehouse applications tend to issue scan intensive operations that access large portions of data at a time and commonly perform bulk operations. 换而言之,此类应用程序需要更高的吞吐量。In other words, such applications require higher throughput. 如果应用程序是这样的,则必须在设计其基础结构时针对吞吐量进行优化。If you have such an application, you must design its infrastructure to optimize for throughput. 在下一部分,我们会详细讨论那些为了实现此目标而必须进行调整的因素。In the next section, we discuss in detail the factors you must tune to achieve this.

将高级存储磁盘连接到大型 VM 时,Azure 会根据磁盘规格预配吞吐量。When you attach a premium storage disk to a high scale VM, Azure provisions throughput as per that disk specification. 例如,P50 磁盘预配 250 MB/秒的磁盘吞吐量。For example, a P50 disk provisions 250 MB per second disk throughput. 每个高规格 VM 还存在一个其所能承受的特定吞吐量限制。Each high scale VM size also has as specific throughput limit that it can sustain.

吞吐量和 IOPS 之间存在一个关系,如以下公式所示。There is a relation between throughput and IOPS as shown in the formula below.

IOPS 和吞吐量的关系

因此,必须确定应用程序所需的最佳吞吐量和 IOPS 值。Therefore, it is important to determine the optimal throughput and IOPS values that your application requires. 尝试优化其中一个值时,另一个值也会受影响。As you try to optimize one, the other also gets affected. 在后面的“优化应用程序性能”部分,我们会更详细地讨论如何优化 IOPS 和吞吐量。 In a later section, Optimizing Application Performance, we will discuss in more details about optimizing IOPS and Throughput.

延迟Latency

延迟是指应用程序接收单个请求,将其发送到存储磁盘,然后又将响应发送到客户端所花的时间。Latency is the time it takes an application to receive a single request, send it to the storage disks and send the response to the client. 这是除 IOPS 和吞吐量之外的针对应用程序性能的关键度量。This is a critical measure of an application's performance in addition to IOPS and Throughput. 高级存储磁盘的延迟是指该磁盘检索请求的信息并将其发送回应用程序所花的时间。The Latency of a premium storage disk is the time it takes to retrieve the information for a request and communicate it back to your application. 高级存储提供持续一致的低延迟服务。Premium Storage provides consistent low latencies. 高级磁盘旨在为大多数 IO 操作提供个位数的毫秒级延迟。Premium Disks are designed to provide single-digit millisecond latencies for most IO operations. 如果在高级存储磁盘上启用 ReadOnly 主机缓存,则可获得相当低的读取延迟。If you enable ReadOnly host caching on premium storage disks, you can get much lower read latency. 在后面的“优化应用程序性能”部分,我们将更详细地讨论磁盘缓存。 We will discuss Disk Caching in more detail in later section on Optimizing Application Performance.

对应用程序进行优化以获取更高的 IOPS 和吞吐量时,应用程序的延迟就会受到影响。When you are optimizing your application to get higher IOPS and Throughput, it will affect the latency of your application. 在优化应用程序性能以后,应始终评估应用程序的延迟,以免出现意外的高延迟行为。After tuning the application performance, always evaluate the latency of the application to avoid unexpected high latency behavior.

在托管磁盘上进行的下述控制平面操作可能涉及将磁盘从一个存储位置移动到另一个存储位置。The following control plane operations on Managed Disks may involve movement of the Disk from one Storage location to another. 这是通过在后台复制数据来安排的,可能需要花费数小时才能完成,通常少于 24 小时,具体取决于磁盘中的数据量。This is orchestrated via background copy of data that can take several hours to complete, typically less than 24 hours depending on the amount of data in the disks. 在此期间,由于一些读取可能被重定向到原始位置,所以应用程序可能会经历比平常更高的读取延迟,并且可能需要花费更长时间才能完成。During that time your application can experience higher than usual read latency as some reads can get redirected to the original location and can take longer to complete. 在此期间,对写入延迟没有影响。There is no impact on write latency during this period.

  • 更新存储类型Update the storage type.
  • 分离磁盘以及将磁盘从一个 VM 附加到另一个 VMDetach and attach a disk from one VM to another.
  • 从 VHD 创建托管磁盘Create a managed disk from a VHD.
  • 从快照创建托管磁盘Create a managed disk from a snapshot.
  • 将非托管磁盘转换为托管磁盘Convert unmanaged disks to managed disks.

磁盘性能应用程序清单Performance Application Checklist for disks

设计在 Azure 高级存储上运行的高性能应用程序时,第一步是了解应用程序的性能要求。The first step in designing high-performance applications running on Azure Premium Storage is understanding the performance requirements of your application. 收集性能要求后,即可优化应用程序,使性能得到最大优化。After you have gathered performance requirements, you can optimize your application to achieve the most optimal performance.

在上一节中,我们介绍了常见的性能指标:IOPS、吞吐量和延迟。In the previous section, we explained the common performance indicators, IOPS, Throughput, and Latency. 必须确定对应用程序最重要的性能指标,以便为用户提供理想体验。You must identify which of these performance indicators are critical to your application to deliver the desired user experience. 例如,对于需要在一秒内处理数百万事务的 OLTP 应用程序来说,提高 IOPS 最重要。For example, high IOPS matters most to OLTP applications processing millions of transactions in a second. 而对于需要在一秒内处理大量数据的数据仓库应用程序来说,提高吞吐量最重要。Whereas, high Throughput is critical for Data Warehouse applications processing large amounts of data in a second. 对于实时应用程序(例如视频直播网站)来说,最重要的是确保极低的延迟。Extremely low Latency is crucial for real-time applications like live video streaming websites.

接下来,请衡量一下应用程序在其整个生存期的最大性能要求。Next, measure the maximum performance requirements of your application throughout its lifetime. 一开始请使用下面的示例清单。Use the sample checklist below as a start. 记录在正常工作负荷期间、高峰工作负荷期间、非工作时间工作负荷期间的最大性能要求。Record the maximum performance requirements during normal, peak, and off-hours workload periods. 确定所有工作负荷级别的要求以后,就能够确定应用程序的总体性能要求。By identifying requirements for all workloads levels, you will be able to determine the overall performance requirement of your application. 例如,电子商务网站的正常工作负荷是指该网站在一年中的多数日子需要处理的事务数。For example, the normal workload of an e-commerce website will be the transactions it serves during most days in a year. 网站的高峰工作负荷是指该网站在假日或进行特殊促销活动时候需要处理的事务数。The peak workload of the website will be the transactions it serves during holiday season or special sale events. 高峰工作负荷通常会在有限的时段内出现,但可能要求应用程序处理正常运行时两倍或以上的事务。The peak workload is typically experienced for a limited period, but can require your application to scale two or more times its normal operation. 找出第 50 百分位数、第 90 百分位和第 99 百分位的要求。Find out the 50 percentile, 90 percentile, and 99 percentile requirements. 这有助于筛选出性能要求中的任何离群值,让能够专门针对正确的值进行优化。This helps filter out any outliers in the performance requirements and you can focus your efforts on optimizing for the right values.

应用程序性能要求清单Application performance requirements checklist

性能要求Performance requirements 第 50 百分位数50 Percentile 第 90 百分位数90 Percentile 第 99 百分位数99 Percentile
最大Max. 每秒事务数Transactions per second
读取操作百分数% Read operations
写入操作百分数% Write operations
随机操作百分数% Random operations
顺序操作百分数% Sequential operations
IO 请求大小IO request size
平均吞吐量Average Throughput
最大Max. 吞吐量Throughput
最小值Min. 延迟Latency
平均延迟Average Latency
最大Max. CPUCPU
平均 CPUAverage CPU
最大Max. 内存Memory
平均内存Average Memory
队列深度Queue Depth

Note

应该根据应用程序未来的预期增长情况,来考虑对这些数字进行缩放。You should consider scaling these numbers based on expected future growth of your application. 最好是预先对增长情况进行计划,因为以后可能更难通过更改基础结构来提高性能。It is a good idea to plan for growth ahead of time, because it could be harder to change the infrastructure for improving performance later.

如果存在现有应用程序而想改用高级存储,请首先为现有应用程序构建上述清单。If you have an existing application and want to move to Premium Storage, first build the checklist above for the existing application. 然后,在高级存储上生成应用程序的原型,根据本文档后面部分的“优化应用程序性能” 中描述的准则设计应用程序。Then, build a prototype of your application on Premium Storage and design the application based on guidelines described in Optimizing Application Performance in a later section of this document. 下一篇文章说明那些可以用来收集性能度量的工具。The next article describes the tools you can use to gather the performance measurements.

用于衡量应用程序性能要求的计数器Counters to measure application performance requirements

若要衡量应用程序的性能要求,最好的方式是使用服务器的操作系统提供的性能监视工具。The best way to measure performance requirements of your application, is to use performance-monitoring tools provided by the operating system of the server. 可将 PerfMon 用于 Windows,将 iostat 用于 Linux。You can use PerfMon for Windows and iostat for Linux. 这些工具会根据以上部分所述的每个度量来捕获计数器。These tools capture counters corresponding to each measure explained in the above section. 必须在应用程序运行其正常工作负荷、高峰工作负荷和非工作时间工作负荷时捕获这些计数器的值。You must capture the values of these counters when your application is running its normal, peak, and off-hours workloads.

PerfMon 计数器适用于处理器、内存以及服务器的每个逻辑磁盘和物理磁盘。The PerfMon counters are available for processor, memory and, each logical disk and physical disk of your server. 将高级存储磁盘用于 VM 时,物理磁盘计数器适用于每个高级存储磁盘,逻辑磁盘计数器适用于在高级存储磁盘上创建的每个卷。When you use premium storage disks with a VM, the physical disk counters are for each premium storage disk, and logical disk counters are for each volume created on the premium storage disks. 必须捕获承载应用程序工作负荷的磁盘的值。You must capture the values for the disks that host your application workload. 如果在逻辑磁盘和物理磁盘之间存在一一映射,则可以引用物理磁盘计数器,否则请引用逻辑磁盘计数器。If there is a one to one mapping between logical and physical disks, you can refer to physical disk counters; otherwise refer to the logical disk counters. 在 Linux 中,iostat 命令会生成 CPU 和磁盘使用率报告。On Linux, the iostat command generates a CPU and disk utilization report. 磁盘使用率报告会按物理设备或分区提供统计信息。The disk utilization report provides statistics per physical device or partition. 如果数据库服务器的数据和日志位于不同的磁盘上,则请针对两种磁盘收集此类数据。If you have a database server with its data and logs on separate disks, collect this data for both disks. 下表描述了磁盘、处理器和内存的计数器:Below table describes counters for disks, processors, and memory:

计数器Counter 说明Description PerfMonPerfMon IostatIostat
IOPS 或每秒事务数IOPS or Transactions per second 每秒发送到存储磁盘的 I/O 请求数。Number of I/O requests issued to the storage disk per second. 磁盘读取数/秒Disk Reads/sec
磁盘写入数/秒Disk Writes/sec
tpstps
r/sr/s
w/sw/s
磁盘读取数和写入数Disk Reads and Writes 在磁盘上执行的读取和写入操作的百分比。% of Reads and Write operations performed on the disk. 磁盘读取时间百分比% Disk Read Time
磁盘写入时间百分比% Disk Write Time
r/sr/s
w/sw/s
吞吐量Throughput 每秒从磁盘读取或向磁盘写入的数据量。Amount of data read from or written to the disk per second. 磁盘读取字节数/秒Disk Read Bytes/sec
磁盘写入字节数/秒Disk Write Bytes/sec
kB_read/skB_read/s
kB_wrtn/skB_wrtn/s
延迟Latency 完成磁盘 IO 请求的总时间。Total time to complete a disk IO request. 平均磁盘秒数/读取Average Disk sec/Read
平均磁盘秒数/写入Average disk sec/Write
awaitawait
svctmsvctm
IO 大小IO size 向存储磁盘发出的 I/O 请求的大小。The size of I/O requests issues to the storage disks. 平均磁盘字节数/读取Average Disk Bytes/Read
平均磁盘字节数/写入Average Disk Bytes/Write
avgrq-szavgrq-sz
队列深度Queue Depth 等待从存储磁盘读取或等待向存储磁盘写入的待处理 I/O 请求的数目。Number of outstanding I/O requests waiting to be read from or written to the storage disk. 当前的磁盘队列长度Current Disk Queue Length avgqu-szavgqu-sz
最大内存Max. Memory 顺利运行应用程序所需的内存量Amount of memory required to run application smoothly 提交的在用字节数百分比% Committed Bytes in Use 使用 vmstatUse vmstat
最大CPUMax. CPU 顺利运行应用程序所需的 CPU 速度Amount CPU required to run application smoothly 处理器时间百分比% Processor time %util%util

详细了解 iostatPerfMonLearn more about iostat and PerfMon.

优化应用程序性能Optimize application performance

影响运行在高级存储上的应用程序的性能的主要因素包括:IO 请求的性质、VM 大小、磁盘大小、磁盘数目、磁盘缓存、多线程处理和队列深度。The main factors that influence performance of an application running on Premium Storage are Nature of IO requests, VM size, Disk size, Number of disks, disk caching, multithreading, and queue depth. 可使用系统提供的设置来控制其中部分因素。You can control some of these factors with knobs provided by the system. 大多数应用程序可能不提供直接更改 IO 大小和队列深度的选项。Most applications may not give you an option to alter the IO size and Queue Depth directly. 例如,如果使用 SQL Server,则不能选择 IO 大小和队列深度。For example, if you are using SQL Server, you cannot choose the IO size and queue depth. SQL Server 会选择最佳 IO 大小和队列深度值以获取最大性能。SQL Server chooses the optimal IO size and queue depth values to get the most performance. 必须了解两类因素对应用程序性能的影响,以便根据性能需要预配相应的资源。It is important to understand the effects of both types of factors on your application performance, so that you can provision appropriate resources to meet performance needs.

此部分从始至终都需要参考所创建的应用程序要求清单,以便确定需要将应用程序性能优化到何种程度。Throughout this section, refer to the application requirements checklist that you created, to identify how much you need to optimize your application performance. 据此,可确定此部分中需要调整的因素。Based on that, you will be able to determine which factors from this section you will need to tune. 若要了解每个因素对应用程序性能的影响,可在应用程序安装以后运行基准测试工具。To witness the effects of each factor on your application performance, run benchmarking tools on your application setup. 有关在 Windows 和 Linux VM 上运行常用基准测试工具的步骤,请参阅最后链接的“基准测试”一文。Refer to the Benchmarking article, linked at the end, for steps to run common benchmarking tools on Windows and Linux VMs.

迅速优化 IOPS、吞吐量和延迟Optimize IOPS, throughput, and latency at a glance

下表汇总了性能因素以及进行 IOPS、吞吐量和延迟优化所需的步骤。The table below summarizes performance factors and the steps necessary to optimize IOPS, throughput, and latency. 此汇总以后的部分更深入地介绍每个因素。The sections following this summary will describe each factor is much more depth.

有关 VM 大小以及每种类型的 VM 可用的 IOPS、吞吐量和延迟的详细信息,请参阅 Linux VM 大小Windows VM 大小For more information on VM sizes and on the IOPS, throughput, and latency available for each type of VM, see Linux VM sizes or Windows VM sizes.

  IOPSIOPS 吞吐量Throughput 延迟Latency
示例方案Example Scenario 企业 OLTP 应用程序,需要很高的每秒事务数比率。Enterprise OLTP application requiring very high transactions per second rate. 企业数据仓库应用程序,处理大量数据。Enterprise Data warehousing application processing large amounts of data. 近实时应用程序,需要对用户请求进行即时响应,例如在线游戏。Near real-time applications requiring instant responses to user requests, like online gaming.
性能因素Performance factors      
IO 大小IO size IO 大小越小,产生的 IOPS 越高。Smaller IO size yields higher IOPS. IO 大小越大,产生的吞吐量越大。Larger IO size to yields higher Throughput.  
VM 大小VM size 使用所提供的 IOPS 超出应用程序要求的 VM 大小。Use a VM size that offers IOPS greater than your application requirement. 使用 VM 大小时,应确保吞吐量限制超出应用程序要求。Use a VM size with throughput limit greater than your application requirement. 使用所提供的规模限制超出应用程序要求的 VM 大小。Use a VM size that offers scale limits greater than your application requirement.
磁盘大小Disk size 使用所提供的 IOPS 超出应用程序要求的磁盘大小。Use a disk size that offers IOPS greater than your application requirement. 使用磁盘大小时,应确保吞吐量限制超出应用程序要求。Use a disk size with Throughput limit greater than your application requirement. 使用所提供的规模限制超出应用程序要求的磁盘大小。Use a disk size that offers scale limits greater than your application requirement.
VM 和磁盘规模限制VM and Disk Scale Limits 所选 VM 大小的 IOPS 限制应大于已连接的高级存储磁盘所要求的总 IOPS。IOPS limit of the VM size chosen should be greater than total IOPS driven by premium storage disks attached to it. 所选 VM 大小的吞吐量限制应大于已连接的高级存储磁盘所要求的总吞吐量。Throughput limit of the VM size chosen should be greater than total Throughput driven by premium storage disks attached to it. 所选 VM 大小的规模限制必须大于已连接高级存储磁盘的总规模限制。Scale limits of the VM size chosen must be greater than total scale limits of attached premium storage disks.
磁盘缓存Disk Caching 在需要进行大量读取操作的高级存储磁盘上启用 ReadOnly 缓存,以便提高读取 IOPS。Enable ReadOnly Cache on premium storage disks with Read heavy operations to get higher Read IOPS.   在需要进行大量读取操作的高级存储磁盘上启用 ReadOnly 缓存,以便尽量降低读取延迟。Enable ReadOnly Cache on premium storage disks with Ready heavy operations to get very low Read latencies.
磁盘条带化Disk Striping 使用多个磁盘并将其条带化,获得更高的 IOPS 和吞吐量组合限制。Use multiple disks and stripe them together to get a combined higher IOPS and Throughput limit. 单个 VM 的组合限制应高于所连接的高级磁盘的组合限制。The combined limit per VM should be higher than the combined limits of attached premium disks.    
条带大小Stripe Size 较小的条带大小适用于随机小型 IO 模式,见于 OLTP 应用程序。Smaller stripe size for random small IO pattern seen in OLTP applications. 例如,SQL Server OLTP 应用程序使用 64 KB 的条带大小。For example, use stripe size of 64 KB for SQL Server OLTP application. 较大的条带大小适用于顺序大型 IO 模式,见于数据仓库应用程序。Larger stripe size for sequential large IO pattern seen in Data Warehouse applications. 例如,SQL Server 数据仓库应用程序使用 256 KB 的条带大小。For example, use 256 KB stripe size for SQL Server Data warehouse application.  
多线程处理Multithreading 使用多线程处理将更高数目的请求推送到高级存储,导致 IOPS 和吞吐量更高。Use multithreading to push higher number of requests to Premium Storage that will lead to higher IOPS and Throughput. 例如,在 SQL Server 上设置较高的 MAXDOP 值,将更多 CPU 分配到 SQL Server。For example, on SQL Server set a high MAXDOP value to allocate more CPUs to SQL Server.    
队列深度Queue Depth 队列深度越大,产生的 IOPS 越高。Larger Queue Depth yields higher IOPS. 队列深度越大,产生的吞吐量越高。Larger Queue Depth yields higher Throughput. 队列深度越小,产生的延迟越低。Smaller Queue Depth yields lower latencies.

IO 请求的性质Nature of IO requests

IO 请求是应用程序要执行的输入/输出操作单元。An IO request is a unit of input/output operation that your application will be performing. 识别 IO 请求的性质(随机或有序、读取或写入、小型或大型)有助于确定应用程序的性能要求。Identifying the nature of IO requests, random or sequential, read or write, small or large, will help you determine the performance requirements of your application. 了解 IO 请求的性质很重要,这有助于在设计应用程序基础结构时作出正确决策。It is important to understand the nature of IO requests, to make the right decisions when designing your application infrastructure. IO 必须均匀分布,以实现可能的最佳性能。IOs must be distributed evenly to achieve the best performance possible.

IO 大小是较为重要的因素之一。IO size is one of the more important factors. IO 大小是由应用程序生成的输入/输出操作请求的大小。The IO size is the size of the input/output operation request generated by your application. IO 大小对性能(尤其是应用程序能够实现的 IOPS 和带宽)有很大的影响。The IO size has a significant impact on performance especially on the IOPS and Bandwidth that the application is able to achieve. 下面的公式说明了 IOPS、IO 大小和带宽/吞吐量之间的关系。The following formula shows the relationship between IOPS, IO size, and Bandwidth/Throughput.

某些应用程序允许更改其 IO 大小,而某些应用程序则不允许。Some applications allow you to alter their IO size, while some applications do not. 例如,SQL Server 会自行确定最佳 IO 大小,不允许用户对其进行更改。For example, SQL Server determines the optimal IO size itself, and does not provide users with any knobs to change it. 另一方面,Oracle 提供了名为 DB_BLOCK_SIZE 的参数,可用于配置数据库的 I/O 请求大小。On the other hand, Oracle provides a parameter called DB_BLOCK_SIZE using which you can configure the I/O request size of the database.

如果使用的应用程序不允许更改 IO 大小,可根据本文中的准则来优化与应用程序最相关的性能 KPI。If you are using an application, which does not allow you to change the IO size, use the guidelines in this article to optimize the performance KPI that is most relevant to your application. 例如,For example,

  • OLTP 应用程序生成数以百万计的小型和随机 IO 请求。An OLTP application generates millions of small and random IO requests. 若要处理这些类型的 IO 请求,必须对应用程序基础结构进行设计,以便提高 IOPS。To handle these types of IO requests, you must design your application infrastructure to get higher IOPS.
  • 数据仓库应用程序生成大型和顺序 IO 请求。A data warehousing application generates large and sequential IO requests. 若要处理这些类型的 IO 请求,必须对应用程序基础结构进行设计,以便提高带宽或吞吐量。To handle these types of IO requests, you must design your application infrastructure to get higher Bandwidth or Throughput.

如果使用的应用程序允许更改 IO 大小,可使用针对 IO 大小的这一经验法则以及其他性能准则。If you are using an application, which allows you to change the IO size, use this rule of thumb for the IO size in addition to other performance guidelines,

  • 通过降低 IO 大小来提高 IOPS。Smaller IO size to get higher IOPS. 例如,对 OLTP 应用程序使用 8 KB 的 IO 大小。For example, 8 KB for an OLTP application.
  • 通过提高 IO 大小来提高带宽/吞吐量。Larger IO size to get higher Bandwidth/Throughput. 例如,对数据仓库应用程序使用 1024 KB 的 IO 大小。For example, 1024 KB for a data warehouse application.

下面是一个如何计算应用程序的 IOPS 和吞吐量/带宽的示例。Here is an example on how you can calculate the IOPS and Throughput/Bandwidth for your application. 以使用 P30 磁盘的应用程序为考虑对象。Consider an application using a P30 disk. P30 磁盘能够实现的最大 IOPS 和吞吐量/带宽分别是 5000 IOPS 和 200 MB/秒。The maximum IOPS and Throughput/Bandwidth a P30 disk can achieve is 5000 IOPS and 200 MB per second respectively. 现在,如果的应用程序要求实现 P30 磁盘能够达到的最大 IOPS 而使用了较小 IO(例如 8 KB),则能够获得的带宽为 40 MB/秒。Now, if your application requires the maximum IOPS from the P30 disk and you use a smaller IO size like 8 KB, the resulting Bandwidth you will be able to get is 40 MB per second. 但是,如果应用程序要求实现 P30 磁盘能够达到的最大吞吐量/带宽而使用了较大的 IO(例如 1024 KB),则最终 IOPS 会更小 (200 IOPS)。However, if your application requires the maximum Throughput/Bandwidth from P30 disk, and you use a larger IO size like 1024 KB, the resulting IOPS will be less, 200 IOPS. 因此,可以调整 IO 大小,使之满足应用程序的 IOPS 和吞吐量/带宽要求。Therefore, tune the IO size such that it meets both your application's IOPS and Throughput/Bandwidth requirement. 下表总结了 P30 磁盘的不同 IO 大小以及相应的 IOPS 和吞吐量。The following table summarizes the different IO sizes and their corresponding IOPS and Throughput for a P30 disk.

应用程序要求Application Requirement I/O 大小I/O size IOPSIOPS 吞吐量/带宽Throughput/Bandwidth
最大 IOPSMax IOPS 8 KB8 KB 5,0005,000 40 MB/秒40 MB per second
最大吞吐量Max Throughput 1024 KB1024 KB 200200 每秒 200 MB200 MB per second
最大吞吐量 + 高 IOPSMax Throughput + high IOPS 64 KB64 KB 3,2003,200 每秒 200 MB200 MB per second
最大 IOPS + 高吞吐量Max IOPS + high Throughput 32 KB32 KB 5,0005,000 160 MB/秒160 MB per second

要获取比单个高级存储磁盘的最大值还要高的 IOPS 和带宽,可将多个高级磁盘一起条带化。To get IOPS and Bandwidth higher than the maximum value of a single premium storage disk, use multiple premium disks striped together. 例如,将两个 P30 磁盘条带化得到的组合 IOPS 为 10,000 IOPS,得到的组合吞吐量为 400 MB/秒。For example, stripe two P30 disks to get a combined IOPS of 10,000 IOPS or a combined Throughput of 400 MB per second. 如下一部分所述,必须使用支持磁盘 IOPS 和吞吐量组合的 VM 大小。As explained in the next section, you must use a VM size that supports the combined disk IOPS and Throughput.

Note

增加 IOPS 或吞吐量这两个指标中的其中一个时,另一指标也将增加,因此增加任一指标时都请勿超过磁盘或 VM 的吞吐量或 IOPS 限制。As you increase either IOPS or Throughput the other also increases, make sure you do not hit throughput or IOPS limits of the disk or VM when increasing either one.

若要了解 IO 大小对应用程序性能的影响,可在 VM 和磁盘上运行基准测试工具。To witness the effects of IO size on application performance, you can run benchmarking tools on your VM and disks. 创建多个测试运行并对每个运行使用不同的 IO 大小,即可观察相应的影响。Create multiple test runs and use different IO size for each run to see the impact. 有关更多详细信息,请参阅最后链接的“基准测试”一文。Refer to the Benchmarking article, linked at the end, for more details.

大型 VM 大小High scale VM sizes

开始设计应用程序时,首要操作之一是选择用于承载应用程序的 VM。When you start designing an application, one of the first things to do is, choose a VM to host your application. 高级存储提供高规格 VM 大小,可以运行需要更高计算能力和高的本地磁盘 I/O 性能的应用程序。Premium Storage comes with High Scale VM sizes that can run applications requiring higher compute power and a high local disk I/O performance. 这些 VM 为本地磁盘提供更快的处理器、更高的内存内核比和固态驱动器 (SSD)。These VMs provide faster processors, a higher memory-to-core ratio, and a Solid-State Drive (SSD) for the local disk. DS 和 DSv2 VM 都是支持高级存储的高规格 VM 的例子。Examples of High Scale VMs supporting Premium Storage are the DS, and DSv2 VMs.

高规格 VM 提供不同的大小、不同数目的 CPU 内核、内存、OS 和临时磁盘大小。High Scale VMs are available in different sizes with a different number of CPU cores, memory, OS, and temporary disk size. 每种 VM 大小还设置了可以连接到 VM 的最大数目的数据磁盘。Each VM size also has maximum number of data disks that you can attach to the VM. 因此,所选 VM 大小会影响提供给应用程序的处理能力、内存大小和存储容量。Therefore, the chosen VM size will affect how much processing, memory, and storage capacity is available for your application. 它还会影响计算和存储成本。It also affects the Compute and Storage cost. 例如,以下是 DS 系列和 DSv2 系列中最大 VM 大小的规格:For example, below are the specifications of the largest VM size in a DS series, and DSv2 series:

VM 大小VM size CPU 核心数CPU cores 内存Memory VM 磁盘大小VM disk sizes 最大Max. 数据磁盘data disks 缓存大小Cache size IOPSIOPS 带宽缓存 IO 限制Bandwidth Cache IO limits
Standard_DS14Standard_DS14 1616 112 GB112 GB OS = 1023 GBOS = 1023 GB
本地 SSD = 224 GBLocal SSD = 224 GB
3232 576 GB576 GB 50,000 IOPS50,000 IOPS
512 MB/秒512 MB per second
4,000 IOPS,33 MB/秒4,000 IOPS and 33 MB per second

若要查看所有可用 Azure VM 大小的完整列表,请参阅 Windows VM 大小Linux VM 大小To view a complete list of all available Azure VM sizes, refer to Windows VM sizes or Linux VM sizes. 选择能够满足或者在扩展后能够满足所需应用程序性能要求的 VM 大小。Choose a VM size that can meet and scale to your desired application performance requirements. 除此之外,在选择 VM 大小时,还需考虑以下重要事项。In addition to this, take into account following important considerations when choosing VM sizes.

规模限制Scale Limits
每个 VM 和每个磁盘的最大 IOPS 限制是不同的,互不影响。The maximum IOPS limits per VM and per disk are different and independent of each other. 请确保应用程序所要实现的 IOPS 处于 VM 以及连接到 VM 的高级磁盘的限制内。Make sure that the application is driving IOPS within the limits of the VM as well as the premium disks attached to it. 否则,应用程序性能就会受到限制。Otherwise, application performance will experience throttling.

举例来说,假设应用程序要求的 IOPS 最大值为 4,000。As an example, suppose an application requirement is a maximum of 4,000 IOPS. 为此,请在 DS1 VM 上预配 P30 磁盘。To achieve this, you provision a P30 disk on a DS1 VM. P30 磁盘可以提供的 IOPS 最多为 5,000。The P30 disk can deliver up to 5,000 IOPS. 但是,DS1 VM 的 IOPS 限制为 3,200。However, the DS1 VM is limited to 3,200 IOPS. 因此,应用程序性能将受限于 3,200 IOPS 的 VM 限制,性能会下降。Consequently, the application performance will be constrained by the VM limit at 3,200 IOPS and there will be degraded performance. 若要防止这种情况的发生,在选择 VM 和磁盘大小时,应确保二者都能满足应用程序要求。To prevent this situation, choose a VM and disk size that will both meet application requirements.

运行成本Cost of Operation
在许多情况下,使用高级存储的总体运行成本可能会低于使用标准存储。In many cases, it is possible that your overall cost of operation using Premium Storage is lower than using Standard Storage.

例如,以需要 16,000 IOPS 的应用程序为考虑对象。For example, consider an application requiring 16,000 IOPS. 若要达到此性能,需要使用 Standard_D14 Azure IaaS VM,该 VM 可以使用 32 个标准存储 1TB 磁盘来实现 16,000 的最大 IOPS。To achieve this performance, you will need a Standard_D14 Azure IaaS VM, which can give a maximum IOPS of 16,000 using 32 standard storage 1TB disks. 每个 1 TB 标准存储磁盘最多可以实现 500 IOPS。Each 1TB standard storage disk can achieve a maximum of 500 IOPS. 此 VM 每月的估计成本将是 CNY10,171。The estimated cost of this VM per month will be CNY10,171. 32 个标准存储磁盘每月的成本将是 CNY8,847。The monthly cost of 32 standard storage disks will be CNY8,847. 每月估计的总成本将是 CNY19,018。The estimated total monthly cost will be CNY19,018.

但是,如果将同一应用程序置于高级存储上,则所需 VM 大小和高级存储磁盘数都会减少,从而降低总体成本。However, if you hosted the same application on Premium Storage, you will need a smaller VM size and fewer premium storage disks, thus reducing the overall cost. Standard_DS13 VM 可以使用 4 个 P30 磁盘来满足 16,000 IOPS 的要求。A Standard_DS13 VM can meet the 16,000 IOPS requirement using four P30 disks. DS13 VM 的最大 IOPS 为 25,600,每个 P30 磁盘的最大 IOPS 为 5,000。The DS13 VM has a maximum IOPS of 25,600 and each P30 disk has a maximum IOPS of 5,000. 总起来说,此配置可以达到 5,000 x 4 = 20,000 的 IOPS。Overall, this configuration can achieve 5,000 x 4 = 20,000 IOPS. 此 VM 每月的估计成本将是 CNY5,081。The estimated cost of this VM per month will be CNY5,081. 4 个 P30 高级存储磁盘每月的成本是 CNY3,625。The monthly cost of four P30 premium storage disks will be CNY3,625. 每月估计的总成本将是 CNY8,706。The estimated total monthly cost will be CNY8,706.

下表总结了这种情况下标准存储和高级存储的成本明细。Table below summarizes the cost breakdown of this scenario for Standard and Premium Storage.

  标准Standard 高级Premium
VM 每月的成本Cost of VM per month CNY10,171.48(标准_D14)CNY10,171.48(Standard_D14) CNY5,081.52(标准_DS13)CNY5,081.52 (Standard_DS13)
磁盘每月的成本Cost of Disks per month CNY8847.36(32 个 1 TB 磁盘)CNY8847.36 (32 x 1 TB disks) CNY3625.16(4 个 P30 磁盘)CNY3625.16 (4 x P30 disks)
每月成本总计Overall Cost per month CNY19,018.84CNY19,018.84 CNY8706.68CNY8706.68

Linux 发行版Linux Distros

使用 Azure 高级存储,可以让运行 Windows 和 Linux 的 VM 获得相同级别的性能。With Azure Premium Storage, you get the same level of Performance for VMs running Windows and Linux. 支持多种 Linux 发行版,可在此处查看完整列表。We support many flavors of Linux distros, and you can see the complete list here. 请务必注意,不同的发行版适用于不同类型的工作负荷。It is important to note that different distros are better suited for different types of workloads. 根据运行工作负荷的发行版,所见性能级别会有所不同。You will see different levels of performance depending on the distro your workload is running on. 使用应用程序测试各种 Linux 发行版,选择最适合的。Test the Linux distros with your application and choose the one that works best.

使用高级存储运行 Linux 时,请查看与所需驱动程序相关的最新更新,确保实现高性能。When running Linux with Premium Storage, check the latest updates about required drivers to ensure high performance.

高级存储磁盘大小Premium storage disk sizes

Azure 高级存储提供了多种大小,因此你可以选择最适合需求的大小。Azure Premium Storage offers a variety of sizes so you can choose one that best suits your needs. 每种磁盘大小对 IOPS、带宽和存储空间设置了不同规格的限制。Each disk size has a different scale limit for IOPS, bandwidth, and storage. 选择正确的高级存储磁盘大小,具体取决于应用程序要求和高规格 VM 大小。Choose the right Premium Storage Disk size depending on the application requirements and the high scale VM size. 下表显示了多种磁盘大小及其功能。The table below shows the disks sizes and their capabilities. 目前,仅托管磁盘支持 P4、P6、P15、P60、P70 和 P80 大小。P4, P6, P15, P60, P70, and P80 sizes are currently only supported for Managed Disks.

高级 SSD 大小Premium SSD sizes  P4P4 P6P6 P10P10 P15P15 P20P20 P30P30 P40P40 P50P50 P60P60 P70P70 P80P80
磁盘大小 (GiB)Disk size in GiB 3232 6464 128128 256256 512512 1,0241,024 2,0482,048 4,0964,096 8,1928,192 16,38416,384 32,76732,767
每个磁盘的 IOPSIOPS per disk 120120 240240 500500 1,1001,100 2,3002,300 5,0005,000 7,5007,500 7,5007,500 16,00016,000 18,00018,000 20,00020,000
每个磁盘的吞吐量Throughput per disk 25 MiB/秒25 MiB/sec 50 MiB/秒50 MiB/sec 100 MiB/秒100 MiB/sec 125 MiB/秒125 MiB/sec 150 MiB/秒150 MiB/sec 200 MiB/秒200 MiB/sec 250 MiB/秒250 MiB/sec 250 MiB/秒250 MiB/sec 500 MiB/秒500 MiB/sec 750 MiB/秒750 MiB/sec 900 MiB/秒900 MiB/sec

选择多少磁盘取决于所选磁盘大小。How many disks you choose depends on the disk size chosen. 可以使用单个 P50 磁盘或多个 P10 磁盘来满足应用程序要求。You could use a single P50 disk or multiple P10 disks to meet your application requirement. 进行选择时,可考虑下面列出的注意事项。Take into account considerations listed below when making the choice.

规模限制(IOPS 和吞吐量)Scale Limits (IOPS and Throughput)
每种高级磁盘大小的 IOPS 和吞吐量限制都是不同的,与 VM 规模限制无关。The IOPS and Throughput limits of each Premium disk size is different and independent from the VM scale limits. 请确保磁盘的总 IOPS 和吞吐量处于所选 VM 大小的规模限制范围内。Make sure that the total IOPS and Throughput from the disks are within scale limits of the chosen VM size.

例如,如果应用程序需要最大 250 MB/秒的吞吐量,而使用的是有单个 P30 磁盘的 DS4 VM。For example, if an application requirement is a maximum of 250 MB/sec Throughput and you are using a DS4 VM with a single P30 disk. 该 DS4 VM 可以提供高达 256 MB/秒的吞吐量。The DS4 VM can give up to 256 MB/sec Throughput. 但是,单个 P30 磁盘的吞吐量限制为 200 MB/秒。因此,应用程序将因磁盘限制而限制为 200 MB/秒。However, a single P30 disk has Throughput limit of 200 MB/sec. Consequently, the application will be constrained at 200 MB/sec due to the disk limit. 为了克服此限制,可为 VM 预配多个数据磁盘,或者将磁盘大小调整为 P40 或 P50。To overcome this limit, provision more than one data disks to the VM or resize your disks to P40 or P50.

Note

缓存提供的读取不包括在磁盘 IOPS 和吞吐量之中,因此不受磁盘限制。Reads served by the cache are not included in the disk IOPS and Throughput, hence not subject to disk limits. 缓存具有单独的 IOPS 和吞吐量限制,具体取决于每个 VM。Cache has its separate IOPS and Throughput limit per VM.

例如,初始读取和写入分别为 60MB/秒和 40MB/秒。For example, initially your reads and writes are 60MB/sec and 40MB/sec respectively. 随着时间的延长,缓存得到预热,可以通过缓存提供越来越多的读取。Over time, the cache warms up and serves more and more of the reads from the cache. 然后,就可以从磁盘获取更高的写入吞吐量。Then, you can get higher write Throughput from the disk.

磁盘数Number of Disks
通过评估应用程序要求,确定所需磁盘数。Determine the number of disks you will need by assessing application requirements. 每种 VM 大小还设置了可以连接到 VM 的磁盘的数目限制。Each VM size also has a limit on the number of disks that you can attach to the VM. 通常情况下,这是内核数的两倍。Typically, this is twice the number of cores. 请确保所选 VM 大小能够支持所需磁盘数。Ensure that the VM size you choose can support the number of disks needed.

请记住,与标准存储磁盘相比,高级存储磁盘具有更高的性能。Remember, the Premium Storage disks have higher performance capabilities compared to Standard Storage disks. 因此,如果要将应用程序从使用标准存储的 Azure IaaS VM 迁移到高级存储,可能只需更少的高级磁盘即可让应用程序实现相同的性能或更高性能。Therefore, if you are migrating your application from Azure IaaS VM using Standard Storage to Premium Storage, you will likely need fewer premium disks to achieve the same or higher performance for your application.

磁盘缓存Disk caching

利用 Azure 高级存储的高规格 VM 使用名为 BlobCache 的多层缓存技术。High Scale VMs that leverage Azure Premium Storage have a multi-tier caching technology called BlobCache. BlobCache 使用虚拟机 RAM 和本地 SSD 的组合进行缓存。BlobCache uses a combination of the Virtual Machine RAM and local SSD for caching. 此缓存适用于高级存储的永久性磁盘和 VM 本地磁盘。This cache is available for the Premium Storage persistent disks and the VM local disks. 默认情况下,此缓存设置已设置为允许对 OS 磁盘进行读/写操作,允许对托管在高级存储中的数据磁盘进行只读操作。By default, this cache setting is set to Read/Write for OS disks and ReadOnly for data disks hosted on Premium Storage. 在高级存储磁盘上启用磁盘缓存后,高规格 VM 可以达到相当高的性能级别,超出基础磁盘性能。With disk caching enabled on the Premium Storage disks, the high scale VMs can achieve extremely high levels of performance that exceed the underlying disk performance.

Warning

等于或大于 4 TiB 的磁盘不支持磁盘缓存。Disk Caching is not supported for disks 4 TiB and larger. 如果将多个磁盘附加到 VM,则每个小于 4 TiB 的磁盘都会支持缓存。If multiple disks are attached to your VM, each disk that is smaller than 4 TiB will support caching.

更改 Azure 磁盘的缓存设置可分离和重新附加目标磁盘。Changing the cache setting of an Azure disk detaches and re-attaches the target disk. 如果它是操作系统磁盘,则重启 VM。If it is the operating system disk, the VM is restarted. 更改磁盘缓存设置前,停止所有可能受此中断影响的应用程序/服务。Stop all applications/services that might be affected by this disruption before changing the disk cache setting.

若要详细了解 BlobCache 的工作方式,请参阅内部的 Azure 高级存储 博客文章。To learn more about how BlobCache works, refer to the Inside Azure Premium Storage blog post.

必须在正确的磁盘集上启用缓存。It is important to enable cache on the right set of disks. 是否应在高级磁盘上启用磁盘缓存取决于该磁盘需要处理的工作负荷模式。Whether you should enable disk caching on a premium disk or not will depend on the workload pattern that disk will be handling. 下表显示 OS 和数据磁盘的默认缓存设置。Table below shows the default cache settings for OS and Data disks.

磁盘类型Disk type 默认缓存设置Default cache setting
OS 磁盘OS disk ReadWriteReadWrite
数据磁盘Data disk ReadOnlyReadOnly

以下是针对数据磁盘建议的磁盘缓存设置:Following are the recommended disk cache settings for data disks,

磁盘缓存设置Disk caching setting 有关何时使用此设置的建议recommendation on when to use this setting
None 对于只写磁盘和频繁写入磁盘,可将 host-cache 配置为“无”。Configure host-cache as None for write-only and write-heavy disks.
ReadOnlyReadOnly 对于只读磁盘和读写磁盘,可将 host-cache 配置为“ReadOnly”。Configure host-cache as ReadOnly for read-only and read-write disks.
ReadWriteReadWrite 如果应用程序可以在需要时将缓存数据正确写入永久性磁盘,则可将 host-cache 配置为 ReadWrite。Configure host-cache as ReadWrite only if your application properly handles writing cached data to persistent disks when needed.

ReadOnlyReadOnly
在高级存储数据磁盘上配置 ReadOnly 缓存,可以为应用程序实现较低的读取延迟,并获得极高的读取 IOPS 和吞吐量。By configuring ReadOnly caching on Premium Storage data disks, you can achieve low Read latency and get very high Read IOPS and Throughput for your application. 这有两个原因。This is due two reasons,

  1. 通过缓存执行的读取操作发生在 VM 内存和本地 SSD 上,其速度要大大快于从数据磁盘进行的读取操作,后者发生在 Azure Blob 存储上。Reads performed from cache, which is on the VM memory and local SSD, are much faster than reads from the data disk, which is on the Azure blob storage.
  2. 高级存储不会将从缓存提供的读取操作计入磁盘 IOPS 和吞吐量。Premium Storage does not count the Reads served from cache, towards the disk IOPS and Throughput. 因此,应用程序能够实现更高的总 IOPS 和吞吐量。Therefore, your application is able to achieve higher total IOPS and Throughput.

ReadWriteReadWrite
默认情况下,OS 磁盘已启用 ReadWrite 缓存。By default, the OS disks have ReadWrite caching enabled. 我们最近还增加了对在数据磁盘上进行 ReadWrite 缓存的支持。We have recently added support for ReadWrite caching on data disks as well. 如果使用 ReadWrite 缓存,则必须通过适当方法将数据从缓存写入到永久性磁盘。If you are using ReadWrite caching, you must have a proper way to write the data from cache to persistent disks. 例如,SQL Server 会自行将缓存数据写入永久性存储磁盘。For example, SQL Server handles writing cached data to the persistent storage disks on its own. 对不负责保留所需数据的应用程序使用 ReadWrite 缓存可能会在 VM 崩溃时导致数据丢失。Using ReadWrite cache with an application that does not handle persisting the required data can lead to data loss, if the VM crashes.

None
目前,只有数据磁盘支持“无” 。Currently, None is only supported on data disks. OS 磁盘不支持此选项。It is not supported on OS disks. 如果在 OS 磁盘上设置“无” ,它将在内部覆盖此设置并将其设置为“ReadOnly” 。If you set None on an OS disk it will override this internally and set it to ReadOnly.

举例来说,可以通过执行以下操作将这些准则应用到在高级存储上运行的 SQL Server:As an example, you can apply these guidelines to SQL Server running on Premium Storage by doing the following,

  1. 在托管数据文件的高级存储磁盘上配置“ReadOnly”缓存。Configure "ReadOnly" cache on premium storage disks hosting data files.
    a.a. 从缓存快速读取可以缩短 SQL Server 查询时间,因为从缓存检索数据页的速度要大大快于直接从数据磁盘进行检索的速度。The fast reads from cache lower the SQL Server query time since data pages are retrieved much faster from the cache compared to directly from the data disks.
    b.b. 从缓存进行读取意味着可以从高级数据磁盘获得更多的吞吐量。Serving reads from cache, means there is additional Throughput available from premium data disks. SQL Server 可以利用这额外的吞吐量来检索更多数据页和执行其他操作,例如备份/还原、批量加载以及索引重建。SQL Server can use this additional Throughput towards retrieving more data pages and other operations like backup/restore, batch loads, and index rebuilds.
  2. 在托管日志文件的高级存储磁盘上将缓存配置为“无”。Configure "None" cache on premium storage disks hosting the log files.
    a.a. 日志文件主要是进行频繁的写入操作。Log files have primarily write-heavy operations. 因此,将缓存设置为 ReadOnly 对其无用。Therefore, they do not benefit from the ReadOnly cache.

优化 Linux Vm 的性能Optimize performance on Linux VMs

对于缓存设置为 ReadOnly 或 None 的所有高级存储 SSD,必须在装入文件系统时禁用“屏障” 。For all premium SSDs with cache set to ReadOnly or None, you must disable "barriers" when you mount the file system. 在此方案中不需要屏障,因为写入高级存储磁盘对于这些缓存设置是持久性的。You don't need barriers in this scenario because the writes to premium storage disks are durable for these cache settings. 成功完成写入请求时,数据已写入持久存储。When the write request successfully finishes, data has been written to the persistent store. 若要禁用“屏障”,请使用以下方法之一:To disable "barriers," use one of the following methods. 选择适用于文件系统的方法:Choose the one for your file system:

  • 对于 reiserFS,请使用 barrier=none 装入选项来禁用屏障。For reiserFS, to disable barriers, use the barrier=none mount option. (若要启用屏障,请使用 barrier=flush。)(To enable barriers, use barrier=flush.)
  • 对于 ext3/ext4,请使用 barrier=0 装入选项来禁用屏障。For ext3/ext4, to disable barriers, use the barrier=0 mount option. (若要启用屏障,请使用 barrier=1。)(To enable barriers, use barrier=1.)
  • 对于 XFS,请使用 nobarrier 装入选项来禁用屏障。For XFS, to disable barriers, use the nobarrier mount option. (若要启用屏障,请使用 barrier。)(To enable barriers, use barrier.)
  • 对于缓存设置为 ReadWrite 的高级存储磁盘,请启用屏障来实现写入持久性。For premium storage disks with cache set to ReadWrite, enable barriers for write durability.
  • 若要在重新启动 VM 后保留卷标,必须在 /etc/fstab 中更新对磁盘的全局唯一标识符 (UUID) 引用。For volume labels to persist after you restart the VM, you must update /etc/fstab with the universally unique identifier (UUID) references to the disks. 有关详细信息,请参阅将托管磁盘添加到 Linux VMFor more information, see Add a managed disk to a Linux VM.

下面是我们使用高级 SSD 验证过的 Linux 发行版。The following Linux distributions have been validated for premium SSDs. 为了提高高级 SSD 的性能和稳定性,建议将 VM 升级到其中一个版本(或更新版本)。For better performance and stability with premium SSDs, we recommend that you upgrade your VMs to one of these versions or newer.

某些版本需要最新的适用于 Azure 的 Linux Integration Services (LIS) v4.0。Some of the versions require the latest Linux Integration Services (LIS), v4.0, for Azure. 若要下载并安装某个发行版,请单击下表中所列的链接。To download and install a distribution, follow the link listed in the following table. 完成验证后,我们将陆续在该列表中添加映像。We add images to the list as we complete validation. 我们的验证表明,性能根据每个映像的不同而异。Our validations show that performance varies for each image. 性能取决于工作负荷特征和映像设置。Performance depends on workload characteristics and your image settings. 不同的映像已针对不同种类的工作负荷进行优化。Different images are tuned for different kinds of workloads.

分发Distribution 版本Version 支持的内核Supported kernel 详细信息Details
UbuntuUbuntu 12.04 或更高版本12.04 or newer 3.2.0-75.110+3.2.0-75.110+ Ubuntu-12_04_5-LTS-amd64-server-20150119-en-us-30GBUbuntu-12_04_5-LTS-amd64-server-20150119-en-us-30GB
UbuntuUbuntu 14.04 或更高版本14.04 or newer 3.13.0-44.73+3.13.0-44.73+ Ubuntu-14_04_1-LTS-amd64-server-20150123-en-us-30GBUbuntu-14_04_1-LTS-amd64-server-20150123-en-us-30GB
DebianDebian 7.x、8.x 或更高版本7.x, 8.x or newer 3.16.7-ckt4-1+3.16.7-ckt4-1+  
SUSESUSE SLES 12 或更高版本SLES 12 or newer 3.12.36-38.1+3.12.36-38.1+ suse-sles-12-priority-v20150213suse-sles-12-priority-v20150213
suse-sles-12-v20150213suse-sles-12-v20150213
SUSESUSE SLES 11 SP4 或更高版本SLES 11 SP4 or newer 3.0.101-0.63.1+3.0.101-0.63.1+  
CoreOSCoreOS 584.0.0+ 或更高版本584.0.0+ or newer 3.18.4+3.18.4+ CoreOS 584.0.0CoreOS 584.0.0
CentOSCentOS 6.5、6.6、6.7、7.0 或更高版本6.5, 6.6, 6.7, 7.0, or newer   需要 LIS4LIS4 required
请参阅下一部分中的注释See note in the next section
CentOSCentOS 7.1+ 或更高版本7.1+ or newer 3.10.0-229.1.2.el7+3.10.0-229.1.2.el7+ 建议使用 LIS4LIS4 recommended
请参阅下一部分中的注释See note in the next section

OpenLogic CentOS 的 LIS 驱动程序LIS drivers for OpenLogic CentOS

运行 OpenLogic CentOS VM 时,请运行以下命令来安装最新的驱动程序:If you're running OpenLogic CentOS VMs, run the following command to install the latest drivers:

sudo rpm -e hypervkvpd  ## (Might return an error if not installed. That's OK.)
sudo yum install microsoft-hyper-v

若要激活新驱动程序,请重启 VM。To activate the new drivers, restart the VM.

磁盘条带化Disk striping

当高规格 VM 与多个高级存储永久性磁盘连接时,可以将这些磁盘一起条带化,以便聚合其 IOPS、带宽和存储容量。When a high scale VM is attached with several premium storage persistent disks, the disks can be striped together to aggregate their IOPs, bandwidth, and storage capacity.

在 Windows 上,可以使用存储空间将磁盘条带化。On Windows, you can use Storage Spaces to stripe disks together. 必须为池中每个磁盘配置一列。You must configure one column for each disk in a pool. 否则,条带化卷的整体性能可能会低于预期,因为磁盘之间的通信分配不平均。Otherwise, the overall performance of striped volume can be lower than expected, due to uneven distribution of traffic across the disks.

重要说明:使用服务器管理器 UI,可以将列的总数设置为每个条带化卷最多 8 个。Important: Using Server Manager UI, you can set the total number of columns up to 8 for a striped volume. 连接 8 个以上的磁盘时,可使用 PowerShell 来创建卷。When attaching more than eight disks, use PowerShell to create the volume. 使用 PowerShell,可以将列数设置为与磁盘数相等。Using PowerShell, you can set the number of columns equal to the number of disks. 例如,如果一个条带集中有 16 个磁盘,可在 New-VirtualDisk PowerShell cmdlet 的 NumberOfColumns 参数中指定 16 个列。For example, if there are 16 disks in a single stripe set; specify 16 columns in the NumberOfColumns parameter of the New-VirtualDisk PowerShell cmdlet.

在 Linux 中,可使用 MDADM 实用工具将磁盘条带化。On Linux, use the MDADM utility to stripe disks together. 有关在 Linux 中对磁盘进行条带化操作的详细步骤,请参阅在 Linux 上配置软件 RAIDFor detailed steps on striping disks on Linux refer to Configure Software RAID on Linux.

条带大小Stripe Size
进行磁盘条带化操作时,一项重要配置是条带大小。An important configuration in disk striping is the stripe size. 条带大小或块大小是应用程序可以在条带化卷上处理的最小数据块区。The stripe size or block size is the smallest chunk of data that application can address on a striped volume. 配置的条带大小取决于应用程序类型及其请求模式。The stripe size you configure depends on the type of application and its request pattern. 如果选择了错误的条带大小,可能导致 IO 不一致,从而导致应用程序性能下降。If you choose the wrong stripe size, it could lead to IO misalignment, which leads to degraded performance of your application.

例如,如果应用程序生成的 IO 请求大于磁盘条带大小,存储系统会将数据写在不止一个磁盘上,跨越条带单元的边界。For example, if an IO request generated by your application is bigger than the disk stripe size, the storage system writes it across stripe unit boundaries on more than one disk. 在需要访问该数据时,则必须跨多个条带单元进行搜索才能完成请求。When it is time to access that data, it will have to seek across more than one stripe units to complete the request. 这种行为的累积效应就是性能大幅下降。The cumulative effect of such behavior can lead to substantial performance degradation. 另一方面,如果 IO 请求大小小于条带大小,并且其性质是随机的,则 IO 请求可能会在同一磁盘上累积起来,导致瓶颈的出现,最终导致 IO 性能下降。On the other hand, if the IO request size is smaller than stripe size, and if it is random in nature, the IO requests may add up on the same disk causing a bottleneck and ultimately degrading the IO performance.

请根据应用程序正在运行的工作负荷的类型,选择合适的条带大小。Depending on the type of workload your application is running, choose an appropriate stripe size. 对于随机的较小的 IO 请求,请使用较小的条带大小。For random small IO requests, use a smaller stripe size. 而对于大型的顺序性的 IO 请求,则请使用较大的条带大小。Whereas for large sequential IO requests use a larger stripe size. 对于要在高级存储上运行的应用程序,请找出相应的条带大小建议。Find out the stripe size recommendations for the application you will be running on Premium Storage. 对于 SQL Server,如果工作负荷为 OLTP 工作负荷,请将条带大小配置为 64 KB;如果工作负荷为数据仓库型工作负荷,则请将条带大小配置为 256 KB。For SQL Server, configure stripe size of 64 KB for OLTP workloads and 256 KB for data warehousing workloads. 请参阅 Azure VM 上的 SQL Server 性能最佳做法了解更多信息。See Performance best practices for SQL Server on Azure VMs to learn more.

Note

在 DS 系列的 VM 上最多可以将 32 个高级存储磁盘条带化。You can stripe together a maximum of 32 premium storage disks on a DS series VM.

多线程处理Multi-threading

Azure 将高级存储平台设计为可以进行大规模并行处理。Azure has designed Premium Storage platform to be massively parallel. 因此,相对于单线程应用程序,多线程应用程序可以实现更高的性能,而且要高得多。Therefore, a multi-threaded application achieves much higher performance than a single-threaded application. 多线程应用程序将其任务拆分成多个线程,并将 VM 和磁盘资源利用到极致,从而提高其执行效率。A multi-threaded application splits up its tasks across multiple threads and increases efficiency of its execution by utilizing the VM and disk resources to the maximum.

例如,如果应用程序是在使用两个线程的单核 VM 上运行,则 CPU 可以在这两个线程之间进行切换以实现高效率。For example, if your application is running on a single core VM using two threads, the CPU can switch between the two threads to achieve efficiency. 当一个线程在等待磁盘 IO 完成时,CPU 可以切换至另一个线程。While one thread is waiting on a disk IO to complete, the CPU can switch to the other thread. 通过这种方式,两个线程可以完成比单个线程更多的任务。In this way, two threads can accomplish more than a single thread would. 如果 VM 有多个内核,则可进一步缩短运行时间,因为每个内核都可以并行执行任务。If the VM has more than one core, it further decreases running time since each core can execute tasks in parallel.

可能无法更改现成应用程序实施单线程处理或多线程处理的方式。You may not be able to change the way an off-the-shelf application implements single threading or multi-threading. 例如,SQL Server 能够处理多 CPU 和多核的情况。For example, SQL Server is capable of handling multi-CPU and multi-core. 但是,SQL Server 可以决定在什么样的情况下利用一个或多个线程来处理查询。However, SQL Server decides under what conditions it will leverage one or more threads to process a query. 它可以运行查询,并使用多线程处理来生成索引。It can run queries and build indexes using multi-threading. 如果一个查询需要先联接多个大型表并对数据进行排序才能返回给用户,则 SQL Server 可能会使用多个线程。For a query that involves joining large tables and sorting data before returning to the user, SQL Server will likely use multiple threads. 但是,用户不能控制 SQL Server 是使用单线程还是多线程来执行查询。However, a user cannot control whether SQL Server executes a query using a single thread or multiple threads.

可以通过更改配置设置来影响应用程序的这种多线程处理或并行处理。There are configuration settings that you can alter to influence this multi-threading or parallel processing of an application. 例如,在使用 SQL Server 的情况下,可以更改最大并行度配置。For example, in case of SQL Server it is the maximum Degree of Parallelism configuration. 此设置称为 MAXDOP,可用于配置 SQL Server 在并行处理时能够使用的最大处理器数。This setting called MAXDOP, allows you to configure the maximum number of processors SQL Server can use when parallel processing. 可为单个查询或索引操作配置 MAXDOP。You can configure MAXDOP for individual queries or index operations. 对于性能关键型应用程序,这在需要均衡系统资源时大有裨益。This is beneficial when you want to balance resources of your system for a performance critical application.

例如,假设应用程序使用 SQL Server,且同时执行大型查询和索引操作。For example, say your application using SQL Server is executing a large query and an index operation at the same time. 假设想让索引操作性能优于大型查询。Let us assume that you wanted the index operation to be more performant compared to the large query. 在这种情况下,可以将索引操作的 MAXDOP 值设为高于查询的 MAXDOP 值。In such a case, you can set MAXDOP value of the index operation to be higher than the MAXDOP value for the query. 这样一来,SQL Server 在进行索引操作时,就可以利用比进行大型查询所需的处理器更多的处理器。This way, SQL Server has more number of processors that it can leverage for the index operation compared to the number of processors it can dedicate to the large query. 请记住,无法人为控制 SQL Server 要用于每个操作的线程数。Remember, you do not control the number of threads SQL Server will use for each operation. 可以控制多线程处理专用的最大处理器数。You can control the maximum number of processors being dedicated for multi-threading.

详细了解 SQL Server 中的并行度Learn more about Degrees of Parallelism in SQL Server. 找出应用程序中影响多线程处理的此类设置及其配置,以便优化性能。Find out such settings that influence multi-threading in your application and their configurations to optimize performance.

队列深度Queue depth

队列深度、队列长度或队列大小是指系统中等待处理的 IO 请求的数目。The queue depth or queue length or queue size is the number of pending IO requests in the system. 队列深度的值决定了应用程序可以让多少个 IO 操作排队供存储磁盘处理。The value of queue depth determines how many IO operations your application can line up, which the storage disks will be processing. 它会影响我们在本文中讨论过的所有三个应用程序性能指标,即 IOPS、吞吐量和延迟。It affects all the three application performance indicators that we discussed in this article viz., IOPS, throughput, and latency.

队列深度和多线程处理密切相关。Queue Depth and multi-threading are closely related. 队列深度值表示应用程序可以实现的多线程处理的程度。The Queue Depth value indicates how much multi-threading can be achieved by the application. 如果队列深度很大,则应用程序可以并行执行更多的操作,换言之,可以进行更多的多线程处理。If the Queue Depth is large, application can execute more operations concurrently, in other words, more multi-threading. 如果队列深度小,则即使应用程序具有多个线程,它也无法让足够多的请求排队来完成并发执行。If the Queue Depth is small, even though application is multi-threaded, it will not have enough requests lined up for concurrent execution.

通常情况下,现成应用程序不允许更改队列深度,因为设置不正确反而有害。Typically, off the shelf applications do not allow you to change the queue depth, because if set incorrectly it will do more harm than good. 应用程序会将队列深度设置成合适的值以获取最佳性能。Applications will set the right value of queue depth to get the optimal performance. 但是,理解这一概念对于解决应用程序性能问题很重要。However, it is important to understand this concept so that you can troubleshoot performance issues with your application. 也可通过在系统中运行基准测试工具来观察队列深度的影响。You can also observe the effects of queue depth by running benchmarking tools on your system.

某些应用程序提供可以影响队列深度的设置。Some applications provide settings to influence the Queue Depth. 例如,前一部分介绍的 SQL Server 中的 MAXDOP(最大并行度)设置。For example, the MAXDOP (maximum degree of parallelism) setting in SQL Server explained in previous section. MAXDOP 是一种影响队列深度和多线程处理的方法,虽然它不直接更改 SQL Server 的队列深度值。MAXDOP is a way to influence Queue Depth and multi-threading, although it does not directly change the Queue Depth value of SQL Server.

高队列深度High queue depth
高队列深度可以让更多操作在磁盘上排队。A high queue depth lines up more operations on the disk. 磁盘可以提前知道其队列中的下一个请求。The disk knows the next request in its queue ahead of time. 因此,磁盘可以提前计划操作,按最佳顺序对其进行处理。Consequently, the disk can schedule operations ahead of time and process them in an optimal sequence. 由于应用程序向磁盘发送了更多的请求,因此磁盘可以处理更多的并行 IO。Since the application is sending more requests to the disk, the disk can process more parallel IOs. 最终,应用程序可以实现更高的 IOPS。Ultimately, the application will be able to achieve higher IOPS. 由于应用程序处理了更多的请求,因此应用程序的总吞吐量也增加。Since application is processing more requests, the total Throughput of the application also increases.

通常,在每个连接的磁盘存在 8-16 个以上待处理 IO 的情况下,应用程序可以实现最大吞吐量。Typically, an application can achieve maximum Throughput with 8-16+ outstanding IOs per attached disk. 如果队列深度为 1,则应用程序不会将足够的 IO 推送到系统,在给定时间内所能处理的 IO 数目会较少。If a queue depth is one, application is not pushing enough IOs to the system, and it will process less amount of in a given period. 换而言之,吞吐量降低。In other words, less Throughput.

例如,在 SQL Server 中,将查询的 MAXDOP 值设置为“4”就是告知 SQL Server:最多可以使用 4 个内核来执行查询。For example, in SQL Server, setting the MAXDOP value for a query to "4" informs SQL Server that it can use up to four cores to execute the query. 由 SQL Server 来确定最佳队列深度值以及执行查询所需的内核数目。SQL Server will determine what is best queue depth value and the number of cores for the query execution.

最佳队列深度Optimal queue depth
队列深度值过高也有其缺点。Very high queue depth value also has its drawbacks. 如果队列深度值过高,则应用程序会尝试实现非常高的 IOPS。If queue depth value is too high, the application will try to drive very high IOPS. 除非应用程序的永久性磁盘具有足够高的预配 IOPS,否则会对应用程序延迟造成负面影响。Unless application has persistent disks with sufficient provisioned IOPS, this can negatively affect application latencies. 以下公式显示了 IOPS、延迟和队列深度之间的关系。Following formula shows the relationship between IOPS, latency, and queue depth.

不应随意地将队列深度配置为某个很高的值,而应将其配置为最佳值,该值可以确保应用程序实现足够高的 IOPS,但又不会影响延迟。You should not configure Queue Depth to any high value, but to an optimal value, which can deliver enough IOPS for the application without affecting latencies. 例如,如果应用程序延迟需要设置为 1 毫秒,则要实现 5,000 IOPS,所需队列深度为:QD = 5000 x 0.001 = 5。For example, if the application latency needs to be 1 millisecond, the Queue Depth required to achieve 5,000 IOPS is, QD = 5000 x 0.001 = 5.

条带化卷的队列深度Queue Depth for Striped Volume
条带化卷应保持足够高的队列深度,使得每个磁盘都有各自的高峰队列深度。For a striped volume, maintain a high enough queue depth such that, every disk has a peak queue depth individually. 例如,以某个应用程序为考虑对象,该应用程序所推送的队列深度为 2,条带中有四个磁盘。For example, consider an application that pushes a queue depth of 2 and there are four disks in the stripe. 两个 IO 请求会发送到两个磁盘中,剩下两个磁盘会处于空闲状态。The two IO requests will go to two disks and remaining two disks will be idle. 因此,请将队列深度配置为让所有磁盘都能够处于繁忙状态。Therefore, configure the queue depth such that all the disks can be busy. 下面的公式说明了如何确定条带化卷的队列深度。Formula below shows how to determine the queue depth of striped volumes.

限制Throttling

Azure 高级存储根据所选 VM 大小和磁盘大小,预配指定数目的 IOPS 和吞吐量。Azure Premium Storage provisions specified number of IOPS and Throughput depending on the VM sizes and disk sizes you choose. 一旦应用程序尝试将 IOPS 或吞吐量提升到这些限制(VM 或磁盘能够处理的量)以上,高级存储就会对其进行限制。Anytime your application tries to drive IOPS or Throughput above these limits of what the VM or disk can handle, Premium Storage will throttle it. 这会以应用程序性能下降的方式体现出来。This manifests in the form of degraded performance in your application. 具体表现为延迟增高、吞吐量下降或 IOPS 降低。This can mean higher latency, lower Throughput, or lower IOPS. 如果高级存储不对此进行限制,则应用程序可能因超过其资源能力而彻底崩溃。If Premium Storage does not throttle, your application could completely fail by exceeding what its resources are capable of achieving. 因此,为了避免因限制而造成的性能问题,请始终为应用程序预配足够的资源。So, to avoid performance issues due to throttling, always provision sufficient resources for your application. 请考虑一下我们在上面的 VM 大小和磁盘大小部分讨论过的内容。Take into consideration what we discussed in the VM sizes and Disk sizes sections above. 若要了解承载应用程序所需的资源,最好进行基准测试。Benchmarking is the best way to figure out what resources you will need to host your application.

后续步骤Next steps

如果你希望对磁盘进行基准测试,请参阅我们的关于磁盘基准测试的文章。If you are looking to benchmark your disk, see our article on Benchmarking a disk.

了解有关可用磁盘类型的详细信息:选择磁盘类型Learn more about the available disk types: Select a disk type

SQL Server 用户请阅读有关 SQL Server 性能最佳实践的文章:For SQL Server users, read articles on Performance Best Practices for SQL Server: