排查 Azure 文件共享性能问题Troubleshoot Azure file shares performance issues

本文列出了与 Azure 文件共享相关的一些常见问题。This article lists some common problems related to Azure file shares. 其中提供了这些问题的潜在原因和解决方法。It provides potential causes and workarounds for when you encounter these problems.

高延迟、低吞吐量和一般性能问题High latency, low throughput, and general performance issues

原因 1:共享受限Cause 1: Share was throttled

当达到文件共享的每秒 I/O 操作数 (IOPS)、流入量或流出量限制时,将会限制请求。Requests are throttled when the I/O operations per second (IOPS), ingress, or egress limits for a file share are reached. 若要了解标准文件共享和高级文件共享的限制,请参阅文件共享和文件缩放目标To understand the limits for standard and premium file shares, see File share and file scale targets.

若要确认共享是否受到限制,可以访问并使用门户中的 Azure 指标。To confirm whether your share is being throttled, you can access and use Azure metrics in the portal.

  1. 在 Azure 门户中转到自己的存储帐户。In the Azure portal, go to your storage account.

  2. 在左侧窗格中的“监视”下,选择“指标” 。On the left pane, under Monitoring, select Metrics.

  3. 选择“文件”作为存储帐户范围的指标命名空间。Select File as the metric namespace for your storage account scope.

  4. 选择“事务”作为指标。Select Transactions as the metric.

  5. 添加一个“响应类型”筛选器,然后检查是否有任何请求具有以下任一响应代码:Add a filter for Response type, and then check to see whether any requests have either of the following response codes:

    • SuccessWithThrottling:对于服务器消息块 (SMB)SuccessWithThrottling: For Server Message Block (SMB)
    • ClientThrottlingError:对于 RESTClientThrottlingError: For REST

    高级文件共享的指标选项的屏幕截图,其中显示了“响应类型”属性筛选器。

    备注

    若要接收警报,请参阅本文后面的“如何创建文件共享受到限制时的警报”部分。To receive an alert, see the "How to create an alert if a file share is throttled" section later in this article.

解决方案Solution

  • 如果使用的是标准文件共享,请在存储帐户上启用大型文件共享If you're using a standard file share, enable large file shares on your storage account. 大型文件共享支持每个共享最多 10,000 IOPS。Large file shares support up to 10,000 IOPS per share.
  • 如果使用的是高级文件共享,请增加预配的文件共享大小,以便提高 IOPS 限制。If you're using a premium file share, increase the provisioned file share size to increase the IOPS limit. 若要了解详细信息,请参阅 Azure 文件存储规划指南中的“了解高级文件共享的预配”部分。To learn more, see the "Understanding provisioning for premium file shares" section in the Azure Files planning guide.

原因 2:元数据或命名空间工作负载繁重Cause 2: Metadata or namespace heavy workload

如果大多数请求以元数据为中心(例如 createfile、openfile、closefile、queryinfo 或 querydirectory),则与读/写操作相比,延迟将会更严重。If the majority of your requests are metadata-centric (such as createfile, openfile, closefile, queryinfo, or querydirectory), the latency will be worse than that of read/write operations.

若要确定你的大多数请求是否以元数据为中心,请先按照先前“原因 1”中概述的步骤 1-4 进行操作。To determine whether most of your requests are metadata-centric, start by following steps 1-4 as previously outlined in Cause 1. 对于步骤 5,请不要添加“响应类型”筛选器,而是添加“API 名称”属性筛选器 。For step 5, instead of adding a filter for Response type, add a property filter for API name.

高级文件共享的指标选项的屏幕截图,其中显示了“API 名称”属性筛选器。

解决方法Workaround

  • 检查是否可以修改应用程序以减少元数据操作的数量。Check to see whether the application can be modified to reduce the number of metadata operations.
  • 在文件共享上添加虚拟硬盘 (VHD),并从客户端通过 SMB 装载 VHD,以便对数据执行文件操作。Add a virtual hard disk (VHD) on the file share and mount the VHD over SMB from the client to perform file operations against the data. 此方法适用于单个写入器和多个读取器的情况,并允许元数据操作在本地进行。This approach works for single writer and multiple readers scenarios and allows metadata operations to be local. 安装程序提供的性能与本地直连的存储的性能类似。The setup offers performance similar to that of a local directly attached storage.

原因 3:单线程应用程序Cause 3: Single-threaded application

如果使用的应用程序是单线程的,则此安装程序可能会导致 IOPS 吞吐量明显低于最大可能的吞吐量,具体取决于预配的共享大小。If the application that you're using is single-threaded, this setup can result in significantly lower IOPS throughput than the maximum possible throughput, depending on your provisioned share size.

解决方案Solution

  • 通过增加线程数来提高应用程序的并行度。Increase application parallelism by increasing the number of threads.
  • 切换到支持并行度的应用程序。Switch to applications where parallelism is possible. 例如,对于复制操作,可以在 Windows 客户端中使用 AzCopy 或 RoboCopy,或者在 Linux 客户端中使用 parallel 命令。For example, for copy operations, you could use AzCopy or RoboCopy from Windows clients or the parallel command from Linux clients.

请求的延迟很高Very high latency for requests

原因Cause

客户端虚拟机 (VM) 所在的区域可能与文件共享所在的区域不同。The client virtual machine (VM) could be located in a different region than the file share.

解决方案Solution

  • 从与文件共享位于同一区域的 VM 运行应用程序。Run the application from a VM that's located in the same region as the file share.

客户端无法实现网络支持的最大吞吐量Client unable to achieve maximum throughput supported by the network

原因Cause

一个可能原因是缺少用于标准文件共享的 SMB 多通道支持。One potential cause is a lack of SMB multi-channel support for standard file shares. 目前,Azure 文件存储仅支持单个通道,因此从客户端 VM 到服务器只有一个连接。Currently, Azure Files supports only single channel, so there's only one connection from the client VM to the server. 此单一连接限定为客户端 VM 上的单一核心,因此,可从 VM 实现的最大吞吐量受限于单个核心。This single connection is pegged to a single core on the client VM, so the maximum throughput achievable from a VM is bound by a single core.

解决方法Workaround

  • 获取核心更大的 VM 可能有助于提高吞吐量。Obtaining a VM with a bigger core might help improve throughput.
  • 从多个 VM 运行客户端应用程序会提高吞吐量。Running the client application from multiple VMs will increase throughput.
  • 尽可能地使用 REST API。Use REST APIs where possible.

Linux 客户端上的吞吐量明显低于 Windows 客户端上的吞吐量Throughput on Linux clients is significantly lower than that of Windows clients

原因Cause

这是在 Linux 上实施 SMB 客户端的一个已知问题。This is a known issue with the implementation of the SMB client on Linux.

解决方法Workaround

  • 跨多个 VM 分散负载。Spread the load across multiple VMs.
  • 在同一 VM 上,通过 nosharesock 选项使用多个装入点,并将负载分散到这些装入点。On the same VM, use multiple mount points with a nosharesock option, and spread the load across these mount points.
  • 在 Linux 上,尝试使用 nostrictsync 选项进行装载,以避免每次调用 fsync 时都强制执行 SMB 刷新 。On Linux, try mounting with a nostrictsync option to avoid forcing an SMB flush on every fsync call. 对于 Azure 文件存储,此选项不会影响数据一致性,但可能会导致目录列表(ls -l 命令)中出现过时的文件元数据。For Azure Files, this option doesn't interfere with data consistency, but it might result in stale file metadata on directory listings (ls -l command). 使用 stat 命令直接查询文件元数据将返回最新的文件元数据。Directly querying file metadata by using the stat command will return the most up-to-date file metadata.

涉及大量打开/关闭操作的元数据密集型工作负载的延迟较高High latencies for metadata-heavy workloads involving extensive open/close operations

原因Cause

缺少目录租约支持。Lack of support for directory leases.

解决方法Workaround

  • 如果可能,请避免短时间内在同一目录中使用过多的打开/关闭句柄。If possible, avoid using an excessive opening/closing handle on the same directory within a short period of time.
  • 对于 Linux VM,请指定“actimeo=<sec>”作为装载选项,以增大目录条目缓存超时。For Linux VMs, increase the directory entry cache timeout by specifying actimeo=<sec> as a mount option. 默认情况下,超时值为 1 秒,因此较大的值(例如 3 或 5 秒)可能会有所帮助。By default, the timeout is 1 second, so a larger value, such as 3 or 5 seconds, might help.
  • 对于 CentOS Linux 或 Red Hat Enterprise Linux (RHEL) VM,请将系统升级到 CentOS Linux 8.2 或 RHEL 8.2。For CentOS Linux or Red Hat Enterprise Linux (RHEL) VMs, upgrade the system to CentOS Linux 8.2 or RHEL 8.2. 对于其他 Linux VM,请将内核升级到 5.0 或更高版本。For other Linux VMs, upgrade the kernel to 5.0 or later.

CentOS Linux 或 RHEL 上的 IOPS 较低Low IOPS on CentOS Linux or RHEL

原因Cause

CentOS Linux 或 RHEL 不支持大于 1 的 I/O 深度。An I/O depth of greater than 1 is not supported on CentOS Linux or RHEL.

解决方法Workaround

  • 升级到 CentOS Linux 8 或 RHEL 8。Upgrade to CentOS Linux 8 or RHEL 8.
  • 改用 Ubuntu。Change to Ubuntu.

在 Linux 中将文件复制到 Azure 文件共享以及从中复制文件时速度缓慢Slow file copying to and from Azure file shares in Linux

如果复制文件时速度缓慢,请查看 Linux 故障排除指南中的“在 Linux 中向/从 Azure 文件共享复制文件时速度缓慢”部分。If you're experiencing slow file copying, see the "Slow file copying to and from Azure file shares in Linux" section in the Linux troubleshooting guide.

IOPS 出现抖动或锯齿模式Jittery or sawtooth pattern for IOPS

原因Cause

客户端应用程序始终超过基线 IOPS。The client application consistently exceeds baseline IOPS. 当前,尚无请求负载的服务端平滑处理。Currently, there's no service-side smoothing of the request load. 如果客户端超过基线 IOPS,它将受到服务的限制。If the client exceeds baseline IOPS, it will get throttled by the service. 该限制可能导致客户端出现抖动或锯齿 IOPS 模式。The throttling can result in the client experiencing a jittery or sawtooth IOPS pattern. 在这种情况下,客户端实现的平均 IOPS 可能低于基线 IOPS。In this case, the average IOPS achieved by the client might be lower than the baseline IOPS.

解决方法Workaround

  • 减少客户端应用程序的请求负载,以使共享不会受到限制。Reduce the request load from the client application, so that the share doesn't get throttled.
  • 提高共享配额,以使共享不会受到限制。Increase the quota of the share so that the share doesn't get throttled.

过多的 DirectoryOpen/DirectoryClose 调用Excessive DirectoryOpen/DirectoryClose calls

原因Cause

如果最频繁的 API 调用中包括 DirectoryOpen/DirectoryClose 调用,而你预计客户端不会发出这么多的调用,则问题可能是 Azure 客户端 VM 上安装的防病毒软件引起的。If the number of DirectoryOpen/DirectoryClose calls is among the top API calls and you don't expect the client to make that many calls, the issue might be caused by the antivirus software that's installed on the Azure client VM.

解决方法Workaround

文件创建速度慢于预期File creation is slower than expected

原因Cause

依赖于创建大量文件的工作负载不会在高级文件共享和标准文件共享之间出现明显的性能差异。Workloads that rely on creating a large number of files won't see a substantial difference in performance between premium file shares and standard file shares.

解决方法Workaround

  • 无。None.

在 Windows 8.1 或 Server 2012 R2 中的性能不佳Slow performance from Windows 8.1 or Server 2012 R2

原因Cause

对于 I/O 密集型工作负载,访问 Azure 文件共享时的延迟要高于预期。Higher than expected latency accessing Azure file shares for I/O-intensive workloads.

解决方法Workaround

如何创建文件共享受到限制时的警报How to create an alert if a file share is throttled

  1. 在 Azure 门户中转到自己的存储帐户。In the Azure portal, go to your storage account.

  2. 在“监视”部分中选择“警报”,然后选择“新建警报规则” 。In the Monitoring section, select Alerts, and then select New alert rule.

  3. 选择“编辑资源”,为存储帐户选择“文件资源类型”,然后选择“完成” 。Select Edit resource, select the File resource type for the storage account, and then select Done. 例如,如果存储帐户名称为 contoso,则选择 contoso/文件资源。For example, if the storage account name is contoso, select the contoso/file resource.

  4. 选择“选择条件”以添加条件。Select Select Condition to add a condition.

  5. 在存储帐户支持的信号列表中,选择“事务”指标。In the list of signals that are supported for the storage account, select the Transactions metric.

  6. 在“配置信号逻辑”窗格的“维度名称”下拉列表中,选择“响应类型” 。On the Configure signal logic pane, in the Dimension name drop-down list, select Response type.

  7. 在“维度值”下拉列表中,选择“SuccessWithThrottling”(对于 SMB)或“ClientThrottlingError”(对于 REST) 。In the Dimension values drop-down list, select SuccessWithThrottling (for SMB) or ClientThrottlingError (for REST).

    备注

    如果“SuccessWithThrottling”和“ClientThrottlingError”维度值均未列出,则意味着该资源尚未受到限制 。If neither the SuccessWithThrottling nor the ClientThrottlingError dimension value is listed, this means that the resource has not been throttled. 若要添加维度值,请选择“维度值”下拉列表旁边的“添加自定义值”,输入“SuccessWithThrottling”或“ClientThrottlingError”,选择“确定”,然后重复步骤 7 。To add the dimension value, next to the Dimension values drop-down list, select Add custom value, enter SuccessWithThrottling or ClientThrottlingError, select OK, and then repeat step 7.

  8. 在“维度名称”下拉列表中,选择“文件共享” 。In the Dimension name drop-down list, select File Share.

  9. 在“维度值”下拉列表中,选择要对其发出警报的文件共享。In the Dimension values drop-down list, select the file share or shares that you want to alert on.

    备注

    如果文件共享是标准文件共享,请选择“所有当前值和将来值”。If the file share is a standard file share, select All current and future values. “维度值”下拉列表不会列出文件共享,因为每共享指标不可用于标准文件共享。The dimension values drop-down list doesn't list the file shares, because per-share metrics aren't available for standard file shares. 如果存储帐户中的任何文件共享受到限制,则会触发标准文件共享的限制警报,并且警报不会识别是哪个文件共享受到了限制。Throttling alerts for standard file shares is triggered if any file share within the storage account is throttled, and the alert doesn't identify which file share was throttled. 由于每共享指标不可用于标准文件共享,因此建议为每个存储帐户使用一个文件共享。Because per-share metrics aren't available for standard file shares, we recommend that you use one file share per storage account.

  10. 通过输入“阈值”、“运算符”、“聚合粒度”和“评估频率”来定义警报参数,然后选择“完成” 。Define the alert parameters by entering the Threshold value, Operator, Aggregation granularity, and Frequency of evaluation, and then select Done.

    提示

    如果使用的是静态阈值,并且文件共享当前受到限制,则可通过指标图表来确定合理的阈值。If you're using a static threshold, the metric chart can help you determine a reasonable threshold value if the file share is currently being throttled. 如果使用的是动态阈值,则指标图表将显示基于最新数据计算出的阈值。If you're using a dynamic threshold, the metric chart displays the calculated thresholds based on recent data.

  11. 选择“选择操作组”,然后通过选择现有操作组或创建新的操作组,将一个操作组(例如电子邮件或短信)添加到警报中。Select Select action group, and then add an action group (for example, email or SMS) to the alert either by selecting an existing action group or by creating a new action group.

  12. 输入警报详细信息,例如“警报规则名称”、“说明”和“严重性” 。Enter the alert details, such as Alert rule name, Description, and Severity.

  13. 选择“创建警报规则”可以创建警报 。Select Create alert rule to create the alert.

若要详细了解如何在 Azure Monitor 中配置警报,请参阅 Azure 中的警报概述To learn more about configuring alerts in Azure Monitor, see Overview of alerts in Azure.

  1. 在 Azure 门户中转到自己的存储帐户。In the Azure portal, go to your storage account.

  2. 在“监视”部分中选择“警报”,然后选择“新建警报规则” 。In the Monitoring section, select Alerts, and then select New alert rule.

  3. 选择“编辑资源”,为存储帐户选择“文件资源类型”,然后选择“完成” 。Select Edit resource, select the File resource type for the storage account, and then select Done. 例如,如果存储帐户名称为 contoso,则选择 contoso/文件资源。For example, if the storage account name is contoso, select the contoso/file resource.

  4. 选择“选择条件”以添加条件。Select Select Condition to add a condition.

  5. 在存储帐户支持的信号列表中,选择“流出量”指标。In the list of signals that are supported for the storage account, select the Egress metric.

    备注

    必须创建三个单独的警报,以在流入量、流出量或事务值超过所设置的阈值时发出警报。You have to create three separate alerts to be alerted when the ingress, egress, or transaction values exceed the thresholds you set. 这是因为仅当满足所有条件时才会触发警报。This is because an alert is triggered only when all conditions are met. 例如,如果将所有条件都放入一个警报,则仅当流入量、流出量和事务都超出其阈值量时才会发出警报。For example, if you put all the conditions in one alert, you would be alerted only if ingress, egress, and transactions exceed their threshold amounts.

  6. 向下滚动。Scroll down. 在“维度名称”下拉列表中,选择“文件共享” 。In the Dimension name drop-down list, select File Share.

  7. 在“维度值”下拉列表中,选择要对其发出警报的文件共享。In the Dimension values drop-down list, select the file share or shares that you want to alert on.

  8. 通过选择“运算符”、“阈值”、“聚合粒度”和“评估频率”下拉列表中的值来定义警报参数,然后选择“完成” 。Define the alert parameters by selecting values in the Operator, Threshold value, Aggregation granularity, and Frequency of evaluation drop-down lists, and then select Done.

    流出量、流入量和事务指标以每分钟表示,但预配的流出量、流入量和 I/O 以每秒表示。Egress, ingress, and transactions metrics are expressed per minute, though you're provisioned egress, ingress, and I/O per second. 因此,例如,如果预配的流出量为每秒 90 兆字节 (MiB/s),并且你希望阈值为预配流出量的 80%,请选择以下警报参数:Therefore, for example, if your provisioned egress is 90 mebibytes per second (MiB/s) and you want your threshold to be 80 percent of provisioned egress, select the following alert parameters:

    • 阈值:75497472For Threshold value: 75497472
    • 运算符:大于或等于For Operator: greater than or equal to
    • 聚合类型:平均For Aggregation type: average

    根据所需的警报的干扰程度,你还可以选择“聚合粒度”和“评估频率”的值 。Depending on how noisy you want your alert to be, you can also select values for Aggregation granularity and Frequency of evaluation. 例如,如果你希望警报查看在 1 小时内的平均流入量,并且希望警报规则每小时运行一次,请选择以下内容:For example, if you want your alert to look at the average ingress over the time period of 1 hour, and you want your alert rule to be run every hour, select the following:

    • 聚合粒度:1 小时For Aggregation granularity: 1 hour
    • 评估频率:1 小时For Frequency of evaluation: 1 hour
  9. 选择“选择操作组”,然后通过选择现有操作组或创建新的操作组,将一个操作组(例如电子邮件或短信)添加到警报中。Select Select action group, and then add an action group (for example, email or SMS) to the alert either by selecting an existing action group or by creating a new one.

  10. 输入警报详细信息,例如“警报规则名称”、“说明”和“严重性” 。Enter the alert details, such as Alert rule name, Description, and Severity.

  11. 选择“创建警报规则”可以创建警报 。Select Create alert rule to create the alert.

    备注

    • 若要接收有关高级文件共享由于预配的流入量而接近限制的通知,请按照前面的说明进行操作,但需要进行以下更改:To be notified that your premium file share is close to being throttled because of provisioned ingress, follow the preceding instructions, but with the following change:

      • 在步骤 5 中,选择“流入量”指标,而不是“流出量” 。In step 5, select the Ingress metric instead of Egress.
    • 若要接收有关高级文件共享由于预配的 IOPS 而接近限制的通知,请按照前面的说明进行操作,但需要进行以下更改:To be notified that your premium file share is close to being throttled because of provisioned IOPS, follow the preceding instructions, but with the following changes:

      • 在步骤 5 中,选择“事务”指标,而不是“流出量” 。In step 5, select the Transactions metric instead of Egress.
      • 在步骤 10 中,“聚合类型”的唯一选项是“总计”。In step 10, the only option for Aggregation type is Total. 因此,阈值将取决于所选的聚合粒度。Therefore, the threshold value depends on your selected aggregation granularity. 例如,如果希望阈值为预配基线 IOPS 的 80%,并且为“聚合粒度”选择了“1 小时”,则“阈值”将为基线 IOPS(以字节为单位)× 0.8 × 3600 。For example, if you want your threshold to be 80 percent of provisioned baseline IOPS and you select 1 hour for Aggregation granularity, your Threshold value would be your baseline IOPS (in bytes) × 0.8 × 3600.

若要详细了解如何在 Azure Monitor 中配置警报,请参阅 Azure 中的警报概述To learn more about configuring alerts in Azure Monitor, see Overview of alerts in Azure.

另请参阅See also