排查 Azure 文件共享性能问题Troubleshoot Azure file shares performance issues

本文列出了与 Azure 文件共享相关的一些常见问题。This article lists some common problems related to Azure file shares. 其中提供了这些问题的潜在原因和解决方法。It provides potential causes and workarounds for when you encounter these problems.

高延迟、低吞吐量和一般性能问题High latency, low throughput, and general performance issues

原因 1:共享受限Cause 1: Share was throttled

当达到文件共享的每秒 I/O 操作数 (IOPS)、流入量或流出量限制时,将会限制请求。Requests are throttled when the I/O operations per second (IOPS), ingress, or egress limits for a file share are reached. 若要了解标准文件共享和高级文件共享的限制,请参阅文件共享和文件缩放目标To understand the limits for standard and premium file shares, see File share and file scale targets.

若要确认共享是否受到限制,可以访问并使用门户中的 Azure 指标。To confirm whether your share is being throttled, you can access and use Azure metrics in the portal.

  1. 在 Azure 门户中转到自己的存储帐户。In the Azure portal, go to your storage account.

  2. 在左侧窗格中的“监视”下,选择“指标” 。On the left pane, under Monitoring, select Metrics.

  3. 选择“文件”作为存储帐户范围的指标命名空间。Select File as the metric namespace for your storage account scope.

  4. 选择“事务”作为指标。Select Transactions as the metric.

  5. 添加一个“响应类型”筛选器,然后检查是否有任何请求被限制。Add a filter for Response type, and then check to see whether any requests have been throttled.

    对于标准文件共享,如果某个请求被限制,则会记录以下响应类型:For standard file shares, the following response types are logged if a request is throttled:

    • SuccessWithThrottlingSuccessWithThrottling
    • ClientThrottlingErrorClientThrottlingError

    对于高级文件共享,如果某个请求被限制,则会记录以下响应类型:For premium file shares, the following response types are logged if a request is throttled:

    • SuccessWithShareEgressThrottlingSuccessWithShareEgressThrottling
    • SuccessWithShareIngressThrottlingSuccessWithShareIngressThrottling
    • SuccessWithShareIopsThrottlingSuccessWithShareIopsThrottling
    • ClientShareEgressThrottlingErrorClientShareEgressThrottlingError
    • ClientShareIngressThrottlingErrorClientShareIngressThrottlingError
    • ClientShareIopsThrottlingErrorClientShareIopsThrottlingError

    若要详细了解每个响应类型,请参阅指标维度To learn more about each response type, see Metric dimensions.

    高级文件共享的指标选项的屏幕截图,其中显示了“响应类型”属性筛选器。

    备注

    若要接收警报,请参阅本文后面的“如何创建文件共享受到限制时的警报”部分。To receive an alert, see the "How to create an alert if a file share is throttled" section later in this article.

解决方案Solution

  • 如果使用的是标准文件共享,请在存储帐户上启用大型文件共享If you're using a standard file share, enable large file shares on your storage account. 大型文件共享支持每个共享最多 10,000 IOPS。Large file shares support up to 10,000 IOPS per share.
  • 如果使用的是高级文件共享,请增加预配的文件共享大小,以便提高 IOPS 限制。If you're using a premium file share, increase the provisioned file share size to increase the IOPS limit. 若要了解详细信息,请参阅了解高级文件共享的预配To learn more, see the Understanding provisioning for premium file shares.

原因 2:元数据或命名空间工作负载繁重Cause 2: Metadata or namespace heavy workload

如果大多数请求以元数据为中心(例如 createfile、openfile、closefile、queryinfo 或 querydirectory),则与读/写操作相比,延迟将会更严重。If the majority of your requests are metadata-centric (such as createfile, openfile, closefile, queryinfo, or querydirectory), the latency will be worse than that of read/write operations.

若要确定你的大多数请求是否以元数据为中心,请先按照先前“原因 1”中概述的步骤 1-4 进行操作。To determine whether most of your requests are metadata-centric, start by following steps 1-4 as previously outlined in Cause 1. 对于步骤 5,请不要添加“响应类型”筛选器,而是添加“API 名称”属性筛选器 。For step 5, instead of adding a filter for Response type, add a property filter for API name.

高级文件共享的指标选项的屏幕截图,其中显示了“API 名称”属性筛选器。

解决方法Workaround

  • 检查是否可以修改应用程序以减少元数据操作的数量。Check to see whether the application can be modified to reduce the number of metadata operations.
  • 在文件共享上添加虚拟硬盘 (VHD),并从客户端通过 SMB 装载 VHD,以便对数据执行文件操作。Add a virtual hard disk (VHD) on the file share and mount the VHD over SMB from the client to perform file operations against the data. 此方法适用于单个写入器和多个读取器的情况,并允许元数据操作在本地进行。This approach works for single writer and multiple readers scenarios and allows metadata operations to be local. 安装程序提供的性能与本地直连的存储的性能类似。The setup offers performance similar to that of a local directly attached storage.

原因 3:单线程应用程序Cause 3: Single-threaded application

如果使用的应用程序是单线程的,则此安装程序可能会导致 IOPS 吞吐量明显低于最大可能的吞吐量,具体取决于预配的共享大小。If the application that you're using is single-threaded, this setup can result in significantly lower IOPS throughput than the maximum possible throughput, depending on your provisioned share size.

解决方案Solution

  • 通过增加线程数来提高应用程序的并行度。Increase application parallelism by increasing the number of threads.
  • 切换到支持并行度的应用程序。Switch to applications where parallelism is possible. 例如,对于复制操作,可以在 Windows 客户端中使用 AzCopy 或 RoboCopy,或者在 Linux 客户端中使用 parallel 命令。For example, for copy operations, you could use AzCopy or RoboCopy from Windows clients or the parallel command from Linux clients.

请求的延迟很高Very high latency for requests

原因Cause

客户端虚拟机 (VM) 所在的区域可能与文件共享所在的区域不同。The client virtual machine (VM) could be located in a different region than the file share. 高延迟的其他原因可能是由于客户端或网络造成的延迟。Other reason for high latency could be due to the latency caused by the client or the network.

解决方案Solution

  • 从与文件共享位于同一区域的 VM 运行应用程序。Run the application from a VM that's located in the same region as the file share.
  • 对于存储帐户,通过 Azure 门户中的 Azure Monitor 查看事务指标 SuccessE2ELatency 和 SuccessServerLatency。For your storage account, review transaction metrics SuccessE2ELatency and SuccessServerLatency via Azure Monitor in Azure portal. SuccessE2ELatency 和 SuccessServerLatency 指标值之间的较大差异表示可能由网络或客户端引起的延迟。A high difference between SuccessE2ELatency and SuccessServerLatency metrics values is an indication of latency that is likely caused by the network or the client. 请参阅 Azure 文件存储监视数据参考中的事务指标See Transaction metrics in Azure Files Monitoring data reference.

客户端无法实现网络支持的最大吞吐量Client unable to achieve maximum throughput supported by the network

原因Cause

一个可能原因是缺少用于标准文件共享的 SMB 多通道支持。One potential cause is a lack of SMB multi-channel support for standard file shares. 目前,Azure 文件存储仅支持单个通道,因此从客户端 VM 到服务器只有一个连接。Currently, Azure Files supports only single channel, so there's only one connection from the client VM to the server. 此单一连接限定为客户端 VM 上的单一核心,因此,可从 VM 实现的最大吞吐量受限于单个核心。This single connection is pegged to a single core on the client VM, so the maximum throughput achievable from a VM is bound by a single core.

解决方法Workaround

  • 获取核心更大的 VM 可能有助于提高吞吐量。Obtaining a VM with a bigger core might help improve throughput.
  • 从多个 VM 运行客户端应用程序会提高吞吐量。Running the client application from multiple VMs will increase throughput.
  • 尽可能地使用 REST API。Use REST APIs where possible.

Linux 客户端上的吞吐量明显低于 Windows 客户端上的吞吐量Throughput on Linux clients is significantly lower than that of Windows clients

原因Cause

这是在 Linux 上实施 SMB 客户端的一个已知问题。This is a known issue with the implementation of the SMB client on Linux.

解决方法Workaround

  • 跨多个 VM 分散负载。Spread the load across multiple VMs.
  • 在同一 VM 上,通过 nosharesock 选项使用多个装入点,并将负载分散到这些装入点。On the same VM, use multiple mount points with a nosharesock option, and spread the load across these mount points.
  • 在 Linux 上,尝试使用 nostrictsync 选项进行装载,以避免每次调用 fsync 时都强制执行 SMB 刷新 。On Linux, try mounting with a nostrictsync option to avoid forcing an SMB flush on every fsync call. 对于 Azure 文件存储,此选项不会影响数据一致性,但可能会导致目录列表(ls -l 命令)中出现过时的文件元数据。For Azure Files, this option doesn't interfere with data consistency, but it might result in stale file metadata on directory listings (ls -l command). 使用 stat 命令直接查询文件元数据将返回最新的文件元数据。Directly querying file metadata by using the stat command will return the most up-to-date file metadata.

涉及大量打开/关闭操作的元数据密集型工作负载的延迟较高High latencies for metadata-heavy workloads involving extensive open/close operations

原因Cause

缺少目录租约支持。Lack of support for directory leases.

解决方法Workaround

  • 如果可能,请避免短时间内在同一目录中使用过多的打开/关闭句柄。If possible, avoid using an excessive opening/closing handle on the same directory within a short period of time.
  • 对于 Linux VM,请指定“actimeo=<sec>”作为装载选项,以增大目录条目缓存超时。For Linux VMs, increase the directory entry cache timeout by specifying actimeo=<sec> as a mount option. 默认情况下,超时值为 1 秒,因此较大的值(例如 3 或 5 秒)可能会有所帮助。By default, the timeout is 1 second, so a larger value, such as 3 or 5 seconds, might help.
  • 对于 CentOS Linux 或 Red Hat Enterprise Linux (RHEL) VM,请将系统升级到 CentOS Linux 8.2 或 RHEL 8.2。For CentOS Linux or Red Hat Enterprise Linux (RHEL) VMs, upgrade the system to CentOS Linux 8.2 or RHEL 8.2. 对于其他 Linux VM,请将内核升级到 5.0 或更高版本。For other Linux VMs, upgrade the kernel to 5.0 or later.

CentOS Linux 或 RHEL 上的 IOPS 较低Low IOPS on CentOS Linux or RHEL

原因Cause

CentOS Linux 或 RHEL 不支持大于 1 的 I/O 深度。An I/O depth of greater than 1 is not supported on CentOS Linux or RHEL.

解决方法Workaround

  • 升级到 CentOS Linux 8 或 RHEL 8。Upgrade to CentOS Linux 8 or RHEL 8.
  • 改用 Ubuntu。Change to Ubuntu.

在 Linux 中将文件复制到 Azure 文件共享以及从中复制文件时速度缓慢Slow file copying to and from Azure file shares in Linux

如果复制文件时速度缓慢,请查看 Linux 故障排除指南中的“在 Linux 中向/从 Azure 文件共享复制文件时速度缓慢”部分。If you're experiencing slow file copying, see the "Slow file copying to and from Azure file shares in Linux" section in the Linux troubleshooting guide.

IOPS 出现抖动或锯齿模式Jittery or sawtooth pattern for IOPS

原因Cause

客户端应用程序始终超过基线 IOPS。The client application consistently exceeds baseline IOPS. 当前,尚无请求负载的服务端平滑处理。Currently, there's no service-side smoothing of the request load. 如果客户端超过基线 IOPS,它将受到服务的限制。If the client exceeds baseline IOPS, it will get throttled by the service. 该限制可能导致客户端出现抖动或锯齿 IOPS 模式。The throttling can result in the client experiencing a jittery or sawtooth IOPS pattern. 在这种情况下,客户端实现的平均 IOPS 可能低于基线 IOPS。In this case, the average IOPS achieved by the client might be lower than the baseline IOPS.

解决方法Workaround

  • 减少客户端应用程序的请求负载,以使共享不会受到限制。Reduce the request load from the client application, so that the share doesn't get throttled.
  • 提高共享配额,以使共享不会受到限制。Increase the quota of the share so that the share doesn't get throttled.

过多的 DirectoryOpen/DirectoryClose 调用Excessive DirectoryOpen/DirectoryClose calls

原因Cause

如果最频繁的 API 调用中包括 DirectoryOpen/DirectoryClose 调用,而你预计客户端不会发出这么多的调用,则问题可能是 Azure 客户端 VM 上安装的防病毒软件引起的。If the number of DirectoryOpen/DirectoryClose calls is among the top API calls and you don't expect the client to make that many calls, the issue might be caused by the antivirus software that's installed on the Azure client VM.

解决方法Workaround

文件创建速度慢于预期File creation is slower than expected

原因Cause

依赖于创建大量文件的工作负载不会在高级文件共享和标准文件共享之间出现明显的性能差异。Workloads that rely on creating a large number of files won't see a substantial difference in performance between premium file shares and standard file shares.

解决方法Workaround

  • 无。None.

在 Windows 8.1 或 Server 2012 R2 中的性能不佳Slow performance from Windows 8.1 or Server 2012 R2

原因Cause

对于 I/O 密集型工作负载,访问 Azure 文件共享时的延迟要高于预期。Higher than expected latency accessing Azure file shares for I/O-intensive workloads.

解决方法Workaround

如何创建文件共享受到限制时的警报How to create an alert if a file share is throttled

  1. 在 Azure 门户 中转到自己的存储帐户。Go to your storage account in the Azure portal.

  2. 在“监视”部分中单击“警报”,然后单击“+ 新建警报规则”。 In the Monitoring section, click Alerts, and then click + New alert rule.

  3. 单击“编辑资源”,为存储帐户选择“文件资源类型”,然后单击“完成”。Click Edit resource, select the File resource type for the storage account and then click Done. 例如,如果存储帐户名称为 contoso,请选择 contoso/file 资源。For example, if the storage account name is contoso, select the contoso/file resource.

  4. 单击“选择条件”以添加条件。Click Add condition to add a condition.

  5. 你将看到存储帐户支持的信号列表,请选择“事务”指标。You will see a list of signals supported for the storage account, select the Transactions metric.

  6. 在“配置信号逻辑”边栏选项卡上,单击“维度名称”下拉列表,然后选择“响应类型”。On the Configure signal logic blade, click the Dimension name drop-down and select Response type.

  7. 单击“维度值”下拉列表,然后为你的文件共享选择适当的响应类型。Click the Dimension values drop-down and select the appropriate response types for your file share.

    对于标准文件共享,请选择以下响应类型:For standard file shares, select the following response types:

    • SuccessWithThrottlingSuccessWithThrottling
    • ClientThrottlingErrorClientThrottlingError

    对于高级文件共享,请选择以下响应类型:For premium file shares, select the following response types:

    • SuccessWithShareEgressThrottlingSuccessWithShareEgressThrottling
    • SuccessWithShareIngressThrottlingSuccessWithShareIngressThrottling
    • SuccessWithShareIopsThrottlingSuccessWithShareIopsThrottling
    • ClientShareEgressThrottlingErrorClientShareEgressThrottlingError
    • ClientShareIngressThrottlingErrorClientShareIngressThrottlingError
    • ClientShareIopsThrottlingErrorClientShareIopsThrottlingError

    备注

    如果“维度值”下拉列表中未列出响应类型,这意味着资源未被限制。If the response types are not listed in the Dimension values drop-down, this means the resource has not been throttled. 若要添加维度值,请在“维度值”下拉列表旁边选择“添加自定义值”,输入响应类型类型(例如 SuccessWithThrottling),选择“确定”,然后重复上述步骤,为你的文件共享添加所有适用的响应类型。To add the dimension values, next to the Dimension values drop-down list, select Add custom value, enter the respone type (for example, SuccessWithThrottling), select OK, and then repeat these steps to add all applicable response types for your file share.

  8. 对于“高级文件共享”,请单击“维度名称”下拉列表,然后选择“文件共享”。For premium file shares, click the Dimension name drop-down and select File Share. 对于“标准文件共享”,请跳到“步骤 #10”。For standard file shares, skip to step #10.

    备注

    如果文件共享是标准文件共享,则“File Share”维度不会列出文件共享,因为每个共享指标对标准文件共享不可用。If the file share is a standard file share, the File Share dimension will not list the file share(s) because per-share metrics are not available for standard file shares. 如果存储帐户中的任何文件共享受到限制,则会触发标准文件共享的限制警报,并且警报不会识别哪个文件共享受到限制。Throttling alerts for standard file shares will be triggered if any file share within the storage account is throttled and the alert will not identify which file share was throttled. 因为每共享指标不可用于标准文件共享,所以建议为每个存储帐户使用一个文件共享。Since per-share metrics are not available for standard file shares, the recommendation is to have one file share per storage account.

  9. 单击“维度值”下拉列表,并选择要对其发出警报的文件共享。Click the Dimension values drop-down and select the file share(s) that you want to alert on.

  10. 定义“警报参数”(阈值、运算符、聚合粒度和评估频率),然后单击“完成”。Define the alert parameters (threshold value, operator, aggregation granularity and frequency of evaluation) and click Done.

    提示

    如果你使用的是静态阈值,并且文件共享当前受到限制,则可通过指标图表来确定合理的阈值。If you are using a static threshold, the metric chart can help determine a reasonable threshold value if the file share is currently being throttled. 如果使用的是动态阈值,则指标图表将显示基于最新数据计算出的阈值。If you are using a dynamic threshold, the metric chart will display the calculated thresholds based on recent data.

  11. 单击“添加操作组”,通过选择现有操作组或创建新的操作组,将一个操作组(电子邮件、短信等)添加到警报中。Click Add action groups to add an action group (email, SMS, etc.) to the alert either by selecting an existing action group or creating a new action group.

  12. 填写 警报详细信息,例如 警报规则名称说明严重性Fill in the Alert details like Alert rule name, Description, and Severity.

  13. 单击“创建警报规则”以创建警报。Click Create alert rule to create the alert.

若要详细了解如何在 Azure Monitor 中配置警报,请参阅 Azure 中的警报概述To learn more about configuring alerts in Azure Monitor, see Overview of alerts in Azure.

  1. 在 Azure 门户中转到自己的存储帐户。In the Azure portal, go to your storage account.

  2. 在“监视”部分中选择“警报”,然后选择“新建警报规则” 。In the Monitoring section, select Alerts, and then select New alert rule.

  3. 选择“编辑资源”,为存储帐户选择“文件资源类型”,然后选择“完成” 。Select Edit resource, select the File resource type for the storage account, and then select Done. 例如,如果存储帐户名称为 contoso,则选择 contoso/文件资源。For example, if the storage account name is contoso, select the contoso/file resource.

  4. 选择“选择条件”以添加条件。Select Select Condition to add a condition.

  5. 在存储帐户支持的信号列表中,选择“流出量”指标。In the list of signals that are supported for the storage account, select the Egress metric.

    备注

    必须创建三个单独的警报,以在流入量、流出量或事务值超过所设置的阈值时发出警报。You have to create three separate alerts to be alerted when the ingress, egress, or transaction values exceed the thresholds you set. 这是因为仅当满足所有条件时才会触发警报。This is because an alert is triggered only when all conditions are met. 例如,如果将所有条件都放入一个警报,则仅当流入量、流出量和事务都超出其阈值量时才会发出警报。For example, if you put all the conditions in one alert, you would be alerted only if ingress, egress, and transactions exceed their threshold amounts.

  6. 向下滚动。Scroll down. 在“维度名称”下拉列表中,选择“文件共享” 。In the Dimension name drop-down list, select File Share.

  7. 在“维度值”下拉列表中,选择要对其发出警报的文件共享。In the Dimension values drop-down list, select the file share or shares that you want to alert on.

  8. 通过选择“运算符”、“阈值”、“聚合粒度”和“评估频率”下拉列表中的值来定义警报参数,然后选择“完成” 。Define the alert parameters by selecting values in the Operator, Threshold value, Aggregation granularity, and Frequency of evaluation drop-down lists, and then select Done.

    流出量、流入量和事务指标以每分钟表示,但预配的流出量、流入量和 I/O 以每秒表示。Egress, ingress, and transactions metrics are expressed per minute, though you're provisioned egress, ingress, and I/O per second. 因此,例如,如果预配的流出量为每秒 90 兆字节 (MiB/s),并且你希望阈值为预配流出量的 80%,请选择以下警报参数:Therefore, for example, if your provisioned egress is 90 mebibytes per second (MiB/s) and you want your threshold to be 80 percent of provisioned egress, select the following alert parameters:

    • 阈值:75497472For Threshold value: 75497472
    • 运算符:大于或等于For Operator: greater than or equal to
    • 聚合类型:平均For Aggregation type: average

    根据所需的警报的干扰程度,你还可以选择“聚合粒度”和“评估频率”的值 。Depending on how noisy you want your alert to be, you can also select values for Aggregation granularity and Frequency of evaluation. 例如,如果你希望警报查看在 1 小时内的平均流入量,并且希望警报规则每小时运行一次,请选择以下内容:For example, if you want your alert to look at the average ingress over the time period of 1 hour, and you want your alert rule to be run every hour, select the following:

    • 聚合粒度:1 小时For Aggregation granularity: 1 hour
    • 评估频率:1 小时For Frequency of evaluation: 1 hour
  9. 选择“添加操作组”,然后通过选择现有操作组或创建新的操作组,将一个操作组(例如电子邮件或短信)添加到警报中。Select Add action groups, and then add an action group (for example, email or SMS) to the alert either by selecting an existing action group or by creating a new one.

  10. 输入警报详细信息,例如“警报规则名称”、“说明”和“严重性” 。Enter the alert details, such as Alert rule name, Description, and Severity.

  11. 选择“创建警报规则”可以创建警报 。Select Create alert rule to create the alert.

    备注

    • 若要接收有关高级文件共享由于预配的流入量而接近限制的通知,请按照前面的说明进行操作,但需要进行以下更改:To be notified that your premium file share is close to being throttled because of provisioned ingress, follow the preceding instructions, but with the following change:

      • 在步骤 5 中,选择“流入量”指标,而不是“流出量” 。In step 5, select the Ingress metric instead of Egress.
    • 若要接收有关高级文件共享由于预配的 IOPS 而接近限制的通知,请按照前面的说明进行操作,但需要进行以下更改:To be notified that your premium file share is close to being throttled because of provisioned IOPS, follow the preceding instructions, but with the following changes:

      • 在步骤 5 中,选择“事务”指标,而不是“流出量” 。In step 5, select the Transactions metric instead of Egress.
      • 在步骤 10 中,“聚合类型”的唯一选项是“总计”。In step 10, the only option for Aggregation type is Total. 因此,阈值将取决于所选的聚合粒度。Therefore, the threshold value depends on your selected aggregation granularity. 例如,如果希望阈值为预配基线 IOPS 的 80%,并且为“聚合粒度”选择了“1 小时”,则“阈值”将为基线 IOPS(以字节为单位)× 0.8 × 3600 。For example, if you want your threshold to be 80 percent of provisioned baseline IOPS and you select 1 hour for Aggregation granularity, your Threshold value would be your baseline IOPS (in bytes) × 0.8 × 3600.

若要详细了解如何在 Azure Monitor 中配置警报,请参阅 Azure 中的警报概述To learn more about configuring alerts in Azure Monitor, see Overview of alerts in Azure.

另请参阅See also