使用存储分析日志排查延迟问题Troubleshoot latency using Storage Analytics logs

诊断和故障排除是生成和支持使用 Azure 存储的客户端应用程序的关键技能。Diagnosing and troubleshooting is a key skill for building and supporting client applications with Azure Storage.

由于 Azure 应用程序是分布式的,因此对错误与性能问题进行诊断和故障排除可能会比在传统环境中更为复杂。Because of the distributed nature of an Azure application, diagnosing and troubleshooting both errors and performance issues may be more complex than in traditional environments.

以下步骤演示了如何使用 Azure 存储分析日志来查明并解决延迟问题,并优化客户端应用程序。The following steps demonstrate how to identify and troubleshoot latency issues using Azure Storage Analytic logs, and optimize the client application.

  1. 下载存储分析日志Download the Storage Analytics logs.

  2. 使用以下 PowerShell 脚本将原始格式的日志转换为表格格式:Use the following PowerShell script to convert the raw format logs into tabular format:

    $Columns = 
         (   "version-number",
             "request-start-time",
             "operation-type",
             "request-status",
             "http-status-code",
             "end-to-end-latency-in-ms",
             "server-latency-in-ms",
             "authentication-type",
             "requester-account-name",
             "owner-account-name",
             "service-type",
             "request-url",
             "requested-object-key",
             "request-id-header",
             "operation-count",
             "requester-ip-address",
             "request-version-header",
             "request-header-size",
             "request-packet-size",
             "response-header-size",
             "response-packet-size",
             "request-content-length",
             "request-md5",
             "server-md5",
             "etag-identifier",
             "last-modified-time",
             "conditions-used",
             "user-agent-header",
             "referrer-header",
             "client-request-id"
         )
    
    $logs = Import-Csv "REPLACE THIS WITH FILE PATH" -Delimiter ";" -Header $Columns
    
    $logs | Out-GridView -Title "Storage Analytic Log Parser"
    
  3. 该脚本将启动一个 GUI 窗口,你可以在其中按列筛选信息,如下所示。The script will launch a GUI window where you can filter the information by columns, as shown below.

    存储分析日志分析程序窗口

  4. 根据“operation-type”缩小日志条目的范围,并查找在发生问题的时间范围内创建的日志条目。Narrow down the log entries based on "operation-type", and look for the log entry created during the issue's time frame.

    Operation-type 日志条目

  5. 在发生问题的时间范围内,以下值很重要:During the time when the issue occurred, the following values are important:

    • Operation-type = GetBlobOperation-type = GetBlob
    • request-status = SASNetworkErrorrequest-status = SASNetworkError
    • End-to-End-Latency-In-Ms = 8453End-to-End-Latency-In-Ms = 8453
    • Server-Latency-In-Ms = 391Server-Latency-In-Ms = 391

    端到端延迟是使用以下公式计算的:End-to-End Latency is calculated using the following equation:

    • 端到端延迟 = 服务器延迟 + 客户端延迟End-to-End Latency = Server-Latency + Client Latency

    使用日志条目计算客户端延迟:Calculate the Client Latency using the log entry:

    • 客户端延迟 = 端到端延迟 – 服务器延迟Client Latency = End-to-End Latency – Server-Latency

       * Example: 8453 – 391 = 8062ms
      

    下表提供了有关高延迟 OperationType 和 RequestStatus 结果的信息:The following table provides information about the high latency OperationType and RequestStatus results:

    RequestStatus=RequestStatus=
    SuccessSuccess
    RequestStatus=RequestStatus=
    (SAS)NetworkError(SAS)NetworkError
    建议Recommendation
    GetBlobGetBlob Yes No GetBlob 操作: RequestStatus = SuccessGetBlob Operation: RequestStatus = Success
    GetBlobGetBlob No Yes GetBlob 操作: RequestStatus = (SAS)NetworkErrorGetBlob Operation: RequestStatus = (SAS)NetworkError
    PutBlobPutBlob Yes No Put 操作: RequestStatus = SuccessPut Operation: RequestStatus = Success
    PutBlobPutBlob No Yes Put 操作: RequestStatus = (SAS)NetworkErrorPut Operation: RequestStatus = (SAS)NetworkError

状态结果Status results

GetBlob 操作:RequestStatus = SuccessGetBlob Operation: RequestStatus = Success

按“建议的步骤”部分中的步骤 5 所述检查以下值:Check the following values as mentioned in step 5 of the "Recommended steps" section:

  • 端到端延迟End-to-End Latency
  • 服务器延迟Server-Latency
  • 客户端延迟Client-Latency

RequestStatus = SuccessGetBlob 操作中,如果在 Client-Latency 中花费了 Max Time,则这表明 Azure 存储在将数据写入到客户端时花费了大量时间。In a GetBlob Operation with RequestStatus = Success, if Max Time is spent in Client-Latency, this indicates that Azure Storage is spending a large volume of time writing data to the client. 此延迟表明客户端存在问题。This delay indicates a Client-Side Issue.

建议:Recommendation:

  • 在客户端中调查代码。Investigate the code in your client.
  • 使用 Wireshark、Microsoft Message Analyzer 或 Tcping 调查客户端的网络连接问题。Use Wireshark, Microsoft Message Analyzer, or Tcping to investigate network connectivity issues from the client.

GetBlob 操作:RequestStatus = (SAS)NetworkErrorGetBlob Operation: RequestStatus = (SAS)NetworkError

按“建议的步骤”部分中的步骤 5 所述检查以下值:Check the following values as mentioned in step 5 of the "Recommended steps" section:

  • 端到端延迟End-to-End Latency
  • 服务器延迟Server-Latency
  • 客户端延迟Client-Latency

RequestStatus = (SAS)NetworkErrorGetBlob 操作中,如果在 Client-Latency 中花费了 Max Time,则最常见的问题是客户端在存储服务超时之前断开连接。In a GetBlob Operation with RequestStatus = (SAS)NetworkError, if Max Time is spent in Client-Latency, the most common issue is that the client is disconnecting before a timeout expires in the storage service.

建议:Recommendation:

  • 应调查客户端中的代码,以了解客户端断开与存储服务的连接的原因和时间。Investigate the code in your client to understand why and when the client disconnects from the storage service.
  • 使用 Wireshark、Microsoft Message Analyzer 或 Tcping 调查客户端的网络连接问题。Use Wireshark, Microsoft Message Analyzer, or Tcping to investigate network connectivity issues from the client.

Put 操作:RequestStatus = SuccessPut Operation: RequestStatus = Success

按“建议的步骤”部分中的步骤 5 所述检查以下值:Check the following values as mentioned in step 5 of the "Recommended steps" section:

  • 端到端延迟End-to-End Latency
  • 服务器延迟Server-Latency
  • 客户端延迟Client-Latency

RequestStatus = SuccessPut 操作中,如果在 Client-Latency 中花费了 Max Time,则这表明客户端在将数据发送到 Azure 存储时花费了更多时间。In a Put Operation with RequestStatus = Success, if Max Time is spent in Client-Latency, this indicates that the Client is taking more time to send data to the Azure Storage. 此延迟表明客户端存在问题。This delay indicates a Client-Side Issue.

建议:Recommendation:

  • 在客户端中调查代码。Investigate the code in your client.
  • 使用 Wireshark、Microsoft Message Analyzer 或 Tcping 调查客户端的网络连接问题。Use Wireshark, Microsoft Message Analyzer, or Tcping to investigate network connectivity issues from the client.

Put 操作:RequestStatus = (SAS)NetworkErrorPut Operation: RequestStatus = (SAS)NetworkError

按“建议的步骤”部分中的步骤 5 所述检查以下值:Check the following values as mentioned in step 5 of the "Recommended steps" section:

  • 端到端延迟End-to-End Latency
  • 服务器延迟Server-Latency
  • 客户端延迟Client-Latency

RequestStatus = (SAS)NetworkErrorPutBlob 操作中,如果在 Client-Latency 中花费了 Max Time,则最常见的问题是客户端在存储服务超时之前断开连接。In a PutBlob Operation with RequestStatus = (SAS)NetworkError, if Max Time is spent in Client-Latency, the most common issue is that the client is disconnecting before a timeout expires in the storage service.

建议:Recommendation:

  • 应调查客户端中的代码,以了解客户端断开与存储服务的连接的原因和时间。Investigate the code in your client to understand why and when the client disconnects from the storage service.
  • 使用 Wireshark、Microsoft Message Analyzer 或 Tcping 调查客户端的网络连接问题。Use Wireshark, Microsoft Message Analyzer, or Tcping to investigate network connectivity issues from the client.