在 Azure HDInsight 中调试 WASB 文件操作Debug WASB file operations in Azure HDInsight

有时候,你可能想了解 WASB 驱动程序从 Azure 存储启动了哪些操作。There are times when you may want to understand what operations the WASB driver started with Azure Storage. 对于客户端,WASB 驱动程序会在 DEBUG 级别为每个文件系统操作生成日志。For the client side, the WASB driver produces logs for each file system operation at DEBUG level. WASB 驱动程序使用 log4j 来控制日志记录级别,默认值为 INFO 级别。WASB driver uses log4j to control logging level and the default is INFO level. 有关 Azure 存储服务器端分析日志,请参阅 Azure 存储分析日志记录For Azure Storage server-side analytics logs, see Azure Storage analytics logging.

生成的日志将如下所示:A produced log will look similar to:

18/05/13 04:15:55 DEBUG NativeAzureFileSystem: Moving wasb://xxx@yyy.blob.core.chinacloudapi.cn/user/livy/ulysses.txt/_temporary/0/_temporary/attempt_20180513041552_0000_m_000000_0/part-00000 to wasb://xxx@yyy.blob.core.chinacloudapi.cn/user/livy/ulysses.txt/part-00000

为文件操作启用 WASB 调试日志Turn on WASB debug log for file operations

  1. 在 Web 浏览器中,导航到 https://CLUSTERNAME.azurehdinsight.cn/#/main/services/SPARK2/configs,其中 CLUSTERNAME 是你的 Spark 群集的名称。From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.cn/#/main/services/SPARK2/configs, where CLUSTERNAME is the name of your Spark cluster.

  2. 导航到“advanced spark2-log4j-properties” 。Navigate to advanced spark2-log4j-properties.

  3. log4j.appender.console.Threshold=INFO 修改为 log4j.appender.console.Threshold=DEBUGModify log4j.appender.console.Threshold=INFO to log4j.appender.console.Threshold=DEBUG.

    1. 添加 log4j.logger.org.apache.hadoop.fs.azure.NativeAzureFileSystem=DEBUGAdd log4j.logger.org.apache.hadoop.fs.azure.NativeAzureFileSystem=DEBUG.
  4. 导航到 Advanced livy2-log4j-propertiesNavigate to Advanced livy2-log4j-properties.

    添加 log4j.logger.org.apache.hadoop.fs.azure.NativeAzureFileSystem=DEBUGAdd log4j.logger.org.apache.hadoop.fs.azure.NativeAzureFileSystem=DEBUG.

  5. 保存更改。Save changes.

其他日志记录Additional logging

上述日志应提供对文件系统操作的概要理解。The above logs should provide high-level understanding of the file system operations. 如果上述日志仍未提供有用的信息,或者要调查 blob 存储 API 调用,请将 fs.azure.storage.client.logging=true 添加到 core-site 中。If the above logs are still not providing useful information, or if you want to investigate blob storage api calls, add fs.azure.storage.client.logging=true to the core-site. 此设置将为 wasb 存储驱动程序启用 java sdk 日志,并打印对 blob 存储服务器的每次调用。This setting will enable the java sdk logs for wasb storage driver and will print each call to blob storage server. 在调查后删除该设置,因为它可能会很快填满磁盘,并可能减慢进程。Remove the setting after investigations because it could fill up the disk quickly and could slow down the process.

如果后端是基于 Azure Data Lake 的,请为组件(例如 spark/tez/hdfs)使用以下 log4j 设置:If the backend is Azure Data Lake based, then use the following log4j setting for the component(for example, spark/tez/hdfs):

log4j.logger.com.microsoft.azure.datalake.store=ALL,adlsFile
log4j.additivity.com.microsoft.azure.datalake.store=true
log4j.appender.adlsFile=org.apache.log4j.FileAppender
log4j.appender.adlsFile.File=/var/log/adl/adl.log
log4j.appender.adlsFile.layout=org.apache.log4j.PatternLayout
log4j.appender.adlsFile.layout.ConversionPattern=%p\t%d{ISO8601}\t%r\t%c\t[%t]\t%m%n

/var/log/adl/adl.log 中的日志中查找日志。Look for the logs in /var/log/adl/adl.log for the logs.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal.