“NativeAzureFileSystem...RequestBodyTooLarge”出现在 HDInsight 的 Apache Spark 流式处理应用日志中"NativeAzureFileSystem...RequestBodyTooLarge" appear in Apache Spark streaming app log in HDInsight
本文介绍在 Azure HDInsight 群集中使用 Apache Spark 组件时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when using Apache Spark components in Azure HDInsight clusters.
问题Issue
错误 NativeAzureFileSystem ... RequestBodyTooLarge
显示在 Apache Spark 流应用的驱动程序日志中。The error: NativeAzureFileSystem ... RequestBodyTooLarge
appears in the driver log for an Apache Spark streaming app.
原因Cause
Spark 事件日志文件可能达到了 WASB 的文件长度限制。Your Spark event log file is probably hitting the file length limit for WASB.
在 Spark 2.3 中,每个 Spark 应用会生成一个 Spark 事件日志文件。In Spark 2.3, each Spark app generates one Spark event log file. 在 Spark 流应用运行过程中,其事件日志文件会不断增大。The Spark event log file for a Spark streaming app continues to grow while the app is running. 目前,WASB 上的文件的块大小限制为 50000,默认块大小为 4 MB。Today a file on WASB has a 50000 block limit, and the default block size is 4 MB. 因此在默认配置中,最大文件大小为 195 GB。So in default configuration the max file size is 195 GB. 但是,Azure 存储已将最大块大小增大至 100 MB,这样就有效地将单个文件的限制提升至 4.75 TB。However, Azure Storage has increased the max block size to 100 MB, which effectively brought the single file limit to 4.75 TB. 有关详细信息,请参阅 Blob 存储可伸缩性和性能目标。For more information, see Scalability and performance targets for Blob storage.
解决方法Resolution
对于此错误,可以采用三种解决方法:There are three solutions available for this error:
将块大小增大至最大 100 MB。Increase the block size to up to 100 MB. 在 Ambari UI 中,修改 HDFS 配置属性
fs.azure.write.request.size
(或者在Custom core-site
节中创建该配置)。In Ambari UI, modify HDFS configuration propertyfs.azure.write.request.size
(or create it inCustom core-site
section). 将该属性设置为更大的值,例如:33554432。Set the property to a larger value, for example: 33554432. 保存更新的配置并重启受影响的组件。Save the updated configuration and restart affected components.定期停止并重新提交 Spark 流作业。Periodically stop and resubmit the spark-streaming job.
使用 HDFS 来存储 Spark 事件日志。Use HDFS to store Spark event logs. 使用 HDFS 存储日志可能会在群集缩放或 Azure 升级期间导致 Spark 事件数据丢失。Using HDFS for storage may result in loss of Spark event data during cluster scaling or Azure upgrades.
通过 Ambari UI 对
spark.eventlog.dir
和spark.history.fs.logDirectory
做出更改:Make changes tospark.eventlog.dir
andspark.history.fs.logDirectory
via Ambari UI:spark.eventlog.dir = hdfs://mycluster/hdp/spark2-events spark.history.fs.logDirectory = "hdfs://mycluster/hdp/spark2-events"
在 HDFS 上创建目录:Create directories on HDFS:
hadoop fs -mkdir -p hdfs://mycluster/hdp/spark2-events hadoop fs -chown -R spark:hadoop hdfs://mycluster/hdp hadoop fs -chmod -R 777 hdfs://mycluster/hdp/spark2-events hadoop fs -chmod -R o+t hdfs://mycluster/hdp/spark2-events
通过 Ambari UI 重启所有受影响的服务。Restart all affected services via Ambari UI.
后续步骤Next steps
如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:
通过 Azure 社区支持获取 Azure 专家的解答。Get answers from Azure experts through Azure Community Support.
如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal.