读取 Apache Kafka 数据的 Apache Spark 流式处理作业失败,在 HDInsight 中显示 NoClassDefFoundErrorApache Spark streaming job that reads Apache Kafka data fails with NoClassDefFoundError in HDInsight

本文介绍在 Azure HDInsight 群集中使用 Apache Spark 组件时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when using Apache Spark components in Azure HDInsight clusters.

问题Issue

Apache Spark 群集运行从 Apache Kafka 群集读取数据的 Spark 流式处理作业。The Apache Spark cluster runs a Spark streaming job that reads data from an Apache Kafka cluster. 如果启用了 Kafka 流压缩,则 Spark 流式处理作业将会失败。The Spark streaming job fails if the Kafka stream compression is turned on. 在这种情况下,由于出现错误,Spark 流式处理 Yarn 应用 application_1525986016285_0193 失败:In this case, the Spark streaming Yarn app application_1525986016285_0193 failed, due to error:

18/05/17 20:01:33 WARN YarnAllocator: Container marked as failed: container_e25_1525986016285_0193_01_000032 on host: wn87-Scaled.2ajnsmlgqdsutaqydyzfzii3le.cx.internal.chinacloudapp.cn. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e25_1525986016285_0193_01_000032
Exit code: 50
Stack trace: ExitCodeException exitCode=50: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)

原因Cause

此错误可能是由于指定的 spark-streaming-kafka jar 文件版本与正在运行的 Kafka 群集的版本不同而导致。This error can be caused by specifying a version of the spark-streaming-kafka jar file that is different than the version of the Kafka cluster you are running.

例如,如果运行的是 Kafka 群集版本 0.10.1,则以下命令将导致错误:For example, if you are running a Kafka cluster version 0.10.1, the following command will result in an error:

spark-submit \
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0
--conf spark.executor.instances=16 \
...
~/Kafka_Spark_SQL.py <bootstrap server details>

解决方法Resolution

使用带 –packages 选项的 Spark-submit 命令,并确保 spark-streaming-kafka jar 文件的版本与正在运行的 kafka 群集的版本相同。Use the Spark-submit command with the –packages option, and ensure that the version of the spark-streaming-kafka jar file is the same as the version of the Kafka cluster that you are running.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.