笔记本中的常见错误Common errors in notebooks

使用笔记本时会发生一些常见问题。There are some common issues that occur when using notebooks. 本部分概述了一些常见问题和应遵循的最佳做法。This section outlines some of the frequently asked questions and best practices that you should follow.

Spark 作业失败,并出现 java.lang.NoClassDefFoundError Spark job fails with java.lang.NoClassDefFoundError

有时,可能会遇到如下错误:Sometimes you may come across an error like:

java.lang.NoClassDefFoundError: Could not initialize class line.....$read$

如果在同一笔记本单元中搭配使用 case 类定义和 Dataset/DataFrame 操作,之后在另一个单元的 Spark 作业中使用 case 类,则在 Spark Scala 2.11 群集和 Scala 笔记本中可能会发生这种情况。This can occur with a Spark Scala 2.11 cluster and a Scala notebook, if you mix together a case class definition and Dataset/DataFrame operations in the same notebook cell, and later use the case class in a Spark job in a different cell. 例如,在第一个单元中,假设定义了 case 类 MyClass,并且还创建了数据集。For example, in the first cell, say you define a case class MyClass and also created a Dataset.

case class MyClass(value: Int)

val dataset = spark.createDataset(Seq(1))

然后,在后面的单元中,在 Spark 作业中创建 MyClass 的实例。Then in a later cell, you create instances of MyClass inside a Spark job.

dataset.map { i => MyClass(i) }.count()

解决方案Solution

将 case 类定义移至其自己的单元。Move the case class definition to a cell of its own.

case class MyClass(value: Int)   // no other code in this cell
val dataset = spark.createDataset(Seq(1))
dataset.map { i => MyClass(i) }.count()

Spark 作业由于出现 java.lang.UnsupportedOperationException 而失败Spark job fails with java.lang.UnsupportedOperationException

有时,可能会遇到如下错误:Sometimes you may come across an error like:

java.lang.UnsupportedOperationException: Accumulator must be registered before send to executor

Spark Scala 2.10 群集和 Scala 笔记本可能会发生这种情况。This can occur with a Spark Scala 2.10 cluster and a Scala notebook. 此错误的原因和解决方案与 Spark 作业失败并出现 java.lang.NoClassDefFoundError 的情况相同。The reason and solution for this error are same as that of Spark job fails with java.lang.NoClassDefFoundError.