使用 Azure Toolkit for IntelliJ 进行失败 Spark 作业调试(预览)Failure spark job debugging with Azure Toolkit for IntelliJ (preview)

本文逐步介绍如何使用 Azure Toolkit for IntelliJ 中的 HDInsight 工具来运行 Spark 失败调试应用程序。This article provides step-by-step guidance on how to use HDInsight Tools in Azure Toolkit for IntelliJ to run Spark Failure Debug applications.

先决条件Prerequisites

使用调试模板创建项目Create a project with debugging template

请创建 spark2.3.2 项目以继续失败调试,并使用此文档中的失败任务调试示例文件。Create a spark2.3.2 project to continue failure debug, take failure task debugging sample file in this document.

  1. 打开 IntelliJ IDEA。Open IntelliJ IDEA. 打开“新建项目”窗口。 Open the New Project window.

    a.a. 在左窗格中选择“Azure Spark/HDInsight” 。Select Azure Spark/HDInsight from the left pane.

    b.b. 从主窗口中选择“Spark 项目和失败任务调试示例(预览)(Scala)”。 Select Spark Project with Failure Task Debugging Sample(Preview)(Scala) from the main window.

    创建调试项目

    c.c. 选择“下一步”。Select Next.

  2. 在“新建项目” 窗口中执行以下步骤:In the New Project window, do the following steps:

    选择 Spark SDK

    a.a. 输入项目名称和项目位置。Enter a project name and project location.

    b.b. 在“项目 SDK”下拉列表中,选择适用于 Spark 2.3.2 群集的 Java 1.8In the Project SDK drop-down list, select Java 1.8 for Spark 2.3.2 cluster.

    c.c. 在“Spark 版本”下拉列表中,选择“Spark 2.3.2(Scala 2.11.8)”。 In the Spark Version drop-down list, select Spark 2.3.2(Scala 2.11.8).

    d.d. 选择“完成”。 Select Finish.

  3. 选择 src > main > scala 打开项目中的代码。Select src > main > scala to open your code in the project. 此示例使用 AgeMean_Div() 脚本。This example uses the AgeMean_Div() script.

在 HDInsight 群集中运行 Spark Scala/Java 应用程序Run a Spark Scala/Java application on an HDInsight cluster

执行以下步骤,创建一个 spark Scala/Java 应用程序,然后在 Spark 群集中运行该应用程序:Create a spark Scala/Java application, then run the application on a Spark cluster by doing the following steps:

  1. 单击“添加配置”,打开“运行/调试配置”窗口。 Click Add Configuration to open Run/Debug Configurations window.

    编辑配置

  2. 在“运行/调试配置”对话框中,选择加号 (+) 。In the Run/Debug Configurations dialog box, select the plus sign (+). 然后选择“HDInsight 上的 Apache Spark”选项。 Then select the Apache Spark on HDInsight option.

    添加新配置

  3. 切换到“在群集中远程运行”选项卡。 为“名称” 、“Spark 群集” 和“Main 类名” 输入信息。Switch to Remotely Run in Cluster tab. Enter information for Name, Spark cluster, and Main class name. 工具支持使用“执行器” 进行调试。Our tools support debug with Executors. numExectors 的默认值为 5,设置的值最好不要大于 3。The numExectors, the default value is 5, and you'd better not set higher than 3. 若要减少运行次数,可以将 spark.yarn.maxAppAttempts 添加到“作业配置”中并将值设置为 1。 To reduce the run time, you can add spark.yarn.maxAppAttempts into job Configurations and set the value to 1. 单击“确定”按钮,保存配置。 Click OK button to save the configuration.

    运行调试配置

  4. 现已使用提供的名称保存配置。The configuration is now saved with the name you provided. 若要查看配置详细信息,请选择配置名称。To view the configuration details, select the configuration name. 若要进行更改,请选择“编辑配置”。 To make changes, select Edit Configurations.

  5. 完成配置设置后,可以针对远程群集运行项目。After you complete the configurations settings, you can run the project against the remote cluster.

    远程运行按钮

  6. 可以在输出窗口中查看应用程序 ID。You can check the application ID from the output window.

    远程运行按钮

下载已失败作业的配置文件Download failed job profile

如果作业提交失败,可以将已失败作业的配置文件下载到本地计算机进行进一步的调试。If the job submission fails, you could download the failed job profile to the local machine for further debugging.

  1. 打开 Microsoft Azure 存储资源管理器,找到已失败作业所在群集的 HDInsight 帐户,将已失败作业的资源从相应的位置 (\hdp\spark2-events\.spark-failures\<application ID>) 下载到本地文件夹。Open Microsoft Azure Storage Explorer, locate the HDInsight account of the cluster for the failed job, download the failed job resources from the corresponding location: \hdp\spark2-events\.spark-failures\<application ID> to a local folder. “活动”窗口会显示下载进度。The activities window will show the download progress.

    下载失败文件

    下载失败文件

配置本地调试环境并在失败时进行调试Configure local debugging environment and debug on failure

  1. 打开原始项目,或者创建一个新项目并将其与原始源代码相关联。Open the original project or create a new project and associate it with the original source code. 目前仅支持对 spark2.3.2 版本进行失败调试。Only spark2.3.2 version is supported for failure debugging currently.

  2. 在 IntelliJ IDEA 中,创建一个 Spark 失败调试配置文件,从以前下载的已失败作业的资源中选择与“Spark 作业失败上下文位置”字段对应的 FTD 文件。 In IntelliJ IDEA, create a Spark Failure Debug config file, select the FTD file from the previously downloaded failed job resources for the Spark Job Failure Context location field.

    远程运行按钮

  3. 单击工具栏中的本地运行按钮,错误就会显示在“运行”窗口中。Click the local run button in the toolbar, the error will display in Run window.

    远程运行按钮

    远程运行按钮

  4. 按日志指示设置断点,然后单击本地调试按钮进行本地调试,就像 IntelliJ 中的正常 Scala/Java 项目一样。Set break point as the log indicates, then click local debug button to do local debugging just as your normal Scala / Java projects in IntelliJ.

  5. 调试后,如果项目成功完成,则可将失败的作业重新提交到 Spark on HDInsight 群集。After debugging, if the project completes successfully, you could resubmit the failed job to your spark on HDInsight cluster.

后续步骤Next steps

方案Scenarios

创建和运行应用程序Create and run applications

工具和扩展Tools and extensions

管理资源Manage resources