使用 Azure Toolkit for IntelliJ 为 HDInsight 群集创建 Apache Spark 应用程序Use Azure Toolkit for IntelliJ to create Apache Spark applications for HDInsight cluster

本文演示如何使用 IntelliJ IDE 的 Azure Toolkit 插件在 Azure HDInsight 上开发 Apache Spark 应用程序。This article demonstrates how to develop Apache Spark applications on Azure HDInsight using the Azure Toolkit plug-in for the IntelliJ IDE. Azure HDInsight 是托管在云中的开放源代码分析服务。Azure HDInsight is a managed, open-source analytics service in the cloud. 凭借该服务,你可使用开放源代码框架,如 Hadoop、Apache Spark、Apache Hive 和 Apache Kafka。The service allows you to use open-source frameworks like Hadoop, Apache Spark, Apache Hive, and Apache Kafka.

可按多种方式使用 Azure 工具包插件:You can use the Azure Toolkit plug-in in a few ways:

  • 开发 Scala Spark 应用程序并将其提交到 HDInsight Spark 群集。Develop and submit a Scala Spark application to an HDInsight Spark cluster.
  • 访问 Azure HDInsight Spark 群集资源。Access your Azure HDInsight Spark cluster resources.
  • 本地开发和运行 Scala Spark 应用程序。Develop and run a Scala Spark application locally.

在本文中,学习如何:In this article, you learn how to:

  • 使用 Azure Toolkit for IntelliJ 插件Use the Azure Toolkit for IntelliJ plug-in
  • 开发 Apache Spark 应用程序Develop Apache Spark applications
  • 将应用程序提交到 Azure HDInsight 群集Submit an application to Azure HDInsight cluster

先决条件Prerequisites

安装适用于 IntelliJ IDEA 的 Scala 插件Install Scala plugin for IntelliJ IDEA

安装 Scala 插件的步骤:Steps to install the Scala plugin:

  1. 打开 IntelliJ IDEA。Open IntelliJ IDEA.

  2. 在欢迎屏幕上,导航到“配置” > “插件”打开“插件”窗口。 On the welcome screen, navigate to Configure > Plugins to open the Plugins window.

    启用 scala 插件

  3. 选择在新窗口中作为特色功能列出的 Scala 插件对应的“安装”。Select Install for the Scala plugin that is featured in the new window.

    安装 scala 插件

  4. 成功安装该插件后,必须重启 IDE。After the plugin installs successfully, you must restart the IDE.

为 HDInsight Spark 群集创建 Spark Scala 应用程序Create a Spark Scala application for an HDInsight Spark cluster

  1. 启动 IntelliJ IDEA,选择“创建新项目”打开“新建项目”窗口。 Start IntelliJ IDEA, and select Create New Project to open the New Project window.

  2. 在左窗格中选择“Azure Spark/HDInsight”。Select Azure Spark/HDInsight from the left pane.

  3. 在主窗口中选择“Spark 项目(Scala)”。Select Spark Project (Scala) from the main window.

  4. 在“生成工具”下拉列表中选择以下任一选项:From the Build tool drop-down list, select one of the following options:

    • Maven:支持 Scala 项目创建向导。Maven for Scala project-creation wizard support.

    • SBT:用于管理依赖项和生成 Scala 项目。SBT for managing the dependencies and building for the Scala project.

      “新建项目”对话框

  5. 选择“下一步”。Select Next.

  6. 在“新建项目”窗口中提供以下信息:In the New Project window, provide the following information:

    属性Property 说明Description
    项目名称Project name 输入名称。Enter a name. 本文使用的是 myAppThis article uses myApp.
    项目位置 Project location 输入项目保存位置。Enter the location to save your project.
    项目 SDKProject SDK 首次使用 IDEA 时,此字段可能为空。This field might be blank on your first use of IDEA. 选择“新建...”并导航到 JDK。Select New... and navigate to your JDK.
    Spark 版本Spark Version 创建向导集成了适当版本的 Spark SDK 和 Scala SDK。The creation wizard integrates the proper version for Spark SDK and Scala SDK. 如果 Spark 群集版本低于 2.0,请选择“Spark 1.x”。If the Spark cluster version is earlier than 2.0, select Spark 1.x. 否则,请选择“Spark 2.x”。Otherwise, select Spark2.x. 本示例使用“Spark 2.3.0 (Scala 2.11.8)”。This example uses Spark 2.3.0 (Scala 2.11.8).

    选择 Spark SDK

  7. 选择“完成”。Select Finish. 可能需要在几分钟后才会显示该项目。It may take a few minutes before the project becomes available.

  8. Spark 项目将自动创建一个项目。The Spark project automatically creates an artifact for you. 要查看项目,请执行以下步骤:To view the artifact, do the following steps:

    a.a. 在菜单栏中,导航到“文件” > “项目结构...”。 From the menu bar, navigate to File > Project Structure....

    b.b. 在“项目结构”窗口中选择“项目”。 From the Project Structure window, select Artifacts.

    c.c. 查看项目后选择“取消”。Select Cancel after viewing the artifact.

    对话框中的项目信息

  9. 执行以下步骤,添加应用程序源代码:Add your application source code by doing the following steps:

    a.a. 在“项目”中,导航到“myApp” > “src” > “main” > “scala”。 From Project, navigate to myApp > src > main > scala.

    b.b. 右键单击“scala”,然后导航到“新建” > “Scala 类”。 Right-click scala, and then navigate to New > Scala Class.

    用于在项目中创建 Scala 类的命令

    c.c. 在“新建 Scala 类”对话框中提供名称,在“种类”下拉列表中选择“对象”,然后选择“确定”。 In the Create New Scala Class dialog box, provide a name, select Object in the Kind drop-down list, and then select OK.

    “新建 Scala 类”对话框

    d.d. 主视图中随后会打开 myApp.scala 文件。The myApp.scala file then opens in the main view. 将默认代码替换为以下代码:Replace the default code with the code found below:

     ```scala
     import org.apache.spark.SparkConf
     import org.apache.spark.SparkContext
    
     object myApp{
         def main (arg: Array[String]): Unit = {
         val conf = new SparkConf().setAppName("myApp")
         val sc = new SparkContext(conf)
    
         val rdd = sc.textFile("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv")
    
         //find the rows that have only one digit in the seventh column in the CSV file
         val rdd1 =  rdd.filter(s => s.split(",")(6).length() == 1)
    
         rdd1.saveAsTextFile("wasbs:///HVACOut")
         }
    
     }
     ```
    

    此代码从 HVAC.csv(用于所有 HDInsight Spark 群集)中读取数据,检索在 CSV 的第七列中只有一个数字的行,并将输出写入群集的默认存储容器下的 /HVACOutThe code reads the data from HVAC.csv (available on all HDInsight Spark clusters), retrieves the rows that have only one digit in the seventh column in the CSV file, and writes the output to /HVACOut under the default storage container for the cluster.

连接到 HDInsight 群集Connect to your HDInsight cluster

用户可以登录 Azure 订阅,也可以链接 HDInsight 群集User can either sign in to Azure subscription, or link a HDInsight cluster. 使用 Ambari 用户名/密码或域加入凭据连接到 HDInsight 群集。Use the Ambari username/password or domain joined credential to connect to your HDInsight cluster.

登录到 Azure 订阅Sign in to your Azure subscription

  1. 在菜单栏中,导航到“视图” > “工具窗口” > “Azure 资源管理器”。 From the menu bar, navigate to View > Tool Windows > Azure Explorer.

    显示 Azure 资源管理器

  2. 在 Azure 资源管理器中右键单击“Azure”节点,然后选择“登录”。 From Azure Explorer, right-click the Azure node, and then select Sign In.

    资源管理器右键单击 Azure

  3. 在“Azure 登录”对话框中,依次选择“设备登录”、“登录” 。In the Azure Sign In dialog box, choose Device Login, and then select Sign in.

    视图资源管理器 2

  4. 在“Azure 设备登录”对话框中单击“复制并打开”。 In the Azure Device Login dialog box, click Copy&Open.

    视图资源管理器 5

  5. 在浏览器界面中粘贴代码,然后单击“下一步”。In the browser interface, paste the code, and then click Next.

    视图资源管理器 6

  6. 输入 Azure 凭据,然后关闭浏览器。Enter your Azure credentials, and then close the browser.

    视图资源管理器 7

  7. 登录之后,“选择订阅”对话框会列出与凭据关联的所有 Azure 订阅。After you're signed in, the Select Subscriptions dialog box lists all the Azure subscriptions that are associated with the credentials. 选择你的订阅,然后选择“选择”按钮。Select your subscription and then select the Select button.

    “选择订阅”对话框

  8. 在“Azure 资源管理器”中展开“HDInsight”,查看订阅中的 HDInsight Spark 群集。 From Azure Explorer, expand HDInsight to view the HDInsight Spark clusters that are in your subscriptions.

    视图资源管理器 3

  9. 若要查看与群集关联的资源(例如存储帐户),可以进一步展开群集名称节点。To view the resources (for example, storage accounts) that are associated with the cluster, you can further expand a cluster-name node.

    展开的群集名称节点

可以使用 Apache Ambari 管理的用户名链接 HDInsight 群集。You can link an HDInsight cluster by using the Apache Ambari managed username. 同样,对于已加入域的 HDInsight 群集,也可使用这种域和用户名(例如 user1@contoso.com)进行链接。Similarly, for a domain-joined HDInsight cluster, you can link by using the domain and username, such as user1@contoso.com. 也可链接 Livy 服务群集。Also you can link Livy Service cluster.

  1. 在菜单栏中,导航到“视图” > “工具窗口” > “Azure 资源管理器”。 From the menu bar, navigate to View > Tool Windows > Azure Explorer.

  2. 在 Azure 资源管理器中右键单击“HDInsight”节点,然后选择“链接群集”。 From Azure Explorer, right-click the HDInsight node, and then select Link A Cluster.

    链接群集上下文菜单

  3. “链接群集”窗口中提供的选项根据在“链接资源类型”下拉列表中选择的值而异。 The available options in the Link A Cluster window will vary depending on which value you select from the Link Resource Type drop-down list. 请输入自己的值,然后选择“确定”。Enter your values and then select OK.

    • HDInsight 群集HDInsight Cluster

      属性Property ValueValue
      链接资源类型Link Resource Type 从下拉列表中选择“HDInsight 群集”。Select HDInsight Cluster from the drop-down list.
      群集名称/URLCluster Name/URL 输入群集名称。Enter cluster name.
      身份验证类型Authentication Type 保留“基本身份验证”Leave as Basic Authentication
      用户名User Name 输入群集用户名,默认为 admin。Enter cluster user name, default is admin.
      密码Password 输入用户名的密码。Enter password for user name.

      链接 hdinsight 群集对话框

    • Livy 服务Livy Service

      属性Property ValueValue
      链接资源类型Link Resource Type 从下拉列表中选择“Livy 服务”。Select Livy Service from the drop-down list.
      Livy 终结点Livy Endpoint 输入 Livy 终结点Enter Livy Endpoint
      群集名称Cluster Name 输入群集名称。Enter cluster name.
      Yarn 终结点Yarn Endpoint 可选。Optional.
      身份验证类型Authentication Type 保留“基本身份验证”Leave as Basic Authentication
      用户名User Name 输入群集用户名,默认为 admin。Enter cluster user name, default is admin.
      密码Password 输入用户名的密码。Enter password for user name.

      链接 livy 群集对话框

  4. 在“HDInsight”节点中可以看到链接的群集。You can see your linked cluster from the HDInsight node.

    链接的群集

  5. 还可以从 Azure 资源管理器取消链接群集。You also can unlink a cluster from Azure Explorer.

    取消链接的群集

在 HDInsight Spark 群集中运行 Spark Scala 应用程序Run a Spark Scala application on an HDInsight Spark cluster

创建 Scala 应用程序后,可将其提交到群集。After creating a Scala application, you can submit it to the cluster.

  1. 从项目中,导航到“myApp” > “src” > “main” > “scala” > “myApp” 。From Project, navigate to myApp > src > main > scala > myApp. 右键单击“myApp”,然后选择“提交 Spark 应用程序”(可能位于列表底部)。 Right-click myApp, and select Submit Spark Application (It will likely be located at the bottom of the list).

    “将 Spark 应用程序提交到 HDInsight”命令

  2. 在“提交 Spark 应用程序”对话框窗口中,选择“1. Spark on HDInsight”。In the Submit Spark Application dialog window, select 1. Spark on HDInsight.

  3. 在“编辑配置”窗口中提供以下值,然后选择“确定”: In the Edit configuration window, provide the following values and then select OK:

    属性Property ValueValue
    Spark 群集(仅限 Linux)Spark clusters (Linux only) 选择要在其上运行应用程序的 HDInsight Spark 群集。Select the HDInsight Spark cluster on which you want to run your application.
    选择要提交的项目Select an Artifact to submit 保留默认设置。Leave default setting.
    主类名Main class name 默认值是所选文件中的 main 类。The default value is the main class from the selected file. 可以通过选择省略号 ( ... ) 并选择其他类来更改类。You can change the class by selecting the ellipsis(...) and choosing another class.
    作业配置Job configurations 可以更改默认的键和/或值。You can change the default keys and/or values. 有关详细信息,请参阅 Apache Livy REST APIFor more information, see Apache Livy REST API.
    命令行参数Command line arguments 如果需要,可为 main 类输入参数并以空格分隔。You can enter arguments separated by space for the main class if needed.
    引用的 Jar 和引用的文件Referenced Jars and Referenced Files 可以输入引用的 Jar 和文件(如果有)的路径。You can enter the paths for the referenced Jars and files if any. 还可以在 Azure 虚拟文件系统中浏览文件,但目前仅支持 ADLS 第 2 代群集。You can also browse files in the Azure virtual file system, which currently only supports ADLS Gen 2 cluster. 更多相关信息:Apache Spark 配置For more information: Apache Spark Configuration. 另请参阅如何将资源上传到群集See also, How to upload resources to cluster.
    作业上传存储Job Upload Storage 展开以显示其他选项。Expand to reveal additional options.
    存储类型Storage Type 从下拉列表中选择“使用 Azure Blob 上传”。Select Use Azure Blob to upload from the drop-down list.
    存储帐户Storage Account 输入存储帐户。Enter your storage account.
    存储密钥Storage Key 输入存储密钥。Enter your storage key.
    存储容器Storage Container 输入“存储帐户”和“存储密钥”后,从下拉列表中选择你的存储容器。 Select your storage container from the drop-down list once Storage Account and Storage Key has been entered.

    “Spark 提交”对话框

  4. 选择“SparkJobRun”将项目提交到所选群集。Select SparkJobRun to submit your project to the selected cluster. 底部的“群集中的远程 Spark 任务”选项卡显示作业执行进度。The Remote Spark Job in Cluster tab displays the job execution progress at the bottom. 通过单击红色按钮,即可停止应用程序。You can stop the application by clicking the red button.

    “Spark 提交”窗口

本地或远程调试 HDInsight 群集上的 Apache Spark 应用程序Debug Apache Spark applications locally or remotely on an HDInsight cluster

我们还建议以另一种方式将 Spark 应用程序提交到群集。We also recommend another way of submitting the Spark application to the cluster. 为此,可在“运行/调试配置”IDE 中设置参数。You can do so by setting the parameters in the Run/Debug configurations IDE. 有关详细信息,请参阅使用用于 IntelliJ 的 Azure 工具包通过 SSH 本地或远程调试 HDInsight 群集上的 Apache Spark 应用程序For more information, see Debug Apache Spark applications locally or remotely on an HDInsight cluster with Azure Toolkit for IntelliJ through SSH.

使用用于 IntelliJ 的 Azure 工具包访问和管理 HDInsight Spark 群集Access and manage HDInsight Spark clusters by using Azure Toolkit for IntelliJ

可以使用 Azure Toolkit for IntelliJ 执行各种操作。You can do various operations by using Azure Toolkit for IntelliJ. 大多数操作都是从“Azure 资源管理器”发起的。Most of the operations are started from Azure Explorer. 在菜单栏中,导航到“视图” > “工具窗口” > “Azure 资源管理器”。 From the menu bar, navigate to View > Tool Windows > Azure Explorer.

访问作业视图Access the job view

  1. 在 Azure 资源管理器中,导航到 HDInsight > <Your Cluster> > “作业” 。From Azure Explorer, navigate to HDInsight > <Your Cluster> > Jobs.

    “作业视图”节点

  2. 在右窗格中,“Spark 作业视图”选项卡显示了群集上运行的所有应用程序。In the right pane, the Spark Job View tab displays all the applications that were run on the cluster. 选择想要查看其详细信息的应用程序的名称。Select the name of the application for which you want to see more details.

    应用程序详细信息

  3. 若要显示正在运行的作业的基本信息,请将鼠标悬停在作业图上。To display basic running job information, hover over the job graph. 若要查看每个作业生成的阶段图和信息,请在作业图中选择一个节点。To view the stages graph and information that every job generates, select a node on the job graph.

    作业阶段详细信息

  4. 若要查看常用的日志,例如“驱动程序 Stderr”、“驱动程序 Stdout”和“目录信息”,请选择“日志”选项卡。 To view frequently used logs, such as Driver Stderr, Driver Stdout, and Directory Info, select the Log tab.

    日志详细信息

  5. 可以查看 Spark 历史记录 UI 和 YARN UI(应用程序级别)。You can view the Spark history UI and the YARN UI (at the application level). 选择窗口顶部的链接。Select a link at the top of the window.

访问 Spark 历史记录服务器Access the Spark history server

  1. 在 Azure 资源管理器中展开“HDInsight”,右键单击 Spark 群集名称,然后选择“打开 Spark 历史记录 UI”。 From Azure Explorer, expand HDInsight, right-click your Spark cluster name, and then select Open Spark History UI.

  2. 出现提示时,请输入群集的管理员凭据(在设置群集时已指定)。When you're prompted, enter the cluster's admin credentials, which you specified when you set up the cluster.

  3. 在“Spark 历史记录服务器”仪表板上,可以使用应用程序名称查找刚运行完的应用程序。On the Spark history server dashboard, you can use the application name to look for the application that you just finished running. 在上述代码中,已使用 val conf = new SparkConf().setAppName("myApp") 设置了应用程序名称。In the preceding code, you set the application name by using val conf = new SparkConf().setAppName("myApp"). Spark 应用程序名为 myApp。Your Spark application name is myApp.

启动 Ambari 门户Start the Ambari portal

  1. 在“Azure 资源管理器”中展开“HDInsight”,右键单击 Spark 群集名称,然后选择“打开群集管理门户(Ambari)”。 From Azure Explorer, expand HDInsight, right-click your Spark cluster name, and then select Open Cluster Management Portal(Ambari).

  2. 出现提示时,请输入群集的管理员凭据。When you're prompted, enter the admin credentials for the cluster. 在群集设置过程中已指定这些凭据。You specified these credentials during the cluster setup process.

管理 Azure 订阅Manage Azure subscriptions

默认情况下,用于 IntelliJ 的 Azure 工具包将列出所有 Azure 订阅中的 Spark 群集。By default, Azure Toolkit for IntelliJ lists the Spark clusters from all your Azure subscriptions. 如果需要,可以指定想要访问的订阅。If necessary, you can specify the subscriptions that you want to access.

  1. 在 Azure 资源管理器中右键单击“Azure”根节点,然后选择“选择订阅”。 From Azure Explorer, right-click the Azure root node, and then select Select Subscriptions.

  2. 在“选择订阅”窗口中,清除不想要访问的订阅旁边的复选框,然后选择“关闭”。 From the Select Subscriptions window, clear the check boxes next to the subscriptions that you don't want to access, and then select Close.

Spark 控制台Spark Console

可以运行 Spark 本地控制台 (Scala) 或运行 Spark Livy 交互式会话控制台 (Scala)。You can run Spark Local Console(Scala) or run Spark Livy Interactive Session Console(Scala).

Spark 本地控制台 (Scala)Spark Local Console(Scala)

确保符合 WINUTILS.EXE 先决条件。Ensure you've satisfied the WINUTILS.EXE prerequisite.

  1. 从菜单栏中,导航到“运行” > “编辑配置...” 。From the menu bar, navigate to Run > Edit Configurations....

  2. 在“运行/调试配置”窗口中的左窗格内,导航到“HDInsight 上的 Apache Spark” > “[HDInsight 上的 Spark] myApp”。 From the Run/Debug Configurations window, in the left pane, navigate to Apache Spark on HDInsight > [Spark on HDInsight] myApp.

  3. 在主窗口中,选择“在本地运行”选项卡。From the main window, select the Locally Run tab.

  4. 提供以下值,然后选择“确定”:Provide the following values, and then select OK:

    属性Property ValueValue
    作业主类Job main class 默认值是所选文件中的 main 类。The default value is the main class from the selected file. 可以通过选择省略号 ( ... ) 并选择其他类来更改类。You can change the class by selecting the ellipsis(...) and choosing another class.
    环境变量Environment variables 请确认 HADOOP_HOME 的值是否正确。Ensure the value for HADOOP_HOME is correct.
    WINUTILS.exe 位置WINUTILS.exe location 请确保路径正确。Ensure the path is correct.

    本地控制台设置配置

  5. 在“项目”中,导航到“myApp” > “src” > “main” > “scala” > “myApp”。 。From Project, navigate to myApp > src > main > scala > myApp.

  6. 从菜单栏中,导航到“工具” > “Spark 控制台” > “运行 Spark 本地控制台(Scala)” 。From the menu bar, navigate to Tools > Spark Console > Run Spark Local Console(Scala).

  7. 然后,系统可能会显示两个对话框,询问你是否要自动修复依赖项。Then two dialogs may be displayed to ask you if you want to auto fix dependencies. 如果出现对话框,请选择“自动修复”。If so, select Auto Fix.

    Spark 自动修复 1

    Spark 自动修复 2

  8. 控制台应如下图所示。The console should look similar to the picture below. 在“控制台”窗口中键入 sc.appName,然后按 Ctrl+Enter。In the console window type sc.appName, and then press ctrl+Enter. 系统将显示结果。The result will be shown. 可以通过单击红色按钮终止本地控制台。You can terminate the local console by clicking red button.

    本地控制台结果

Spark Livy 交互式会话控制台 (Scala)Spark Livy Interactive Session Console(Scala)

  1. 从菜单栏中,导航到“运行” > “编辑配置...” 。From the menu bar, navigate to Run > Edit Configurations....

  2. 在“运行/调试配置”窗口中的左窗格内,导航到“HDInsight 上的 Apache Spark” > “[HDInsight 上的 Spark] myApp”。 From the Run/Debug Configurations window, in the left pane, navigate to Apache Spark on HDInsight > [Spark on HDInsight] myApp.

  3. 在主窗口中,选择“在群集中远程运行”选项卡。From the main window, select the Remotely Run in Cluster tab.

  4. 提供以下值,然后选择“确定”:Provide the following values, and then select OK:

    属性Property ValueValue
    Spark 群集(仅限 Linux)Spark clusters (Linux only) 选择要在其上运行应用程序的 HDInsight Spark 群集。Select the HDInsight Spark cluster on which you want to run your application.
    主类名Main class name 默认值是所选文件中的 main 类。The default value is the main class from the selected file. 可以通过选择省略号 ( ... ) 并选择其他类来更改类。You can change the class by selecting the ellipsis(...) and choosing another class.

    交互式控制台设置配置

  5. 在“项目”中,导航到“myApp” > “src” > “main” > “scala” > “myApp”。 。From Project, navigate to myApp > src > main > scala > myApp.

  6. 从菜单栏中,导航到“工具” > “Spark 控制台” > “运行 Spark Livy 交互式会话控制台(Scala)” 。From the menu bar, navigate to Tools > Spark Console > Run Spark Livy Interactive Session Console(Scala).

  7. 控制台应如下图所示。The console should look similar to the picture below. 在“控制台”窗口中键入 sc.appName,然后按 Ctrl+Enter。In the console window type sc.appName, and then press ctrl+Enter. 系统将显示结果。The result will be shown. 可以单击红色按钮来结束本地控制台。You can end the local console by clicking red button.

    交互式控制台结果

将选定内容发送到 Spark ConsoleSend Selection to Spark Console

可以通过将一些代码发送到本地控制台或 Livy 交互式会话控制台 (Scala) 来方便地预测脚本结果。It's convenient for you to foresee the script result by sending some code to the local console or Livy Interactive Session Console(Scala). 可以在 Scala 文件中突出显示一些代码,然后右键单击“向 Spark 控制台发送所选内容”。You can highlight some code in the Scala file, then right-click Send Selection To Spark Console. 所选代码会发送到控制台。The selected code will be sent to the console. 结果将显示在控制台中的代码后面。The result will be displayed after the code in the console. 控制台将检查是否存在错误。The console will check the errors if existing.

将选定内容发送到 Spark Console

仅限读取者角色Reader-only role

当用户使用仅限读取者的角色权限将作业提交到群集时,必须提供 Ambari 凭据。When users submit job to a cluster with reader-only role permission, Ambari credentials is required.

  1. 使用仅限读取者角色帐户登录。Sign in with reader-only role account.

  2. 在“Azure 资源管理器”中展开“HDInsight”,查看订阅中的 HDInsight 群集。 From Azure Explorer, expand HDInsight to view HDInsight clusters that are in your subscription. 标记为“角色:读取者”的群集只有仅限读取者角色权限。The clusters marked "Role:Reader" only have reader-only role permission.

    视图资源管理器 15

  3. 右键单击具有仅限读取者角色权限的群集。Right-click the cluster with reader-only role permission. 从上下文菜单中选择“链接此群集”以链接群集。Select Link this cluster from context menu to link cluster. 输入 Ambari 用户名和密码。Enter the Ambari username and Password.

    视图资源管理器 11

  4. 如果已成功链接群集,HDInsight 将会刷新。If the cluster is linked successfully, HDInsight will be refreshed. 群集阶段将变为链接状态。The stage of the cluster will become linked.

    视图资源管理器 8

  1. 单击“作业”节点,此时会弹出“群集作业访问被拒绝”窗口。 Click Jobs node, Cluster Job Access Denied window pops up.

  2. 单击“链接此群集”以链接群集。Click Link this cluster to link cluster.

    视图资源管理器 9

  1. 创建一个 HDInsight 配置。Create an HDInsight Configuration. 然后选择“在群集中远程运行”。Then select Remotely Run in Cluster.

  2. 对于“Spark 群集(仅限 Linux)”,请选择一个具有仅限读取者角色权限的群集。Select a cluster, which has reader-only role permission for Spark clusters(Linux only). 此时会显示警告消息。可以单击“链接此群集”以链接群集。Warning message shows out. You can Click Link this cluster to link cluster.

    创建配置 1

查看存储帐户View Storage Accounts

  • 对于具有仅限读取者角色权限的群集,请单击“存储帐户”节点,此时会弹出“存储访问被拒绝”窗口。 For clusters with reader-only role permission, click Storage Accounts node, Storage Access Denied window pops up. 可以单击“打开 Azure 存储资源管理器”以打开存储资源管理器。You can click Open Azure Storage Explorer to open Storage Explorer.

    视图资源管理器 14

    视图资源管理器 10

  • 对于链接的群集,请单击“存储帐户”节点,此时会弹出“存储访问被拒绝”窗口。 For linked clusters, click Storage Accounts node, Storage Access Denied window pops up. 可以单击“打开 Azure 存储”以打开存储资源管理器。You can click Open Azure Storage to open Storage Explorer.

    视图资源管理器 13

    视图资源管理器 12

转换现有 IntelliJ IDEA 应用程序以使用用于 IntelliJ 的 Azure 工具包Convert existing IntelliJ IDEA applications to use Azure Toolkit for IntelliJ

可以转换在 IntelliJ IDEA 中创建的现有 Spark Scala 应用程序,使其与用于 IntelliJ 的 Azure 工具包兼容。You can convert the existing Spark Scala applications that you created in IntelliJ IDEA to be compatible with Azure Toolkit for IntelliJ. 然后,可以使用该插件将应用程序提交到 HDInsight Spark 群集。You can then use the plug-in to submit the applications to an HDInsight Spark cluster.

  1. 对于通过 IntelliJ IDEA 创建的现有 Spark Scala 应用程序,打开关联的 .iml 文件。For an existing Spark Scala application that was created through IntelliJ IDEA, open the associated .iml file.

  2. 在根级别有一个 module 元素,如以下文本所示:At the root level, is a module element like the following text:

     ```
     <module org.jetbrains.idea.maven.project.MavenProjectsManager.isMavenModule="true" type="JAVA_MODULE" version="4">
     ```
    

    编辑该元素以添加 UniqueKey="HDInsightTool",使 module 元素如以下文本所示:Edit the element to add UniqueKey="HDInsightTool" so that the module element looks like the following text:

     ```
     <module org.jetbrains.idea.maven.project.MavenProjectsManager.isMavenModule="true" type="JAVA_MODULE" version="4" UniqueKey="HDInsightTool">
     ```
    
  3. 保存更改。Save the changes. 现在,应用程序应与用于 IntelliJ 的 Azure 工具包兼容。Your application should now be compatible with Azure Toolkit for IntelliJ. 可以通过右键单击“项目”中的项目名称来测试此项。You can test it by right-clicking the project name in Project. 弹出菜单现在将包含选项“将 Spark 应用程序提交到 HDInsight”。The pop-up menu now has the option Submit Spark Application to HDInsight.

清理资源Clean up resources

如果不打算继续使用此应用程序,请使用以下步骤删除创建的群集:If you're not going to continue to use this application, delete the cluster that you created with the following steps:

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 在顶部的“搜索”框中,键入 HDInsightIn the Search box at the top, type HDInsight.

  3. 选择“服务”下的“HDInsight 群集” 。Select HDInsight clusters under Services.

  4. 在显示的 HDInsight 群集列表中,选择为本文创建的群集旁边的“…”。In the list of HDInsight clusters that appears, select the ... next to the cluster that you created for this article.

  5. 选择“删除” 。Select Delete. 请选择“是”。Select Yes.

删除 HDInsight 群集Delete an HDInsight cluster

后续步骤Next steps

本教程介绍了如何使用 Azure Toolkit for IntelliJ 插件开发以 Scala 编写的 Apache Spark 应用程序,并直接从 IntelliJ 集成开发环境 (IDE) 将其提交到 HDInsight Spark 群集。In this tutorial, you learned how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in Scala, and then submitted them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). 请转到下一篇文章,了解如何将在 Apache Spark 中注册的数据拉取到 Power BI 等 BI 分析工具中。Advance to the next article to see how the data you registered in Apache Spark can be pulled into a BI analytics tool such as Power BI.