将用于 IntelliJ 的 HDInsight 工具与 Hortonworks 沙盒配合使用Use HDInsight Tools for IntelliJ with Hortonworks Sandbox

了解如何通过用于 IntelliJ 的 HDInsight 工具开发 Apache Scala 应用程序,并在计算机上运行的 Hortonworks 沙盒上测试应用程序。Learn how to use HDInsight Tools for IntelliJ to develop Apache Scala applications, and then test the applications on Hortonworks Sandbox running on your computer.

IntelliJ IDEA 是一种 Java 集成开发环境 (IDE),用于开发计算机软件。IntelliJ IDEA is a Java integrated development environment (IDE) for developing computer software. 在 Hortonworks 沙盒上开发并测试应用程序以后,即可将应用程序移至 Azure HDInsightAfter you develop and test your applications on Hortonworks Sandbox, you can move the applications to Azure HDInsight.

必备条件Prerequisites

要阅读本教程,必须具备以下项:Before you begin this tutorial, you must have the following items:

安装插件的步骤:To install the plug-ins:

  1. 打开 IntelliJ IDEA。Open IntelliJ IDEA.
  2. 在“欢迎”页上,依次选择“配置”、“插件”。On the Welcome page, select Configure, and then select Plugins.
  3. 选择左下角的“安装 JetBrains 插件”。In the lower-left corner, select Install JetBrains plugin.
  4. 使用搜索功能搜索 Scala,并选择“安装”。Use the search function to search for Scala, and then select Install.
  5. 若要完成安装,请选择“重启 IntelliJ IDEA”。To complete the installation, select Restart IntelliJ IDEA.
  6. 重复步骤 4 和步骤 5,安装用于 IntelliJ 的 Azure 工具包Repeat steps 4 and 5 to install Azure Toolkit for IntelliJ. 有关详细信息,请参阅安装用于 IntelliJ 的 Azure 工具包For more information, see Install Azure Toolkit for IntelliJ.

创建 Apache Spark Scala 应用程序Create an Apache Spark Scala application

本部分使用 IntelliJ IDEA 创建示例 Scala 项目。In this section, you create a sample Scala project by using IntelliJ IDEA. 在下一部分,用户在提交项目之前,需将 IntelliJ IDEA 链接到 Hortonworks 沙盒(仿真程序)。In the next section, you link IntelliJ IDEA to the Hortonworks Sandbox (emulator) before you submit the project.

  1. 在计算机上打开 IntelliJ IDEA。Open IntelliJ IDEA on your computer. 在“新建项目”对话框中,完成以下步骤:In the New Project dialog box, complete these steps:

    1. 选择“HDInsight” > “Spark on HDInsight (Scala)”Select HDInsight > Spark on HDInsight (Scala).

    2. 在“生成工具”列表中,基于方案选择以下项之一:In the Build tool list, select one of the following, based on your scenario:

      • Maven:用于支持 Scala 项目创建向导。Maven: For Scala project-creation wizard support.
      • SBT:用于管理依赖项和生成 Scala 项目。SBT: For managing dependencies and building for the Scala project.

    “新建项目”对话框

  2. 选择“下一步”。Select Next.

  3. 在接下来显示的“新建项目”对话框中,完成以下步骤:In the next New Project dialog box, complete the following steps:

    1. 在“项目名称”框中输入项目名称。In the Project name box, enter a project name.

    2. 在“项目位置”框中输入项目位置。In the Project location box, enter a project location.

    3. 在“项目 SDK”下拉列表旁边,依次选择“新建”和“JDK”,并指定 Java JDK 1.7 或更高版本的文件夹。Next to the Project SDK drop-down list, select New, select JDK, and then specify the folder for Java JDK version 1.7 or later. 为 Spark 2.x 群集选择 Java 1.8Select Java 1.8 for the Spark 2.x cluster. 为 Spark 1.x 群集选择 Java 1.7Select Java 1.7 for the Spark 1.x cluster. 默认位置为 C:\Program Files\Java\jdk1.8.x_xxx。The default location is C:\Program Files\Java\jdk1.8.x_xxx.

    4. 在“Spark 版本”下拉列表中,Scala 项目创建向导集成了 Spark SDK 和 Scala SDK 的正确版本。In the Spark version drop-down list, the Scala project creation wizard integrates the correct version for the Spark SDK and Scala SDK. 如果 Spark 群集版本低于 2.0,请选择“Spark 1.x”。If the Spark cluster version is earlier than 2.0, select Spark 1.x. 否则,请选择“Spark 2.x”。Otherwise, select Spark2.x. 本示例使用“Spark 1.6.2 (Scala 2.10.5)”。This example uses Spark 1.6.2 (Scala 2.10.5). 请确保使用标记为 Scala 2.10.x 的存储库。Ensure that you are using the repository marked Scala 2.10.x. 不要使用标记为 Scala 2.11.x 的存储库。Do not use the repository marked Scala 2.11.x.

      创建 IntelliJ Scala 项目属性

  4. 选择“完成”。Select Finish.

  5. 如果“项目”视图尚未打开,请按 Alt+1 将其打开。If the Project view is not already open, press Alt+1 to open it.

  6. 在“项目资源管理器”中展开项目,并选择“src”。In Project Explorer, expand the project, and then select src.

  7. 右键单击“src”,指向“新建”,并选择“Scala 类”。Right-click src, point to New, and then select Scala class.

  8. 在“名称”框中,输入名称。In the Name box, enter a name. 在“类型”框中,选择“对象”。In the Kind box, select Object. 选择“确定”。Then, select OK.

    “新建 Scala 类”对话框

  9. 在 .scala 文件中粘贴以下代码:In the .scala file, paste the following code:

     import java.util.Random
     import org.apache.spark.{SparkConf, SparkContext}
     import org.apache.spark.SparkContext._
    
     /**
     * Usage: GroupByTest [numMappers] [numKVPairs] [valSize] [numReducers]
     */
     object GroupByTest {
         def main(args: Array[String]) {
             val sparkConf = new SparkConf().setAppName("GroupBy Test")
             var numMappers = 3
             var numKVPairs = 10
             var valSize = 10
             var numReducers = 2
    
             val sc = new SparkContext(sparkConf)
    
             val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
             val ranGen = new Random
             var arr1 = new Array[(Int, Array[Byte])](numKVPairs)
             for (i <- 0 until numKVPairs) {
                 val byteArr = new Array[Byte](valSize)
                 ranGen.nextBytes(byteArr)
                 arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
             }
             arr1
             }.cache
             // Enforce that everything has been calculated and in cache.
             pairs1.count
    
             println(pairs1.groupByKey(numReducers).count)
         }
     }
    
  10. 在“生成”菜单中,选择“生成项目”。On the Build menu, select Build project. 确保编译成功完成。Ensure that the compilation finishes successfully.

必须先有 IntelliJ 应用程序,才能链接到 Hortonworks 沙盒(仿真器)。Before you can link to a Hortonworks Sandbox (emulator), you must have an existing IntelliJ application.

若要链接到模拟器,请执行以下操作:To link to an emulator:

  1. 在 IntelliJ 中打开项目。Open the project in IntelliJ.

  2. 在“视图”菜单中,依次选择“工具窗口”、“Azure 资源管理器”。On the View menu, select Tools Windows, and then select Azure Explorer.

  3. 展开“Azure”,右键单击“HDInsight”,并选择“链接仿真器”。Expand Azure, right-click HDInsight, and then select Link an Emulator.

  4. 在“链接新模拟器”对话框中,输入为 Hortonworks 沙盒的根帐户设定的密码。In the Link A New Emulator dialog box, enter the password that you've set for the root account of the Hortonworks Sandbox. 接下来,输入类似于以下屏幕截图中所使用的值。Next, enter values similar to those used in the following screenshot. 选择“确定”。Then, select OK.

    “链接新模拟器”对话框

  5. 若要配置仿真器,请选择“是”。To configure the emulator, select Yes.

成功连接模拟器后,模拟器(Hortonworks 沙盒)会列在 HDInsight 节点中。When the emulator is successfully connected, the emulator (Hortonworks Sandbox) is listed on the HDInsight node.

将 Spark Scala 应用程序提交到 Hortonworks 沙盒Submit the Spark Scala application to the Hortonworks Sandbox

将 IntelliJ IDEA 链接到仿真器之后,即可提交项目。After you have linked IntelliJ IDEA to the emulator, you can submit your project.

若要将项目提交到模拟器,请执行以下操作:To submit a project to an emulator:

  1. 在“项目资源管理器” 中,右键单击项目,并选择“将 Spark 应用程序提交到 HDInsight”。 In Project Explorer, right-click the project, and then select Submit Spark application to HDInsight.

  2. 请完成下列步骤:Complete the following steps:

    1. 在“Spark 群集(仅 Linux)”下拉列表中,选择本地 Hortonworks 沙盒。In the Spark cluster (Linux only) drop-down list, select your local Hortonworks Sandbox.
    2. 在“主类名”框中,选择或输入主类名。In the Main class name box, select or enter the main class name. 对于本教程,该名称为 GroupByTestFor this tutorial, the name is GroupByTest.
  3. 选择“提交”。Select Submit. 作业提交日志显示在“Spark”提交工具窗口。The job submission logs are shown in the Spark submission tool window.

后续步骤Next steps