开始使用 Apache Hadoop 沙盒,它是虚拟机上的模拟器Get started with an Apache Hadoop sandbox, an emulator on a virtual machine

了解如何在虚拟机上安装 Hortonworks 提供的 Apache Hadoop 沙盒,以了解 Hadoop 生态系统。Learn how to install the Apache Hadoop sandbox from Hortonworks on a virtual machine to learn about the Hadoop ecosystem. 该沙盒提供一个本地开发环境,让用户了解 Hadoop、Hadoop 分布式文件系统 (HDFS) 和作业提交内容。The sandbox provides a local development environment to learn about Hadoop, Hadoop Distributed File System (HDFS), and job submission. 熟悉 Hadoop 之后,便可以开始在 Azure 中使用 Hadoop 创建 HDInsight 群集。Once you are familiar with Hadoop, you can start using Hadoop on Azure by creating an HDInsight cluster. 有关如何入门的详细信息,请参阅在 HDInsight 中开始使用 HadoopFor more information on how to get started, see Get started with Hadoop on HDInsight.

先决条件Prerequisites

下载并安装虚拟机Download and install the virtual machine

  1. 浏览到 Cloudera 下载Browse to the Cloudera downloads.

  2. 单击“选择安装类型” 下的 VIRTUALBOX,在 VM 上下载最新的 Hortonworks 沙盒。Click VIRTUALBOX under Choose Installation Type to download the latest Hortonworks Sandbox on a VM. 登录或填写产品兴趣表。Sign in or complete the product interest form.

  3. 单击按钮“HDP 沙盒(最新)” 开始下载。Click the button HDP SANDBOX (LATEST) to begin the download.

有关设置沙盒的说明,请参阅沙盒部署和安装指南For instructions on setting up the sandbox, see Sandbox Deployment and Install Guide.

若要下载旧版本的 HDP 沙盒,请参阅“旧版本” 下的链接。To download an older HDP version sandbox, see the links under Older Versions.

启动虚拟机Start the virtual machine

  1. 打开 Oracle VM VirtualBox。Open Oracle VM VirtualBox.

  2. 在“文件”菜单上,单击“导入设备”,然后指定 Hortonworks 沙盒映像 。From the File menu, click Import Appliance, and then specify the Hortonworks Sandbox image.

  3. 选择 Hortonworks 沙盒,依次单击“启动” 、“正常启动” 。Select the Hortonworks Sandbox, click Start, and then Normal Start. 虚拟机完成启动过程后,显示登录说明。Once the virtual machine has finished the boot process, it displays login instructions.

    正常启动

  4. 打开 Web 浏览器并导航到显示的 URL(通常是 http://127.0.0.1:8888)。Open a web browser and navigate to the URL displayed (usually http://127.0.0.1:8888).

设置沙盒密码Set Sandbox passwords

  1. 在“Hortonworks 沙盒”页的“开始” 步骤中,选择“查看高级选项” 。From the get started step of the Hortonworks Sandbox page, select View Advanced Options. 使用此页上的信息通过 SSH 登录到沙盒。Use the information on this page to log in to the sandbox using SSH. 使用提供的名称和密码。Use the name and password provided.

    备注

    如果未安装 SSH 客户端,可以使用虚拟机在 http://localhost:4200/ 上提供的基于 Web 的 SSH。If you do not have an SSH client installed, you can use the web-based SSH provided at by the virtual machine at http://localhost:4200/.

    首次使用 SSH 建立连接时,系统会提示更改 root 帐户的密码。The first time you connect using SSH, you are prompted to change the password for the root account. 输入新密码,在使用 SSH 登录时将使用该密码。Enter a new password, which you use when you log in using SSH.

  2. 登录后,请输入以下命令:Once logged in, enter the following command:

     ambari-admin-password-reset
    

    出现提示时,请提供 Ambari 管理员帐户的密码。When prompted, provide a password for the Ambari admin account. 访问 Ambari Web UI 时要用到此密码。This is used when you access the Ambari Web UI.

使用 Hive 命令Use Hive commands

  1. 与沙盒建立 SSH 连接后,使用以下命令启动 Hive shell:From an SSH connection to the sandbox, use the following command to start the Hive shell:

     hive
    
  2. 启动 shell 后,使用以下命令查看随沙盒一起提供的表:Once the shell has started, use the following to view the tables that are provided with the sandbox:

     show tables;
    
  3. 使用以下命令检索 sample_07 表中的 10 行数据:Use the following to retrieve 10 rows from the sample_07 table:

     select * from sample_07 limit 10;
    

后续步骤Next steps