Get started with an Apache Hadoop sandbox, an emulator on a virtual machine

Learn how to install the Apache Hadoop sandbox from Hortonworks on a virtual machine to learn about the Hadoop ecosystem. The sandbox provides a local development environment to learn about Hadoop, Hadoop Distributed File System (HDFS), and job submission. Once you are familiar with Hadoop, you can start using Hadoop on Azure by creating an HDInsight cluster. For more information on how to get started, see Get started with Hadoop on HDInsight.

Prerequisites

Download and install the virtual machine

  1. Browse to the Cloudera downloads.

  2. Click VIRTUALBOX under Choose Installation Type to download the latest Hortonworks Sandbox on a VM. Sign in or complete the product interest form.

  3. Click the button HDP SANDBOX (LATEST) to begin the download.

For instructions on setting up the sandbox, see Sandbox Deployment and Install Guide.

To download an older HDP version sandbox, see the links under Older Versions.

Start the virtual machine

  1. Open Oracle VM VirtualBox.

  2. From the File menu, click Import Appliance, and then specify the Hortonworks Sandbox image.

  3. Select the Hortonworks Sandbox, click Start, and then Normal Start. Once the virtual machine has finished the boot process, it displays login instructions.

    virtualbox manager normal start

  4. Open a web browser and navigate to the URL displayed (usually http://127.0.0.1:8888).

Set Sandbox passwords

  1. From the get started step of the Hortonworks Sandbox page, select View Advanced Options. Use the information on this page to log in to the sandbox using SSH. Use the name and password provided.

    Note

    If you do not have an SSH client installed, you can use the web-based SSH provided at by the virtual machine at http://localhost:4200/.

    The first time you connect using SSH, you are prompted to change the password for the root account. Enter a new password, which you use when you log in using SSH.

  2. Once logged in, enter the following command:

    ambari-admin-password-reset
    

    When prompted, provide a password for the Ambari admin account. This is used when you access the Ambari Web UI.

Use Hive commands

  1. From an SSH connection to the sandbox, use the following command to start the Hive shell:

    hive
    
  2. Once the shell has started, use the following to view the tables that are provided with the sandbox:

    show tables;
    
  3. Use the following to retrieve 10 rows from the sample_07 table:

    select * from sample_07 limit 10;
    

Next steps