Quickstart: Query Apache HBase in Azure HDInsight with Apache Phoenix

In this quickstart, you learn how to use the Apache Phoenix to run HBase queries in Azure HDInsight. Apache Phoenix is a SQL query engine for Apache HBase. It is accessed as a JDBC driver, and it enables querying and managing HBase tables by using SQL. SQLLine is a command-line utility to execute SQL.

If you don't have an Azure subscription, create a trial account before you begin.

Prerequisites

Identify a ZooKeeper node

When you connect to an HBase cluster, you need to connect to one of the Apache ZooKeeper nodes. Each HDInsight cluster has three ZooKeeper nodes. Curl can be used to quickly identify a ZooKeeper node. Edit the curl command below by replacing PASSWORD and CLUSTERNAME with the relevant values, and then enter the command in a command prompt:

curl -u admin:PASSWORD -sS -G https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER

A portion of the output will look similar to:

    {
      "href" : "http://hn*.432dc3rlshou3ocf251eycoapa.bx.internal.chinacloudapp.cn:8080/api/v1/clusters/myCluster/hosts/<zookeepername1>.432dc3rlshou3ocf251eycoapa.bx.internal.chinacloudapp.cn/host_components/ZOOKEEPER_SERVER",
      "HostRoles" : {
        "cluster_name" : "myCluster",
        "component_name" : "ZOOKEEPER_SERVER",
        "host_name" : "<zookeepername1>.432dc3rlshou3ocf251eycoapa.bx.internal.chinacloudapp.cn"
      }

Take note of the value for host_name for later use.

Create a table and manipulate data

You can use SSH to connect to HBase clusters, and then use Apache Phoenix to create HBase tables, insert data, and query data.

  1. Use ssh command to connect to your HBase cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  2. Change directory to the Phoenix client. Enter the following command:

    cd /usr/hdp/current/phoenix-client/bin
    
  3. Launch SQLLine. Edit the command below by replacing ZOOKEEPER with the ZooKeeper node identified earlier, then enter the command:

    ./sqlline.py ZOOKEEPER:2181:/hbase-unsecure
    
  4. Create an HBase table. Enter the following command:

    CREATE TABLE Company (company_id INTEGER PRIMARY KEY, name VARCHAR(225));
    
  5. Use the SQLLine !tables command to list all tables in HBase. Enter the following command:

    !tables
    
  6. Insert values in the table. Enter the following command:

    UPSERT INTO Company VALUES(1, 'Microsoft');
    UPSERT INTO Company VALUES(2, 'Apache');
    
  7. Query the table. Enter the following command:

    SELECT * FROM Company;
    
  8. Delete a record. Enter the following command:

    DELETE FROM Company WHERE COMPANY_ID=1;
    
  9. Drop the table. Enter the following command:

    DROP TABLE Company;
    
  10. Use the SQLLine !quit command to exit SQLLine. Enter the following command:

    !quit
    

Clean up resources

After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

To delete a cluster, see Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

Next steps

In this quickstart, you learned how to use the Apache Phoenix to run HBase queries in Azure HDInsight. To learn more about Apache Phoenix, the next article will provide a deeper examination.