快速入门:使用 Apache Zeppelin 在 Azure HDInsight 中执行 Apache Hive 查询Quickstart: Execute Apache Hive queries in Azure HDInsight with Apache Zeppelin

本快速入门介绍如何使用 Apache Zeppelin 在 Azure HDInsight 中运行 Apache Hive 查询。In this quickstart, you learn how to use Apache Zeppelin to run Apache Hive queries in Azure HDInsight. HDInsight 交互式查询群集包括可用来运行交互式 Hive 查询的 Apache Zeppelin 笔记本。HDInsight Interactive Query clusters include Apache Zeppelin notebooks that you can use to run interactive Hive queries.

如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

一个 HDInsight 交互式查询群集。An HDInsight Interactive Query cluster. 若要创建 HDInsight 群集,请参阅创建群集See Create cluster to create an HDInsight cluster. 请确保选择“交互式查询”群集类型。 Make sure to choose the Interactive Query cluster type.

创建 Apache Zeppelin 笔记Create an Apache Zeppelin Note

  1. 请将以下 URL 中的 CLUSTERNAME 替换为你的群集的名称:https://CLUSTERNAME.azurehdinsight.cn/zeppelinReplace CLUSTERNAME with the name of your cluster in the following URL https://CLUSTERNAME.azurehdinsight.cn/zeppelin. 然后在 Web 浏览器中输入该 URL。Then enter the URL in a web browser.

  2. 输入群集登录用户名和密码。Enter your cluster login username and password. 在 Zeppelin 页中,可以创建新笔记,也可以打开现有笔记。From the Zeppelin page, you can either create a new note or open existing notes. HiveSample 包含一些示例 Hive 查询。HiveSample contains some sample Hive queries.

    HDInsight 交互式查询 zeppelin

  3. 选择“创建新笔记”。 Select Create new note.

  4. 在“创建新笔记”对话框中,键入或选择以下值: From the Create new note dialog, type or select the following values:

    • 笔记名称:输入笔记的名称。Note Name: Enter a name for the note.
    • 默认解释器:从下拉列表中选择“jdbc”。 Default interpreter: Select jdbc from the drop-down list.
  5. 选择“创建笔记” 。Select Create Note.

  6. 在代码部分输入以下 Hive 查询,然后按 Shift + EnterEnter the following Hive query in the code section, and then press Shift + Enter:

    %jdbc(hive)
    show tables
    

    HDInsight 交互式查询 zeppelin 运行查询

    第一行中的 %jdbc(hive) 语句告诉笔记本使用 Hive JDBC 解释程序。The %jdbc(hive) statement in the first line tells the notebook to use the Hive JDBC interpreter.

    该查询将返回一个名为 hivesampletable 的 Hive 表。The query shall return one Hive table called hivesampletable.

    以下是可以针对 hivesampletable 运行的两个附加 Hive 查询:The following are two additional Hive queries that you can run against hivesampletable:

    %jdbc(hive)
    select * from hivesampletable limit 10
    
    %jdbc(hive)
    select ${group_name}, count(*) as total_count
    from hivesampletable
    group by ${group_name=market,market|deviceplatform|devicemake}
    limit ${total_count=10}
    

    与传统 Hive 相比,返回查询结果的速度更快。Comparing to the traditional Hive, the query results come back must faster.

清理资源Clean up resources

完成本快速入门后,可以删除群集。After you complete the quickstart, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. 此外,还需要支付 HDInsight 群集费用,即使未使用。You are also charged for an HDInsight cluster, even when it is not in use. 由于群集费用高于存储空间费用数倍,因此在不使用群集时将其删除可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

若要删除群集,请参阅使用浏览器、PowerShell 或 Azure CLI 删除 HDInsight 群集To delete a cluster, see Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

后续步骤Next steps

本快速入门介绍了如何使用 Apache Zeppelin 在 Azure HDInsight 中运行 Apache Hive 查询。In this quickstart, you learned how to use Apache Zeppelin to run Apache Hive queries in Azure HDInsight. 若要详细了解 Hive 查询,请参阅下一篇文章,其中介绍了如何使用 Visual Studio 执行查询。To learn more about Hive queries, the next article will show you how to execute queries with Visual Studio.

另请参阅See also