将 Apache Ambari Hive 视图与 HDInsight 中的 Apache Hadoop 配合使用Use Apache Ambari Hive View with Apache Hadoop in HDInsight

了解如何使用 Apache Ambari Hive 视图运行 Hive 查询。Learn how to run Hive queries by using Apache Ambari Hive View. Hive 视图允许从 Web 浏览器创作、优化和运行 Hive 查询。The Hive View allows you to author, optimize, and run Hive queries from your web browser.

先决条件Prerequisites

  • 基于 Linux 的 Apache Hadoop on HDInsight 群集版本 3.4 或更高版本。A Linux-based Apache Hadoop on HDInsight cluster version 3.4 or greater.

    Important

    Linux 是 HDInsight 3.4 或更高版本上使用的唯一操作系统。Linux is the only operating system used on HDInsight version 3.4 or greater. 有关详细信息,请参阅 HDInsight 在 Windows 上停用For more information, see HDInsight retirement on Windows.

  • Web 浏览器A web browser

运行 Hive 查询Run a Hive query

  1. 打开 Azure 门户Open the Azure portal.

  2. 选择 HDInsight 群集,然后从“快速链接”部分选择“Ambari 视图”。Select your HDInsight cluster, and then select Ambari Views from the Quick Links section.

    门户快速链接部分

    当提示进行身份验证时,请使用在创建群集时所提供的群集登录名(默认为 admin)帐户名称和密码。When prompted to authenticate, use the cluster login (default admin) account name and password that you provided when you created the cluster.

  3. 在视图列表中,选择“Hive 视图”。From the list of views, select Hive View.

    已选中 Hive 视图

    Hive 视图页面类似于下图:The Hive view page is similar to the following image:

    Hive 视图查询工作表图像

  4. 将以下 HiveQL 语句从“查询”选项卡粘贴到工作表中:From the Query tab, paste the following HiveQL statements into the worksheet:

    DROP TABLE log4jLogs;
    CREATE EXTERNAL TABLE log4jLogs(
        t1 string,
        t2 string,
        t3 string,
        t4 string,
        t5 string,
        t6 string,
        t7 string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
    STORED AS TEXTFILE LOCATION '/example/data/';
    SELECT t4 AS loglevel, COUNT(*) AS count FROM log4jLogs 
        WHERE t4 = '[ERROR]' 
        GROUP BY t4;
    

    这些语句执行以下操作:These statements perform the following actions:

    • DROP TABLE:删除表和数据文件(如果该表已存在)。DROP TABLE: Deletes the table and the data file, in case the table already exists.

    • CREATE EXTERNAL TABLE:在 Hive 中创建一个新的“外部”表。CREATE EXTERNAL TABLE: Creates a new "external" table in Hive. 外部表仅在 Hive 中存储表定义。External tables store only the table definition in Hive. 数据保留在原始位置。The data is left in the original location.

    • ROW FORMAT:演示如何设置数据格式。ROW FORMAT: Shows how the data is formatted. 在此情况下,每个日志中的字段以空格分隔。In this case, the fields in each log are separated by a space.

    • STORED AS TEXTFILE LOCATION:显示数据的存储位置,并且数据已存储为文本。STORED AS TEXTFILE LOCATION: Shows where the data is stored, and that it's stored as text.

    • SELECT:选择 t4 列包含值 [ERROR] 的所有行的计数。SELECT: Selects a count of all rows where column t4 contains the value [ERROR].

      Important

      将“数据库”选择保留为“默认”。Leave the Database selection at default. 本文档中的示例使用 HDInsight 附带的默认数据库。The examples in this document use the default database included with HDInsight.

  5. 要启动查询,请使用工作表下方的“执行”按钮。To start the query, use the Execute button below the worksheet. 按钮变为橙色,文本更改为“停止”。The button turns orange and the text changes to Stop.

  6. 完成查询后,“结果”选项卡显示操作结果。After the query has finished, the Results tab displays the results of the operation. 以下文本是查询结果:The following text is the result of the query:

     loglevel       count
     [ERROR]        3
    

    可使用“日志”选项卡查看作业创建的日志记录信息。You can use the Logs tab to view the logging information that the job created.

    Tip

    通过位于“查询处理结果”部分左上角的“保存结果”下拉对话框,可下载或保存结果。Download or save results from the Save results drop-down dialog box in the upper left of the Query Process Results section.

Visual explainVisual explain

要显示查询计划的可视化效果,选择工作表下方的“可视化说明”选项卡。To display a visualization of the query plan, select the Visual Explain tab below the worksheet.

查询的“可视化说明”视图可帮助理解复杂查询的流。The Visual Explain view of the query can be helpful in understanding the flow of complex queries. 可使用查询编辑器上的“说明”按钮查看此视图的等效文本。You can see a textual equivalent of this view by using the Explain button in the Query Editor.

Tez UITez UI

要显示查询的 Tez UI,选择工作表下方的“Tez”选项卡。To display the Tez UI for the query, select the Tez tab below the worksheet.

Important

Tez 不用于解析所有查询。Tez is not used to resolve all queries. 无需使用 Tez 即可解析许多查询。You can resolve many queries without using Tez.

如果使用 Tez 来解析查询,将显示有向无环图 (DAG)。If Tez was used to resolve the query, the Directed Acyclic Graph (DAG) is displayed. 若要查看之前运行的查询的 DAG,或调试 Tez 进程,请改用 Tez 视图If you want to view the DAG for queries you've run in the past, or if you want to debug the Tez process, use the Tez View instead.

查看作业历史记录View job history

“作业”选项卡显示 Hive 查询的历史记录。The Jobs tab displays a history of Hive queries.

作业历史记录图像

数据库表Database tables

可使用“表”选项卡处理 Hive 数据库内的表。You can use the Tables tab to work with tables within a Hive database.

表选项卡图像

已保存的查询Saved queries

在“查询”选项卡中,可以按需要保存查询。From the Query tab, you can optionally save queries. 保存查询后,可通过“已保存的查询”选项卡对其重复进行使用。After you save a query, you can reuse it from the Saved Queries tab.

“保存的查询”选项卡图像

Tip

保存的查询存储在默认群集存储中。Saved queries are stored in the default cluster storage. 可在路径 /user/<username>/hive/scripts 下找到保存的查询。You can find the saved queries under the path /user/<username>/hive/scripts. 它们存储为纯文本 .hql 文件。These are stored as plain-text .hql files.

用户定义的函数User-defined functions

可以通过用户定义函数 (UDF) 扩展 Hive。You can extend Hive through user-defined functions (UDF). 使用 UDF 实现 HiveQL 中不容易建模的功能或逻辑。Use a UDF to implement functionality or logic that isn't easily modeled in HiveQL.

使用 Hive 视图顶部的“UDF”选项卡,声明并保存一组 UDF。Declare and save a set of UDFs by using the UDF tab at the top of the Hive View. 可以在查询编辑器中使用这些 UDF。These UDFs can be used with the Query Editor.

UDF 选项卡图像

将 UDF 添加到 Hive 视图后,“插入 UDF”按钮将显示在“查询编辑器”底部。After you've added a UDF to the Hive View, an Insert udfs button appears at the bottom of the Query Editor. 选择此项将显示 Hive 视图中定义的 UDF 的下拉列表。Selecting this entry displays a drop-down list of the UDFs defined in the Hive View. 选择一个 UDF 可向查询添加 HiveQL 语句以启用 UDF。Selecting a UDF adds HiveQL statements to your query to enable the UDF.

例如,如果定义了一个具有以下属性的 UDF:For example, if you have defined a UDF with the following properties:

  • 资源名称:myudfsResource name: myudfs

  • 资源路径:/myudfs.jarResource path: /myudfs.jar

  • UDF 名称:myawesomeudfUDF name: myawesomeudf

  • UDF 类名称:com.myudfs.AwesomeUDF class name: com.myudfs.Awesome

使用“插入 UDF”按钮将显示名为 myudfs 的条目,以及为该资源定义的每个 UDF 的另一下拉列表。Using the Insert udfs button displays an entry named myudfs, with another drop-down list for each UDF defined for that resource. 本例中为 myawesomeudf。In this case, it is myawesomeudf. 选择此条目会在查询的开头添加以下内容:Selecting this entry adds the following to the beginning of the query:

add jar /myudfs.jar;
create temporary function myawesomeudf as 'com.myudfs.Awesome';

然后便可在查询中使用 UDF。You can then use the UDF in your query. 例如,SELECT myawesomeudf(name) FROM people;For example, SELECT myawesomeudf(name) FROM people;.

有关如何在 HDInsight 中将 UDF 与 Hive 配合使用的详细信息,请参阅以下文章:For more information on using UDFs with Hive on HDInsight, see the following articles:

Hive 设置Hive settings

可以更改各种 Hive 设置,例如将 Hive 的执行引擎从 Tez(默认)更改为 MapReduce。You can change various Hive settings, such as changing the execution engine for Hive from Tez (the default) to MapReduce.

后续步骤Next steps

有关 HDInsight 中 Hive 的常规信息:For general information on Hive on HDInsight:

有关 HDInsight 上 Hadoop 的其他使用方法的信息:For information on other ways you can work with Hadoop on HDInsight: