在 HDInsight 中通过 JDBC 驱动程序查询 Apache HiveQuery Apache Hive through the JDBC driver in HDInsight

了解如何从 Java 应用程序使用 JDBC 驱动程序将 Apache Hive 查询提交到 Azure HDInsight 中的 Apache Hadoop。Learn how to use the JDBC driver from a Java application to submit Apache Hive queries to Apache Hadoop in Azure HDInsight. 本文档中的信息演示如何以编程方式从 SQuirreL SQL 客户端进行连接。The information in this document demonstrates how to connect programmatically, and from the SQuirreL SQL client.

有关 Hive JDBC 接口的详细信息,请参阅 HiveJDBCInterfaceFor more information on the Hive JDBC Interface, see HiveJDBCInterface.

先决条件Prerequisites

JDBC 连接字符串JDBC connection string

JDBC 通过端口 443 连接到 Azure 上的 HDInsight 群集,并使用 SSL 保护通信安全。JDBC connections to an HDInsight cluster on Azure are made over port 443, and the traffic is secured using SSL. 公用网关(群集位于其后)会将通信重定向到 HiveServer2 实际进行侦听的端口。The public gateway that the clusters sit behind redirects the traffic to the port that HiveServer2 is actually listening on. 以下连接字符串显示要用于 HDInsight 的格式:The following connection string shows the format to use for HDInsight:

jdbc:hive2://CLUSTERNAME.azurehdinsight.cn:443/default;transportMode=http;ssl=true;httpPath=/hive2

CLUSTERNAME 替换为 HDInsight 群集的名称。Replace CLUSTERNAME with the name of your HDInsight cluster.

身份验证Authentication

建立连接时,必须使用 HDInsight 群集管理员名称和密码对群集网关进行身份验证。When establishing the connection, you must use the HDInsight cluster admin name and password to authenticate to the cluster gateway. 从 JDBC 客户端(如 SQuirreL SQL)进行连接时,必须在客户端设置中输入管理员名称和密码。When connecting from JDBC clients such as SQuirreL SQL, you must enter the admin name and password in client settings.

从 Java 应用程序建立连接时,必须使用该名称和密码。From a Java application, you must use the name and password when establishing a connection. 例如,以下 Java 代码使用连接字符串、管理员名称和密码打开新连接:For example, the following Java code opens a new connection using the connection string, admin name, and password:

DriverManager.getConnection(connectionString,clusterAdmin,clusterPassword);

使用 SQuirreL SQL 客户端进行连接Connect with SQuirreL SQL client

SQuirreL SQL 是一个 JDBC 客户端,可用于通过 HDInsight 群集远程运行 Hive 查询。SQuirreL SQL is a JDBC client that can be used to remotely run Hive queries with your HDInsight cluster. 以下步骤假设已安装 SQuirreL SQL。The following steps assume that you have already installed SQuirreL SQL.

  1. 创建一个目录来包含要从群集复制的某些文件。Create a directory to contain certain files to be copied from your cluster.

  2. 在下面的脚本中,将 sshuser 替换为群集的 SSH 用户帐户名。In the following script, replace sshuser with the SSH user account name for the cluster. CLUSTERNAME 替换为 HDInsight 群集名称。Replace CLUSTERNAME with the HDInsight cluster name. 从命令行将工作目录更改为上一步中创建的目录,然后输入以下命令以从 HDInsight 群集复制文件:From a command line, change your work directory to the one created in the prior step, and then enter the following command to copy files from an HDInsight cluster:

    scp sshuser@CLUSTERNAME-ssh.azurehdinsight.cn:/usr/hdp/current/hadoop-client/{hadoop-auth.jar,hadoop-common.jar,lib/log4j-*.jar,lib/slf4j-*.jar,lib/curator-*.jar} .
    
    scp sshuser@CLUSTERNAME-ssh.azurehdinsight.cn:/usr/hdp/current/hive-client/lib/{commons-codec*.jar,commons-logging-*.jar,hive-*-*.jar,httpclient-*.jar,httpcore-*.jar,libfb*.jar,libthrift-*.jar} .
    
  3. 启动 SQuirreL SQL 应用程序。Start the SQuirreL SQL application. 在窗口左侧中,选择“驱动程序” 。From the left of the window, select Drivers.

    窗口左侧的“驱动程序”选项卡

  4. 从“驱动程序” 对话框顶部的图标中,选择 + 图标创建驱动程序。From the icons at the top of the Drivers dialog, select the + icon to create a driver.

    驱动程序图标

  5. 在“添加驱动程序”对话框中,添加以下信息:In the Add Driver dialog, add the following information:

    • 名称:HiveName: Hive
    • 示例 URLjdbc:hive2://localhost:443/default;transportMode=http;ssl=true;httpPath=/hive2Example URL: jdbc:hive2://localhost:443/default;transportMode=http;ssl=true;httpPath=/hive2
    • 额外类路径:使用“添加” 按钮添加此前下载的所有 jar 文件Extra Class Path: Use the Add button to add the all of jar files downloaded earlier
    • 类名:org.apache.hive.jdbc.HiveDriverClass Name: org.apache.hive.jdbc.HiveDriver

    添加驱动程序对话框

    选择“确定” 保存这些设置。Select OK to save these settings.

  6. 在 SQuirreL SQL 窗口左侧,选择“别名” 。On the left of the SQuirreL SQL window, select Aliases. 然后选择 + 图标来创建连接别名。Then select the + icon to create a connection alias.

    SQuirreL SQL“添加新别名”对话框

  7. 将以下值用于“添加别名” 对话框。Use the following values for the Add Alias dialog.

    • 名称:Hive on HDInsightName: Hive on HDInsight

    • 驱动程序:使用下拉列表选择 Hive 驱动程序Driver: Use the drop-down to select the Hive driver

    • URL:jdbc:hive2://CLUSTERNAME.azurehdinsight.cn:443/default;transportMode=http;ssl=true;httpPath=/hive2URL: jdbc:hive2://CLUSTERNAME.azurehdinsight.cn:443/default;transportMode=http;ssl=true;httpPath=/hive2

      CLUSTERNAME 替换为 HDInsight 群集的名称。Replace CLUSTERNAME with the name of your HDInsight cluster.

    • 用户名:HDInsight 群集的群集登录帐户名。User Name: The cluster login account name for your HDInsight cluster. 默认为 adminThe default is admin.

    • 密码:群集登录帐户的密码。Password: The password for the cluster login account.

    添加别名对话框

    重要

    使用“测试” 按钮验证连接是否有效。Use the Test button to verify that the connection works. 出现“连接到: Hive on HDInsight”对话框时,选择“连接” 进行测试。When Connect to: Hive on HDInsight dialog appears, select Connect to perform the test. 如果测试成功,将会显示“连接成功” 对话框。If the test succeeds, you see a Connection successful dialog. 如果发生错误,请参阅故障排除If an error occurs, see Troubleshooting.

    若要保存连接别名,请使用“添加别名” 对话框底部的“确定” 按钮。To save the connection alias, use the Ok button at the bottom of the Add Alias dialog.

  8. 在 SQuirreL SQL 顶部的“连接到” 下拉列表中,选择“Hive on HDInsight” 。From the Connect to dropdown at the top of SQuirreL SQL, select Hive on HDInsight. 出现提示时,选择“连接” 。When prompted, select Connect.

    连接对话框

  9. 连接后,在 SQL 查询对话框中输入以下查询,然后选择“运行” 图标(一个正在跑步的人)。Once connected, enter the following query into the SQL query dialog, and then select the Run icon (a running person). 结果区域会显示查询的结果。The results area should show the results of the query.

    select * from hivesampletable limit 10;
    

    sql 查询对话框,其中包括结果

从 Java 应用程序示例进行连接Connect from an example Java application

https://github.com/Azure-Samples/hdinsight-java-hive-jdbc 上提供了使用 Java 客户端查询 Hive on HDInsight 的示例。An example of using a Java client to query Hive on HDInsight is available at https://github.com/Azure-Samples/hdinsight-java-hive-jdbc. 按照存储库中的说明生成并运行该示例。Follow the instructions in the repository to build and run the sample.

故障排除Troubleshooting

尝试打开 SQL 连接时发生意外错误Unexpected Error occurred attempting to open an SQL connection

症状:连接到 HDInsight 群集版本 3.3 或更高版本时,可能会遇到意外错误。Symptoms: When connecting to an HDInsight cluster that is version 3.3 or greater, you may receive an error that an unexpected error occurred. 此错误的堆栈跟踪的开头为以下行:The stack trace for this error begins with the following lines:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.<init>(I)V
at java.util.concurrent.FutureTas...(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)

原因:此错误由 SQuirreL 随附的较旧版本 commons-codec.jar 文件引起。Cause: This error is caused by an older version commons-codec.jar file included with SQuirreL.

解决方法:若要解决此错误,请使用以下步骤:Resolution: To fix this error, use the following steps:

  1. 退出 SQuirreL,并转到系统上安装 SQuirreL 的目录(可能是 C:\Program Files\squirrel-sql-4.0.0\lib)。Exit SQuirreL, and then go to the directory where SQuirreL is installed on your system, perhaps C:\Program Files\squirrel-sql-4.0.0\lib. 在 SquirreL 目录的 lib 目录下,将现有的 commons-codec.jar 替换为从 HDInsight 群集下载的文件。In the SquirreL directory, under the lib directory, replace the existing commons-codec.jar with the one downloaded from the HDInsight cluster.

  2. 重新启动 SQuirreL。Restart SQuirreL. 连接到 HDInsight 上的 Hive 时,应不再会出现该错误。The error should no longer occur when connecting to Hive on HDInsight.

后续步骤Next steps

现在,已了解如何将 JDBC 与 Hive 配合使用,请使用以下链接学习 Azure HDInsight 的其他用法。Now that you have learned how to use JDBC to work with Hive, use the following links to explore other ways to work with Azure HDInsight.