连接到 HDInsight 上的 Apache Beeline 或将其安装在本地Connect to Apache Beeline on HDInsight or install it locally

Apache Beeline 是一个 Hive 客户端,它包含在 HDInsight 群集的头节点上。Apache Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. 本文介绍如何通过不同连接类型连接到 HDInsight 群集上安装的 Beeline 客户端。This article describes how to connect to the Beeline client installed on your HDInsight cluster across different types of connections. 还介绍了如何在本地安装 Beeline 客户端It also discusses how to Install the Beeline client locally.

连接类型Types of connections

从 SSH 会话From an SSH session

如果从 SSH 会话连接到群集头节点,则可随后连接到端口 headnodehost 上的 10001 地址:When connecting from an SSH session to a cluster headnode, you can then connect to the headnodehost address on port 10001:

beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'

通过 Azure 虚拟网络Over an Azure Virtual Network

通过 Azure 虚拟网络从客户端连接到 HDInsight 时,必须提供群集头节点的完全限定域名 (FQDN)。When connecting from a client to HDInsight over an Azure Virtual Network, you must provide the fully qualified domain name (FQDN) of a cluster head node. 由于直接与群集节点建立此连接,因此此连接使用端口 10001Since this connection is made directly to the cluster nodes, the connection uses port 10001:

beeline -u 'jdbc:hive2://<headnode-FQDN>:10001/;transportMode=http'

<headnode-FQDN> 替换为群集头节点的完全限定域名。Replace <headnode-FQDN> with the fully qualified domain name of a cluster headnode. 若要查找头节点的完全限定域名,请使用使用 Apache Ambari REST API 管理 HDInsight 文档中的信息。To find the fully qualified domain name of a headnode, use the information in the Manage HDInsight using the Apache Ambari REST API document.

通过公共或专用终结点Over public or private endpoints

使用公共或专用终结点连接到群集时,必须提供群集登录帐户名(默认值为 admin)和密码。When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default admin) and password. 例如,使用 Beeline 从客户端系统连接到 clustername.azurehdinsight.cn 地址。For example, using Beeline from a client system to connect to the clustername.azurehdinsight.cn address. 此连接通过端口 443 建立,并使用 TLS/SSL 进行加密。This connection is made over port 443, and is encrypted using TLS/SSL.

clustername 替换为 HDInsight 群集的名称。Replace clustername with the name of your HDInsight cluster. admin 替换为群集的群集登录帐户。Replace admin with the cluster login account for your cluster. 对于 群集,请使用完整的 UPN(例如 user@domain.com)。For clusters, use the full UPN (for example, user@domain.com). password 替换为群集登录帐户的密码。Replace password with the password for the cluster login account.

beeline -u 'jdbc:hive2://clustername.azurehdinsight.cn:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'

或对于专用终结点:or for private endpoint:

beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.cn:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'

专用终结点指向一个基本的负载均衡器,后者只能从在同一区域中进行对等互连的 VNET 访问。Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. 有关详细信息,请参阅对全局 VNet 对等互连和负载均衡器的约束See constraints on global VNet peering and load balancers for more info. 在使用 beeline 之前,可以将 curl 命令与 -v 选项配合使用,以便排查公共或专用终结点的任何连接问题。You can use the curl command with -v option to troubleshoot any connectivity problems with public or private endpoints before using beeline.

将 Beeline 与 Apache Spark 配合使用Use Beeline with Apache Spark

Apache Spark 提供自己的 HiveServer2 实现(有时称为 Spark Thrift 服务器)。Apache Spark provides its own implementation of HiveServer2, which is sometimes referred to as the Spark Thrift server. 此服务使用 Spark SQL 而不是 Hive 来解析查询。This service uses Spark SQL to resolve queries instead of Hive. 并且,此服务可以提供更好的性能,具体取决于查询。And may provide better performance depending on your query.

通过公共或专用终结点Through public or private endpoints

使用的连接字符串略有不同。The connection string used is slightly different. 它使用 httpPath/sparkhive2,而不包含 httpPath=/hive2Instead of containing httpPath=/hive2 it uses httpPath/sparkhive2. clustername 替换为 HDInsight 群集的名称。Replace clustername with the name of your HDInsight cluster. admin 替换为群集的群集登录帐户。Replace admin with the cluster login account for your cluster. 对于 群集,请使用完整的 UPN(例如 user@domain.com)。For clusters, use the full UPN (for example, user@domain.com). password 替换为群集登录帐户的密码。Replace password with the password for the cluster login account.

beeline -u 'jdbc:hive2://clustername.azurehdinsight.cn:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'

或对于专用终结点:or for private endpoint:

beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.cn:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'

专用终结点指向一个基本的负载均衡器,后者只能从在同一区域中进行对等互连的 VNET 访问。Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. 有关详细信息,请参阅对全局 VNet 对等互连和负载均衡器的约束See constraints on global VNet peering and load balancers for more info. 在使用 beeline 之前,可以将 curl 命令与 -v 选项配合使用,以便排查公共或专用终结点的任何连接问题。You can use the curl command with -v option to troubleshoot any connectivity problems with public or private endpoints before using beeline.

使用 Apache Spark 从群集头或 Azure 虚拟网络中From cluster head or inside Azure Virtual Network with Apache Spark

当直接从群集头节点或者从 HDInsight 群集所在的 Azure 虚拟网络中的资源进行连接时,应当为 Spark Thrift 服务器使用端口 10002 而非 10001When connecting directly from the cluster head node, or from a resource inside the same Azure Virtual Network as the HDInsight cluster, port 10002 should be used for Spark Thrift server instead of 10001. 以下示例演示如何直接连接到头节点:The following example shows how to connect directly to the head node:

/usr/hdp/current/spark2-client/bin/beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'

安装 Beeline 客户端Install Beeline client

虽然头节点上包含 Beeline,但建议将其安装在本地。Although Beeline is included on the head nodes, you may want to install it locally. 本地计算机的安装步骤基于适用于 Linux 的 Windows 子系统The install steps for a local machine are based on a Windows Subsystem for Linux.

  1. 更新包列表。Update package lists. 在 bash shell 中输入以下命令:Enter the following command in your bash shell:

    sudo apt-get update
    
  2. 安装 Java(如果未安装)。Install Java if not installed. 可以使用 which java 命令进行检查。You can check with the which java command.

    1. 如果未安装 java 包,请输入以下命令:If no java package is installed, enter the following command:

      sudo apt install openjdk-11-jre-headless
      
    2. 打开 bashrc 文件(通常位于 ~/.bashrc 中):nano ~/.bashrcOpen the bashrc file (often found in ~/.bashrc): nano ~/.bashrc.

    3. 修改 bashrc 文件。Amend the bashrc file. 在该文件的末尾添加以下行:Add the following line at the end of the file:

      export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
      

      依次按 Ctrl+XY、Enter。Then press Ctrl+X, then Y, then enter.

  3. 下载 Hadoop 和 Beeline 存档,输入以下命令:Download Hadoop and Beeline archives, enter the following commands:

    wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
    wget https://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
    
  4. 解压缩这些存档,输入以下命令:Unpack the archives, enter the following commands:

    tar -xvzf hadoop-2.7.3.tar.gz
    tar -xvzf apache-hive-1.2.1-bin.tar.gz
    
  5. 进一步修改 bashrc 文件。Further amend the bashrc file. 你需要确定存档解压缩到的路径。You'll need to identify the path to where the archives were unpacked. 如果使用适用于 Linux 的 Windows 子系统,并严格按步骤操作,则路径为 /mnt/c/Users/user/,其中 user 是你的用户名。If using the Windows Subsystem for Linux, and you followed the steps exactly, your path would be /mnt/c/Users/user/, where user is your user name.

    1. 打开文件 nano ~/.bashrcOpen the file: nano ~/.bashrc

    2. 用适当的路径修改下面的命令,并将其输入到 bashrc 文件的末尾:Modify the commands below with the appropriate path and then enter them at the end of the bashrc file:

      export HADOOP_HOME=/path_where_the_archives_were_unpacked/hadoop-2.7.3
      export HIVE_HOME=/path_where_the_archives_were_unpacked/apache-hive-1.2.1-bin
      PATH=$PATH:$HIVE_HOME/bin
      
    3. 依次按 Ctrl+XY、Enter。Then press Ctrl+X, then Y, then enter.

  6. 关闭并重新打开 bash 会话。Close and then reopen you bash session.

  7. 测试连接。Test your connection. 使用上面的公共或专用终结点的连接格式。Use the connection format from Over public or private endpoints, above.

后续步骤Next steps