在 HDInsight Hadoop 群集上安装并使用 HueInstall and use Hue on HDInsight Hadoop clusters

了解如何在 HDInsight 群集上安装 Hue,并使用隧道将请求路由至 Hue。Learn how to install Hue on HDInsight clusters and use tunneling to route the requests to Hue.

什么是 Hue?What is Hue?

Hue 是一组 Web 应用程序,用来与 Apache Hadoop 群集交互。Hue is a set of Web applications used to interact with an Apache Hadoop cluster. 可以使用 Hue 浏览与 Hadoop 群集关联的存储(对于 HDInsight 群集,为 WASB)、运行 Hive 作业和 Pig 脚本等等。You can use Hue to browse the storage associated with a Hadoop cluster (WASB, in the case of HDInsight clusters), run Hive jobs and Pig scripts, and so on. HDInsight Hadoop 群集上的 Hue 安装提供以下组件。The following components are available with Hue installations on an HDInsight Hadoop cluster.

  • Beeswax Hive 编辑器Beeswax Hive Editor
  • Apache PigApache Pig
  • 元存储管理器Metastore manager
  • Apache OozieApache Oozie
  • FileBrowser(与 WASB 默认容器进行通信)FileBrowser (which talks to WASB default container)
  • 作业浏览器Job Browser

警告

完全支持通过 HDInsight 群集提供的组件,Azure 支持部门帮助找出并解决与这些组件相关的问题。Components provided with the HDInsight cluster are fully supported and Azure Support will help to isolate and resolve issues related to these components.

自定义组件可获得合理范围的支持,有助于进一步解决问题。Custom components receive commercially reasonable support to help you to further troubleshoot the issue. 这可能会促进解决问题,或要求使用可用的开源技术渠道,在渠道中可找到该技术的深厚的专业知识。This might result in resolving the issue OR asking you to engage available channels for the open source technologies where deep expertise for that technology is found. 有许多可以使用的社区站点,例如:面向 HDInsight 的 MSDN 论坛Azure CSDNFor example, there are many community sites that can be used, like: MSDN forum for HDInsight, Azure CSDN. 此外,Apache 项目在 http://apache.org 上提供了项目站点,例如:HadoopAlso Apache projects have project sites on http://apache.org, for example: Hadoop.

使用脚本操作安装 HueInstall Hue using Script Actions

使用下表中的信息进行脚本操作。Use the information in the table below for your Script Action. 有关使用脚本操作的具体说明,请参阅使用脚本操作自定义 HDInsight 群集See Customize HDInsight clusters with Script Actions for specific instructions on using Script Actions.

备注

若要在 HDInsight 群集上安装 Hue,建议的头节点大小为至少 A4(8 核、14 GB 内存)。To install Hue on HDInsight clusters, the recommended headnode size is at least A4 (8 cores, 14 GB memory).

属性Property ValueValue
脚本类型:Script type: - Custom- Custom
名称Name 安装 HueInstall Hue
Bash 脚本 URIBash script URI https://hdiconfigactions.blob.core.windows.net/linuxhueconfigactionv02/install-hue-uber-v02.sh
节点类型:Node type(s): Head

将 Hue 与 HDInsight 群集搭配使用Use Hue with HDInsight clusters

运行 Hue 时,SSH 隧道是在群集上访问 Hue 的唯一方式。SSH Tunneling is the only way to access Hue on the cluster once it is running. 通过 SSH 的隧道允许流量直接流向运行 Hue 的群集的头节点。Tunneling via SSH allows the traffic to go directly to the headnode of the cluster where Hue is running. 在完成群集预配后,通过执行以下步骤在 HDInsight 群集上使用 Hue。After the cluster has finished provisioning, use the following steps to use Hue on an HDInsight cluster.

备注

建议使用 Firefox web 浏览器按照以下说明进行操作。We recommend using Firefox web browser to follow the instructions below.

  1. 利用使用 SSH 隧道来访问 Apache Ambari Web UI、ResourceManager、JobHistory、NameNode、Oozie 及其他 Web UI 中的信息,创建从客户端系统到 HDInsight 群集的 SSH 隧道,并将 Web 浏览器配置为使用 SSH 隧道作为代理。Use the information in Use SSH Tunneling to access Apache Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI's to create an SSH tunnel from your client system to the HDInsight cluster, and then configure your Web browser to use the SSH tunnel as a proxy.

  2. 使用 ssh 命令连接到群集。Use ssh command to connect to your cluster. 编辑以下命令(将 CLUSTERNAME 替换为群集的名称),然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  3. 连接后,使用以下命令获取主头节点的完全限定域名:Once connected, use the following command to obtain the fully qualified domain name of the primary headnode:

    hostname -f
    

    此命令返回类似于下面的名称:This will return a name similar to the following:

     myhdi-nfebtpfdv1nubcidphpap2eq2b.ex.internal.chinacloudapp.cn
    

    这是 Hue 网站所在的主头节点的主机名。This is the hostname of the primary headnode where the Hue website is located.

  4. 使用浏览器打开 Hue 门户 ( http://HOSTNAME:8888 )。Use the browser to open the Hue portal at http://HOSTNAME:8888. 将 HOSTNAME 替换为在上一步骤中获取的名称。Replace HOSTNAME with the name you obtained in the previous step.

    备注

    第一次登录时,系统会提示创建帐户来登录 Hue 门户。When you log in for the first time, you will be prompted to create an account to log in to the Hue portal. 在此处指定的凭据只能用于该门户,并且与预配群集时指定的管理员或 SSH 用户凭据不相关。The credentials you specify here will be limited to the portal and are not related to the admin or SSH user credentials you specified while provision the cluster.

    登录到 Hue 门户Login to the Hue portal

运行 Hive 查询Run a Hive query

  1. 在 Hue 门户中,选择“查询编辑器” ,然后选择“Hive” 打开 Hive 编辑器。From the Hue portal, select Query Editors, and then select Hive to open the Hive editor.

    使用 HiveUse Hive

  2. 在“帮助” 选项卡上的“数据库” 下面,应会看到 hivesampletableOn the Assist tab, under Database, you should see hivesampletable. 这是 HDInsight 上的所有 Hadoop 群集随附的示例表。This is a sample table that is shipped with all Hadoop clusters on HDInsight. 在右窗格中输入示例查询,并在下方窗格的“结果” 选项卡中查看输出,如屏幕截图所示。Enter a sample query in the right pane and see the output on the Results tab in the pane below, as shown in the screen capture.

    运行 Hive 查询Run Hive query

    也可以使用“图表” 选项卡查看结果的视觉表示形式。You can also use the Chart tab to see a visual representation of the result.

浏览群集存储Browse the cluster storage

  1. 在 Hue 门户中,选择菜单栏右上角的“文件浏览器” 。From the Hue portal, select File Browser in the top-right corner of the menu bar.

  2. 默认情况下,文件浏览器在 /user/myuser 目录中打开。By default the file browser opens at the /user/myuser directory. 选择路径中用户目录前面的正斜杠,以转到与群集关联的 Azure 存储容器的根目录。Select the forward slash right before the user directory in the path to go to the root of the Azure storage container associated with the cluster.

    使用文件浏览器Use file browser

  3. 右键单击某个文件或文件夹,以查看可用的操作。Right-click on a file or folder to see the available operations. 使用右侧的“上传” 按钮,将文件上传到当前目录。Use the Upload button in the right corner to upload files to the current directory. 使用“新建” 按钮创建新的文件或目录。Use the New button to create new files or directories.

备注

Hue 文件浏览器只能显示与 HDInsight 群集关联的默认容器的内容。The Hue file browser can only show the contents of the default container associated with the HDInsight cluster. 与群集关联的任何其他存储帐户/容器将无法使用文件浏览器访问。Any additional storage accounts/containers that you might have associated with the cluster will not be accessible using the file browser. 不过,与群集关联的其他容器始终可供 Hive 作业访问。However, the additional containers associated with the cluster will always be accessible for the Hive jobs. 例如,如果在 Hive 编辑器中输入 dfs -ls wasb://newcontainer@mystore.blob.core.chinacloudapi.cn 命令,也可以看到其他容器的内容。For example, if you enter the command dfs -ls wasb://newcontainer@mystore.blob.core.chinacloudapi.cn in the Hive editor, you can see the contents of additional containers as well. 在此命令中, newcontainer 不是与群集关联的默认容器。In this command, newcontainer is not the default container associated with a cluster.

重要注意事项Important considerations

  1. 用于安装 Hue 的脚本只会在群集的主头节点上安装它。The script used to install Hue installs it only on the primary headnode of the cluster.

  2. 在安装期间,系统会重启多个 Hadoop 服务(HDFS、YARN、MR2、Oozie),以更新配置。During installation, multiple Hadoop services (HDFS, YARN, MR2, Oozie) are restarted for updating the configuration. 在脚本安装完 Hue 之后,可能需要一些时间让其他 Hadoop 服务启动。After the script finishes installing Hue, it might take some time for other Hadoop services to start up. 一开始可能会影响 Hue 的性能。This might affect Hue's performance initially. 等所有服务都启动之后,Hue 就可以完全正常运行。Once all services start up, Hue will be fully functional.

  3. Hue 不了解 Apache Tez 作业,它是 Hive 当前的默认值。Hue does not understand Apache Tez jobs, which is the current default for Hive. 如果想使用 MapReduce 作为 Hive 执行引擎,请更新脚本,以在脚本中使用以下命令:If you want to use MapReduce as the Hive execution engine, update the script to use the following command in your script:

      set hive.execution.engine=mr;
    
  4. 使用 Linux 群集时,可能会出现这种情况:服务在主头节点上运行,而 Resource Manager 可能在辅助头节点上运行。With Linux clusters, you can have a scenario where your services are running on the primary headnode while the Resource Manager could be running on the secondary. 使用 Hue 查看群集上正在运行的作业的详细信息时,这种情况可能会导致错误(如下所示)。Such a scenario might result in errors (shown below) when using Hue to view details of RUNNING jobs on the cluster. 不过,可以在作业完成后查看作业详细信息。However, you can view the job details when the job has completed.

    Hue 门户错误Hue portal error

    这是由已知问题造成的。This is due to a known issue. 解决方法如下:修改 Ambari,使活动 Resource Manager 也在主头节点上运行。As a workaround, modify Ambari so that the active Resource Manager also runs on the primary headnode.

  5. 当 HDInsight 群集使用 Azure 存储(使用 wasbs://)时,Hue 能识别 WebHDFS。Hue understands WebHDFS while HDInsight clusters use Azure Storage using wasbs://. 因此,搭配脚本操作使用的自定义脚本会安装 WebWasb,这是用来与 WASB 通信的 WebHDFS 兼容服务。So, the custom script used with script action installs WebWasb, which is a WebHDFS-compatible service for talking to WASB. 因此,即使 Hue 门户中显示 HDFS(例如,将鼠标移到“文件浏览器” 上时),也应该将它解释为 WASB。So, even though the Hue portal says HDFS in places (like when you move your mouse over the File Browser), it should be interpreted as WASB.