创建 HDInsight 群集时添加自定义 Apache Hive 库Add custom Apache Hive libraries when creating your HDInsight cluster

了解如何在 HDInsight 上预加载 Apache Hive 库。Learn how to pre-load Apache Hive libraries on HDInsight. 本文档包含有关在群集创建过程中使用脚本操作预加载库的信息。This document contains information on using a Script Action to pre-load libraries during cluster creation. 使用本文档中的步骤添加的库已在 Hive 中正式发布 - 无需使用 ADD JAR 加载它们。Libraries added using the steps in this document are globally available in Hive - there is no need to use ADD JAR to load them.

工作原理How it works

创建群集时,可以使用脚本操作修改创建的群集节点。When creating a cluster, you can use a Script Action to modify cluster nodes as they are created. 本文档中的脚本接受一个参数,即库的位置。The script in this document accepts a single parameter, which is the location of the libraries. 此位置必须位于 Azure 存储帐户中,并且库必须作为 jar 文件存储。This location must be in an Azure Storage Account, and the libraries must be stored as jar files.

在群集创建过程中,该脚本将枚举文件、将这些文件复制到头节点和工作节点上的 /usr/lib/customhivelibs/ 目录,并将它们添加到 core-site.xml 文件中的 hive.aux.jars.path 属性。During cluster creation, the script enumerates the files, copies them to the /usr/lib/customhivelibs/ directory on head and worker nodes, then adds them to the hive.aux.jars.path property in the core-site.xml file. 在基于 Linux 的群集中,它还会针对文件位置更新 hive-env.sh 文件。On Linux-based clusters, it also updates the hive-env.sh file with the location of the files.

使用本文中的脚本操作,可以在使用 Hive 客户端 WebHCatHiveServer2 时提供库。Using the script action in this article makes the libraries available when using a Hive client for WebHCat, and HiveServer2.

脚本The script

脚本位置Script location

https://hdiconfigactions.blob.core.windows.net/setupcustomhivelibsv01/setup-customhivelibs-v01.ps1

要求Requirements

  • 这些脚本必须同时应用于“头节点” 和“辅助角色节点” 。The scripts must be applied to both the Head nodes and Worker nodes.

  • 要安装的 jar 必须存储在单个容器中的 Azure Blob 存储中。The jars you wish to install must be stored in Azure Blob Storage in a single container.

  • 在创建期间,包含 jar 文件的库的存储帐户 必须 链接到 HDInsight 群集。The storage account containing the library of jar files must be linked to the HDInsight cluster during creation. 它必须是默认的存储帐户,或通过__存储帐户设置__添加的帐户。It must either be the default storage account, or an account added through Storage Account Settings.

  • 必须指定容器的 WASB 路径作为脚本操作的参数。The WASB path to the container must be specified as a parameter to the Script Action. 例如,如果 jar 存储在名为 mystorage 的存储帐户上名为 libs 的容器中,则该参数应为 wasb://libs@mystorage.blob.core.chinacloudapi.cn/For example, if the jars are stored in a container named libs on a storage account named mystorage, the parameter would be wasb://libs@mystorage.blob.core.chinacloudapi.cn/.

    备注

    本文档假定已创建存储帐户、blob 容器,并已将文件上传到该容器。This document assumes that you have already created a storage account, blob container, and uploaded the files to it.

    如果尚未创建存储帐户,可以通过 Azure 门户创建该帐户。If you have not created a storage account, you can do so through the Azure portal. 然后可以使用实用工具(如 Azure 存储资源管理器 )在帐户中创建容器并将文件上传到该容器。You can then use a utility such as Azure Storage Explorer to create a container in the account and upload files to it.

使用脚本创建群集。Create a cluster using the script

  1. 使用预配 Linux 上的 HDInsight 群集中的步骤开始预配群集,但不要完成预配。Start provisioning a cluster by using the steps in Provision HDInsight clusters on Linux, but don't complete provisioning. 也可以使用 Azure PowerShell 或 HDInsight .NET SDK 来使用此脚本创建群集。You can also use Azure PowerShell or the HDInsight .NET SDK to create a cluster using this script. 有关使用这些方法的详细信息,请参阅使用脚本操作自定义 HDInsight 群集For more information on using these methods, see Customize HDInsight clusters with Script Actions. 在 Azure 门户中,从“配置 + 定价” 选项卡中选择“+ 添加脚本操作” 。For the Azure portal, from the Configuration + pricing tab, select the + Add script action.

  2. 对于“存储” ,如果包含 jar 文件的库的存储帐户将不同于用于群集的帐户,则请完成 “其他存储帐户”。For Storage, if the storage account containing the library of jar files will be different than the account used for the cluster, complete Additional storage accounts.

  3. 对于“脚本操作” ,请提供以下信息:For Script Actions, provide the following information:

    属性Property ValueValue
    脚本类型Script type - Custom- Custom
    名称Name Libraries
    Bash 脚本 URIBash script URI https://hdiconfigactions.blob.core.windows.net/linuxsetupcustomhivelibsv01/setup-customhivelibs-v01.sh
    节点类型Node type(s) 头节点、工作器节点Head, Worker
    parametersParameters 输入包含 jar 的容器和存储帐户的 WASB 地址。Enter the WASB address to the container and storage account that contains the jars. 例如,wasbs://libs@mystorage.blob.core.windows.net/For example, wasbs://libs@mystorage.blob.core.windows.net/.

    备注

    对于 Apache Spark 2.1,请使用此 bash 脚本 URI:https://hdiconfigactions.blob.core.windows.net/linuxsetupcustomhivelibsv01/setup-customhivelibs-v00.shFor Apache Spark 2.1, use this bash script URI: https://hdiconfigactions.blob.core.windows.net/linuxsetupcustomhivelibsv01/setup-customhivelibs-v00.sh.

  4. 继续按预配 Linux 上的 HDInsight 群集中所述预配群集。Continue provisioning the cluster as described in Provision HDInsight clusters on Linux.

群集创建完成后,你将能够使用通过此脚本从 Hive 添加的 jar,而无需使用 ADD JAR 语句。Once cluster creation completes, you're able to use the jars added through this script from Hive without having to use the ADD JAR statement.

后续步骤Next steps

有关使用 Hive 的详细信息,请参阅将 Apache Hive 与 HDInsight 配合使用For more information on working with Hive, see Use Apache Hive with HDInsight.