在 Azure 虚拟网络中的 HDInsight 上创建 Apache HBase 群集Create Apache HBase clusters on HDInsight in Azure Virtual Network

了解如何在 Azure 虚拟网络中创建 Azure HDInsight Apache HBase 群集。Learn how to create Azure HDInsight Apache HBase clusters in an Azure Virtual Network.

通过虚拟网络集成,可以将 Apache HBase 群集部署到应用程序所在的虚拟网络,以便应用程序直接与 HBase 进行通信。With virtual network integration, Apache HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly. 优点包括:The benefits include:

  • 将 Web 应用程序直接连接到 HBase 群集节点,通过 HBase Java 远程过程调用 (RPC) API 实现通信。Direct connectivity of the web application to the nodes of the HBase cluster, which enables communication via HBase Java remote procedure call (RPC) APIs.
  • 提高性能,因为流量不必通过多个网关和负载均衡器。Improved performance by not having your traffic go over multiple gateways and load-balancers.
  • 能够以更安全的方式处理敏感信息,而无需公开公共终结点。The ability to process sensitive information in a more secure manner without exposing a public endpoint.

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

在虚拟网络中创建 Apache HBase 群集Create Apache HBase cluster into virtual network

在本部分中,通过 Azure 资源管理器模板在 Azure 虚拟网络中使用从属 Azure 存储帐户创建基于 Linux 的 Apache HBase 群集。In this section, you create a Linux-based Apache HBase cluster with the dependent Azure Storage account in an Azure virtual network using an Azure Resource Manager template. 若要了解其他群集创建方法以及设置,请参阅创建 HDInsight 群集For other cluster creation methods and understanding the settings, see Create HDInsight clusters. 有关使用模板在 HDInsight 中创建 Apache Hadoop 群集的详细信息,请参阅使用 Azure 资源管理器模板在 HDInsight 中创建 Apache Hadoop 群集For more information about using a template to create Apache Hadoop clusters in HDInsight, see Create Apache Hadoop clusters in HDInsight using Azure Resource Manager templates

备注

某些属性已在模板中硬编码。Some properties are hard-coded into the template. 例如:For example:

  • 位置:中国东部Location: China East
  • 群集版本:3.6Cluster version: 3.6
  • 群集工作节点计数:2Cluster worker node count: 2
  • 默认存储帐户:唯一字符串Default storage account: a unique string
  • 虚拟网络名称:<群集名称>-vnetVirtual network name: <Cluster Name>-vnet
  • 虚拟网络地址空间:10.0.0.0/16Virtual network address space: 10.0.0.0/16
  • 子网名称:subnet1Subnet name: subnet1
  • 子网地址范围:10.0.0.0/24Subnet address range: 10.0.0.0/24

<群集名称> 会替换为使用模板时提供的群集名称。<Cluster Name> is replaced with the cluster name you provide when using the template.

  1. 单击下面的图像可在 Azure 门户中打开模板。Click the following image to open the template in the Azure portal. 模板位于 Azure 快速入门模板中。The template is located in Azure QuickStart Templates.

    Deploy to Azure

  2. 从“自定义部署” 对话框中,选择“编辑模板” 。From the Custom deployment dialog, select Edit template.

  3. 在第 165 行中,将值 Standard_A3 更改为 Standard_A4_V2On line 165, change value Standard_A3 to Standard_A4_V2. 再选择“保存” 。Then select Save.

  4. 使用以下信息完成剩余模板:Complete the remaining template with the following information:

    属性Property ValueValue
    订阅Subscription 选择用来创建 HDInsight 群集的 Azure 订阅、相关存储帐户和 Azure 虚拟网络。Select an Azure subscription used to create the HDInsight cluster, the dependent Storage account and the Azure virtual network.
    资源组Resource group 选择“新建” ,并指定新的资源组名称。Select Create new, and specify a new resource group name.
    位置Location 选择资源组的位置。Select a location for the resource group.
    群集名称Cluster Name 为要创建的 Hadoop 群集输入名称。Enter a name for the Hadoop cluster to be created.
    群集登录用户名和密码Cluster Login User Name and Password 默认的“用户名”为 admin。提供密码。The default User Name is admin. Provide a password.
    SSH 用户名和密码Ssh User Name and Password 默认的“用户名”为 sshuserThe default User Name is sshuser. 提供密码。Provide a password.

    选择“我同意上述条款和条件”。 Select I agree to the terms and the conditions stated above.

  5. 选择“购买”。 Select Purchase. 创建群集大约需要 20 分钟时间。It takes about around 20 minutes to create a cluster. 创建群集之后,便可以在门户中选择群集以打开它。Once the cluster is created, you can select the cluster in the portal to open it.

完成本文后,可能需要删除群集。After you complete the article, you might want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. 此外,还需要支付 HDInsight 群集费用,即使未使用。You are also charged for an HDInsight cluster, even when it is not in use. 由于群集费用高于存储空间费用数倍,因此在不使用群集时将其删除可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use. 有关删除群集的说明,请参阅使用 Azure 门户在 HDInsight 中管理 Apache Hadoop 群集For the instructions of deleting a cluster, see Manage Apache Hadoop clusters in HDInsight by using the Azure portal.

要开始处理新 HBase 群集,可以按照开始在 HDInsight 中将 Apache HBase 与 Apache Hadoop 配合使用中的步骤进行操作。To begin working with your new HBase cluster, you can use the procedures found in Get started using Apache HBase with Apache Hadoop in HDInsight.

使用 Apache HBase Java RPC API 连接到 Apache HBase 群集。Connect to the Apache HBase cluster using Apache HBase Java RPC APIs

创建虚拟机Create a virtual machine

将基础结构即服务 (IaaS) 虚拟机创建到相同的 Azure 虚拟网络和子网中。Create an infrastructure as a service (IaaS) virtual machine into the same Azure virtual network and the same subnet. 有关创建新 IaaS 虚拟机的说明,请参阅创建运行 Windows Server 的虚拟机For instructions on creating a new IaaS virtual machine, see Create a Virtual Machine Running Windows Server. 按照本文档中的步骤操作时,必须使用以下值进行网络配置:When following the steps in this document, you must use the following values for the Network configuration:

  • 虚拟网络:CLUSTERNAME-vnetVirtual network: CLUSTERNAME-vnet
  • 子网:subnet1Subnet: subnet1

重要

CLUSTERNAME 替换为在先前步骤中创建 HDInsight 群集时使用的名称。Replace CLUSTERNAME with the name you used when creating the HDInsight cluster in previous steps.

使用这些值可将虚拟机放置在与 HDInsight 群集相同的虚拟网络和子网中。Using these values, the virtual machine is placed in the same virtual network and subnet as the HDInsight cluster. 此配置让它们能够直接相互通信。This configuration allows them to directly communicate with each other. 有一种方法可使用空的边缘节点创建 HDInsight 群集。There is a way to create an HDInsight cluster with an empty edge node. 该边缘节点可用于管理群集。The edge node can be used to manage the cluster. 有关详细信息,请参阅在 HDInsight 中使用空边缘节点For more information, see Use empty edge nodes in HDInsight.

获取完全限定的域名Obtain fully qualified domain name

使用 Java 应用程序远程连接到 HBase 时,必须使用完全限定的域名 (FQDN)。When using a Java application to connect to HBase remotely, you must use the fully qualified domain name (FQDN). 要确定这一点,必须获取 HBase 群集的连接特定的 DNS 后缀。To determine this, you must get the connection-specific DNS suffix of the HBase cluster. 为此,可以使用以下方法之一:To do that, you can use one of the following methods:

<span data-ttu-id="faf7b-176">浏览到 `https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/hosts?minimal_response=true`。</span><span class="sxs-lookup"><span data-stu-id="faf7b-176">Browse to `https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/hosts?minimal_response=true`.</span></span> <span data-ttu-id="faf7b-177">它会返回带 DNS 后缀的 JSON 文件。</span><span class="sxs-lookup"><span data-stu-id="faf7b-177">It returns a JSON file with the DNS suffixes.</span></span>
  • 使用 Ambari 网站:Use the Ambari website:

    1. 浏览到 https://CLUSTERNAME.azurehdinsight.cnBrowse to https://CLUSTERNAME.azurehdinsight.cn.
    2. 在顶部菜单中选择“主机” 。Select Hosts from the top menu.
  • 使用 Curl 发出 REST 调用:Use Curl to make REST calls:

    curl -u <username>:<password> -k https://<clustername>.azurehdinsight.cn/ambari/api/v1/clusters/<clustername>.azurehdinsight.cn/services/hbase/components/hbrest
    

    在返回的 JavaScript 对象表示法 (JSON) 数据中,找到“host_name”条目。In the JavaScript Object Notation (JSON) data returned, find the "host_name" entry. 此条目包含群集中的节点的 FQDN。It contains the FQDN for the nodes in the cluster. 例如:For example:

"host_name" : "hn0-hbaseg.hjfrnszlumfuhfk4pi1guh410c.bx.internal.chinacloudapp.cn"

以群集名称开头的域名的部分是 DNS 后缀。The portion of the domain name beginning with the cluster name is the DNS suffix. 例如,hjfrnszlumfuhfk4pi1guh410c.bx.internal.chinacloudapp.cnFor example, hjfrnszlumfuhfk4pi1guh410c.bx.internal.chinacloudapp.cn.

验证虚拟网络内的通信Verify communication inside virtual network

要验证虚拟机是否可与 HBase 群集进行通信,请从虚拟机使用 ping headnode0.<dns suffix> 命令。To verify that the virtual machine can communicate with the HBase cluster, use the command ping headnode0.<dns suffix> from the virtual machine. 例如,ping hn0-hbaseg.hjfrnszlumfuhfk4pi1guh410c.bx.internal.chinacloudapp.cnFor example, ping hn0-hbaseg.hjfrnszlumfuhfk4pi1guh410c.bx.internal.chinacloudapp.cn.

要在 Java 应用程序中使用此信息,可以按照使用 Apache Maven 构建将 Apache HBase 与 HDInsight (Hadoop) 配合使用的 Java 应用程序中的步骤来创建应用程序。To use this information in a Java application, you can follow the steps in Use Apache Maven to build Java applications that use Apache HBase with HDInsight (Hadoop) to create an application. 若要让应用程序连接到远程 HBase 服务器,请修改本示例中的 hbase-site.xml 文件,以对 Zookeeper 使用 FQDN。To have the application connect to a remote HBase server, modify the hbase-site.xml file in this example to use the FQDN for Zookeeper. 例如:For example:

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>zookeeper0.<dns suffix>,zookeeper1.<dns suffix>,zookeeper2.<dns suffix></value>
</property>

备注

有关 Azure 虚拟网络中的名称解析的详细信息,包括如何使用自己的 DNS 服务器,请参阅名称解析 (DNS)For more information about name resolution in Azure virtual networks, including how to use your own DNS server, see Name Resolution (DNS).

后续步骤Next steps

在本文中,你已学习了如何创建 Apache HBase 群集。In this article, you learned how to create an Apache HBase cluster. 若要了解更多信息,请参阅以下文章:To learn more, see: