Azure HDInsight 虚拟网络体系结构Azure HDInsight virtual network architecture

本文介绍在将 HDInsight 群集部署到自定义的 Azure 虚拟网络中时提供的资源。This article explains the resources that are present when you deploy an HDInsight cluster into a custom Azure Virtual Network. 此信息可帮助你将本地资源连接到 Azure 中的 HDInsight 群集。This information will help you to connect on-premises resources to your HDInsight cluster in Azure. 有关 Azure 虚拟网络的详细信息,请参阅什么是 Azure 虚拟网络?For more information on Azure Virtual Networks, see What is Azure Virtual Network?.

Azure HDInsight 群集中的资源类型Resource types in Azure HDInsight clusters

Azure HDInsight 群集包含不同类型的虚拟机(或节点)。Azure HDInsight clusters have different types of virtual machines, or nodes. 每个节点类型对于系统的正常运行发挥着相应的作用。Each node type plays a role in the operation of the system. 下表汇总了这些节点类型及其在群集中的作用。The following table summarizes these node types and their roles in the cluster.

类型Type 说明Description
头节点Head node 对于除 Apache Storm 以外的所有群集类型,头节点托管用于管理分布式应用程序的执行的进程。For all cluster types except Apache Storm, the head nodes host the processes that manage execution of the distributed application. 头节点也是可以通过 SSH 连接到其中并执行应用程序的节点。连接后,系统会协调这些应用程序,使其可在不同的群集资源上运行。The head node is also the node that you can SSH into and execute applications that are then coordinated to run across the cluster resources. 所有群集类型的头节点数目固定为 2 个。The number of head nodes is fixed at two for all cluster types.
ZooKeeper 节点ZooKeeper node ZooKeeper 协调执行数据处理的节点之间的任务。Zookeeper coordinates tasks between the nodes that are doing data processing. 它还执行头节点的主控选择,并跟踪哪个头节点正在运行特定的主服务。It also does leader election of the head node, and keeps track of which head node is running a specific master service. ZooKeeper 节点数目固定为 3 个。The number of ZooKeeper nodes is fixed at three.
工作器节点Worker node 表示支持数据处理功能的节点。Represents the nodes that support data processing functionality. 可以在群集中添加或删除工作器节点,以缩放计算能力和控制成本。Worker nodes can be added or removed from the cluster to scale computing capability and manage costs.
区域节点Region node 对于 HBase 群集类型,区域节点(也称为数据节点)将运行区域服务器。For the HBase cluster type, the region node (also referred to as a Data Node) runs the Region Server. 区域服务器为 HBase 管理的一部分数据提供服务,并对其进行管理。Region Servers serve and manage a portion of the data managed by HBase. 可以在群集中添加或删除区域节点,以缩放计算能力和控制成本。Region nodes can be added or removed from the cluster to scale computing capability and manage costs.
Nimbus 节点Nimbus node 对于 Storm 群集类型,Nimbus 节点提供类似于头节点的功能。For the Storm cluster type, the Nimbus node provides functionality similar to the Head node. Nimbus 节点通过 Zookeeper 将任务分配给群集中的其他节点,Zookeeper 协调 Storm 拓扑的运行。The Nimbus node assigns tasks to other nodes in a cluster through Zookeeper, which coordinates the running of Storm topologies.
监督器节点Supervisor node 对于 Storm 群集类型,监督器节点执行 Nimbus 节点所提供的指令以进行处理。For the Storm cluster type, the supervisor node executes the instructions provided by the Nimbus node to do the processing.

资源命名约定Resource naming conventions

在对群集中的节点进行寻址时,使用完全限定的域名 (FQDN)。Use Fully Qualified Domain Names (FQDNs) when addressing nodes in your cluster. 可以使用 Ambari API 获取群集中各种节点类型的 FQDN。You can get the FQDNs for various node types in your cluster using the Ambari API.

这些 FQDN 的格式为 <node-type-prefix><instance-number>-<abbreviated-clustername>.<unique-identifier>.cx.internal.chinacloudapp.cnThese FQDNs will be of the form <node-type-prefix><instance-number>-<abbreviated-clustername>.<unique-identifier>

对于头节点,<node-type-prefix> 为“hn”,对于工作节点为“wn”,对于 zookeeper 节点为“zn”。The <node-type-prefix> will be hn for headnodes, wn for worker nodes and zn for zookeeper nodes.

如果只需要主机名,则仅使用 FQDN 的第一部分:<node-type-prefix><instance-number>-<abbreviated-clustername>If you need just the host name, use only the first part of the FQDN: <node-type-prefix><instance-number>-<abbreviated-clustername>

基本虚拟网络资源Basic virtual network resources

下图显示了 HDInsight 节点和网络资源在 Azure 中的位置。The following diagram shows the placement of HDInsight nodes and network resources in Azure.

在 Azure 自定义 VNET 中创建的 HDInsight 实体示意图

Azure 虚拟网络中的默认资源包括上表中提到的群集节点类型。The default resources in an Azure Virtual Network include the cluster node types mentioned in the previous table. 同时包括支持虚拟网络和外部网络之间的通信的网络设备。And network devices that support communication between the virtual network and outside networks.

下表汇总了将 HDInsight 部署到自定义 Azure 虚拟网络时创建的 9 个群集节点。The following table summarizes the nine cluster nodes created when HDInsight is deployed into a custom Azure Virtual Network.

资源类型Resource type 提供的数量Number present 详细信息Details
头节点Head node twotwo
Zookeeper 节点Zookeeper node threethree
工作器节点Worker node twotwo 此数字根据群集的配置和规模而异。This number can vary based on cluster configuration and scaling. Apache Kafka 至少需要 3 个工作器节点。A minimum of three worker nodes is needed for Apache Kafka.
网关节点Gateway node twotwo 网关节点是在 Azure 中创建的、但不会在订阅中显示的 Azure 虚拟机。Gateway nodes are Azure virtual machines that are created on Azure, but aren't visible in your subscription. 如果需要重新启动这些节点,请联系支持人员。Contact support if you need to reboot these nodes.

在与 HDInsight 配合使用的虚拟网络中,会自动创建以下网络资源:The following network resources present are automatically created inside the virtual network used with HDInsight:

网络资源Networking resource 提供的数量Number present 详细信息Details
负载均衡器Load balancer threethree
网络接口Network Interfaces 9 个nine 此值基于普通群集,在此类群集中,每个节点具有自身的网络接口。This value is based on a normal cluster, where each node has its own network interface. 9 个接口分别用于上表中所述的 2 个头节点、3 个 ZooKeeper 节点、2 个工作器节点和 2 个网关节点。The nine interfaces are for: two head nodes, three zookeeper nodes, two worker nodes, and two gateway nodes mentioned in the previous table.
公共 IP 地址Public IP Addresses twotwo

用于连接 HDInsight 的终结点Endpoints for connecting to HDInsight

可通过三种方式访问 HDInsight 群集:You can access your HDInsight cluster in three ways:

  • 虚拟网络 ( 外部的 HTTPS 终结点。An HTTPS endpoint outside of the virtual network at
  • 直接连接到位于 中的头节点的 SSH 终结点。An SSH endpoint for directly connecting to the headnode at
  • 虚拟网络 ( 内部的 HTTPS 终结点。An HTTPS endpoint within the virtual network 请注意此 URL 中的“-int”。Notice the "-int" in this URL. 此终结点将解析为该虚拟网络中的专用 IP,无法从公共 Internet 访问。This endpoint will resolve to a private IP in that virtual network and isn't accessible from the public internet.

在这 3 个终结点中,每个终结点分配有一个负载均衡器。These three endpoints are each assigned a load balancer.

此外,将为 2 个终结点提供公共 IP 地址,以便从虚拟网络外部进行连接。Public IP addresses are also provided to the two endpoints that allow connection from outside the virtual network.

  1. 将为负载均衡器分配 1 个公共 IP,以便从 Internet 连接到群集时使用完全限定的域名 (FQDN)。One public IP is assigned to the load balancer for the fully qualified domain name (FQDN) to use when connecting to the cluster from the internet
  2. 第二个公共 IP 地址用于仅限 SSH 的域名 CLUSTERNAME-ssh.azurehdinsight.cnThe second public IP address is used for the SSH only domain name

后续步骤Next steps