规划 Azure HDInsight 的虚拟网络Plan a virtual network for Azure HDInsight

本文提供有关将 Azure 虚拟网络 (VNet) 与 Azure HDInsight 配合使用的背景信息。This article provides background information on using Azure Virtual Networks (VNets) with Azure HDInsight. 其中介绍了在为 HDInsight 群集实施虚拟网络之前必须做出的设计和实施决策。It also discusses design and implementation decisions that must be made before you can implement a virtual network for your HDInsight cluster. 规划阶段完成后,可以继续为 Azure HDInsight 群集创建虚拟网络Once the planning phase is finished, you can proceed to Create virtual networks for Azure HDInsight clusters. 有关正确配置网络安全组 (NSG) 和用户定义的路由所需的 HDInsight 管理 IP 地址的详细信息,请参阅 HDInsight 管理 IP 地址For more information on HDInsight management IP addresses that are needed to properly configure network security groups (NSGs) and user-defined routes, see HDInsight management IP addresses.

使用 Azure 虚拟网络可以实现以下方案:Using an Azure Virtual Network enables the following scenarios:

  • 直接从本地网络连接到 HDInsight。Connecting to HDInsight directly from an on-premises network.
  • 在 Azure 虚拟网络中将 HDInsight 连接到数据存储。Connecting HDInsight to data stores in an Azure Virtual network.
  • 直接访问无法通过 Internet 公开访问的 Apache Hadoop 服务。Directly accessing Apache Hadoop services that aren't available publicly over the internet. 例如,Apache Kafka API 或 Apache HBase Java API。For example, Apache Kafka APIs or the Apache HBase Java API.

重要

在 VNET 中创建 HDInsight 群集时会创建多个网络资源,例如 NIC 和负载均衡器。Creating an HDInsight cluster in a VNET will create several networking resources, such as NICs and load balancers. 删除这些网络资源,因为群集需要它们才能在 VNET 中正常运行。Do not delete these networking resources, as they are needed for your cluster to function correctly with the VNET.

2019 年 2 月 28 日以后,在 VNET 中创建的新 HDInsight 群集的网络资源(例如 NIC、LB 等)会在同一 HDInsight 群集资源组中进行预配。After Feb 28, 2019, the networking resources (such as NICs, LBs, etc) for NEW HDInsight clusters created in a VNET will be provisioned in the same HDInsight cluster resource group. 以前,这些资源在 VNET 资源组中预配。Previously, these resources were provisioned in the VNET resource group. 当前运行的群集以及那些在没有 VNET 的情况下创建的群集没有任何更改。There is no change to the current running clusters and those clusters created without a VNET.

规划Planning

下面是计划在虚拟网络中安装 HDInsight 时必须回答的问题:The following are the questions that you must answer when planning to install HDInsight in a virtual network:

  • 是否需要将 HDInsight 安装到现有的虚拟网络?Do you need to install HDInsight into an existing virtual network? 或者,你是否在创建新的网络?Or are you creating a new network?

    如果使用现有的虚拟网络,则可能需要修改网络配置,然后才能安装 HDInsight。If you are using an existing virtual network, you may need to modify the network configuration before you can install HDInsight. 有关详细信息,请参阅将 HDInsight 添加到现有的虚拟网络部分。For more information, see the add HDInsight to an existing virtual network section.

  • 是否需要将包含 HDInsight 的虚拟网络连接到另一个虚拟网络或本地网络?Do you want to connect the virtual network containing HDInsight to another virtual network or your on-premises network?

    若要轻松地跨网络使用资源,可能需要创建自定义 DNS 并配置 DNS 转发。To easily work with resources across networks, you may need to create a custom DNS and configure DNS forwarding. 有关详细信息,请参阅连接多个网络部分。For more information, see the connecting multiple networks section.

  • 是否需要限制/重定向 HDInsight 的入站或出站流量?Do you want to restrict/redirect inbound or outbound traffic to HDInsight?

    HDInsight 必须与 Azure 数据中心的特定 IP 地址进行不受限制的通信。HDInsight must have unrestricted communication with specific IP addresses in the Azure data center. 此外还必须通过防火墙启用数个端口,以便进行客户端通信。There are also several ports that must be allowed through firewalls for client communication. 有关详细信息,请参阅控制网络流量For more information, see Control network traffic.

将 HDInsight 添加到现有的虚拟网络Add HDInsight to an existing virtual network

使用本部分中的步骤,了解如何将 HDInsight 添加到现有 Azure 虚拟网络。Use the steps in this section to discover how to add a new HDInsight to an existing Azure Virtual Network.

备注

无法将现有 HDInsight 群集添加到虚拟网络中。You cannot add an existing HDInsight cluster into a virtual network.

  1. 是否对虚拟网络使用经典或 Resource Manager 部署模型?Are you using a classic or Resource Manager deployment model for the virtual network?

    HDInsight 3.4 及更高版本需要 Resource Manager 虚拟网络。HDInsight 3.4 and greater requires a Resource Manager virtual network. 早期版本的 HDInsight 要求使用经典虚拟网络。Earlier versions of HDInsight required a classic virtual network.

    如果现有网络是经典虚拟网络,则必须创建 Resource Manager 虚拟网络,然后将这两种网络连接起来。If your existing network is a classic virtual network, then you must create a Resource Manager virtual network and then connect the two. 将经典 VNet 连接到新的 VNetConnecting classic VNets to new VNets.

    连接后,安装在 Resource Manager 网络中的 HDInsight 即可与经典网络中的资源交互。Once joined, HDInsight installed in the Resource Manager network can interact with resources in the classic network.

  2. 是否使用网络安全组、用户定义路由或虚拟网络设备来限制进出虚拟网络的流量?Do you use network security groups, user-defined routes, or Virtual Network Appliances to restrict traffic into or out of the virtual network?

    作为托管服务,HDInsight 要求对 Azure 数据中心的多个 IP 地址进行不受限制的访问。As a managed service, HDInsight requires unrestricted access to several IP addresses in the Azure data center. 请更新任何现有的网络安全组或用户定义路由,以便与这些 IP 地址通信。To allow communication with these IP addresses, update any existing network security groups or user-defined routes.

    HDInsight 托管多个服务,这些服务使用不同的端口。HDInsight hosts multiple services, which use a variety of ports. 不要阻止流向这些端口的流量。Do not block traffic to these ports. 有关虚拟设备防火墙的允许端口列表,请参阅“安全”一节。For a list of ports to allow through virtual appliance firewalls, see the Security section.

    若要查找现有的安全配置,请使用以下 Azure PowerShell 或 Azure CLI 命令:To find your existing security configuration, use the following Azure PowerShell or Azure CLI commands:

    • 网络安全组Network security groups

      RESOURCEGROUP 替换为包含虚拟网络的资源组的名称,然后输入命令:Replace RESOURCEGROUP with the name of the resource group that contains the virtual network, and then enter the command:

      Get-AzNetworkSecurityGroup -ResourceGroupName  "RESOURCEGROUP"
      
      az network nsg list --resource-group RESOURCEGROUP
      

      有关详细信息,请参阅排查网络安全组问题文档。For more information, see the Troubleshoot network security groups document.

      重要

      网络安全组规则按规则优先级顺序应用。Network security group rules are applied in order based on rule priority. 将应用与流量模式匹配的第一条规则,而不应用该流量的其他规则。The first rule that matches the traffic pattern is applied, and no others are applied for that traffic. 权限级别从最高到最低排列的顺序规则。Order rules from most permissive to least permissive. 有关详细信息,请参阅使用网络安全组筛选网络流量文档。For more information, see the Filter network traffic with network security groups document.

    • 用户定义路由User-defined routes

      RESOURCEGROUP 替换为包含虚拟网络的资源组的名称,然后输入命令:Replace RESOURCEGROUP with the name of the resource group that contains the virtual network, and then enter the command:

      Get-AzRouteTable -ResourceGroupName "RESOURCEGROUP"
      
      az network route-table list --resource-group RESOURCEGROUP
      

      有关详细信息,请参阅排查路由问题文档。For more information, see the Troubleshoot routes document.

  3. 创建一个 HDInsight 群集,并在配置过程中选择 Azure 虚拟网络。Create an HDInsight cluster and select the Azure Virtual Network during configuration. 使用以下文档中的步骤了解群集创建过程:Use the steps in the following documents to understand the cluster creation process:

    重要

    向虚拟网络添加 HDInsight 是一项可选的配置步骤。Adding HDInsight to a virtual network is an optional configuration step. 请确保在配置群集时选择虚拟网络。Be sure to select the virtual network when configuring the cluster.

连接多个网络Connecting multiple networks

多网络配置的最大难题是在网络之间进行名称解析。The biggest challenge with a multi-network configuration is name resolution between the networks.

Azure 为安装在虚拟网络中的 Azure 服务提供名称解析。Azure provides name resolution for Azure services that are installed in a virtual network. 此内置的名称解析功能允许 HDInsight 使用完全限定的域名 (FQDN) 连接到以下资源:This built-in name resolution allows HDInsight to connect to the following resources by using a fully qualified domain name (FQDN):

  • 在 Internet 上提供的任何资源。Any resource that is available on the internet. 例如,microsoft.com、windowsupdate.com。For example, microsoft.com, windowsupdate.com.

  • 同一 Azure 虚拟网络中能够使用资源的内部 DNS 名称 连接的的任何资源。Any resource that is in the same Azure Virtual Network, by using the internal DNS name of the resource. 例如,在使用默认的名称解析时,下面是分配给 HDInsight 工作器节点的内部 DNS 名称示例:For example, when using the default name resolution, the following are examples of internal DNS names assigned to HDInsight worker nodes:

    • wn0-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cnwn0-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cn

    • wn2-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cnwn2-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cn

      这两个节点均可使用内部 DNS 名称直接相互通信,以及与 HDInsight 中的其他节点通信。Both these nodes can communicate directly with each other, and other nodes in HDInsight, by using internal DNS names.

默认名称解析不允许 HDInsight 解析连接到虚拟网络的网络中的资源的名称。The default name resolution does not allow HDInsight to resolve the names of resources in networks that are joined to the virtual network. 例如,将本地网络加入虚拟网络很常见。For example, it is common to join your on-premises network to the virtual network. 仅使用默认的名称解析时,HDInsight 不能通过名称访问本地网络中的资源。With only the default name resolution, HDInsight cannot access resources in the on-premises network by name. 反过来也是这样,本地网络中的资源不能通过名称访问虚拟网络中的资源。The opposite is also true, resources in your on-premises network cannot access resources in the virtual network by name.

警告

必须在创建 HDInsight 群集之前,先创建自定义 DNS 服务器并将虚拟网络配置为使用该服务器。You must create the custom DNS server and configure the virtual network to use it before creating the HDInsight cluster.

若要在虚拟网络和已连接网络中的资源之间启用名称解析,必须执行以下操作:To enable name resolution between the virtual network and resources in joined networks, you must perform the following actions:

  1. 在计划安装 HDInsight 的 Azure 虚拟网络中创建自定义 DNS 服务器。Create a custom DNS server in the Azure Virtual Network where you plan to install HDInsight.

  2. 将虚拟网络配置为使用自定义 DNS 服务器。Configure the virtual network to use the custom DNS server.

  3. 为虚拟网络找出 Azure 分配的 DNS 后缀。Find the Azure assigned DNS suffix for your virtual network. 该值类似于 0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cnThis value is similar to 0owcbllr5hze3hxdja3mqlrhhe.ex.internal.chinacloudapp.cn. 有关查找 DNS 后缀的信息,请参阅示例:自定义 DNS 部分。For information on finding the DNS suffix, see the Example: Custom DNS section.

  4. 配置 DNS 服务器之间的转发。Configure forwarding between the DNS servers. 此配置取决于远程网络的类型。The configuration depends on the type of remote network.

    • 如果远程网络为本地网络,请将 DNS 配置如下:If the remote network is an on-premises network, configure DNS as follows:

      • 自定义 DNS(位于虚拟网络中):Custom DNS (in the virtual network):

        • 将针对虚拟网络 DNS 后缀的请求转发到 Azure 递归解析程序 (168.63.129.16)。Forward requests for the DNS suffix of the virtual network to the Azure recursive resolver (168.63.129.16). Azure 处理虚拟网络中资源的请求Azure handles requests for resources in the virtual network

        • 将所有其他请求转发到本地 DNS 服务器。Forward all other requests to the on-premises DNS server. 本地 DNS 处理所有其他的名称解析请求,甚至包括 Internet 资源(例如 Microsoft.com)的请求。The on-premises DNS handles all other name resolution requests, even requests for internet resources such as Microsoft.com.

      • 本地 DNS:将虚拟网络 DNS 后缀的请求转发到自定义 DNS 服务器。On-premises DNS: Forward requests for the virtual network DNS suffix to the custom DNS server. 然后,自定义 DNS 服务器转发给 Azure 递归解析程序。The custom DNS server then forwards to the Azure recursive resolver.

        此配置将完全限定的域名(其中包含虚拟网络的 DNS 后缀)的请求路由到自定义 DNS 服务器。This configuration routes requests for fully qualified domain names that contain the DNS suffix of the virtual network to the custom DNS server. 所有其他请求(甚至包括对公共 Internet 地址的请求)由本地 DNS 服务器处理。All other requests (even for public internet addresses) are handled by the on-premises DNS server.

    • 如果远程网络为另一 Azure 虚拟网络,请将 DNS 配置如下:If the remote network is another Azure Virtual Network, configure DNS as follows:

      • 自定义 DNS(位于每个虚拟网络中):Custom DNS (in each virtual network):

        • 对虚拟网络 DNS 后缀的请求将转发到自定义 DNS 服务器。Requests for the DNS suffix of the virtual networks are forwarded to the custom DNS servers. 每个虚拟网络中的 DNS 负责解析其网络中的资源。The DNS in each virtual network is responsible for resolving resources within its network.

        • 将所有其他请求转发到 Azure 递归解析程序。Forward all other requests to the Azure recursive resolver. 递归解析程序负责解析本地资源和 Internet 资源。The recursive resolver is responsible for resolving local and internet resources.

        每个网络的 DNS 服务器根据 DNS 后缀将请求转发到其他服务器。The DNS server for each network forwards requests to the other, based on DNS suffix. 其他请求使用 Azure 递归解析程序进行解析。Other requests are resolved using the Azure recursive resolver.

      有关每个配置的示例,请参阅示例:自定义 DNS 部分。For an example of each configuration, see the Example: Custom DNS section.

有关详细信息,请参阅 VM 和角色实例的名称解析文档。For more information, see the Name Resolution for VMs and Role Instances document.

直接连接到 Apache Hadoop 服务Directly connect to Apache Hadoop services

可以通过 https://CLUSTERNAME.azurehdinsight.cn 连接到该群集。You can connect to the cluster at https://CLUSTERNAME.azurehdinsight.cn. 此地址使用公共 IP,如果已使用 NSG 来限制来自 Internet 的传入流量,则可能无法访问此地址。This address uses a public IP, which may not be reachable if you have used NSGs to restrict incoming traffic from the internet. 此外,在 VNet 中部署群集时,可以使用专用终结点 https://CLUSTERNAME-int.azurehdinsight.cn 访问它。Additionally, when you deploy the cluster in a VNet you can access it using the private endpoint https://CLUSTERNAME-int.azurehdinsight.cn. 此终结点可解析为 VNet 中的专用 IP,以进行群集访问。This endpoint resolves to a private IP inside the VNet for cluster access.

若要通过虚拟网络连接到 Apache Ambari 以及其他网页,请使用以下步骤:To connect to Apache Ambari and other web pages through the virtual network, use the following steps:

  1. 若要发现 HDInsight 群集节点的内部完全限定的域名 (FQDN),请使用以下方法之一:To discover the internal fully qualified domain names (FQDN) of the HDInsight cluster nodes, use one of the following methods:

    RESOURCEGROUP 替换为包含虚拟网络的资源组的名称,然后输入命令:Replace RESOURCEGROUP with the name of the resource group that contains the virtual network, and then enter the command:

    $clusterNICs = Get-AzNetworkInterface -ResourceGroupName "RESOURCEGROUP" | where-object {$_.Name -like "*node*"}
    
    $nodes = @()
    foreach($nic in $clusterNICs) {
        $node = new-object System.Object
        $node | add-member -MemberType NoteProperty -name "Type" -value $nic.Name.Split('-')[1]
        $node | add-member -MemberType NoteProperty -name "InternalIP" -value $nic.IpConfigurations.PrivateIpAddress
        $node | add-member -MemberType NoteProperty -name "InternalFQDN" -value $nic.DnsSettings.InternalFqdn
        $nodes += $node
    }
    $nodes | sort-object Type
    
    az network nic list --resource-group RESOURCEGROUP --output table --query "[?contains(name,'node')].{NICname:name,InternalIP:ipConfigurations[0].privateIpAddress,InternalFQDN:dnsSettings.internalFqdn}"
    

    在返回的节点列表中,查找头节点的 FQDN,并使用这些 FQDN 连接到 Ambari 和其他 Web 服务。In the list of nodes returned, find the FQDN for the head nodes and use the FQDNs to connect to Ambari and other web services. 例如,使用 http://<headnode-fqdn>:8080 访问 Ambari。For example, use http://<headnode-fqdn>:8080 to access Ambari.

    重要

    托管在头节点上的某些服务一次只能在一个节点上处于活动状态。Some services hosted on the head nodes are only active on one node at a time. 如果尝试在一个头节点上访问服务并且它返回 404 错误,请切换到其他头节点。If you try accessing a service on one head node and it returns a 404 error, switch to the other head node.

  2. 若要确定服务可用的节点和端口,请参阅 HDInsight 的 Hadoop 服务所用的端口一文。To determine the node and port that a service is available on, see the Ports used by Hadoop services on HDInsight document.

负载均衡Load balancing

创建 HDInsight 群集时,也会创建一个负载均衡器。When you create an HDInsight cluster, a load balancer is created as well. 此负载均衡器的类型在基本 SKU 级别,该级别有某些约束。The type of this load balancer is at the basic SKU level, which has certain constraints. 这些约束中的一个是:如果两个虚拟网络位于不同的区域,则无法连接到基本负载均衡器。One of these constraints is that if you have two virtual networks in different regions, you cannot connect to basic load balancers. 有关详细信息,请参阅虚拟网络常见问题解答:对全局 VNet 对等互连的约束See virtual networks FAQ: constraints on global vnet peering, for more information.

后续步骤Next steps