将 HDInsight 连接到本地网络Connect HDInsight to your on-premises network

了解如何使用 Azure 虚拟网络和 VPN 网关将 HDInsight 连接到本地网络。Learn how to connect HDInsight to your on-premises network by using Azure Virtual Networks and a VPN gateway. 本文档提供以下相关规划信息:This document provides planning information on:

  • 在连接到本地网络的 Azure 虚拟网络上使用 HDInsight。Using HDInsight in an Azure Virtual Network that connects to your on-premises network.

  • 配置虚拟网络与本地网络之间的 DNS 名称解析。Configuring DNS name resolution between the virtual network and your on-premises network.

  • 将网络安全组配置为限制对 HDInsight 的 Internet 访问。Configuring network security groups to restrict internet access to HDInsight.

  • 虚拟网络上由 HDInsight 提供的端口。Ports provided by HDInsight on the virtual network.

概述Overview

若要让 HDInsight 和已连接网络中的资源通过名称通信,必须执行以下操作:To allow HDInsight and resources in the joined network to communicate by name, you must perform the following actions:

  • 创建 Azure 虚拟网络。Create Azure Virtual Network.

  • 在 Azure 虚拟网络中创建自定义 DNS 服务器。Create a custom DNS server in the Azure Virtual Network.

  • 将虚拟网络配置为使用自定义 DNS 服务器而非默认的 Azure 递归解析程序。Configure the virtual network to use the custom DNS server instead of the default Azure Recursive Resolver.

  • 配置自定义 DNS 服务器和本地 DNS 服务器之间的转发。Configure forwarding between the custom DNS server and your on-premises DNS server.

该配置启用以下行为:This configuration enables the following behavior:

  • 将完全限定的域名(其中包含虚拟网络的 DNS 后缀)的请求转发到自定义 DNS 服务器。 Requests for fully qualified domain names that have the DNS suffix for the virtual network are forwarded to the custom DNS server. 自定义 DNS 服务器然后会将这些请求转发到 Azure 递归解析程序,由后者返回 IP 地址。The custom DNS server then forwards these requests to the Azure Recursive Resolver, which returns the IP address.

  • 将所有其他请求转发到本地 DNS 服务器。All other requests are forwarded to the on-premises DNS server. 甚至对公共 Internet 资源(如 microsoft.com)的请求也将因名称解析转发至本地 DNS 服务器。Even requests for public internet resources such as microsoft.com are forwarded to the on-premises DNS server for name resolution.

在下面的关系图中,绿线表示以虚拟网络的 DNS 后缀结尾的资源请求。In the following diagram, green lines are requests for resources that end in the DNS suffix of the virtual network. 蓝线表示本地网络或公共 Internet 上的资源请求。Blue lines are requests for resources in the on-premises network or on the public internet.

如何在配置中解析 DNS 请求的示意图

先决条件Prerequisites

创建虚拟网络配置Create virtual network configuration

请参阅以下文档,了解如何创建连接到本地网络的 Azure 虚拟网络:Use the following documents to learn how to create an Azure Virtual Network that is connected to your on-premises network:

创建自定义 DNS 服务器Create custom DNS server

重要

必须先创建并配置 DNS 服务器,此后再将 HDInsight 安装到虚拟网络中。You must create and configure the DNS server before installing HDInsight into the virtual network.

这些步骤通过 Azure 门户创建 Azure 虚拟机。These steps use the Azure portal to create an Azure Virtual Machine. 关于其他创建虚拟机的方法,请参阅创建 VM - Azure CLI创建 VM - Azure PowerShellFor other ways to create a virtual machine, see Create VM - Azure CLI and Create VM - Azure PowerShell. 若要创建使用 Bind DNS 软件的 Linux VM,请执行以下步骤:To create a Linux VM that uses the Bind DNS software, use the following steps:

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,导航到“+创建资源” > “计算” > “Ubuntu Server 18.04 LTS” 。From the left menu, navigate to + Create a resource > Compute > Ubuntu Server 18.04 LTS.

    创建 Ubuntu 虚拟机

  3. 在“基本信息”选项卡中输入以下信息: From the Basics tab, enter the following information:

    字段Field ValueValue
    订阅Subscription 选择相应的订阅。Select your appropriate subscription.
    资源组Resource group 选择包含此前创建的虚拟网络的资源组。Select the resource group that contains the virtual network created earlier.
    虚拟机名称Virtual machine name 输入用于标识该虚拟机的友好名称。Enter a friendly name that identifies this virtual machine. 本示例使用 DNSProxyThis example uses DNSProxy.
    区域Region 选择与此前创建的虚拟网络相同的区域。Select the same region as the virtual network created earlier. 并非所有 VM 大小都可在所有区域中使用。Not all VM sizes are available in all regions.
    可用性选项Availability options 选择所需的可用性级别。Select your desired level of availability. Azure 提供一系列的选项,用于管理应用程序的可用性和复原能力。Azure offers a range of options for managing availability and resiliency for your applications. 将解决方案构建为使用可用性区域或可用性集中的已复制 VM,使应用和数据免受事件中心中断和维护事件的影响。Architect your solution to use replicated VMs in Availability Zones or Availability Sets to protect your apps and data from datacenter outages and maintenance events. 此示例使用“不需要基础结构冗余” 。This example uses No infrastructure redundancy required.
    映像Image 保留“Ubuntu Server 18.04 LTS” 。Leave at Ubuntu Server 18.04 LTS.
    身份验证类型Authentication type __密码__或 SSH 公钥:SSH 帐户的身份验证方法。Password or SSH public key: The authentication method for the SSH account. 建议使用公钥,因为更安全。We recommend using public keys, as they are more secure. 本示例使用密码This example uses Password. 有关详细信息,请参阅为 Linux VM 创建和使用 SSH 密钥文档。For more information, see the Create and use SSH keys for Linux VMs document.
    用户名User name 输入 VM 的管理员用户名。Enter the administrator username for the VM. 本示例使用 sshuserThis example uses sshuser.
    密码或 SSH 公钥Password or SSH public key 可用字段取决于针对“身份验证类型”所做的选择。 The available field is determined by your choice for Authentication type. 输入相应的值。Enter the appropriate value.
    公共入站端口Public inbound ports 选择“允许所选端口” 。Select Allow selected ports. 然后从“选择入站端口” 下拉列表中选择“SSH (22)” 。Then select SSH (22) from the Select inbound ports drop-down list.

    虚拟机基本配置

    将其他项保留为默认值,然后选择“网络”选项卡 。Leave other entries at the default values and then select the Networking tab.

  4. 在“网络”选项卡中,输入以下信息: From the Networking tab, enter the following information:

    字段Field Value
    虚拟网络Virtual network 选择此前创建的虚拟网络。Select the virtual network that you created earlier.
    子网Subnet 选择前面创建的虚拟网络的默认子网。Select the default subnet for the virtual network that you created earlier. 请勿选择 VPN 网关使用的子网。Do not select the subnet used by the VPN gateway.
    公共 IPPublic IP 使用自动填充的值。Use the autopopulated value.

    HDInsight 虚拟网络设置

    将其他项保留为默认值,然后选择“查看 + 创建” 。Leave other entries at the default values and then select the Review + create.

  5. 在“查看 + 创建” 选项卡中,选择“创建” 以创建虚拟机。From the Review + create tab, select Create to create the virtual machine.

查看 IP 地址Review IP Addresses

创建虚拟机后,会收到“部署成功”的通知,该通知附带一个“转到资源”按钮 。Once the virtual machine has been created, you will receive a Deployment succeeded notification with a Go to resource button. 选择“转到资源” ,转到新的虚拟机。Select Go to resource to go to your new virtual machine. 在新虚拟机的默认视图中,按照以下步骤确定关联的 IP 地址:From the default view for your new virtual machine, follow these steps to identify the associated IP Addresses:

  1. 在“设置”中,选择“属性” 。From Settings, select Properties.

  2. 记下“公共 IP 地址/DNS 名称标签”和“专用 IP 地址”的值供以后使用。 Note the values for PUBLIC IP ADDRESS/DNS NAME LABEL and PRIVATE IP ADDRESS for later use.

    公共和专用 IP 地址

安装和配置 Bind(DNS 软件)Install and configure Bind (DNS software)

  1. 使用 SSH 连接到虚拟机的公共 IP 地址。 Use SSH to connect to the public IP address of the virtual machine. sshuser 替换为创建 VM 时指定的 SSH 用户帐户。Replace sshuser with the SSH user account you specified when creating the VM. 以下示例连接到位于 40.68.254.142 的虚拟机:The following example connects to a virtual machine at 40.68.254.142:

    ssh sshuser@40.68.254.142
    
  2. 若要安装 Bind,请通过 SSH 会话使用以下命令:To install Bind, use the following commands from the SSH session:

    sudo apt-get update -y
    sudo apt-get install bind9 -y
    
  3. 若要配置 Bind,以便将名称解析请求转发到本地 DNS 服务器,请使用以下文本作为 /etc/bind/named.conf.options 文件的内容:To configure Bind to forward name resolution requests to your on premises DNS server, use the following text as the contents of the /etc/bind/named.conf.options file:

     acl goodclients {
         10.0.0.0/16; # Replace with the IP address range of the virtual network
         10.1.0.0/16; # Replace with the IP address range of the on-premises network
         localhost;
         localnets;
     };
    
     options {
             directory "/var/cache/bind";
    
             recursion yes;
    
             allow-query { goodclients; };
    
             forwarders {
             192.168.0.1; # Replace with the IP address of the on-premises DNS server
             };
    
             dnssec-validation auto;
    
             auth-nxdomain no;    # conform to RFC1035
             listen-on { any; };
     };
    

    重要

    goodclients 节中的值替换为虚拟网络和本地网络的 IP 地址范围。Replace the values in the goodclients section with the IP address range of the virtual network and on-premises network. 此节定义此 DNS 服务器从其接受请求的地址。This section defines the addresses that this DNS server accepts requests from.

    forwarders 节中的 192.168.0.1 项替换为本地 DNS 服务器的 IP 地址。Replace the 192.168.0.1 entry in the forwarders section with the IP address of your on-premises DNS server. 此项将 DNS 请求路由到本地 DNS 服务器进行解析。This entry routes DNS requests to your on-premises DNS server for resolution.

    若要编辑该文件,请使用以下命令:To edit this file, use the following command:

    sudo nano /etc/bind/named.conf.options
    

    若要保存文件,请使用 Ctrl+X、Y,然后按 Enter。 To save the file, use Ctrl+X, Y, and then Enter.

  4. 在 SSH 会话中使用以下命令:From the SSH session, use the following command:

    hostname -f
    

    此命令返回类似于以下文本的值:This command returns a value similar to the following text:

    dnsproxy.icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn
    

    icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn 文本是此虚拟网络的 DNS 后缀。The icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn text is the DNS suffix for this virtual network. 保存该值,因为以后会用到。Save this value, as it is used later.

  5. 若要配置 Bind,以便为虚拟网络中的资源解析 DNS 名称,请使用以下文本作为 /etc/bind/named.conf.local 文件的内容:To configure Bind to resolve DNS names for resources within the virtual network, use the following text as the contents of the /etc/bind/named.conf.local file:

     // Replace the following with the DNS suffix for your virtual network
     zone "icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn" {
         type forward;
         forwarders {168.63.129.16;}; # The Azure recursive resolver
     };
    

    重要

    必须将 icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn 替换为此前检索的 DNS 后缀。You must replace the icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn with the DNS suffix you retrieved earlier.

    若要编辑该文件,请使用以下命令:To edit this file, use the following command:

    sudo nano /etc/bind/named.conf.local
    

    若要保存文件,请使用 Ctrl+X、Y,然后按 Enter。 To save the file, use Ctrl+X, Y, and then Enter.

  6. 若要启动 Bind,请使用以下命令:To start Bind, use the following command:

    sudo service bind9 restart
    
  7. 若要验证 Bind 能否解析本地网络中资源的名称,请使用以下命令:To verify that bind can resolve the names of resources in your on-premises network, use the following commands:

    sudo apt install dnsutils
    nslookup dns.mynetwork.net 10.0.0.4
    

    重要

    dns.mynetwork.net 替换为本地网络中资源的完全限定的域名 (FQDN)。Replace dns.mynetwork.net with the fully qualified domain name (FQDN) of a resource in your on-premises network.

    10.0.0.4 替换为虚拟网络中自定义 DNS 服务器的内部 IP 地址 。Replace 10.0.0.4 with the internal IP address of your custom DNS server in the virtual network.

    显示的响应类似于以下文本:The response appears similar to the following text:

    Server:         10.0.0.4
    Address:        10.0.0.4#53
    
    Non-authoritative answer:
    Name:   dns.mynetwork.net
    Address: 192.168.0.4
    

将虚拟网络配置为使用自定义 DNS 服务器Configure virtual network to use the custom DNS server

若要配置虚拟网络以使用自定义 DNS 服务器,而不是 Azure 递归解析程序,请在 Azure 门户中使用以下步骤:To configure the virtual network to use the custom DNS server instead of the Azure recursive resolver, use the following steps from the Azure portal:

  1. 在左侧菜单中,导航到“所有服务” > “网络” > “虚拟网络” 。From the left menu, navigate to All services > Networking > Virtual networks.

  2. 从列表中选择虚拟网络,此时会打开虚拟网络的默认视图。Select your virtual network from the list, which will open the default view for your virtual network.

  3. 在默认视图中的“设置”下,选择“DNS 服务器” 。From the default view, under Settings, select DNS servers.

  4. 选择“自定义”,然后输入自定义 DNS 服务器的专用 IP 地址 。Select Custom, and enter the PRIVATE IP ADDRESS of the custom DNS server.

  5. 选择__保存__。Select Save.

    设置网络的自定义 DNS 服务器

配置本地 DNS 服务器Configure on-premises DNS server

前一部分已将自定义 DNS 服务器配置为将请求转发到本地 DNS 服务器。In the previous section, you configured the custom DNS server to forward requests to the on-premises DNS server. 接下来,必须将本地 DNS 服务器配置为将请求转发到自定义 DNS 服务器。Next, you must configure the on-premises DNS server to forward requests to the custom DNS server.

有关 DNS 服务器配置的具体步骤,请参阅 DNS 服务器软件的文档。For specific steps on how to configure your DNS server, consult the documentation for your DNS server software. 请查找配置条件转发器的步骤。 Look for the steps on how to configure a conditional forwarder.

条件转发仅转发对特定 DNS 后缀的请求。A conditional forward only forwards requests for a specific DNS suffix. 在此示例中,必须为虚拟网络的 DNS 后缀配置转发器。In this case, you must configure a forwarder for the DNS suffix of the virtual network. 对此后缀的请求应转发到自定义 DNS 服务器的 IP 地址。Requests for this suffix should be forwarded to the IP address of the custom DNS server.

以下文本是 Bind DNS 软件的条件转发器配置的示例:The following text is an example of a conditional forwarder configuration for the Bind DNS software:

zone "icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn" {
    type forward;
    forwarders {10.0.0.4;}; # The custom DNS server's internal IP address
};

若要了解如何在 Windows Server 2016 上使用 DNS,请参阅 Add-DnsServerConditionalForwarderZone 文档。For information on using DNS on Windows Server 2016, see the Add-DnsServerConditionalForwarderZone documentation...

配置本地 DNS 服务器以后,即可在本地网络中使用 nslookup 来验证能否解析虚拟网络中的名称。Once you have configured the on-premises DNS server, you can use nslookup from the on-premises network to verify that you can resolve names in the virtual network. 下面为示例The following example

nslookup dnsproxy.icb0d0thtw0ebifqt0g1jycdxd.ex.internal.chinacloudapp.cn 196.168.0.4

该示例使用位于 196.168.0.4 的本地 DNS 服务器来解析自定义 DNS 服务器的名称。This example uses the on-premises DNS server at 196.168.0.4 to resolve the name of the custom DNS server. 将该 IP 地址替换为本地 DNS 服务器的 IP 地址。Replace the IP address with the one for the on-premises DNS server. dnsproxy 地址替换为自定义 DNS 服务器的完全限定的域名。Replace the dnsproxy address with the fully qualified domain name of the custom DNS server.

可选:控制网络流量Optional: Control network traffic

可以使用网络安全组 (NSG) 或用户定义路由 (UDR) 来控制网络通信。You can use network security groups (NSG) or user-defined routes (UDR) to control network traffic. NSG 用于筛选入站和出站流量,以及允许或拒绝流量。NSGs allow you to filter inbound and outbound traffic, and allow or deny the traffic. UDR 用于控制虚拟网络、Internet 和本地网络中各资源之间的流量。UDRs allow you to control how traffic flows between resources in the virtual network, the internet, and the on-premises network.

警告

HDInsight 要求从 Azure 云中的特定 IP 地址进行入站访问,以及进行不受限制的出站访问。HDInsight requires inbound access from specific IP addresses in the Azure cloud, and unrestricted outbound access. 使用 NSG 或 UDR 控制流量时,必须执行以下步骤:When using NSGs or UDRs to control traffic, you must perform the following steps:

  1. 找到虚拟网络所在位置的 IP 地址。Find the IP addresses for the location that contains your virtual network. 如需按位置列出的必需 IP,请参阅必需 IP 地址For a list of required IPs by location, see Required IP addresses.

  2. 对于步骤 1 中确定的 IP 地址,允许该 IP 地址的入站流量。For the IP addresses identified in step 1, allow inbound traffic from that IP addresses.

    • 如果使用 NSG:在端口 443上允许该 IP地址的入站流量 。If you are using NSG: Allow inbound traffic on port 443 for the IP addresses.
    • 如果使用 UDR:为该 IP 地址将路由的下一个跃点类型设置为“Internet” 。If you are using UDR: Set the Next Hop type of the route to Internet for the IP addresses.

如需使用 Azure PowerShell 或 Azure CLI 来创建 NSG 的示例,请参阅使用 Azure 虚拟网络扩展 HDInsight 文档。For an example of using Azure PowerShell or the Azure CLI to create NSGs, see the Extend HDInsight with Azure Virtual Networks document.

创建 HDInsight 群集Create the HDInsight cluster

警告

必须先配置自定义 DNS 服务器,然后才能将 HDInsight 安装到虚拟网络中。You must configure the custom DNS server before installing HDInsight in the virtual network.

通过使用 Azure 门户创建 HDInsight 群集文档中的步骤创建 HDInsight 群集。Use the steps in the Create an HDInsight cluster using the Azure portal document to create an HDInsight cluster.

警告

  • 在群集创建过程中,必须选择虚拟网络所在的位置。During cluster creation, you must choose the location that contains your virtual network.

  • 在配置的“高级设置”部分,必须选择此前创建的虚拟网络和子网。 In the Advanced settings part of configuration, you must select the virtual network and subnet that you created earlier.

连接到 HDInsightConnecting to HDInsight

HDInsight 上的大多数文档假定你可以通过 Internet 访问群集。Most documentation on HDInsight assumes that you have access to the cluster over the internet. 例如,可以通过 https://CLUSTERNAME.azurehdinsight.cn 连接到该群集。For example, that you can connect to the cluster at https://CLUSTERNAME.azurehdinsight.cn. 此地址使用公共网关,如果你使用了 NSG 或 UDR 限制来自 Internet 的访问,则该网关不可用。This address uses the public gateway, which is not available if you have used NSGs or UDRs to restrict access from the internet. 一些文档在通过 SSH 会话连接到群集时还引用了 headnodehostSome documentation also references headnodehost when connecting to the cluster from an SSH session. 该地址仅可在群集中的节点上使用,在通过虚拟网络连接的客户端上不可用。This address is only available from nodes within a cluster, and is not usable on clients connected over the virtual network.

若要通过虚拟网络直接连接到 HDInsight,请使用以下步骤:To directly connect to HDInsight through the virtual network, use the following steps:

  1. 若要发现 HDInsight 群集节点的内部完全限定的域名,请使用以下方法之一:To discover the internal fully qualified domain names of the HDInsight cluster nodes, use one of the following methods:

    $resourceGroupName = "The resource group that contains the virtual network used with HDInsight"
    
    $clusterNICs = Get-AzNetworkInterface -ResourceGroupName $resourceGroupName | where-object {$_.Name -like "*node*"}
    
    $nodes = @()
    foreach($nic in $clusterNICs) {
        $node = new-object System.Object
        $node | add-member -MemberType NoteProperty -name "Type" -value $nic.Name.Split('-')[1]
        $node | add-member -MemberType NoteProperty -name "InternalIP" -value $nic.IpConfigurations.PrivateIpAddress
        $node | add-member -MemberType NoteProperty -name "InternalFQDN" -value $nic.DnsSettings.InternalFqdn
        $nodes += $node
    }
    $nodes | sort-object Type
    
    az network nic list --resource-group <resourcegroupname> --output table --query "[?contains(name,'node')].{NICname:name,InternalIP:ipConfigurations[0].privateIpAddress,InternalFQDN:dnsSettings.internalFqdn}"
    
  2. 若要确定服务的可用端口,请参阅 HDInsight 的 Apache Hadoop 服务所用的端口文档。To determine the port that a service is available on, see the Ports used by Apache Hadoop services on HDInsight document.

    重要

    托管在头节点上的某些服务一次只能在一个节点上处于活动状态。Some services hosted on the head nodes are only active on one node at a time. 如果尝试在一个头节点上访问某个服务时失败,请切换到其他头节点。If you try accessing a service on one head node and it fails, switch to the other head node.

    例如,Apache Ambari 一次仅在一个头节点上处于活动状态。For example, Apache Ambari is only active on one head node at a time. 如果在一个头节点上尝试访问 Ambari 并返回 404 错误,则它正在其他头节点上运行。If you try accessing Ambari on one head node and it returns a 404 error, then it is running on the other head node.

后续步骤Next steps