在 Azure HDInsight 中创建群集失败并出现 InvalidNetworkConfigurationErrorCodeCluster creation fails with InvalidNetworkConfigurationErrorCode in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

如果看到错误代码 InvalidNetworkConfigurationErrorCode 和说明“虚拟网络配置与 HDInsight 要求不兼容”,这往往表示群集的虚拟网络配置有问题。If you see error code InvalidNetworkConfigurationErrorCode with the description "Virtual Network configuration is not compatible with HDInsight Requirement", it usually indicates a problem with the virtual network configuration for your cluster. 请根据错误说明中的余下内容,按照以下部分所述解决问题。Based on the rest of the error description, follow the below sections to resolve your problem.

“主机名解析失败”"HostName Resolution failed"

问题Issue

错误说明中包含“主机名解析失败”。Error description contains "HostName Resolution failed".

原因Cause

此错误与某个自定义 DNS 配置问题相关。This error points to a problem with custom DNS configuration. 虚拟网络中的 DNS 服务器可以将 DNS 查询转发到 Azure 的递归解析程序,以便解析该虚拟网络中的主机名(有关详细信息,请参阅虚拟网络中的名称解析)。DNS servers within a virtual network can forward DNS queries to Azure's recursive resolvers to resolve hostnames within that virtual network (see Name Resolution in Virtual Networks for details). 可以通过虚拟 IP 168.63.129.16 访问 Azure 的递归解析程序。Access to Azure's recursive resolvers is provided via the virtual IP 168.63.129.16. 只能从 Azure VM 访问此 IP。This IP is only accessible from the Azure VMs. 因此,如果使用本地 DNS 服务器,或者 DNS 服务器是不属于群集虚拟网络的 Azure VM,则此 IP 不起作用。So it won't work if you're using an OnPrem DNS server, or your DNS server is an Azure VM, which isn't part of the cluster's virtual network.

解决方法Resolution

  1. 通过 SSH 连接到属于群集的 VM,并运行命令 hostname -fSsh into the VM that is part of the cluster, and run the command hostname -f. 此命令将返回主机的完全限定域名(在以下说明中称为 <host_fqdn>)。This will return the host’s fully qualified domain name (referred to as <host_fqdn> in the below instructions).

  2. 然后运行命令 nslookup <host_fqdn>(例如 nslookup hn1-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn)。Then, run the command nslookup <host_fqdn> (for example, nslookup hn1-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn). 如果此命令将名称解析为 IP 地址,则表示 DNS 服务器工作正常。If this command resolves the name to an IP address, it means your DNS server is working correctly. 在这种情况下,请提交有关 HDInsight 的支持案例,我们将调查你的问题。In this case, raise a support case with HDInsight, and we will investigate your issue. 请在支持案例中包含执行的故障排除步骤。In your support case, include the troubleshooting steps you executed. 这有助于我们更快解决问题。This will help us resolve the issue faster.

  3. 如果以上命令未返回 IP 地址,请运行 nslookup <host_fqdn> 168.63.129.16(例如 nslookup hn1-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn 168.63.129.16)。If the above command doesn't return an IP address, then run nslookup <host_fqdn> 168.63.129.16 (for example, nslookup hn1-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn 168.63.129.16). 如果此命令能够解析 IP,则表示 DNS 服务器未将查询转发到 Azure 的 DNS,或者它不是与群集处于同一虚拟网络中的 VM。If this command is able to resolve the IP, it means that either your DNS server isn't forwarding the query to Azure's DNS, or it isn't a VM that is part of the same virtual network as the cluster.

  4. 如果你没有任何可充当群集虚拟网络中的自定义 DNS 服务器的 Azure VM,则需要先添加此 VM。If you don't have an Azure VM that can act as a custom DNS server in the cluster’s virtual network, then you need to add this first. 在虚拟网络中创建一个要配置为 DNS 转发器的 VM。Create a VM in the virtual network, which will be configured as DNS forwarder.

  5. 在虚拟网络中部署 VM 后,在此 VM 上配置 DNS 转发规则。Once you have a VM deployed in your virtual network, configure the DNS forwarding rules on this VM. 将所有 iDNS 名称解析请求转发到 168.63.129.16,将剩余的请求转发到 DNS 服务器。Forward all iDNS name resolution requests to 168.63.129.16, and the rest to your DNS server. 此处提供了一个示例来演示如何为自定义 DNS 服务器完成此设置。Here is an example of this setup for a custom DNS server.

  6. 添加此 VM 的 IP 地址作为虚拟网络 DNS 配置的第一个 DNS 条目。Add the IP Address of this VM as first DNS entry for the Virtual Network DNS configuration.


“无法连接到 Azure 存储帐户”"Failed to connect to Azure Storage Account”

问题Issue

错误说明中包含“无法连接到 Azure 存储帐户”或“无法连接到 Azure SQL”。Error description contains "Failed to connect to Azure Storage Account” or “Failed to connect to Azure SQL".

原因Cause

Azure 存储和 SQL 没有固定的 IP 地址,因此,我们需要允许与所有 IP 建立出站连接,以允许访问这些服务。Azure Storage and SQL don't have fixed IP Addresses, so we need to allow outbound connections to all IPs to allow accessing these services. 确切的解决步骤取决于设置的是网络安全组 (NSG) 还是用户定义的规则 (UDR)。The exact resolution steps depend on whether you have set up a Network Security Group (NSG) or User-Defined Rules (UDR). 有关这些配置的详细信息,请参阅有关使用网络安全组和用户定义的路由控制 HDInsight 网络流量的部分。Refer to the section on controlling network traffic with HDInsight with network security groups and user-defined routes for details on these configurations.

解决方法Resolution

  • 如果群集使用网络安全组 (NSG)If your cluster uses a Network Security Group (NSG).

    转到 Azure 门户,并找到与其中部署了群集的子网关联的 NSG。Go to the Azure portal and identify the NSG that is associated with the subnet where the cluster is being deployed. 在“出站安全规则”部分,允许不受限制地对 Internet 进行出站访问(请注意,此处的优先级编号越小,表示优先级越高)。 In the Outbound security rules section, allow outbound access to internet without limitation (note that a smaller priority number here means higher priority). 另外,在“子网”部分确认此 NSG 是否已应用到群集子网。 Also, in the subnets section, confirm if this NSG is applied to the cluster subnet.

  • 如果群集使用用户定义的路由 (UDR)If your cluster uses a User-defined Routes (UDR).

    转到 Azure 门户,并找到与其中部署了群集的子网关联的路由表。Go to the Azure portal and identify the route table that is associated with the subnet where the cluster is being deployed. 找到子网的路由表后,检查其中的 routes 节。Once you find the route table for the subnet, inspect the routes section in it.

    如果定义了路由,请确保部署了群集的区域的 IP 地址存在路由,并且每个路由的 NextHopTypeInternetIf there are routes defined, make sure that there are routes for IP addresses for the region where the cluster was deployed, and the NextHopType for each route is Internet. 应该为上述文章中所述的每个所需 IP 地址定义一个路由。There should be a route defined for each required IP Address documented in the aforementioned article.


“虚拟网络配置不符合 HDInsight 要求”"Virtual network configuration is not compatible with HDInsight requirement"

问题Issue

错误说明中包含如下所示的消息:Error descriptions contain messages similar as follows:

ErrorCode: InvalidNetworkConfigurationErrorCode
ErrorDescription: Virtual Network configuration is not compatible with HDInsight Requirement. Error: 'Failed to connect to Azure Storage Account; Failed to connect to Azure SQL; HostName Resolution failed', Please follow https://go.microsoft.com/fwlink/?linkid=853974 to fix it.

原因Cause

自定义 DNS 设置可能有问题。Likely an issue with the custom DNS setup.

解决方法Resolution

验证 168.63.129.16 是否在自定义 DNS 链中。Validate that 168.63.129.16 is in the custom DNS chain. 虚拟网络中的 DNS 服务器可以将 DNS 查询转发到 Azure 的递归解析程序,以便解析该虚拟网络中的主机名。DNS servers within a virtual network can forward DNS queries to Azure's recursive resolvers to resolve hostnames within that virtual network. 有关详细信息,请参阅虚拟网络中的名称解析For more information, see Name Resolution in Virtual Networks. 可以通过虚拟 IP 168.63.129.16 访问 Azure 的递归解析程序。Access to Azure's recursive resolvers is provided via the virtual IP 168.63.129.16.

  1. 使用 ssh 命令连接到群集。Use ssh command to connect to your cluster. 编辑以下命令(将 CLUSTERNAME 替换为群集的名称),然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  2. 运行以下命令:Execute the following command:

    cat /etc/resolv.conf | grep nameserver*
    

    应看到与下面类似的内容:You should see something like this:

    nameserver 168.63.129.16
    nameserver 10.21.34.43
    nameserver 10.21.34.44
    

    根据结果选择执行以下步骤之一:Based on the result - choose one of the following steps to follow:

168.63.129.16 不在此列表中168.63.129.16 is not in this list

选项 1Option 1
使用规划 Azure HDInsight 的虚拟网络中所述的步骤,将 168.63.129.16 添加为虚拟网络的第一个自定义 DNS。Add 168.63.129.16 as the first custom DNS for the virtual network using the steps described in Plan a virtual network for Azure HDInsight. 仅当自定义 DNS 服务器在 Linux 上运行时,这些步骤才适用。These steps are applicable only if your custom DNS server runs on Linux.

方法 2Option 2
为虚拟网络部署 DNS 服务器 VM。Deploy a DNS server VM for the virtual network. 这包括以下步骤:This involves the following steps:

  • 在虚拟网络中创建一个要配置为 DNS 转发器的 VM(可以是 Linux VM 或 Windows VM)。Create a VM in the virtual network, which will be configured as DNS forwarder (it can be a Linux or windows VM).
  • 在此 VM 上配置 DNS 转发规则(将所有 iDNS 名称解析请求转发到 168.63.129.16,将剩余的请求转发到 DNS 服务器)。Configure DNS forwarding rules on this VM (forward all iDNS name resolution requests to 168.63.129.16, and the rest to your DNS server).
  • 添加此 VM 的 IP 地址作为虚拟网络 DNS 配置的第一个 DNS 条目。Add the IP Address of this VM as first DNS entry for Virtual Network DNS configuration.

168.63.129.16 在列表中168.63.129.16 is in the list

在这种情况下,请创建有关 HDInsight 的支持案例,我们将调查你的问题。In this case, please create a support case with HDInsight, and we'll investigate your issue. 在支持案例中包括以下命令的结果。Include the result of the below commands in your support case. 这可以帮助我们更快调查和解决问题。This will help us investigate and resolve the issue quicker.

在头节点上的 SSH 会话中,编辑并运行以下命令:From an ssh session on the head node, edit and then run the following:

hostname -f
nslookup <headnode_fqdn> (e.g.nslookup hn1-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn)
dig @168.63.129.16 <headnode_fqdn> (e.g. dig @168.63.129.16 hn0-hditest.5h6lujo4xvoe1kprq3azvzmwsd.hx.internal.chinacloudapp.cn)

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道获取更多支持:If you didn't see your problem or are unable to solve your issue, visit following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub. 有关更多详细信息,请参阅如何创建 Azure 支持请求For more detailed information, please review How to create an Azure support request. 在 Microsoft Azure 订阅中可以访问订阅管理和计费支持;通过 Azure 支持计划之一提供技术支持。Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the Azure Support Plans.