对 Azure 负载均衡器进行故障排除Troubleshoot Azure Load Balancer

本页提供了有关基本和标准 Azure 负载均衡器常见问题的故障排除信息。This page provides troubleshooting information for Basic and Standard common Azure Load Balancer questions. 有关标准负载均衡器的详细信息,请参阅标准负载均衡器概述For more information about Standard Load Balancer, see Standard Load Balancer overview.

负载均衡器连接不可用时,最常见的症状如下:When the Load Balancer connectivity is unavailable, the most common symptoms are as follows:

  • 负载均衡器后端的 VM 不响应运行状况探测VMs behind the Load Balancer are not responding to health probes
  • 负载均衡器后端的 VM 不响应已配置端口上的流量VMs behind the Load Balancer are not responding to the traffic on the configured port

当后端 VM 的外部客户端通过负载均衡器时,将使用客户端的 IP 地址进行通信。When the external clients to the backend VMs go through the load balancer, the IP address of the clients will be used for the communication. 请确保将客户端的 IP 地址添加到 NSG 允许列表。Make sure the IP address of the clients are added into the NSG allow list.

症状:负载均衡器后端的 VM 不响应运行状况探测Symptom: VMs behind the Load Balancer are not responding to health probes

后端服务器必须通过探测检查后,才可加入负载均衡器集。For the backend servers to participate in the load balancer set, they must pass the probe check. 有关运行状况探测的详细信息,请参阅了解负载均衡器探测For more information about health probes, see Understanding Load Balancer Probes. 

负载均衡器后端池 VM 可能因下列任意原因而不响应探测:The Load Balancer backend pool VMs may not be responding to the probes due to any of the following reasons:

  • 负载均衡器后端池 VM 不正常Load Balancer backend pool VM is unhealthy
  • 负载均衡器后端池 VM 未侦听探测端口Load Balancer backend pool VM is not listening on the probe port
  • 防火墙或网络安全组阻止负载均衡器后端池 VM 上的端口Firewall, or a network security group is blocking the port on the Load Balancer backend pool VMs
  • 负载均衡器中的其他错误配置Other misconfigurations in Load Balancer

原因 1:负载均衡器后端池 VM 不正常Cause 1: Load Balancer backend pool VM is unhealthy

验证及解决方法 Validation and resolution

要解决此问题,请登录参与的 VM,检查 VM 状态是否正常,能否从池中的另一 VM 响应 PsPing 或 TCPing 。To resolve this issue, log in to the participating VMs, and check if the VM state is healthy, and can respond to PsPing or TCPing from another VM in the pool. 如果 VM 不正常或无法响应探测,必须先解决此问题并使 VM 恢复正常状态,该 VM 才可参与负载均衡。If the VM is unhealthy, or is unable to respond to the probe, you must rectify the issue and get the VM back to a healthy state before it can participate in load balancing.

原因 2:负载均衡器后端池 VM 未侦听探测端口Cause 2: Load Balancer backend pool VM is not listening on the probe port

如果 VM 处于正常状态但未响应探测,可能是因为参与的 VM 上的探测端口未打开,或 VM 未侦听此端口。If the VM is healthy, but is not responding to the probe, then one possible reason could be that the probe port is not open on the participating VM, or the VM is not listening on that port.

验证及解决方法 Validation and resolution

  1. 登录到后端 VM。Log in to the backend VM.
  2. 打开命令提示符并运行下列命令,验证是否有应用程序在侦听探测端口:Open a command prompt and run the following command to validate there is an application listening on the probe port: 
    netstat -annetstat -an
  3. 如果端口状态未列为“正在侦听” ,请配置适当的端口。If the port state is not listed as LISTENING, configure the proper port.
  4. 或者,选择其他列为“正在侦听” 的端口,并相应地更新负载均衡器配置。Alternatively, select another port, that is listed as LISTENING, and update load balancer configuration accordingly. 

原因 3:防火墙或网络安全组要阻止负载均衡器后端池 VM 上的端口Cause 3: Firewall, or a network security group is blocking the port on the load balancer backend pool VMs 

如果 VM 上的防火墙阻止探测端口,或者子网或 VM 上配置的一个或多个网络安全组禁止探测到达端口,VM 将无法响应运行状况探测。If the firewall on the VM is blocking the probe port, or one or more network security groups configured on the subnet or on the VM, is not allowing the probe to reach the port, the VM is unable to respond to the health probe.

验证及解决方法 Validation and resolution

  • 如果启用了防火墙,请检查它是否配置为允许探测端口。If the firewall is enabled, check if it is configured to allow the probe port. 如果没有启用防火墙,请将其配置为允许探测端口上的流量并重新测试。If not, configure the firewall to allow traffic on the probe port, and test again.
  • 在网络安全组列表中,检查探测端口上的传入或传出流量是否被干扰。From the list of network security groups, check if the incoming or outgoing traffic on the probe port has interference.
  • 此外,检查 VM NIC 或子网上是否存在优先级高于允许 LB 探测和流量的默认规则的“全部拒绝” 网络安全组规则(网络安全组必须允许负载均衡器 IP 168.63.129.16)。Also, check if a Deny All network security groups rule on the NIC of the VM or the subnet that has a higher priority than the default rule that allows LB probes & traffic (network security groups must allow Load Balancer IP of 168.63.129.16).
  • 如果上述任意规则阻止探测流量,请将其删除并将规则配置为允许探测流量。If any of these rules are blocking the probe traffic, remove and reconfigure the rules to allow the probe traffic. 
  • 测试 VM 是否现已开始响应运行状况探测。Test if the VM has now started responding to the health probes.

原因 4:负载均衡器中的其他错误配置Cause 4: Other misconfigurations in Load Balancer

如果上述各原因都已正确验证和解决,但后端 VM 仍未响应运行状况探测,请手动测试连接性,并收集一些跟踪信息以了解连接性。If all the preceding causes seem to be validated and resolved correctly, and the backend VM still does not respond to the health probe, then manually test for connectivity, and collect some traces to understand the connectivity.

验证及解决方法 Validation and resolution

  • 使用来自 VNet 中其他 VM 的 Psping 进行探测端口响应测试(例如 .\psping.exe -t 10.0.0.4:3389)并记录结果。Use Psping from one of the other VMs within the VNet to test the probe port response (example: .\psping.exe -t 10.0.0.4:3389) and record results.
  • 使用来自 VNet 中其他 VM 的 TCPing 进行探测端口响应测试(例如 .\tcping.exe 10.0.0.4 3389)并记录结果。Use TCPing from one of the other VMs within the VNet to test the probe port response (example: .\tcping.exe 10.0.0.4 3389) and record results.
  • 如果在这些 ping 测试中未收到响应,请If no response is received in these ping tests, then
    • 在目标后端池 VM 中同时运行一个 Netsh 跟踪,再运行一个来自相同 VNet 的测试 VM。Run a simultaneous Netsh trace on the target backend pool VM and another test VM from the same VNet. 现在,运行一段时间的 PsPing 测试,收集一些网络跟踪信息,然后停止测试。Now, run a PsPing test for some time, collect some network traces, and then stop the test.
    • 分析网络捕获,查看是否同时存在与 ping 查询相关的传入和传出数据包。Analyze the network capture and see if there are both incoming and outgoing packets related to the ping query.
      • 如果在后端池 VM 中未观察到传入数据包,可能是某个网络安全组或 UDR 错误配置阻止了流量。If no incoming packets are observed on the backend pool VM, there is potentially a network security groups or UDR mis-configuration blocking the traffic.
      • 如果在后端池 VM 中未观察到传出数据包,需检查 VM 是否存在任何不相关的问题(例如,应用程序阻止探测端口)。If no outgoing packets are observed on the backend pool VM, the VM needs to be checked for any unrelated issues (for example, Application blocking the probe port).
    • 验证到达负载均衡器之前,探测数据包是否强制发送到其他目标(可能通过 UDR 设置发送)。Verify if the probe packets are being forced to another destination (possibly via UDR settings) before reaching the load balancer. 这将使流量永远无法达到后端 VM。This can cause the traffic to never reach the backend VM.
  • 更改探测类型(例如从 HTTP 更改为 TCP),并在网络安全组 ACL 和防火墙中配置相应端口,以验证问题是否与探测响应的配置有关。Change the probe type (for example, HTTP to TCP), and configure the corresponding port in network security groups ACLs and firewall to validate if the issue is with the configuration of probe response. 有关运行状况探测配置的详细信息,请参阅终结点负载均衡运行状况探测配置For more information about health probe configuration, see Endpoint Load Balancing health probe configuration.

症状:负载均衡器后端的 VM 不响应已配置数据端口上的流量Symptom: VMs behind Load Balancer are not responding to traffic on the configured data port

如果后端池 VM 被列为正常且响应运行状况探测,但仍未参与负载均衡,或未响应数据流量,可能是由于以下某项原因:If a backend pool VM is listed as healthy and responds to the health probes, but is still not participating in the Load Balancing, or is not responding to the data traffic, it may be due to any of the following reasons:

  • 负载均衡器后端池 VM 未侦听数据端口Load Balancer Backend pool VM is not listening on the data port
  • 网络安全组阻止负载均衡器后端池 VM 上的端口Network security group is blocking the port on the Load Balancer backend pool VM 
  • 从相同的 VM 和 NIC 访问负载均衡器Accessing the Load Balancer from the same VM and NIC
  • 从参与的负载均衡器后端池 VM 访问 Internet 负载均衡器前端Accessing the Internet Load Balancer frontend from the participating Load Balancer backend pool VM

原因 1:负载均衡器后端池 VM 未侦听数据端口Cause 1: Load Balancer backend pool VM is not listening on the data port

如果 VM 未响应数据流量,可能是因为参与的 VM 上的目标端口未打开,或者 VM 未侦听此端口。If a VM does not respond to the data traffic, it may be because either the target port is not open on the participating VM, or, the VM is not listening on that port.

验证及解决方法 Validation and resolution

  1. 登录到后端 VM。Log in to the backend VM.
  2. 打开命令提示符并运行下列命令,以验证是否有应用程序在侦听数据端口:  netstat -anOpen a command prompt and run the following command to validate there is an application listening on the data port:  netstat -an
  3. 如果端口状态未被列为“正在侦听”,请配置适当的侦听端口If the port is not listed with State “LISTENING”, configure the proper listener port
  4. 如果端口被标记为“正在侦听”,请检查该端口上的目标应用程序是否存在问题。If the port is marked as Listening, then check the target application on that port for any possible issues.

原因 2:网络安全组阻止负载均衡器后端池 VM 上的端口Cause 2: Network security group is blocking the port on the Load Balancer backend pool VM 

如果子网或 VM 上配置的一个或多个网络安全组阻止源 IP 或端口,此 VM 将无法响应。If one or more network security groups configured on the subnet or on the VM, is blocking the source IP or port, then the VM is unable to respond.

对于公共负载均衡器,将使用 Internet 客户端的 IP 地址在客户端与负载均衡器后端 VM 之间进行通信。For the public load balancer, the IP address of the Internet clients will be used for communication between the clients and the load balancer backend VMs. 请确保在后端 VM 的网络安全组中允许客户端的 IP 地址。Make sure the IP address of the clients are allowed in the backend VM's network security group.

  1. 列出后端 VM 上配置的网络安全组。List the network security groups configured on the backend VM. 有关详细信息,请参阅管理网络安全组For more information, see Manage network security groups
  2. 在网络安全组列表中,检查:From the list of network security groups, check if:
    • 数据端口上的传入或传出流量是否被干扰。the incoming or outgoing traffic on the data port has interference.
    • VM NIC 或子网上是否存在优先级高于允许负载均衡探测和流量的默认规则的“全部拒绝” 网络安全组规则(网络安全组必须允许负载均衡器 IP 168.63.129.16 - 即探测端口)a Deny All network security group rule on the NIC of the VM or the subnet that has a higher priority that the default rule that allows Load Balancer probes and traffic (network security groups must allow Load Balancer IP of 168.63.129.16, that is probe port)
  3. 如果某规则阻止流量,请将其删除并将规则重新配置为允许数据流量。If any of the rules are blocking the traffic, remove and reconfigure those rules to allow the data traffic. 
  4. 测试 VM 是否现已开始响应运行状况探测。Test if the VM has now started to respond to the health probes.

原因 3:从相同的 VM 和网络接口访问负载均衡器Cause 3: Accessing the Load Balancer from the same VM and Network interface

如果负载均衡器后端 VM 上托管的应用程序正尝试通过同一网络接口访问同一后端 VM 上托管的其他应用程序,该操作不受支持且将失败。If your application hosted in the backend VM of a Load Balancer is trying to access another application hosted in the same backend VM over the same Network Interface, it is an unsupported scenario and will fail.

解决方法 - 可通以下方法之一解决此问题:Resolution You can resolve this issue via one of the following methods:

  • 为每个应用程序配置单独的后端池 VM。Configure separate backend pool VMs per application.
  • 在双 NIC VM 中配置应用程序,以便每个应用程序均使用自己的网络接口和 IP 地址。Configure the application in dual NIC VMs so each application was using its own Network interface and IP address.

原因 4:从参与的负载均衡器后端池 VM 访问 Internet 负载均衡器前端Cause 4: Accessing the internal Load Balancer frontend from the participating Load Balancer backend pool VM

如果在 VNet 中配置了内部负载均衡器,并且某个参与的后端 VM 正在尝试访问内部负载均衡器前端,则当将流映射到原始 VM 时会发生故障。If an internal Load Balancer is configured inside a VNet, and one of the participant backend VMs is trying to access the internal Load Balancer frontend, failures can occur when the flow is mapped to the originating VM. 不支持这种情况。This scenario is not supported. 有关详细讨讨论,请参阅限制Review limitations for a detailed discussion.

解决方案:有几种方法来取消阻止此方案,包括使用代理 。Resolution There are several ways to unblock this scenario, including using a proxy. 评估应用程序网关或其他第三方代理服务器(例如 nginx 或 haproxy)。Evaluate Application Gateway or other 3rd party proxies (for example, nginx or haproxy). 有关应用程序网关的详细信息,请参阅应用程序网关概述For more information about Application Gateway, see Overview of Application Gateway

症状:对于已在后端池中部署 VM 规模集的负载均衡器,无法根据其现有 LB 规则更改后端端口。Symptom: Cannot change backend port for existing LB rule of a load balancer which has VM Scale Set deployed in the backend pool.

原因:对于 VM 规模集参考的负载均衡器,不能根据其运行状况探测所用的负载均衡规则修改后端端口。Cause : The backend port cannot be modified for a load balancing rule that's used by a health probe for load balancer referenced by VM Scale Set.

解决方案 为了更改端口,可以通过更新 VM 规模集来删除运行状况探测,更新端口,然后重新配置运行状况探测。Resolution In order to change the port, you can remove the health probe by updating the VM Scale Set, update the port and then configure the health probe again.

附加网络捕获Additional network captures

如果决定打开支持案例,请收集下列信息,以更快获得解决方案。If you decide to open a support case, collect the following information for a quicker resolution. 选择单个后端 VM 执行下列测试:Choose a single backend VM to perform the following tests:

  • 使用来自 VNet 中后端 VM 的 Psping 进行探测端口响应测试(例如 psping 10.0.0.4:3389)并记录结果。Use Psping from one of the backend VMs within the VNet to test the probe port response (example: psping 10.0.0.4:3389) and record results.
  • 如果这些 ping 测试未收到响应,请在运行 PsPing 时,在后端 VM 和 VNet 测试 VM 上同时运行 Netsh 跟踪,然后停止 Netsh 跟踪。If no response is received in these ping tests, run a simultaneous Netsh trace on the backend VM and the VNet test VM while you run PsPing then stop the Netsh trace.

后续步骤Next steps

如果上述步骤无法解决问题,请开具支持票证If the preceding steps do not resolve the issue, open a support ticket.