排查 Azure 负载均衡器运行状况探测状态的问题Troubleshoot Azure Load Balancer health probe status

此页介绍有关 Azure 负载均衡器运行状况探测常见问题的故障排查信息。This page provides troubleshooting information common Azure Load Balancer health probe questions.

故障描述:负载均衡器后端的 VM 不响应运行状况探测Symptom: VMs behind the Load Balancer are not responding to health probes

后端服务器必须通过探测检查后,才可加入负载均衡器集。For the backend servers to participate in the load balancer set, they must pass the probe check. 有关运行状况探测的详细信息,请参阅了解负载均衡器探测For more information about health probes, see Understanding Load Balancer Probes.

负载均衡器后端池 VM 因下列任意原因而不响应探测:The Load Balancer backend pool VMs may not be responding to the probes due to any of the following reasons:

  • 负载均衡器后端池 VM 不正常Load Balancer backend pool VM is unhealthy
  • 负载均衡器后端池 VM 未侦听探测端口Load Balancer backend pool VM is not listening on the probe port
  • 防火墙或网络安全组要阻止负载均衡器后端池 VM 上的端口Firewall, or a network security group is blocking the port on the Load Balancer backend pool VMs
  • 负载均衡器中的其他错误配置Other misconfigurations in Load Balancer

原因 1:负载均衡器后端池 VM 不正常Cause 1: Load Balancer backend pool VM is unhealthy

验证及解决方法Validation and resolution

要解决此问题,请登录参与的 VM,并检查 VM 状态是否正常,能否从池中的另一 VM 响应 PsPing 或 TCPing。To resolve this issue, log in to the participating VMs, and check if the VM state is healthy, and can respond to PsPing or TCPing from another VM in the pool. 如果 VM 不正常或无法响应探测,必须先解决此问题并使 VM 恢复正常状态,然后该 VM 才可参与负载均衡。If the VM is unhealthy, or is unable to respond to the probe, you must rectify the issue and get the VM back to a healthy state before it can participate in load balancing.

原因 2:负载均衡器后端池 VM 未侦听探测端口Cause 2: Load Balancer backend pool VM is not listening on the probe port

如果 VM 处于正常状态但未响应探测,则可能是因为参与的 VM 上的探测端口未打开,或 VM 未侦听此端口。If the VM is healthy, but is not responding to the probe, then one possible reason could be that the probe port is not open on the participating VM, or the VM is not listening on that port.

验证及解决方法Validation and resolution

  1. 登录到后端 VM。Log in to the backend VM.
  2. 打开命令提示符并运行下列命令,以验证是否有应用程序在侦听探测端口: netstat -anOpen a command prompt and run the following command to validate there is an application listening on the probe port: netstat -an
  3. 如果端口状态未列为“正在侦听”,请配置适当的端口。If the port state is not listed as LISTENING, configure the proper port.
  4. 或者,选择其他列为“正在侦听”的端口,并相应地更新负载均衡器配置。Alternatively, select another port, that is listed as LISTENING, and update load balancer configuration accordingly.

原因 3:防火墙或网络安全组要阻止负载均衡器后端池 VM 上的端口Cause 3: Firewall, or a network security group is blocking the port on the load balancer backend pool VMs

如果 VM 上的防火墙要阻止探测端口,或者子网或 VM 上配置的一个或多个网络安全组禁止探测到达端口,则 VM 无法响应运行状况探测。If the firewall on the VM is blocking the probe port, or one or more network security groups configured on the subnet or on the VM, is not allowing the probe to reach the port, the VM is unable to respond to the health probe.

验证及解决方法Validation and resolution

  1. 如果启用了防火墙,请检查它是否配置为允许探测端口。If the firewall is enabled, check if it is configured to allow the probe port. 如果没有,请将其配置为允许探测端口上的流量并重新测试。If not, configure the firewall to allow traffic on the probe port, and test again.
  2. 在网络安全组列表中,检查探测端口上的传入或传出流量是否受到干扰。From the list of network security groups, check if the incoming or outgoing traffic on the probe port has interference.
  3. 此外,检查 VM NIC 或子网上是否存在优先级高于允许 LB 探测和流量的默认规则的“全部拒绝”网络安全组规则(网络安全组必须允许负载均衡器 IP 168.63.129.16)。Also, check if a Deny All network security groups rule on the NIC of the VM or the subnet that has a higher priority than the default rule that allows LB probes & traffic (network security groups must allow Load Balancer IP of 168.63.129.16).
  4. 如果上述某规则要阻止探测流量,请将其删除并将规则配置为允许探测流量。If any of these rules are blocking the probe traffic, remove and reconfigure the rules to allow the probe traffic.
  5. 测试 VM 是否现已开始响应运行状况探测。Test if the VM has now started responding to the health probes.

原因 4:负载均衡器中的其他错误配置Cause 4: Other misconfigurations in Load Balancer

如果上述各原因貌似都经过验证和正确解决,但后端 VM 仍未响应运行状况探测,请手动测试连接性,并收集一些跟踪信息以了解连接性。If all the preceding causes seem to be validated and resolved correctly, and the backend VM still does not respond to the health probe, then manually test for connectivity, and collect some traces to understand the connectivity.

验证及解决方法Validation and resolution

  1. 使用来自 VNet 中其他 VM 的 Psping 进行探测端口响应测试(例如 .\psping.exe -t 10.0.0.4:3389)并记录结果。Use Psping from one of the other VMs within the VNet to test the probe port response (example: .\psping.exe -t 10.0.0.4:3389) and record results.
  2. 使用来自 VNet 中其他 VM 的 TCPing 进行探测端口响应测试(例如 .\tcping.exe 10.0.0.4 3389)并记录结果。Use TCPing from one of the other VMs within the VNet to test the probe port response (example: .\tcping.exe 10.0.0.4 3389) and record results.
  3. 如果在这些 ping 测试中未收到响应,则If no response is received in these ping tests, then
    • 在目标后端池 VM 中同时运行一个 Netsh 跟踪,再运行一个来自相同 VNet 的测试 VM。Run a simultaneous Netsh trace on the target backend pool VM and another test VM from the same VNet. 现在,运行 PsPing 一段时间,收集一些网络跟踪,并停止测试。Now, run a PsPing test for some time, collect some network traces, and then stop the test.
    • 分析网络捕获,查看是否同时存在与 ping 查询相关的传入和传出数据包。Analyze the network capture and see if there are both incoming and outgoing packets related to the ping query.
      • 如果在后端池 VM 中未观察到传入数据包,可能是某个网络安全组或 UDR 错误配置阻止了流量。If no incoming packets are observed on the backend pool VM, there is potentially a network security groups or UDR mis-configuration blocking the traffic.
      • 如果在后端池 VM 中未观察到传出数据包,需检查 VM 是否存在任何不相关的问题(例如,应用程序阻止探测端口)。If no outgoing packets are observed on the backend pool VM, the VM needs to be checked for any unrelated issues (for example, Application blocking the probe port).
    • 验证在到达负载均衡器之前,探测数据包是否强制发送到其他目标(可能通过 UDR 设置发送)。Verify if the probe packets are being forced to another destination (possibly via UDR settings) before reaching the load balancer. 这会使流量永远无法达到后端 VM。This can cause the traffic to never reach the backend VM.
  4. 更改探测类型(例如从 HTTP 到 TCP),并在网络安全组 ACL 和防火墙中配置相应端口,以验证问题是否与探测响应的配置有关。Change the probe type (for example, HTTP to TCP), and configure the corresponding port in network security groups ACLs and firewall to validate if the issue is with the configuration of probe response. 有关运行状况探测配置的详细信息,请参阅终结点负载均衡运行状况探测配置For more information about health probe configuration, see Endpoint Load Balancing health probe configuration.

后续步骤Next steps

如果上述步骤无法解决问题,请开具支持票证If the preceding steps do not resolve the issue, open a support ticket.