排查应用程序网关中的网关无效错误Troubleshooting bad gateway errors in Application Gateway

了解如何排查使用 Azure 应用程序网关时收到的网关无效 (502) 错误。Learn how to troubleshoot bad gateway (502) errors received when using Azure Application Gateway.

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

概述Overview

配置应用程序网关后,可能会出现的一个错误是“服务器错误:502 - Web 服务器在作为网关或代理服务器时收到了无效响应”。After configuring an application gateway, one of the errors that you may see is "Server Error: 502 - Web server received an invalid response while acting as a gateway or proxy server". 此错误可能是以下主要原因造成的:This error may happen for the following main reasons:

网络安全组、用户定义的路由或自定义 DNS 问题Network Security Group, User Defined Route, or Custom DNS issue

原因Cause

如果对后端的访问由于存在 NSG、UDR 或自定义 DNS 而被阻止,则应用程序网关实例将无法联系后端池。If access to the backend is blocked because of an NSG, UDR, or custom DNS, application gateway instances can't reach the backend pool. 这会导致探测故障并引发 502 错误。This causes probe failures, resulting in 502 errors.

NSG/UDR 可能存在于应用程序网关子网中,也可能存在于部署了应用程序 VM 的子网中。The NSG/UDR could be present either in the application gateway subnet or the subnet where the application VMs are deployed.

类似地,VNet 中存在自定义 DNS 也可能会导致问题。Similarly, the presence of a custom DNS in the VNet could also cause issues. 用于后端池成员的 FQDN 无法由用户为 VNet 配置的 DNS 服务器正确解析。A FQDN used for backend pool members might not resolve correctly by the user configured DNS server for the VNet.

解决方案Solution

通过执行以下步骤验证 NSG、UDR 和 DNS 配置:Validate NSG, UDR, and DNS configuration by going through the following steps:

  • 检查与应用程序网关子网关联的 NSG。Check NSGs associated with the application gateway subnet. 确保与后端的通信没有被阻止。Ensure that communication to backend isn't blocked.

  • 检查与应用程序网关子网关联的 UDR。Check UDR associated with the application gateway subnet. 确保 UDR 没有将流量引离后端子网。Ensure that the UDR isn't directing traffic away from the backend subnet. 例如,检查到网络虚拟设备的路由或通过 ExpressRoute/VPN 播发到应用程序网关子网的默认路由。For example, check for routing to network virtual appliances or default routes being advertised to the application gateway subnet via ExpressRoute/VPN.

    $vnet = Get-AzVirtualNetwork -Name vnetName -ResourceGroupName rgName
    Get-AzVirtualNetworkSubnetConfig -Name appGwSubnet -VirtualNetwork $vnet
    
  • 检查包含后端 VM 的有效 NSG 和路由Check effective NSG and route with the backend VM

    Get-AzEffectiveNetworkSecurityGroup -NetworkInterfaceName nic1 -ResourceGroupName testrg
    Get-AzEffectiveRouteTable -NetworkInterfaceName nic1 -ResourceGroupName testrg
    
  • 检查 VNet 中是否存在自定义 DNS。Check presence of custom DNS in the VNet. 可以通过查看输出中的 VNet 属性的详细信息来检查 DNS。DNS can be checked by looking at details of the VNet properties in the output.

    Get-AzVirtualNetwork -Name vnetName -ResourceGroupName rgName 
    DhcpOptions            : {
                               "DnsServers": [
                                 "x.x.x.x"
                               ]
                             }
    

    如果存在,请确保 DNS 服务器能够正确解析后端池成员的 FQDN。If present, ensure that the DNS server can resolve the backend pool member's FQDN correctly.

默认运行状况探测出现问题Problems with default health probe

原因Cause

此外,出现 502 错误经常意味着默认的运行状况探测无法访问后端 VM。502 errors can also be frequent indicators that the default health probe can't reach back-end VMs.

预配某个应用程序网关实例时,该实例会使用 BackendHttpSetting 的属性自动将默认的运行状况探测配置到每个 BackendAddressPool。When an application gateway instance is provisioned, it automatically configures a default health probe to each BackendAddressPool using properties of the BackendHttpSetting. 无需用户输入即可设置此探测。No user input is required to set this probe. 具体而言,在配置负载均衡规则时,会在 BackendHttpSetting 与 BackendAddressPool 之间建立关联。Specifically, when a load-balancing rule is configured, an association is made between a BackendHttpSetting and a BackendAddressPool. 默认探测是针对其中每个关联配置的,而应用程序网关会在 BackendHttpSetting 元素中指定的端口上,与 BackendAddressPool 中每个实例发起周期性运行状况检查连接。A default probe is configured for each of these associations and the application gateway starts a periodic health check connection to each instance in the BackendAddressPool at the port specified in the BackendHttpSetting element.

下表列出了与默认运行状况探测关联的值:The following table lists the values associated with the default health probe:

探测属性Probe property ValueValue 说明Description
探测 URLProbe URL http://127.0.0.1/ URL 路径URL path
时间间隔Interval 3030 探测间隔(秒)Probe interval in seconds
超时Time-out 3030 探测超时(秒)Probe time-out in seconds
不正常阈值Unhealthy threshold 33 探测重试计数。Probe retry count. 连续探测失败计数达到不正常阈值后,将后端服务器标记为故障。The back-end server is marked down after the consecutive probe failure count reaches the unhealthy threshold.

解决方案Solution

  • 确定默认站点已配置且正在侦听 127.0.0.1。Ensure that a default site is configured and is listening at 127.0.0.1.
  • 如果 BackendHttpSetting 指定的端口不是 80,则应将默认站点配置为侦听指定的端口。If BackendHttpSetting specifies a port other than 80, the default site should be configured to listen at that port.
  • http://127.0.0.1:port 的调用应返回 HTTP 结果代码 200。The call to http://127.0.0.1:port should return an HTTP result code of 200. 应在 30 秒超时期限内返回此代码。This should be returned within the 30-second timeout period.
  • 确保配置的端口已打开,并且没有任何防火墙或 Azure 网络安全组在配置的端口上阻止传入或传出流量。Ensure that the port configured is open and that there are no firewall rules or Azure Network Security Groups, which block incoming or outgoing traffic on the port configured.
  • 如果对 Azure 经典 VM 或云服务使用 FQDN 或公共 IP,请确保打开相应的终结点If Azure classic VMs or Cloud Service is used with a FQDN or a public IP, ensure that the corresponding endpoint is opened.
  • 如果 VM 是通过 Azure 资源管理器配置的并且位于应用程序网关部署所在的 VNet 的外部,则必须将网络安全组配置为允许在所需端口上进行访问。If the VM is configured via Azure Resource Manager and is outside the VNet where the application gateway is deployed, a Network Security Group must be configured to allow access on the desired port.

自定义运行状况探测出现问题Problems with custom health probe

原因Cause

自定义运行状况探测能够对默认探测行为提供更大的弹性。Custom health probes allow additional flexibility to the default probing behavior. 使用自定义探测时,可以配置探测间隔、要测试的 URL 和路径,以及在将后端池实例标记为不正常之前可接受的失败响应次数。When you use custom probes, you can configure the probe interval, the URL, the path to test, and how many failed responses to accept before marking the back-end pool instance as unhealthy.

添加了以下附加属性:The following additional properties are added:

探测属性Probe property 说明Description
NameName 探测的名称。Name of the probe. 此名称用于在后端 HTTP 设置中引用探测。This name is used to refer to the probe in back-end HTTP settings.
协议Protocol 用于发送探测的协议。Protocol used to send the probe. 探测使用后端 HTTP 设置中定义的协议The probe uses the protocol defined in the back-end HTTP settings
主机Host 用于发送探测的主机名。Host name to send the probe. 仅当应用程序网关上配置了多站点时才适用。Applicable only when multi-site is configured on the application gateway. 这与 VM 主机名不同。This is different from VM host name.
PathPath 探测的相对路径。Relative path of the probe. 有效路径以“/”开头。The valid path starts from '/'. 将探测发送到 <protocol>://<host>:<port><path>The probe is sent to <protocol>://<host>:<port><path>
时间间隔Interval 探测间隔(秒)。Probe interval in seconds. 这是每两次连续探测之间的时间间隔。This is the time interval between two consecutive probes.
超时Time-out 探测超时(秒)。Probe time-out in seconds. 如果在此超时期间内未收到有效响应,则将探测标记为失败。If a valid response isn't received within this time-out period, the probe is marked as failed.
不正常阈值Unhealthy threshold 探测重试计数。Probe retry count. 连续探测失败计数达到不正常阈值后,将后端服务器标记为故障。The back-end server is marked down after the consecutive probe failure count reaches the unhealthy threshold.

解决方案Solution

根据上表验证是否已正确配置自定义运行状况探测。Validate that the Custom Health Probe is configured correctly as the preceding table. 除了上述故障排除步骤以外,另请确保符合以下要求:In addition to the preceding troubleshooting steps, also ensure the following:

  • 确保已根据指南正确指定了探测。Ensure that the probe is correctly specified as per the guide.
  • 如果在应用程序网关中设置了单站点,则默认情况下,除非已在自定义探测中进行配置,否则应将主机名指定为 127.0.0.1If the application gateway is configured for a single site, by default the Host name should be specified as 127.0.0.1, unless otherwise configured in custom probe.
  • 确保对 http://<host>:<port><path> 的调用返回 HTTP 结果代码 200。Ensure that a call to http://<host>:<port><path> returns an HTTP result code of 200.
  • 确保 Interval、Timeout 和 UnhealtyThreshold 都在可接受的范围内。Ensure that Interval, Timeout, and UnhealtyThreshold are within the acceptable ranges.
  • 如果使用 HTTPS 探测器,请通过在后端服务器本身上配置回退证书,确保后端服务器不需要 SNI。If using an HTTPS probe, make sure that the backend server doesn't require SNI by configuring a fallback certificate on the backend server itself.

请求超时Request time-out

原因Cause

收到用户请求后,应用程序网关会将配置的规则应用到该请求,并将其路由到后端池实例。When a user request is received, the application gateway applies the configured rules to the request and routes it to a back-end pool instance. 应用程序网关将等待一段可配置的时间间隔,以接收后端实例做出的响应。It waits for a configurable interval of time for a response from the back-end instance. 默认情况下,此间隔为 20 秒。By default, this interval is 20 seconds. 如果应用程序网关在此时间间隔内未收到后端应用程序的响应,则用户请求出现 502 错误。If the application gateway does not receive a response from back-end application in this interval, the user request gets a 502 error.

解决方案Solution

应用程序网关允许通过 BackendHttpSetting 配置此设置,并可将此设置应用到不同的池。Application Gateway allows you to configure this setting via the BackendHttpSetting, which can be then applied to different pools. 不同的后端池可以有不同的 BackendHttpSetting,因此可配置不同的请求超时。Different back-end pools can have different BackendHttpSetting, and a different request time-out configured.

    New-AzApplicationGatewayBackendHttpSettings -Name 'Setting01' -Port 80 -Protocol Http -CookieBasedAffinity Enabled -RequestTimeout 60

BackendAddressPool 为空Empty BackendAddressPool

原因Cause

如果应用程序网关没有在后端地址池中配置 VM 或虚拟机规模集,则无法路由任何客户请求,并将发出网关无效错误。If the application gateway has no VMs or virtual machine scale set configured in the back-end address pool, it can't route any customer request and sends a bad gateway error.

解决方案Solution

确保后端地址池不为空。Ensure that the back-end address pool isn't empty. 这可以通过 PowerShell、CLI 或门户来实现。This can be done either via PowerShell, CLI, or portal.

Get-AzApplicationGateway -Name "SampleGateway" -ResourceGroupName "ExampleResourceGroup"

上述 cmdlet 的输出应包含非空后端地址池。The output from the preceding cmdlet should contain non-empty back-end address pool. 以下示例显示了返回的两个池,其中配置了后端 VM 的 FQDN 或 IP 地址。The following example shows two pools returned which are configured with a FQDN or an IP addresses for the backend VMs. BackendAddressPool 的预配状态必须是“Succeeded”。The provisioning state of the BackendAddressPool must be 'Succeeded'.

BackendAddressPoolsText:BackendAddressPoolsText:

[{
    "BackendAddresses": [{
        "ipAddress": "10.0.0.10",
        "ipAddress": "10.0.0.11"
    }],
    "BackendIpConfigurations": [],
    "ProvisioningState": "Succeeded",
    "Name": "Pool01",
    "Etag": "W/\"00000000-0000-0000-0000-000000000000\"",
    "Id": "/subscriptions/<subscription id>/resourceGroups/<resource group name>/providers/Microsoft.Network/applicationGateways/<application gateway name>/backendAddressPools/pool01"
}, {
    "BackendAddresses": [{
        "Fqdn": "xyx.chinacloudapp.cn",
        "Fqdn": "abc.chinacloudapp.cn"
    }],
    "BackendIpConfigurations": [],
    "ProvisioningState": "Succeeded",
    "Name": "Pool02",
    "Etag": "W/\"00000000-0000-0000-0000-000000000000\"",
    "Id": "/subscriptions/<subscription id>/resourceGroups/<resource group name>/providers/Microsoft.Network/applicationGateways/<application gateway name>/backendAddressPools/pool02"
}]

BackendAddressPool 中存在运行不正常的实例Unhealthy instances in BackendAddressPool

原因Cause

如果 BackendAddressPool 的所有实例都运行不正常,则应用程序网关不会包含任何要将用户请求路由到其中的后端。If all the instances of BackendAddressPool are unhealthy, then the application gateway doesn't have any back-end to route user request to. 当后端实例运行正常但尚未部署所需的应用程序时,也可能会发生此情况。This can also be the case when back-end instances are healthy but don't have the required application deployed.

解决方案Solution

确定实例正常运行且已正确配置了应用程序。Ensure that the instances are healthy and the application is properly configured. 检查后端实例是否能够从同一个 VNet 中的另一个 VM 响应 ping。Check if the back-end instances can respond to a ping from another VM in the same VNet. 如果实例中配置了公共终结点,请确保能够为发送到 Web 应用程序的浏览器请求提供服务。If configured with a public end point, ensure a browser request to the web application is serviceable.

后续步骤Next steps

如果上述步骤无法解决问题,请开具支持票证If the preceding steps don't resolve the issue, open a support ticket.