Azure 流量管理器上的降级状态故障排除Troubleshooting degraded state on Azure Traffic Manager

本文介绍如何对显示降级状态的 Azure 流量管理器配置文件进行故障排除。This article describes how to troubleshoot an Azure Traffic Manager profile that is showing a degraded status. 在排查 Azure 流量管理器状态降级问题时,第一步是启用诊断日志记录。As a first step in troubleshooting a Azure Traffic Manager degraded state is to enable diagnostic logging. 有关详细信息,请参阅启用诊断日志Refer to Enable diagnostic logs for more information. 在此方案中,假设已配置了一个指向某些 chinacloudapp.cn 托管服务的流量管理器配置文件。For this scenario, consider that you have configured a Traffic Manager profile pointing to some of your chinacloudapp.cn hosted services. 如果流量管理器的运行状况显示“已降级” 状态,则一个或多个终结点的状态可能为“已降级” :If the health of your Traffic Manager displays a Degraded status, then the status of one or more endpoints may be Degraded:

已降级终结点状态

如果流量管理器的运行状况显示“非活动” 状态,则这两个终结点可能已禁用If the health of your Traffic Manager displays an Inactive status, then both end points may be Disabled:

非活动流量管理器状态

了解流量管理器探测Understanding Traffic Manager probes

  • 仅当探测从探测路径收到 HTTP 200 响应时,流量管理器才将终结点视为“联机”。Traffic Manager considers an endpoint to be ONLINE only when the probe receives an HTTP 200 response back from the probe path. 如果应用程序返回任何其他的 HTTP 响应代码,则应将该响应代码添加到流量管理器配置文件的预期状态代码范围If you application returns any other HTTP response code you should add that response code to Expected status code ranges of your Traffic Manager profile.
  • 应将 30x 重定向响应视为失败,除非已在流量管理器配置文件的预期状态代码范围中将其指定为有效的响应代码。A 30x redirect response is treated as failure unless you have specified this as a valid response code in Expected status code ranges of your Traffic Manager profile. 流量管理器不探测重定向目标。Traffic Manager does not probe the redirection target.
  • 对于 HTTPS 探测器,证书错误会被忽略。For HTTPs probes, certificate errors are ignored.
  • 只要返回 200,就无需在意探测器路径的实际内容。The actual content of the probe path doesn't matter, as long as a 200 is returned. 常用的技巧是探测某些静态内容的 URL,例如“/favicon.ico”。Probing a URL to some static content like "/favicon.ico" is a common technique. 即使应用程序处于正常状态,ASP 页等动态内容也不一定会返回 200。Dynamic content, like the ASP pages, may not always return 200, even when the application is healthy.
  • 最佳做法是将探测路径设置为某个值,该值具有足够逻辑来确定站点是已启动还是已关闭。A best practice is to set the probe path to something that has enough logic to determine that the site is up or down. 在上面的示例中,如果将路径设置为“/favicon.ico”,只会测试 w3wp.exe 是否响应。In the previous example, by setting the path to "/favicon.ico", you are only testing that w3wp.exe is responding. 这种探测可能不会指示 Web 应用程序是否正常。This probe may not indicate that your web application is healthy. 更好的选择是,将路径设置为诸如“/Probe.aspx”之类的值,可通过逻辑确定站点运行状况。A better option would be to set a path to a something such as "/Probe.aspx" that has logic to determine the health of the site. 例如,可以使用性能计数器来查看 CPU 利用率,或者测量失败请求的数目。For example, you could use performance counters to CPU utilization or measure the number of failed requests. 或者,可以尝试访问数据库资源或会话状态,确保 Web 应用程序正常工作。Or you could attempt to access database resources or session state to make sure that the web application is working.
  • 如果配置文件中的所有终结点都已降级,流量管理器会将所有终结点视为处于正常状态,并将流量路由到所有终结点。If all endpoints in a profile are degraded, then Traffic Manager treats all endpoints as healthy and routes traffic to all endpoints. 此行为可确保探测机制中的问题不会导致服务完全中断。This behavior ensures that problems with the probing mechanism do not result in a complete outage of your service.

故障排除Troubleshooting

若要排查探测失败,需要使用一个工具显示探测 URL 中返回的 HTTP 状态代码。To troubleshoot a probe failure, you need a tool that shows the HTTP status code return from the probe URL. 有许多工具可以显示原始 HTTP 响应。There are many tools available that show you the raw HTTP response.

也可以在 Internet Explorer 中使用“F12 调试工具”的“网络”标签页查看 HTTP 响应。Also, you can use the Network tab of the F12 Debugging Tools in Internet Explorer to view the HTTP responses.

在本示例中,我们想要查看以下探测 URL 返回的响应:http://watestsdp2008r2.chinacloudapp.cn:80/Probe。For this example we want to see the response from our probe URL: http://watestsdp2008r2.chinacloudapp.cn:80/Probe. 以下 PowerShell 示例演示了该问题。The following PowerShell example illustrates the problem.

Invoke-WebRequest 'http://watestsdp2008r2.chinacloudapp.cn/Probe' -MaximumRedirection 0 -ErrorAction SilentlyContinue | Select-Object StatusCode,StatusDescription

示例输出:Example output:

StatusCode StatusDescription
---------- -----------------
       301 Moved Permanently

请注意我们收到了重定向响应。Notice that we received a redirect response. 如前所述,任何非 200 状态代码都被视为失败。As stated previously, any StatusCode other than 200 is considered a failure. 流量管理器将终结点状态更改为“脱机”。Traffic Manager changes the endpoint status to Offline. 若要解决该问题,请检查网站配置,确保可以从探测路径返回正确的状态代码。To resolve the problem, check the website configuration to ensure that the proper StatusCode can be returned from the probe path. 将流量管理器探测重新配置为指向返回 200 的路径。Reconfigure the Traffic Manager probe to point to a path that returns a 200.

如果探测使用 HTTPS 协议,可能需要禁用证书检查,避免测试期间出现 SSL/TLS 错误。If your probe is using the HTTPS protocol, you may need to disable certificate checking to avoid SSL/TLS errors during your test. 以下 PowerShell 语句禁用当前 PowerShell 会话的证书验证:The following PowerShell statements disable certificate validation for the current PowerShell session:

add-type @"
using System.Net;
using System.Security.Cryptography.X509Certificates;
public class TrustAllCertsPolicy : ICertificatePolicy {
    public bool CheckValidationResult(
    ServicePoint srvPoint, X509Certificate certificate,
    WebRequest request, int certificateProblem) {
    return true;
    }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

后续步骤Next Steps

关于流量管理器流量路由方法About Traffic Manager traffic routing methods

什么是流量管理器What is Traffic Manager

云服务Cloud Services

Azure 应用服务Azure App Service

流量管理器上的操作(REST API 参考)Operations on Traffic Manager (REST API Reference)

Azure 流量管理器 cmdletAzure Traffic Manager Cmdlets