故障排除Troubleshooting

本文将有助于排查使用可用性监视时可能出现的常见问题。This article will help you to troubleshoot common issues that may occur when using availability monitoring.

SSL/TLS 错误SSL/TLS errors

症状/错误消息Symptom/error message 可能的原因Possible causes
无法创建 SSL/TLS 安全通道Could not create SSL/TLS Secure Channel SSL 版本。SSL version. 仅支持 TLS 1.0、1.1 和 1.2。Only TLS 1.0, 1.1, and 1.2 are supported. 不支持 SSLv3。SSLv3 is not supported.
TLSv1.2 记录层:警报(级别:严重,说明:错误记录 MAC)TLSv1.2 Record Layer: Alert (Level: Fatal, Description: Bad Record MAC) 请查看 StackExchange 线程以了解详细信息See StackExchange thread for more information.
无法连接到 CDN(内容分发网络)的 URLURL that is failing is to a CDN (Content Delivery Network) 这可能是由于 CDN 上的错误配置导致的This may be caused by a misconfiguration on your CDN

可能的解决方法Possible workaround

  • 如果遇到问题的 URL 始终指向依赖资源,建议对 Web 测试禁用“分析从属请求” 。If the URLs that are experiencing the issue are always to dependent resources, it is recommended to disable parse dependent requests for the web test.

测试仅在某些位置失败Test fails only from certain locations

症状/错误消息Symptom/error message 可能的原因Possible causes
连接尝试失败,因为已连接方在一段时间后尚未做出正确的响应A connection attempt failed because the connected party did not properly respond after a period of time 某些位置的测试代理正被防火墙阻止。Test agents in certain locations are being blocked by a firewall.
正在通过(负载均衡器、异地流量管理器、Azure Express Route)重新路由某些 IP 地址。Rerouting of certain IP addresses is occurring via (Load Balancers, Geo traffic managers, Azure Express Route.)
如果使用的是 Azure ExpressRoute,则在发生非对称路由时,存在着数据包可能被丢弃的情况。If using Azure ExpressRoute, there are scenarios where packets can be dropped in cases where Asymmetric Routing occurs.

间歇性测试失败,出现违反协议错误Intermittent test failure with a protocol violation error

症状/错误消息Symptom/error message 可能的原因Possible causes 可能的解决方法Possible Resolutions
服务器违反了协议。The server committed a protocol violation. 节=ResponseHeader 详细信息=CR 必须后跟 LFSection=ResponseHeader Detail=CR must be followed by LF 检测到格式不正确的标头时,会发生这种情况。This occurs when malformed headers are detected. 具体来说,某些标头可能没有使用 CRLF 来指示行尾,这违反了 HTTP 规范。Specifically, some headers might not be using CRLF to indicate the end of line, which violates the HTTP specification. Application Insights 强制实施此 HTTP 规范,并使用格式错误的标头导致响应失败。Application Insights enforces this HTTP specification and fails responses with malformed headers. a.a. 请与网站主机提供商/CDN 提供商联系以修复故障服务器。Contact web site host provider / CDN provider to fix the faulty servers.
b.b. 如果失败的请求是资源(例如,样式文件、图像、脚本),则可以考虑禁止分析依赖请求。In case the failed requests are resources (e.g. style files, images, scripts), you may consider disabling the parsing of dependent requests. 请记住,如果执行此操作,你将无法监视这些文件的可用性。Keep in mind, if you do this you will lose the ability to monitor the availability of those files).

备注

在 HTTP 标头验证比较宽松的浏览器上,URL 可能不会失败。The URL may not fail on browsers that have a relaxed validation of HTTP headers. 有关该问题的详细说明,请参阅此博客文章: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/See this blog post for a detailed explanation of this issue: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/

常见故障排除问题Common troubleshooting questions

站点看似正常,但我看见测试失败了?Site looks okay but I see test failures? 为何 Application Insights 会向我发出警报?Why is Application Insights alerting me?

  • 你的测试是否启用了“分析从属请求” ?Does your test have Parse dependent requests enabled? 这会导致严格检查脚本、图像等资源。这类故障在浏览器上可能不明显。That results in a strict check on resources such as scripts, images etc. These types of failures may not be noticeable on a browser. 检查所有图像、脚本、样式表和页面加载的任何其他文件。Check all the images, scripts, style sheets, and any other files loaded by the page. 如果其中有任何一个失败,即使 HTML 主页正常加载,测试也会报告为失败。If any of them fails, the test is reported as failed, even if the main HTML page loads without issue. 若要使测试对此类资源故障不再敏感,只需在测试配置中取消选中“分析从属请求”即可。To desensitize the test to such resource failures, simply uncheck the Parse Dependent Requests from the test configuration.

  • 若要降低暂时性网络问题等各方面因素导致的干扰,请确保选中“测试故障时允许重试”配置。To reduce odds of noise from transient network blips etc., ensure Enable retries for test failures configuration is checked. 也可从多个位置进行测试并对警报规则阈值进行相应的管理,防止在出现特定于位置的问题时引发不必要的警报。You can also test from more locations and manage alert rule threshold accordingly to prevent location-specific issues causing undue alerts.

  • 单击可用性体验中的任意红点或搜索资源管理器中的任意可用性故障,以查看我们报告失败的详细原因。Click on any of the red dots from the Availability experience, or any availability failure from the Search explorer to see the details of why we reported the failure. 测试结果以及相关的服务器端遥测数据(如果启用)应该有助于了解测试失败的原因。The test result, along with the correlated server-side telemetry (if enabled) should help understand why the test failed. 暂时性问题的常见原因是网络或连接问题。Common causes of transient issues are network or connection issues.

  • 测试是否超时?Did the test time-out? 我们在 2 分钟后中止测试。We abort tests after 2 minutes. 如果你的 ping 或多步骤测试花费的时间超过 2 分钟,我们会将其报告为失败。If your ping or multi-step test takes longer than 2 minutes, we will report it as a failure. 请考虑将测试分成多个可在较短持续时间内完成的测试。Consider breaking the test into multiple ones that can complete in shorter durations.

  • 是所有位置都报告失败,还是只有部分位置报告失败?Did all locations report failure, or only some of them? 如果只有部分位置报告失败,则可能是由网络/CDN 问题引起的。If only some reported failures, it may be due to network/CDN issues. 再次单击红点应该有助于了解该位置报告失败的原因。Again, clicking on the red dots should help understand why the location reported failures.

在警报触发和/或解决时,我并未收到电子邮件?I did not get an email when the alert triggered, or resolved or both?

检查经典警报配置,确认是否已直接列出你的电子邮件,或者你所在的通讯组列表是否配置为接收通知。Check the classic alerts configuration to confirm your email is directly listed, or a distribution list you are on is configured to receive notifications. 如果是,则检查通讯组列表配置,确认它可以接收外部电子邮件。If it is, then check the distribution list configuration to confirm it can receive external emails. 另外,检查邮件管理员是否有可能配置了任何可能导致此问题的策略。Also check if your mail administrator may have any policies configured that may cause this issue.

我尚未收到 Webhook 通知?I did not receive the webhook notification?

检查以确保接收 Webhook 通知的应用程序可用并成功处理 Webhook 请求。Check to ensure the application receiving the webhook notification is available, and successfully processes the webhook requests. 有关详细信息,请参阅此文See this for more information.

间歇性测试失败,出现违反协议错误?Intermittent test failure with a protocol violation error?

错误(“违反协议: CR 必须后跟 LF”)表明服务器(或依赖项)存在问题。The error ("protocol violation..CR must be followed by LF") indicates an issue with the server (or dependencies). 在响应中设置的标头格式错误时,会发生这种情况。This happens when malformed headers are set in the response. 可能是负载均衡器或 CDN 引发的。It can be caused by load balancers or CDNs. 具体来说,某些标头可能没有使用 CRLF 来指示行尾,这违反了 HTTP 规范,因此无法通过 .NET WebRequest 级别的验证。Specifically, some headers might not be using CRLF to indicate end of line, which violates the HTTP specification and therefore fail validation at the .NET WebRequest level. 请检查响应,以找出可能违反规范的标头。Inspect the response to spot headers, which might be in violation.

备注

在 HTTP 标头验证比较宽松的浏览器上,URL 可能不会失败。The URL may not fail on browsers that have a relaxed validation of HTTP headers. 有关该问题的详细说明,请参阅此博客文章: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/See this blog post for a detailed explanation of this issue: http://mehdi.me/a-tale-of-debugging-the-linkedin-api-net-and-http-protocol-violations/

如果已为服务器端应用程序设置 Application Insights,则可能是因为采样正在进行。If you have Application Insights set up for your server-side application, that may be because sampling is in operation. 请选择其他可用性结果。Select a different availability result.

是否可从 Web 测试调用代码?Can I call code from my web test?

否。No. 测试步骤必须在 .webtest 文件中指定。The steps of the test must be in the .webtest file. 此外,不能调用其他 Web 测试或使用循环。And you can't call other web tests or use loops. 但是可以借助一些有用的插件。But there are several plug-ins that you might find helpful.

“Web 测试”与“可用性测试”之间是否存在差异?Is there a difference between "web tests" and "availability tests"?

这两个术语可以互换引用。The two terms may be referenced interchangeably. 可用性测试是更通用的术语,其中除了包含多步骤 Web 测试外,还包含单 URL ping 测试。Availability tests is a more generic term that includes the single URL ping tests in addition to the multi-step web tests.

我希望在防火墙后面运行的内部服务器上使用可用性测试。I'd like to use availability tests on our internal server that runs behind a firewall.

下面是两种可能的解决方案:There are two possible solutions:

  • 请将防火墙配置为允许从我们的 Web 测试代理 IP 地址发出的传入请求。Configure your firewall to permit incoming requests from the IP addresses of our web test agents.
  • 编写自己的代码,定期测试内部服务器。Write your own code to periodically test your internal server. 在防火墙后的测试服务器上以后台进程的方式运行该代码。Run the code as a background process on a test server behind your firewall. 测试进程可以通过核心 SDK 包中的 TrackAvailability() API 将其结果发送到 Application Insights。Your test process can send its results to Application Insights by using TrackAvailability() API in the core SDK package. 这要求测试服务器能够以传出访问的方式访问 Application Insights 引入终结点,但与允许传入请求相比,这种方式的安全风险要小得多。This requires your test server to have outgoing access to the Application Insights ingestion endpoint, but that is a much smaller security risk than the alternative of permitting incoming requests. 结果将显示在“可用性 Web 测试”边栏选项卡中,但是与通过门户创建的测试相比,体验会略微简化。The results will appear in the availability web tests blades though the experience will be slightly simplified from what is available for tests created via the portal. 自定义可用性测试还会在“分析”、“搜索”和“指标”中显示为可用性结果。Custom availability tests will also appear as availability results in Analytics, Search, and Metrics.

上传多步骤 Web 测试失败Uploading a multi-step web test fails

可能导致此问题的一些原因包括:Some reasons this might happen:

  • 存在 300 K 大小限制。There's a size limit of 300 K.
  • 不支持循环。Loops aren't supported.
  • 不支持对其他 Web 测试的引用。References to other web tests aren't supported.
  • 不支持数据源。Data sources aren't supported.

多步骤测试无法完成My multi-step test doesn't complete

存在每个测试 100 个请求的限制。There's a limit of 100 requests per test. 此外,如果运行时间超过两分钟,测试会停止。Also, the test is stopped if it runs longer than two minutes.

如何使用客户端证书运行测试?How can I run a test with client certificates?

目前不支持。This is currently not supported.

谁会收到(经典)警报通知?Who receives the (classic) alert notifications?

本节仅适用于经典警报,并将帮助优化警报通知以确保只有预期的接收人能收到通知。This section only applies to classic alerts and will help you optimize your alert notifications to ensure that only your desired recipients receive notifications. 若要详细了解经典警报与新的警报体验之间的区别,请参阅警报概述文章To understand more about the difference between classic alertsand the new alerts experience refer to the alerts overview article. 若要控制新的警报体验中的警报通知,请使用操作组To control alert notification in the new alerts experience use action groups.

  • 建议将经典警报通知用于特定接收人。We recommend the use of specific recipients for classic alert notifications.

  • 对于 Y 个位置中 X 个位置的失败相关警报,如已启用“批/组” 复选框选项,会向具有管理员/共同管理员角色的用户发送相关通知。For alerts on failures from X out of Y locations, the bulk/group check-box option, if enabled, sends to users with admin/co-admin roles. 实质上是_订阅_的_所有_管理员均会收到通知。Essentially all administrators of the subscription will receive notifications.

  • 对于可用性指标警报,“批量/组”复选框选项(如果已启用)将发送给订阅中具有所有者、参与者或阅读者角色的用户 。For alerts on availability metrics the bulk/group check-box option if enabled, sends to users with owner, contributor, or reader roles in the subscription. 实际上,可以访问包含 Application Insights 资源在内的订阅的所有用户均会收到通知 。In effect, all users with access to the subscription the Application Insights resource are in scope and will receive notifications.

备注

如果当前使用“批/组”复选框选项并禁用它,则无法还原更改 。If you currently use the bulk/group check-box option, and disable it, you will not be able to revert the change.

如果需要根据用户角色通知用户,请使用新的警报体验/近实时警报。Use the new alert experience/near-realtime alerts if you need to notify users based on their roles. 使用操作组,可以为具有任何参与者/所有者/读者角色(未融合为单一选项)的用户配置电子邮件通知。With action groups, you can configure email notifications to users with any of the contributor/owner/reader roles (not combined together as a single option).

后续步骤Next steps