使用资源运行状况排查 Azure SQL 数据库和 Azure SQL 托管实例的连接问题Use Resource Health to troubleshoot connectivity for Azure SQL Database and Azure SQL Managed Instance

适用于:是 Azure SQL 数据库 是Azure SQL 托管实例 APPLIES TO: yesAzure SQL Database yesAzure SQL Managed Instance

适用于 Azure SQL 数据库和 Azure SQL 托管实例的资源运行状况可以帮助你在 Azure 问题影响到 SQL 资源时进行诊断和获取支持。Resource Health for Azure SQL Database and Azure SQL Managed Instance helps you diagnose and get support when an Azure issue impacts your SQL resources. 它通知你有关资源的当前和过去运行状况的信息,并帮助你缓解问题。It informs you about the current and past health of your resources and helps you mitigate issues. 在需要有关 Azure 服务问题的帮助时,资源运行状况将提供技术支持。Resource Health provides technical support when you need help with Azure service issues.

概述

运行状况检查Health checks

资源运行状况通过检查资源登录的成功与失败状态来确定 SQL 资源的运行状况。Resource Health determines the health of your SQL resource by examining the success and failure of logins to the resource. 目前,适用于 SQL 数据库资源的资源运行状况只会检查系统错误(而不是用户错误)导致的登录失败。Currently, Resource Health for your SQL Database resource only examines login failures due to system error and not user error. 资源运行状况的状态每隔 1 - 2 分钟更新一次。The Resource Health status is updated every 1 to 2 minutes.

健康状况Health states

可用Available

“可用”状态表示资源运行状况未检测到 SQL 资源中的系统错误导致的登录失败。A status of Available means that Resource Health has not detected login failures due to system errors on your SQL resource.

可用

已降级Degraded

“已降级”状态表示资源运行状况检测到大多数登录成功,但也有一部分登录失败。A status of Degraded means that Resource Health has detected a majority of successful logins, but some failures as well. 这些问题很有可能是暂时性登录错误。These are most likely transient login errors. 若要减轻暂时性登录错误导致的连接问题所造成的影响,请在代码中实施重试逻辑To reduce the impact of connection issues caused by transient login errors, implement retry logic in your code.

已降级

不可用Unavailable

“不可用”状态表示资源运行状况检测到 SQL 资源登录一直失败。A status of Unavailable means that Resource Health has detected consistent login failures to your SQL resource. 如果资源长时间保持此状态,请与支持人员联系。If your resource remains in this state for an extended period of time, contact support.

不可用

未知Unknown

运行状况为“未知”指示资源运行状况未收到此资源的相关信息已超过 10 分钟。The health status of Unknown indicates that Resource Health hasn't received information about this resource for more than 10 minutes. 尽管此状态不是资源状态的最终指示,但它是故障排除过程中一个重要的数据点。Although this status isn't a definitive indication of the state of the resource, it is an important data point in the troubleshooting process. 如果资源正在按预期方式运行,资源状态会在几分钟后更改为“可用”。If the resource is running as expected, the status of the resource will change to Available after a few minutes. 如果资源遇到问题,“未知”运行状态可能暗示平台中的事件正在影响资源。If you're experiencing problems with the resource, the Unknown health status might suggest that an event in the platform is affecting the resource.

未知

历史信息Historical information

可在“资源运行状况”的“运行状况历史记录”部分中访问最多 14 天的运行状况历史记录。You can access up to 14 days of health history in the Health History section of Resource Health. 该部分还包含资源运行状况报告的停机问题的停机原因(如果有)。The section will also contain the downtime reason (when available) for the downtimes reported by Resource Health. 目前,Azure 以两分钟粒度显示数据库资源的停机时间。Currently, Azure shows the downtime for your database resource at a two-minute granularity. 实际停机时间可能小于一分钟。The actual downtime is likely less than a minute. 平均为 8 秒。The average is 8 seconds.

停机原因Downtime reasons

如果数据库遇到停机,将执行分析来确定原因。When your database experiences downtime, analysis is performed to determine a reason. 在适当的情况下,资源运行状况的“运行状况历史记录”部分会报告停机原因。When available, the downtime reason is reported in the Health History section of Resource Health. 停机原因通常在发生某个事件后的 30 分钟内发布。Downtime reasons are typically published 30 minutes after an event.

计划内维护Planned maintenance

Azure 基础结构定期执行计划内维护 - 升级数据中心内的硬件或软件组件。The Azure infrastructure periodically performs planned maintenance - the upgrade of hardware or software components in the datacenter. 在数据库接受维护期间,Azure SQL 可以终止某些现有连接并拒绝新连接。While the database undergoes maintenance, Azure SQL may terminate some existing connections and refuse new ones. 在计划内维护期间出现的登录失败通常是暂时性的,重试逻辑可以帮助减轻影响。The login failures experienced during planned maintenance are typically transient, and retry logic helps reduce the impact. 如果持续遇到登录错误,请与支持人员联系。If you continue to experience login errors, contact support.

重新配置Reconfiguration

重新配置被视为暂时性状态,预期会不时地发生。Reconfigurations are considered transient conditions and are expected from time to time. 这些事件可能是负载均衡或软件/硬件故障触发的。These events can be triggered by load balancing or software/hardware failures. 连接到云数据库的任何客户端生产应用程序应该实施可靠的连接重试逻辑,因为此逻辑有助于缓解这些情况,并且可让最终用户清晰地看到错误。Any client production application that connects to a cloud database should implement a robust connection retry logic, as it would help mitigate these situations and should generally make the errors transparent to the end user.

后续步骤Next steps