监视、诊断和排查 Azure IoT 中心的连接断开问题Monitor, diagnose, and troubleshoot disconnects with Azure IoT Hub

由于存在许多可能的故障点,IoT 设备的连接问题有时很难排查。Connectivity issues for IoT devices can be difficult to troubleshoot because there are many possible points of failure. 应用程序逻辑、物理网络、协议、硬件、IoT 中心和其他云服务都可能导致问题。Application logic, physical networks, protocols, hardware, IoT Hub, and other cloud services can all cause problems. 检测和查明问题根源的能力至关重要。The ability to detect and pinpoint the source of an issue is critical. 但是,大规模的 IoT 解决方案可能有数千台设备,因此,手动检查各个设备是不切实际的。However, an IoT solution at scale could have thousands of devices, so it's not practical to check individual devices manually. 为了帮助你大规模检测、诊断和排查这些问题,请使用 IoT 中心通过 Azure Monitor 提供的监视功能。To help you detect, diagnose, and troubleshoot these issues at scale, use the monitoring capabilities IoT Hub provides through Azure Monitor. 这些功能仅限于 IoT 中心可以观察到的内容,因此我们还建议遵循监视设备和其他 Azure 服务的最佳实践。These capabilities are limited to what IoT Hub can observe, so we also recommend that you follow monitoring best practices for your devices and other Azure services.

获取警报和错误日志Get alerts and error logs

使用 Azure Monitor 可在设备连接断开时获取警报并写入日志。Use Azure Monitor to get alerts and write logs when devices disconnect.

启用日志Turn on logs

若要记录设备连接事件和错误,请为 IoT 中心连接资源日志创建诊断设置。To log device connection events and errors, create a diagnostic setting for IoT Hub connections resource logs. 建议尽快创建此设置,因为默认情况下不会收集这些日志;如果没有这些日志,在设备断开连接时,你将没有任何信息可供排查。We recommend creating this setting as early as possible, because these logs aren't collected by default, and, without them, you won't have any information to troubleshoot device disconnects with when they occur.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“诊断设置”。Select Diagnostics settings.

  4. 选择“添加诊断设置”。Select Add diagnostic setting.

  5. 选择“连接”日志。Select Connections logs.

  6. 为便于分析,请选择“发送到 Log Analytics”。For easier analysis, select Send to Log Analytics . 请参阅解决连接错误下的示例。See the example under Resolve connectivity errors.

    建议的设置

有关详细信息,请参阅监视 IoT 中心To learn more, see Monitor IoT Hub.

大规模设置设备断开连接警报Set up alerts for device disconnect at scale

若要在设备断开连接时获取警报,请针对“联网设备(预览版)”指标配置警报。To get alerts when devices disconnect, configure alerts on the connected devices (preview) metric.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“警报”。Select Alerts.

  4. 选择“新建警报规则”。Select New alert rule.

  5. 选择“添加条件”,然后选择“已连接的设备数(预览)”。Select Add condition, then select "Connected devices (preview)".

  6. 按照提示设置阈值和警报。Set up threshold and alerting by following prompts.

有关详细信息,请参阅 Microsoft Azure 中的警报是什么?To learn more, see What are alerts in Microsoft Azure?.

检测单个设备断开连接Detecting individual device disconnects

若要检测“按设备”断开连接,例如,如果你需要在工厂刚脱机时就知晓,请使用事件网格配置设备断开连接事件To detect per-device disconnects, such as when you need to know a factory just went offline, configure device disconnect events with Event Grid.

解决连接错误Resolve connectivity errors

为联网设备启用日志和警报后,如果出错,则会收到警报。When you turn on logs and alerts for connected devices, you get alerts when errors occur. 本部分介绍如何在收到警报时解决常见问题。This section describes how to look for common issues when you receive an alert. 以下步骤假定你已创建可将 IoT 中心连接日志发送到 Log Analytics 工作区的诊断设置。The steps below assume you've already created a diagnostic setting to send IoT Hub connections logs to a Log Analytics workspace.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“日志”。Select Logs.

  4. 若要隔离 IoT 中心的连接错误日志,请输入以下查询,然后选择“运行”:To isolate connectivity error logs for IoT Hub, enter the following query and then select Run:

    AzureDiagnostics
    | where ( ResourceType == "IOTHUBS" and Category == "Connections" and Level == "Error")
    
  5. 如果返回了结果,请查看 OperationNameResultType(错误代码)和 ResultDescription(错误消息),以获取有关错误的更多详细信息。If there are results, look for OperationName, ResultType (error code), and ResultDescription (error message) to get more detail on the error.

    错误日志示例

  6. 对于最常见的错误,请遵循问题解决指南:Follow the problem resolution guides for the most common errors:

我尝试了这些步骤,但没有奏效I tried the steps, but they didn't work

如果前面的步骤没有帮助,可尝试以下操作:If the previous steps didn't help, try:

如果本指南未能提供所需的帮助,请在下面的反馈部分中留言,以帮助我们改进文档。To help improve the documentation for everyone, leave a comment in the feedback section below if this guide didn't help you.

后续步骤Next steps