监视、诊断和排查 Azure IoT 中心的连接断开问题Monitor, diagnose, and troubleshoot disconnects with Azure IoT Hub

由于存在许多可能的故障点,IoT 设备的连接问题有时很难排查。Connectivity issues for IoT devices can be difficult to troubleshoot because there are many possible points of failure. 应用程序逻辑、物理网络、协议、硬件、IoT 中心和其他云服务都可能会导致问题。Application logic, physical networks, protocols, hardware, IoT Hub, and other cloud services can all cause problems. 检测和查明问题根源的能力至关重要。The ability to detect and pinpoint the source of an issue is critical. 但是,大规模的 IoT 解决方案可能有数千台设备,因此,手动检查各台设备是不切实际的。However, an IoT solution at scale could have thousands of devices, so it's not practical to check individual devices manually. 为了帮助你大规模地检测、诊断和排查这些问题,请使用 IoT 中心通过 Azure Monitor 提供的监视功能。To help you detect, diagnose, and troubleshoot these issues at scale, use the monitoring capabilities IoT Hub provides through Azure Monitor. 这些功能仅限于 IoT 中心可以观察到的内容,因此我们还建议你遵循适用于设备和其他 Azure 服务的监视最佳做法。These capabilities are limited to what IoT Hub can observe, so we also recommend that you follow monitoring best practices for your devices and other Azure services.

获取警报和错误日志Get alerts and error logs

使用 Azure Monitor 可在设备断开连接时获取警报并写入日志。Use Azure Monitor to get alerts and write logs when devices disconnect.

启用诊断日志Turn on Diagnostic Logs

若要记录设备连接事件和错误,请为 IoT 中心启用诊断。To log device connection events and errors, turn on diagnostics for IoT Hub. 建议尽早启用这些日志,因为如果未启用诊断日志,则当设备断开连接时,你将不会有任何信息可用来排查问题。We recommend turning on these logs as early as possible, because if diagnostic logs aren't enabled, when device disconnects occur, you won't have any information to troubleshoot the problem with.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“诊断设置”。 Select Diagnostics settings.

  4. 选择“启用诊断” 。Select Turn on diagnostics.

  5. 启用要收集的“连接”日志 。Enable Connections logs to be collected.

  6. 为便于分析,请启用“发送到 Log Analytics” 。For easier analysis, turn on Send to Log Analytics . 请参阅解决连接错误下的示例。See the example under Resolve connectivity errors.

    建议的设置

有关详细信息,请参阅监视 Azure IoT 中心的运行状况并快速诊断问题To learn more, see Monitor the health of Azure IoT Hub and diagnose problems quickly.

针对设备断开连接大规模地设置警报Set up alerts for device disconnect at scale

若要在设备断开连接时获取警报,请针对“联网设备(预览版)”指标配置警报 。To get alerts when devices disconnect, configure alerts on the connected devices (preview) metric.

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“警报”。Select Alerts.

  4. 选择“新建警报规则”。 Select New alert rule.

  5. 选择“添加条件”,然后选择“联网设备(预览版)”。 Select Add condition, then select "Connected devices (preview)".

  6. 按照提示设置阈值和警报。Set up threshold and alerting by following prompts.

若要了解详细信息,请参阅什么是 Microsoft Azure 中的警报?To learn more, see What are alerts in Microsoft Azure?.

检测各个设备断开连接Detecting individual device disconnects

若要检测“每设备” 断开连接,例如当需要知道某个工厂刚刚离线时,请使用事件网格配置设备断开连接事件To detect per-device disconnects, such as when you need to know a factory just went offline, configure device disconnect events with Event Grid.

解决连接错误Resolve connectivity errors

为联网设备启用诊断日志和警报后,如果出错,则会收到警报。When you turn on diagnostic logs and alerts for connected devices, you get alerts when errors occur. 本部分介绍了如何在收到警报时查找常见问题。This section describes how to look for common issues when you receive an alert. 以下步骤假设已经在 Azure Monitor 日志中设置了诊断日志。The steps below assume you've set up Azure Monitor logs for your diagnostic logs.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 浏览到 IoT 中心。Browse to your IoT hub.

  3. 选择“日志”。 Select Logs.

  4. 若要隔离 IoT 中心的连接错误日志,请输入以下查询,然后选择“运行” :To isolate connectivity error logs for IoT Hub, enter the following query and then select Run:

    AzureDiagnostics
    | where ( ResourceType == "IOTHUBS" and Category == "Connections" and Level == "Error")
    
  5. 如果返回了结果,请查看 OperationNameResultType(错误代码)和 ResultDescription(错误消息),以获取有关错误的更多详细信息。If there are results, look for OperationName, ResultType (error code), and ResultDescription (error message) to get more detail on the error.

    错误日志示例

  6. 对于最常见的错误,请遵循以下问题解决指南:Follow the problem resolution guides for the most common errors:

我尝试了这些步骤,但它们不起作用I tried the steps, but they didn't work

如果前面的步骤没有帮助,请尝试以下操作:If the previous steps didn't help, try:

如果本指南未能提供所需的帮助,请在下面的反馈部分中留言,以帮助我们改进文档。To help improve the documentation for everyone, leave a comment in the feedback section below if this guide didn't help you.

后续步骤Next steps