监视、诊断和排查 Azure IoT 中心的连接断开问题Monitor, diagnose, and troubleshoot disconnects with Azure IoT Hub
由于存在许多可能的故障点,IoT 设备的连接问题有时很难排查。Connectivity issues for IoT devices can be difficult to troubleshoot because there are many possible points of failure. 应用程序逻辑、物理网络、协议、硬件、IoT 中心和其他云服务都可能导致问题。Application logic, physical networks, protocols, hardware, IoT Hub, and other cloud services can all cause problems. 检测和查明问题根源的能力至关重要。The ability to detect and pinpoint the source of an issue is critical. 但是,大规模的 IoT 解决方案可能有数千台设备,因此,手动检查各个设备是不切实际的。However, an IoT solution at scale could have thousands of devices, so it's not practical to check individual devices manually. 为了帮助你大规模检测、诊断和排查这些问题,请使用 IoT 中心通过 Azure Monitor 提供的监视功能。To help you detect, diagnose, and troubleshoot these issues at scale, use the monitoring capabilities IoT Hub provides through Azure Monitor. 这些功能仅限于 IoT 中心可以观察到的内容,因此我们还建议遵循监视设备和其他 Azure 服务的最佳实践。These capabilities are limited to what IoT Hub can observe, so we also recommend that you follow monitoring best practices for your devices and other Azure services.
获取警报和错误日志Get alerts and error logs
使用 Azure Monitor 可在设备连接断开时获取警报并写入日志。Use Azure Monitor to get alerts and write logs when devices disconnect.
启用日志Turn on logs
若要记录设备连接事件和错误,请为 IoT 中心连接资源日志创建诊断设置。To log device connection events and errors, create a diagnostic setting for IoT Hub connections resource logs. 建议尽快创建此设置,因为默认情况下不会收集这些日志;如果没有这些日志,在设备断开连接时,你将没有任何信息可供排查。We recommend creating this setting as early as possible, because these logs aren't collected by default, and, without them, you won't have any information to troubleshoot device disconnects with when they occur.
登录 Azure 门户。Sign in to the Azure portal.
浏览到 IoT 中心。Browse to your IoT hub.
选择“诊断设置”。Select Diagnostics settings.
选择“添加诊断设置”。Select Add diagnostic setting.
选择“连接”日志。Select Connections logs.
为便于分析,请选择“发送到 Log Analytics”。For easier analysis, select Send to Log Analytics . 请参阅解决连接错误下的示例。See the example under Resolve connectivity errors.
有关详细信息,请参阅监视 IoT 中心。To learn more, see Monitor IoT Hub.
大规模设置设备断开连接警报Set up alerts for device disconnect at scale
若要在设备断开连接时获取警报,请针对“联网设备(预览版)”指标配置警报。To get alerts when devices disconnect, configure alerts on the connected devices (preview) metric.
登录 Azure 门户。Sign in to the Azure portal.
浏览到 IoT 中心。Browse to your IoT hub.
选择“警报”。Select Alerts.
选择“新建警报规则”。Select New alert rule.
选择“添加条件”,然后选择“已连接的设备数(预览)”。Select Add condition, then select "Connected devices (preview)".
按照提示设置阈值和警报。Set up threshold and alerting by following prompts.
有关详细信息,请参阅 Microsoft Azure 中的警报是什么?。To learn more, see What are alerts in Microsoft Azure?.
检测单个设备断开连接Detecting individual device disconnects
若要检测“按设备”断开连接,例如,如果你需要在工厂刚脱机时就知晓,请使用事件网格配置设备断开连接事件。To detect per-device disconnects, such as when you need to know a factory just went offline, configure device disconnect events with Event Grid.
解决连接错误Resolve connectivity errors
为联网设备启用日志和警报后,如果出错,则会收到警报。When you turn on logs and alerts for connected devices, you get alerts when errors occur. 本部分介绍如何在收到警报时解决常见问题。This section describes how to look for common issues when you receive an alert. 以下步骤假定你已创建可将 IoT 中心连接日志发送到 Log Analytics 工作区的诊断设置。The steps below assume you've already created a diagnostic setting to send IoT Hub connections logs to a Log Analytics workspace.
登录 Azure 门户。Sign in to the Azure portal.
浏览到 IoT 中心。Browse to your IoT hub.
选择“日志”。Select Logs.
若要隔离 IoT 中心的连接错误日志,请输入以下查询,然后选择“运行”:To isolate connectivity error logs for IoT Hub, enter the following query and then select Run:
AzureDiagnostics | where ( ResourceType == "IOTHUBS" and Category == "Connections" and Level == "Error")
如果返回了结果,请查看
OperationName
、ResultType
(错误代码)和ResultDescription
(错误消息),以获取有关错误的更多详细信息。If there are results, look forOperationName
,ResultType
(error code), andResultDescription
(error message) to get more detail on the error.对于最常见的错误,请遵循问题解决指南:Follow the problem resolution guides for the most common errors:
我尝试了这些步骤,但没有奏效I tried the steps, but they didn't work
如果前面的步骤没有帮助,可尝试以下操作:If the previous steps didn't help, try:
如果你有权以物理方式或远程访问(例如通过 SSH)有问题的设备,请遵循设备端故障排除指南继续进行故障排除。If you have access to the problematic devices, either physically or remotely (like SSH), follow the device-side troubleshooting guide to continue troubleshooting.
在 Azure 门户 > IoT 中心 > IoT 设备中验证你的设备是否 已启用。Verify that your devices are Enabled in the Azure portal > your IoT Hub > IoT devices.
如果设备使用 MQTT 协议,请确认端口 8883 已打开。If your device uses MQTT protocol, verify that port 8883 is open. 有关详细信息,请参阅连接到 IoT Hub (MQTT)。For more information, see Connecting to IoT Hub (MQTT).
获取有关适用于 Azure IoT 中心的 Microsoft 常见问题解答页面、堆栈溢出或 Azure 支持的帮助。Get help from Microsoft Q&A question page for Azure IoT Hub, Stack Overflow, or Azure support.
如果本指南未能提供所需的帮助,请在下面的反馈部分中留言,以帮助我们改进文档。To help improve the documentation for everyone, leave a comment in the feedback section below if this guide didn't help you.
后续步骤Next steps
- 要了解有关解决暂时性问题的详细信息,请参阅暂时性故障处理。To learn more about resolving transient issues, see Transient fault handling.
- 要了解有关 Azure IoT SDK 和管理重试的详细信息,请参阅如何使用 Azure IoT Hub 设备 SDK 管理连接和可靠消息传递。To learn more about Azure IoT SDK and managing retries, see How to manage connectivity and reliable messaging using Azure IoT Hub device SDKs.