Azure IoT Edge 的常见问题和解决方法Common issues and resolutions for Azure IoT Edge

还列出了在部署 IoT Edge 解决方案时可能遇到的常见问题的步骤。Use this article to find steps to resolve common issues that you may experience when deploying IoT Edge solutions. 如需了解如何从 IoT Edge 设备查找日志和错误,请参阅对 IoT Edge 设备进行故障排除If you need to learn how to find logs and errors from your IoT Edge device, see Troubleshoot your IoT Edge device.

IoT Edge 代理在大约一分钟后停止IoT Edge agent stops after about a minute

观察到的行为:Observed behavior:

edgeAgent 模块将启动并成功运行大约一分钟,然后停止。The edgeAgent module starts and runs successfully for about a minute, then stops. 日志表明,IoT Edge 代理尝试通过 AMQP 连接到 IoT 中心,并且尝试使用 AMQP 通过 WebSocket 进行连接。The logs indicate that the IoT Edge agent attempts to connect to IoT Hub over AMQP, and then attempts to connect using AMQP over WebSocket. 该操作失败时,IoT Edge 代理将会退出。When that fails, the IoT Edge agent exits.

示例 edgeAgent 日志:Example edgeAgent logs:

2017-11-28 18:46:19 [INF] - Starting module management agent.
2017-11-28 18:46:19 [INF] - Version - 1.0.7516610 (03c94f85d0833a861a43c669842f0817924911d5)
2017-11-28 18:46:19 [INF] - Edge agent attempting to connect to IoT Hub via AMQP...
2017-11-28 18:46:49 [INF] - Edge agent attempting to connect to IoT Hub via AMQP over WebSocket...

根本原因:Root cause:

主机网络上的某个网络配置阻止 IoT Edge 代理到达该网络。A networking configuration on the host network is preventing the IoT Edge agent from reaching the network. 代理首先会尝试通过 AMQP(端口 5671)进行连接。The agent attempts to connect over AMQP (port 5671) first. 如果连接失败,它将尝试 WebSocket(端口 443)。If the connection fails, it tries WebSockets (port 443).

IoT Edge 运行时会为每个模块设置要在其中进行通信的网络。The IoT Edge runtime sets up a network for each of the modules to communicate on. 在 Linux 上,此网络是一个桥网络。On Linux, this network is a bridge network. 在 Windows 上,它使用 NAT。On Windows, it uses NAT. 此问题在其中的 Windows 容器使用 NAT 网络的 Windows 设备上更为常见。This issue is more common on Windows devices using Windows containers that use the NAT network.

解决方法:Resolution:

确保分配给此桥/NAT 网络的 IP 地址具有通向 Internet 的路由。Ensure that there is a route to the internet for the IP addresses assigned to this bridge/NAT network. 有时候,主机上的 VPN 配置会替代 IoT Edge 网络。Sometimes a VPN configuration on the host overrides the IoT Edge network.

IoT Edge 代理无法访问某个模块的映像 (403)IoT Edge agent can't access a module's image (403)

观察到的行为:Observed behavior:

某个容器未能运行,并且 edgeAgent 日志显示了 403 错误。A container fails to run, and the edgeAgent logs show a 403 error.

根本原因:Root cause:

IoT Edge 代理无权访问某个模块的映像。The IoT Edge agent doesn't have permissions to access a module's image.

解决方法:Resolution:

确保在部署清单中正确指定了注册表凭据。Make sure that your registry credentials are correctly specified in your deployment manifest.

Edge 代理模块报告“配置文件为空”,且设备上不会启动任何模块Edge Agent module reports 'empty config file' and no modules start on the device

观察到的行为:Observed behavior:

设备在启动部署中定义的模块时出现问题。The device has trouble starting modules defined in the deployment. 只有 edgeAgent 在运行,但它持续报告“配置文件为空...”。Only the edgeAgent is running but continually reporting 'empty config file...'.

根本原因:Root cause:

默认情况下,IoT Edge 在模块自身的隔离容器网络中启动模块。By default, IoT Edge starts modules in their own isolated container network. 在此专用网络中,设备可能会遇到 DNS 名称解析方面的问题。The device may be having trouble with DNS name resolution within this private network.

解决方法:Resolution:

选项 1:在容器引擎设置中设置 DNS 服务器Option 1: Set DNS server in container engine settings

在容器引擎设置中为环境指定 DNS 服务器,该设置将应用于引擎启动的所有容器模块。Specify the DNS server for your environment in the container engine settings, which will apply to all container modules started by the engine. 创建名为 daemon.json 的文件,并在其中指定要使用的 DNS 服务器。Create a file named daemon.json specifying the DNS server to use. 例如:For example:

{
    "dns": ["1.1.1.1"]
}

上面的示例将 DNS 服务器设置为可公开访问的 DNS 服务。The above example sets the DNS server to a publicly accessible DNS service. 如果边缘设备无法从其所在环境访问此 IP,请将其替换为可访问的 DNS 服务器地址。If the edge device can't access this IP from its environment, replace it with DNS server address that is accessible.

daemon.json 放入平台上的适当位置:Place daemon.json in the right location for your platform:

平台Platform 位置Location
LinuxLinux /etc/docker
包含 Windows 容器的 Windows 主机Windows host with Windows containers C:\ProgramData\iotedge-moby\config

如果该位置已包含 daemon.json 文件,请在其中添加 dns 密钥,然后保存该文件。If the location already contains daemon.json file, add the dns key to it and save the file.

重启容器引擎以使更新生效。Restart the container engine for the updates to take effect.

平台Platform 命令Command
LinuxLinux sudo systemctl restart docker
Windows (Admin PowerShell)Windows (Admin PowerShell) Restart-Service iotedge-moby -Force

选项 2:在每个模块的 IoT Edge 部署中设置 DNS 服务器Option 2: Set DNS server in IoT Edge deployment per module

可以针对 IoT Edge 部署中每个模块的 createOptions 设置 DNS 服务器。You can set DNS server for each module's createOptions in the IoT Edge deployment. 例如:For example:

"createOptions": {
  "HostConfig": {
    "Dns": [
      "x.x.x.x"
    ]
  }
}

请确保也为 edgeAgent 和 edgeHub 模块设置此配置。Be sure to set this configuration for the edgeAgent and edgeHub modules as well.

IoT Edge 中心未能启动IoT Edge hub fails to start

观察到的行为:Observed behavior:

edgeHub 模块未能启动。The edgeHub module fails to start. 你可能会在日志中看到一条类似于以下错误的消息:You may see a message like one of the following errors in the logs:

One or more errors occurred.
(Docker API responded with status code=InternalServerError, response=
{\"message\":\"driver failed programming external connectivity on endpoint edgeHub (6a82e5e994bab5187939049684fb64efe07606d2bb8a4cc5655b2a9bad5f8c80):
Error starting userland proxy: Bind for 0.0.0.0:443 failed: port is already allocated\"}\n)

Or

info: edgelet_docker::runtime -- Starting module edgeHub...
warn: edgelet_utils::logging -- Could not start module edgeHub
warn: edgelet_utils::logging --     caused by: failed to create endpoint edgeHub on network nat: hnsCall failed in Win32:  
        The process cannot access the file because it is being used by another process. (0x20)

根本原因:Root cause:

主机上的其他某个进程绑定了 edgeHub 模块尝试绑定的端口。Some other process on the host machine has bound a port that the edgeHub module is trying to bind. 用于网关方案的 IoT Edge 中心映射端口 443、5671 和 8883。The IoT Edge hub maps ports 443, 5671, and 8883 for use in gateway scenarios. 如果另一个进程已绑定了其中某个端口,则无法启动该模块。The module fails to start if another process has already bound one of those ports.

解决方法:Resolution:

可通过两种方式解决此问题:You can resolve this issue two ways:

如果 IoT Edge 设备充当网关设备,则你需要查找并停止正在使用端口 443、5671 或 8883 的进程。If the IoT Edge device is functioning as a gateway device, then you need to find and stop the process that is using port 443, 5671, or 8883. 端口 443 的错误通常表示另一个进程是 Web 服务器。An error for port 443 usually means that the other process is a web server.

如果不需要将 IoT Edge 设备用作网关,则可以从 edgeHub 的模块创建选项中删除端口绑定。If you don't need to use the IoT Edge device as a gateway, then you can remove the port bindings from edgeHub's module create options. 可以在 Azure 门户中更改创建选项,也可直接在 deployment 文件中进行更改。You can change the create options in the Azure portal or directly in the deployment.json file.

在 Azure 门户中:In the Azure portal:

  1. 导航到 IoT 中心并选择“IoT Edge”。Navigate to your IoT hub and select IoT Edge.

  2. 选择要更新的 IoT Edge 设备。Select the IoT Edge device that you want to update.

  3. 选择“设置模块”。Select Set Modules.

  4. 选择“运行时设置”。Select Runtime Settings.

  5. 在“Edge Hub”模块设置中,从“创建选项”文本框中删除所有内容。In the Edge Hub module settings, delete everything from the Create Options text box.

  6. 保存更改并创建部署。Save your changes and create the deployment.

在 deployment.json 文件中:In the deployment.json file:

  1. 打开应用到 IoT Edge 设备的 deployment.json 文件。Open the deployment.json file that you applied to your IoT Edge device.

  2. 在 edgeAgent 所需属性部分找到 edgeHub 设置:Find the edgeHub settings in the edgeAgent desired properties section:

    "edgeHub": {
        "settings": {
            "image": "mcr.microsoft.com/azureiotedge-hub:1.0",
            "createOptions": "{\"HostConfig\":{\"PortBindings\":{\"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}]}}}"
        },
        "type": "docker",
        "status": "running",
        "restartPolicy": "always"
    }
    
  3. 删除 createOptions 行,并删除其前面的 image 行的尾随逗号:Remove the createOptions line, and the trailing comma at the end of the image line before it:

    "edgeHub": {
        "settings": {
            "image": "mcr.microsoft.com/azureiotedge-hub:1.0"
        },
        "type": "docker",
        "status": "running",
        "restartPolicy": "always"
    }
    
  4. 保存该文件,然后再次将其应用到 IoT Edge 设备。Save the file and apply it to your IoT Edge device again.

由于主机名无效,IoT Edge 安全守护程序失败IoT Edge security daemon fails with an invalid hostname

观察到的行为:Observed behavior:

尝试检查 IoT Edge 安全管理器日志失败,并输出以下消息:Attempting to check the IoT Edge security manager logs fails and prints the following message:

Error parsing user input data: invalid hostname. Hostname cannot be empty or greater than 64 characters

根本原因:Root cause:

IoT Edge 运行时只支持短于 64 个字符的主机名。The IoT Edge runtime can only support hostnames that are shorter than 64 characters. 物理计算机通常不具有长主机名,但此问题在虚拟机上更常见。Physical machines usually don't have long hostnames, but the issue is more common on a virtual machine. 特别是为 Azure 中托管的 Windows 虚拟机自动生成的主机名,往往会很长。The automatically generated hostnames for Windows virtual machines hosted in Azure, in particular, tend to be long.

解决方法:Resolution:

看到此错误时,可以配置虚拟机的 DNS 名称,然后在设置命令中将 DNS 名称设置为主机名。When you see this error, you can resolve it by configuring the DNS name of your virtual machine, and then setting the DNS name as the hostname in the setup command.

  1. 在 Azure 门户中,导航到虚拟机的概述页面。In the Azure portal, navigate to the overview page of your virtual machine.

  2. 选择 DNS 名称下的“配置”。Select configure under DNS name. 如果你的虚拟机已配置 DNS 名称,则不需要再配置。If your virtual machine already has a DNS name configured, you don't need to configure a new one.

    配置虚拟机的 DNS 名称

  3. 为“DNS 名称标签”提供一个值,然后选择“保存” 。Provide a value for DNS name label and select Save.

  4. 复制新的 DNS 名称,名称格式应为 <DNSnamelabel>.<vmlocation>.cloudapp.azure.com。Copy the new DNS name, which should be in the format <DNSnamelabel>.<vmlocation>.cloudapp.azure.com.

  5. 在虚拟机中使用下列命令,以 DNS 名称设置 IoT Edge 运行时:Inside the virtual machine, use the following command to set up the IoT Edge runtime with your DNS name:

    • 在 Linux 上:On Linux:

      sudo nano /etc/iotedge/config.yaml
      
    • 在 Windows 上:On Windows:

      notepad C:\ProgramData\iotedge\config.yaml
      

无法在 Windows 上获取 IoT Edge 守护程序日志Can't get the IoT Edge daemon logs on Windows

观察到的行为:Observed behavior:

在 Windows 上使用 Get-WinEvent 时,会收到 EventLogException。You get an EventLogException when using Get-WinEvent on Windows.

根本原因:Root cause:

Get-WinEvent PowerShell 命令依赖于存在的注册表项来按特定 ProviderName 查找日志。The Get-WinEvent PowerShell command relies on a registry entry to be present to find logs by a specific ProviderName.

解决方法:Resolution:

设置 IoT Edge 守护程序的注册表项。Set a registry entry for the IoT Edge daemon. 创建包含以下内容的 iotedge.reg 文件,再双击该文件或使用 reg import iotedge.reg 命令将其导入到 Windows 注册表中:Create a iotedge.reg file with the following content, and import in to the Windows Registry by double-clicking it or using the reg import iotedge.reg command:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\EventLog\Application\iotedged]
"CustomSource"=dword:00000001
"EventMessageFile"="C:\\ProgramData\\iotedge\\iotedged.exe"
"TypesSupported"=dword:00000007

小型设备上的稳定性问题Stability issues on smaller devices

观察到的行为:Observed behavior:

你可能会在 Raspberry Pi 等资源受限设备上遇到稳定性问题,尤其是在这些设备用作网关时。You may experience stability problems on resource constrained devices like the Raspberry Pi, especially when used as a gateway. 症状包括 IoT Edge 中心模块出现“内存不足”异常、下游设备无法连接或者设备在几小时后无法发送遥测消息。Symptoms include out of memory exceptions in the IoT Edge hub module, downstream devices failing to connect, or the device failing to send telemetry messages after a few hours.

根本原因:Root cause:

IoT Edge 中心是 IoT Edge 运行时的一部分,默认情况下已针对性能进行了优化,并尝试分配大块内存。The IoT Edge hub, which is part of the IoT Edge runtime, is optimized for performance by default and attempts to allocate large chunks of memory. 这种优化对于受限 Edge 设备并不理想,并可能会导致稳定性问题。This optimization is not ideal for constrained edge devices and can cause stability problems.

解决方法:Resolution:

对于 IoT Edge 中心,请将环境变量 OptimizeForPerformance 设置为 falseFor the IoT Edge hub, set an environment variable OptimizeForPerformance to false. 可以通过两种方式来设置环境变量:There are two ways to set environment variables:

在 Azure 门户中:In the Azure portal:

在 IoT 中心,选择 IoT Edge 设备,然后从设备详细信息页中依次选择“设置模块” > “运行时设置” 。In your IoT Hub, select your IoT Edge device and from the device details page and select Set Modules > Runtime Settings. 为 IoT Edge 中心模块创建名为“OptimizeForPerformance”、设置为“false”的环境变量。Create an environment variable for the IoT Edge hub module called OptimizeForPerformance that is set to false.

设为 false 的 OptimizeForPerformance

在部署清单中:In the deployment manifest:

"edgeHub": {
  "type": "docker",
  "settings": {
    "image": "mcr.microsoft.com/azureiotedge-hub:1.0",
    "createOptions": <snipped>
  },
  "env": {
    "OptimizeForPerformance": {
      "value": "false"
    }
  },

IoT Edge 模块未能将消息发送到 edgeHub 并出现 404 错误IoT Edge module fails to send a message to edgeHub with 404 error

观察到的行为:Observed behavior:

自定义 IoT Edge 模块未能将消息发送到 IoT Edge 中心并出现 404 Module not found 错误。A custom IoT Edge module fails to send a message to the IoT Edge hub with a 404 Module not found error. IoT Edge 守护程序在日志中输出以下消息:The IoT Edge daemon prints the following message to the logs:

Error: Time:Thu Jun  4 19:44:58 2018 File:/usr/sdk/src/c/provisioning_client/adapters/hsm_client_http_edge.c Func:on_edge_hsm_http_recv Line:364 executing HTTP request fails, status=404, response_buffer={"message":"Module not found"}u, 04 )

根本原因:Root cause:

出于安全考虑,IoT Edge 守护程序会强制对连接到 edgeHub 的所有模块执行进程识别。The IoT Edge daemon enforces process identification for all modules connecting to the edgeHub for security reasons. 它会验证某个模块发送的所有消息是否来自该模块的主进程 ID。It verifies that all messages being sent by a module come from the main process ID of the module. 如果发送消息的模块的进程 ID 不同于最初建立的进程 ID,则守护程序会拒绝该消息并返回 404 错误消息。If a message is being sent by a module from a different process ID than initially established, it will reject the message with a 404 error message.

解决方法:Resolution:

从版本 1.0.7 开始,所有模块进程都有权进行连接。As of version 1.0.7, all module processes are authorized to connect. 有关详细信息,请参阅 1.0.7 版本更改日志For more information, see the 1.0.7 release changelog.

如果无法升级到 1.0.7,请完成以下步骤。If upgrading to 1.0.7 isn't possible, complete the following steps. 确保自定义 IoT Edge 模块始终使用相同的进程 ID 向 Edge 中心发送消息。Make sure that the same process ID is always used by the custom IoT Edge module to send messages to the edgeHub. 例如,请确保在 Docker 文件中使用 ENTRYPOINT,而不使用 CMD 命令。For instance, make sure to ENTRYPOINT instead of CMD command in your Docker file. CMD 命令会导致为模块生成一个进程 ID,并为运行主程序的 bash 命令生成另一个进程 ID,但是 ENTRYPOINT 只会生成单个进程 ID。The CMD command leads to one process ID for the module and another process ID for the bash command running the main program, but ENTRYPOINT leads to a single process ID.

IoT Edge 模块部署成功后,会从设备中消失IoT Edge module deploys successfully then disappears from device

观察到的行为:Observed behavior:

为 IoT Edge 设备设置模块后即表示已成功部署模块,但几分钟后,它们将从设备以及 Azure 门户中的设备详细信息中消失。After setting modules for an IoT Edge device, the modules are deployed successfully but after a few minutes they disappear from the device and from the device details in the Azure portal. 除了定义的模块外,其他模块也可能会出现在设备上。Other modules than the ones defined might also appear on the device.

根本原因:Root cause:

如果针对某个设备进行自动部署,则自动部署的优先级高于为单个设备手动设置模块。If an automatic deployment targets a device, it takes priority over manually setting the modules for a single device. Azure 门户中的“设置模块”功能或 Visual Studio Code 中的“为单个设备创建部署”功能将暂时生效 。The Set modules functionality in Azure portal or Create deployment for single device functionality in Visual Studio Code will take effect for a moment. 你会看到定义的模块在设备上启动。You see the modules that you defined start on the device. 然后,自动部署的优先级开始生效,并覆盖该设备的所需属性。Then the automatic deployment's priority kicks in and overwrites the device's desired properties.

解决方法:Resolution:

每个设备仅使用一种类型的部署机制,即自动部署或单设备部署。Only use one type of deployment mechanism per device, either an automatic deployment or individual device deployments. 如果你有针对某个设备的多个自动部署,则可以更改优先级或目标说明,以确保正确的部署应用于给定的设备。If you have multiple automatic deployments targeting a device, you can change priority or target descriptions to make sure the correct one applies to a given device. 还可以更新设备孪生,使其不再与自动部署的目标描述匹配。You can also update the device twin to no longer match the target description of the automatic deployment.

有关详细信息,请参阅了解单个设备或大规模的 IoT Edge 自动部署For more information, see Understand IoT Edge automatic deployments for single devices or at scale.

后续步骤Next steps

认为在 IoT Edge 平台中发现了 bug?Do you think that you found a bug in the IoT Edge platform? 提交问题,以便我们可以持续改进。Submit an issue so that we can continue to improve.

如果你还有其他问题,请创建支持请求以获取帮助。If you have more questions, create a Support request for help.