消息路由故障排除Troubleshooting message routing

本文提供关于如何监视和解决 IoT 中心消息路由的常见问题的指导。This article provides monitoring and troubleshooting guidance for common issues and resolution for IoT Hub message routing.

监视消息路由Monitoring message routing

建议监视与消息路由和终结点相关的 IoT 中心指标,以便对发送的消息有个大致了解。We recommend you monitor IoT Hub metrics related to message routing and endpoints to give you an overview of the messages sent. 还可以创建一个诊断设置,以便 将 IoT 中心资源日志中 路由的操作发送到 Azure Monitor 日志、事件中心或 Azure 存储以进行自定义处理。You can also create a diagnostic setting to send operations for routes in IoT Hub resource logs to Azure Monitor Logs, Event Hubs or Azure Storage for custom processing. 若要详细了解如何使用指标、资源日志和诊断设置,请参阅监视 IoT 中心To learn more about using metrics, resource logs, and diagnostic settings, see Monitor IoT Hub. 有关教程,请参阅通过 IoT 中心设置和使用指标和资源日志For a tutorial, see Set up and use metrics and resource logs with an IoT hub.

如果要维护与任何路由上的查询不匹配的消息,建议同时启用回退路由We also recommend enabling the fallback route if you want to maintain messages that don't match the query on any of the routes. 这些路由可以在内置终结点中保留至配置的保留天数。These can be retained in the built-in endpoint for the amount of retention days configured.

常见问题Top issues

以下是观察到的最常见的消息路由问题。The following are the most common issues observed with message routing. 若要开始排除故障,请单击问题,了解详细步骤。To start troubleshooting, click on the issue for detailed steps.

设备上的消息未按预期方式路由Messages from my devices are not being routed as expected

若要解决此问题,请分析以下内容。To troubleshoot this issue, analyze the following.

此终结点的路由指标The routing metrics for this endpoint

所有与路由相关的 IoT 中心指标都以“Routing”作为前缀。All the IoT Hub metrics related to routing are prefixed with Routing. 你可以组合来自多项指标的信息来确定问题的根本原因。You can combine information from multiple metrics to identify root cause for issues. 例如,使用指标“路由传递尝试数”来确定当消息与任何路由上的查询不匹配并且已禁用回退路由时传递到终结点或已删除的消息数。For example, use metric Routing Delivery Attempts to identify the number of messages that were delivered to an endpoint or dropped when they didn't match queries on any of the routes and fallback route was disabled. 检查“路由延迟”指标来观察消息传递的延迟是否稳定或增加。Check the Routing Latency metric to observe whether latency for message delivery is steady or increasing. 延迟的增加可能表示特定终结点有问题,建议检查终结点的运行状况A growing latency can indicate a problem with a specific endpoint and we recommend checking the health of the endpoint. 这些路由指标还具有维度,这些维度提供有关指标的详细信息,如终结点类型、特定终结点名称和消息未传递的原因。These routing metrics also have dimensions that provide details on the metric like the endpoint type, specific endpoint name and a reason why the message was not delivered.

任何操作问题的资源日志The resource logs for any operational issues

观察“路由”诊断日志,获取有关路由和终结点操作的详细信息,或者确定错误和相关的错误代码以进一步了解问题。Observe the Routes resource logs to get more information on the routing and endpoint operations or identify errors and relevant error code to understand the issue further. 例如,日志中的操作名称“RouteEvaluationError”指示由于消息格式的问题,无法计算路由。For example, the operation name RouteEvaluationError in the log indicates the route could not be evaluated because of an issue with the message format. 使用为特定的操作名称提供的提示来缓解此问题。Use the tips provided for the specific operation names to mitigate the issue. 将某个事件记录为一个错误时,日志还提供计算失败原因的详细信息。When an event is logged as an error, the log will also provide more information on why the evaluation failed. 例如,如果操作名称是“EndpointUnhealthy”,则错误代码 403004 指示终结点空间不足。For example, if the operation name is EndpointUnhealthy, an Error code of 403004 indicates the endpoint ran out of space.

终结点的运行状况The health of the endpoint

使用 REST API 获取终结点运行状况 获取终结点的运行状况状态Use the REST API Get Endpoint Health to get health status of the endpoints. “获取终结点运行状况”API 还提供有关消息上一次成功发送到终结点的时间、上一个已知错误、上一次发生已知错误的时间以及上一次尝试对此终结点发送的时间的信息。The Get Endpoint Health API also provides information on the last time a message was successfully sent to the endpoint, the last known error, last known error time and the last time a send attempt was made for this endpoint. 使用为特定的上一个已知错误提供的缓解措施。Use the possible mitigation provided for the specific last known error.

从内置终结点处获取消息的过程突然停止I suddenly stopped getting messages at the built-in endpoint

若要解决此问题,请分析以下内容。To troubleshoot this issue, analyze the following.

是否创建了新的路由?Was a new route created?

在创建一个路由后,数据将停止流向内置终结点,除非创建了到该终结点的路由。Once a route is created, data stops flowing to the built-in-endpoint, unless a route is created to that endpoint. 若要确保在添加新路由后消息继续流向内置终结点,应配置一个指向事件终结点的路由。To ensure messages continues to flow to the built-in-endpoint if a new route is added, configure a route to the events endpoint.

回退路由是否处于禁用状态?Was the Fallback route disabled?

回退路由将所有不满足任何现有路由上的查询条件的消息发送到与事件中心兼容的内置事件中心(消息/事件)。The fallback route sends all the messages that don't satisfy query conditions on any of the existing routes to the built-in-Event Hubs (messages/events), that is compatible with Event Hubs. 如果已启用消息路由,则可以启用此回退路由功能。If message routing is turned on, you can enable the fallback route capability. 如果没有到内置终结点的路由并且已启用回退路由,则仅与路由上的任何查询条件不匹配的消息将被发送到内置终结点。If there are no routes to the built-in-endpoint and a fallback route is enabled, only messages that don't match any query conditions on routes will be sent to the built-in-endpoint. 此外,如果已删除现有路由,必须启用回退路由才能接收内置终结点处的所有数据。Also, if all existing routes are deleted, fallback route must be enabled to receive all data at the built-in-endpoint.

可以在 Azure 门户->“消息路由”边栏选项卡中启用/禁用回退路由。You can enable/disable the fallback route in the Azure portal->Message Routing blade. 还可以将 Azure 资源管理器用于 FallbackRouteProperties 来为回退路由使用自定义终结点。You can also use Azure Resource Manager for FallbackRouteProperties to use a custom endpoint for fallback route.

上一个有个 IoT 中心路由终结点的已知错误Last known errors for IoT Hub routing endpoints

REST API 中的 Get Endpoint Health(获取终结点运行状况)提供终结点的运行状况以及上一个已知错误,以确定终结点不正常的原因。Get Endpoint Health in the REST API gives the health status of the endpoints, as well as the last known error, to identify the reason an endpoint is not healthy. 下表列出了最常见的错误。The table below lists the most common errors.

上一个已知错误Last Known Error 说明/发生时间Description/when it occurs 可能的缓解操作Possible Mitigation
暂时性Transient 出现暂时性错误,IoT 中心将重试该操作。A transient error has occurred and IoT Hub will retry the operation. 观察路由诊断日志Observe routes diagnostic logs.
InternalErrorInternalError 将消息传递到终结点时出错。An error occurred while delivering a message to an endpoint. 这是一个内部异常,但同时也应查看路由诊断日志This is an internal exception but also observe the routes diagnostic logs.
未授权Unauthorized IoT 中心无权向指定终结点发送消息。IoT Hub is not authorized to send messages to the specified endpoint. 验证该终结点的连接字符串是否为最新。Validate that the connection string is up to date for the endpoint. 如果已更改,请考虑在 IoT 中心上更新。If it has changed, consider an update on your IoT Hub. 如果终结点使用托管标识,请检查 IoT 中心主体是否对目标具有所需的权限。If the endpoint uses managed identity, check that the IoT Hub principal has the required permissions on the target.
已中止Throttled 将消息写入到终结点时,将中止 IoT 中心。IoT Hub is being throttled while writing messages into the endpoint. 查看受影响的终结点的中止限制。Review the throttle limits for the affected endpoint. 修改终结点的配置以纵向扩展(如果需要)。Modify configurations for the endpoint to scale up if needed.
超时Timeout 操作超时。Operation timeout. 请重试操作即可。Retry the operation.
未找到Not Found 目标资源不存在。Target resource does not exist. 请确保目标资源存在。Ensure that the target resource exists.
未找到容器Container Not Found 存储容器不存在。Storage container does not exist. 请确保存储容器存在。Ensure the storage container exists.
已禁用容器Container disabled 已禁用存储容器。Storage container is disabled. 请确保存储容器已启用。Ensure the storage container is enabled.
MaxMessageSizeExceededMaxMessageSizeExceeded 消息路由的消息大小限制为 256 Kb。要路由的消息大小超过了此限制。Message routing has a message size limit of 256Kb.The message size being routed exceeded this limit. 请检查是否可以通过使用更少的应用程序属性或更少的消息扩充来减小消息大小。Check if message size can be reduced by using fewer application properties or fewer message enrichments.
PartitioningAndDuplicateDetectionNotSupportedPartitioningAndDuplicateDetectionNotSupported 服务总线可能未启用重复检测。Service bus may not have duplicate detection enabled. 从服务总线禁用重复检测,或考虑使用不带重复检测的实体。Disable duplicate detection from Service Bus or consider using an entity without duplicate detection.
SessionfulEntityNotSupportedSessionfulEntityNotSupported 服务总线可能未启用会话。Service bus may not have sessions enabled. 从服务总线禁用会话,或考虑使用不带会话的实体。Disable session from Service Bus or consider using an entity without sessions.
NoMatchingSubscriptionsForMessageNoMatchingSubscriptionsForMessage 没有要编写服务总线主题消息的订阅。There is no subscription to write message on the service bus topic. 创建要将 IoT 中心消息路由到的订阅。Create a subscription for IoT Hub messages to be routed to.
EndpointExternallyDisabledEndpointExternallyDisabled 终结点未处于活动状态,因此 IoT 中心可以向其发送消息。Endpoint is not in an active state so IoT Hub can send messages to it. 启用终结点以使其恢复为活动状态。Enable the endpoint to bring it back to active state.
DeviceMaximumQueueDepthExceededDeviceMaximumQueueDepthExceeded 已达到服务总线大小限制。Service bus size limit has been reached. 请考虑从目标事件中心删除消息,以允许将新消息引入到事件中心。Consider removing messages from the target Event Hubs to allow new messages to be ingested into the Event Hubs.

路由资源日志Routes resource logs

下面是路由资源日志中记录的操作名称和错误代码。The following are the operation names and error codes logged in the routes resource logs.

操作名称Operation Names

操作名称Operation Name LevelLevel 说明Description
UndefinedRouteEvaluationUndefinedRouteEvaluation 信息Information 无法使用给定条件对消息进行评估。The message cannot be evaluated with a giving condition. 例如,如果消息中缺少路由查询条件中的属性。For example, if a property in the route query condition is absent in the message. 详细了解路由查询语法Learn more about routing query syntax.
RouteEvaluationErrorRouteEvaluationError 错误Error 由于消息格式存在问题,因此评估消息时出错。There was an error evaluating the message because of an issue with the message format. 例如,如果消息中未指定内容编码或内容类型无效,则会记录此错误。For example, this error will be logged if the content encoding not specified or Content type not valid in the message. 必须在系统属性中设置这些内容。These must be set in the system properties.
DroppedMessageDroppedMessage 错误Error 消息已被丢弃且未进行路由。Message was dropped and not routed. 造成这种问题的原因可能是消息与任何路由查询都不匹配或终结点失效并且在多次重试后仍然无法传递消息等。This could be due to reasons like message didn't match any routing query or endpoint was dead and message could not be delivered after several retries. 建议使用 REST API get endpoint health 来获取有关端点的更多详细信息。We recommend getting more details on the endpoint by using the REST API get endpoint health.
EndpointUnhealthyEndpointUnhealthy 错误Error 终结点未接受来自 IoT 中心的消息,IoT 中心正在尝试重新发送消息。Endpoint has not been accepting messages from IoT Hub and IoT Hub is trying to resend the messages. 建议通过 REST API get endpoint health 来观察上一个已知错误。We recommend observing the last known error via the REST API get endpoint health.
EndpointDeadEndpointDead 错误Error 终结点已经有一个多小时未接收来自 IoT 中心的消息了。Endpoint has not been accepting messages from IoT Hub for over an hour. 建议通过 REST API get endpoint health 来观察上一个已知错误。We recommend observing the last known error via the REST API get endpoint health.
EndpointHealthyEndpointHealthy 信息Information 终结点正常运行,并从 IoT 中心接收消息。Endpoint is healthy and receiving messages from IoT Hub. 此消息不会持续记录,仅当终结点再次恢复正常时才记录。This message is not logged continuously, but logged only when the endpoint becomes healthy again. 此消息表示 IoT 中心无法将消息发送到终结点,但终结点当前正常。This message means IoT Hub was unable to send messages to the endpoint, but the endpoint is now healthy.
OrphanedMessageOrphanedMessage 信息Information 消息与任何路由都不匹配。The message does not match to any route.
InvalidMessageInvalidMessage 错误Error 由于与终结点不兼容,消息无效。Message is invalid because of incompatibility with the endpoint. 建议检查终结点的配置。We recommend check configurations of the endpoint.

UndefinedRouteEvaluationRouteEvaluationErrorOrphanedMessage 操作限制为每个 IoT 中心每分钟记录一次。The operations UndefinedRouteEvaluation, RouteEvaluationError and OrphanedMessage are throttled and logged no more than once a minute per IoT Hub.

常见错误代码Common error codes

错误代码Error Code 说明Description
401002401002 IoT 中心未授权访问Iot Hub Unauthorized Access
413001413001 消息太大Message too large
403004403004 超过设备最大队列深度Device maximum queue depth exceeded
503008503008 已中止接收链接Receive link throttled
500000500000 泛型服务器错误Generic Server error
401401 未授权Unauthorized
503503 服务不可用Service Unavailable
500001500001 服务器错误Server Error
400103400103 内容编码或内容类型无效Invalid Content Encoding Or Content Type
404001404001 找不到设备Device Not found