使用 Azure IoT 中心设备 SDK 管理连接和可靠的消息传送Manage connectivity and reliable messaging by using Azure IoT Hub device SDKs

本文提供概述性指导,可帮助设计更具弹性的设备应用程序。This article provides high-level guidance to help you design device applications that are more resilient. 它说明如何利用 Azure IoT 设备 SDK 中的连接和可靠消息传送功能。It shows you how to take advantage of the connectivity and reliable messaging features in Azure IoT device SDKs. 本指南旨在帮助管理以下方案:The goal of this guide is to help you manage the following scenarios:

  • 修复已删除的网络连接Fixing a dropped network connection
  • 在不同的网络连接之间切换Switching between different network connections
  • 由暂时性服务连接错误导致的重新连接Reconnecting because of service transient connection errors

实现细详信息可能因语言而异。Implementation details may vary by language. 有关详细信息,请参阅 API 文档或特定 SDK:For more information, see the API documentation or specific SDK:

复原设计Designing for resiliency

IoT 设备通常依赖非连续或不稳定的网络连接(例如,GSM 或卫星)。IoT devices often rely on non-continuous or unstable network connections (for example, GSM or satellite). 当设备与基于云的服务交互时,由于间歇性服务可用性和基础设施级别故障或暂时性故障,可能会发生错误。Errors can occur when devices interact with cloud-based services because of intermittent service availability and infrastructure-level or transient faults. 在设备上运行的应用程序必须管理连接和重新连接机制,以及发送/接收消息的重试逻辑。An application that runs on a device has to manage the mechanisms for connection, reconnection, and the retry logic for sending and receiving messages. 此外,重试策略要求在很大程度上取决于设备的 IoT 方案、上下文和功能。Also, the retry strategy requirements depend heavily on the device's IoT scenario, context, capabilities.

Azure IoT 中心设备 SDK 旨在简化从云到设备和从设备到云的连接及通信。The Azure IoT Hub device SDKs aim to simplify connecting and communicating from cloud-to-device and device-to-cloud. 这些 SDK 提供了一种连接到 Azure IoT 中心的可靠方式以及一组用于发送和接收消息的全面选项。These SDKs provide a robust way to connect to Azure IoT Hub and a comprehensive set of options for sending and receiving messages. 开发人员还可以修改现有实现,以针对给定方案自定义更好的重试策略。Developers can also modify existing implementation to customize a better retry strategy for a given scenario.

支持连接和可靠消息传送的相关 SDK 功能将在以下各节中介绍。The relevant SDK features that support connectivity and reliable messaging are covered in the following sections.

连接和重试Connection and retry

本部分概述了管理连接时可用的重新连接和重试模式。This section gives an overview of the reconnection and retry patterns available when managing connections. 它详细介绍了在设备应用程序中使用不同重试策略的实现指南,并列出了设备 SDK 中的相关 API。It details implementation guidance for using a different retry policy in your device application and lists relevant APIs from the device SDKs.

错误模式Error patterns

连接故障可能在许多级别中发生:Connection failures can happen at many levels:

  • 网络错误:断开连接的套接字和名称解析错误Network errors: disconnected socket and name resolution errors
  • HTTP、AMQP 和 MQTT 传输的协议级别错误:链接断开或会话过期Protocol-level errors for HTTP, AMQP, and MQTT transport: detached links or expired sessions
  • 由本地错误(如无效凭据)或服务行为(例如,超出配额或限制)导致的应用程序级别错误Application-level errors that result from either local mistakes: invalid credentials or service behavior (for example, exceeding the quota or throttling)

设备 SDK 在所有三个级别检测错误。The device SDKs detect errors at all three levels. 设备 SDK 不会检测和处理与 OS 相关的错误和硬件错误。OS-related errors and hardware errors are not detected and handled by the device SDKs. SDK 设计基于 Azure 体系结构中心的暂时性故障处理指南The SDK design is based on The Transient Fault Handling Guidance from the Azure Architecture Center.

重试模式Retry patterns

以下步骤介绍检测到连接错误时的重试过程:The following steps describe the retry process when connection errors are detected:

  1. SDK 检测网络、协议或应用程序中的错误和相关错误。The SDK detects the error and the associated error in the network, protocol, or application.
  2. SDK 使用错误筛选器来确定错误类型并决定是否需要重试。The SDK uses the error filter to determine the error type and decide if a retry is needed.
  3. 如果 SDK 确定了无法恢复的错误,则会停止连接、发送和接收等操作。If the SDK identifies an unrecoverable error, operations like connection, send, and receive are stopped. SDK 会通知用户。The SDK notifies the user. 无法恢复的错误的示例包括身份验证错误和错误终结点错误。Examples of unrecoverable errors include an authentication error and a bad endpoint error.
  4. 如果 SDK 确定了可恢复的错误,则会根据指定的重试策略进行重试,直到经过定义的超时时间。If the SDK identifies a recoverable error, it retries according to the specified retry policy until the defined timeout elapses. 请注意,SDK 默认情况下使用带抖动的指数回退重试策略。Note that the SDK uses Exponential back-off with jitter retry policy by default.
  5. 当定义的超时到期时,SDK 会停止尝试连接或发送。When the defined timeout expires, the SDK stops trying to connect or send. 它会通知用户。It notifies the user.
  6. SDK 允许用户附加回调以接收连接状态更改。The SDK allows the user to attach a callback to receive connection status changes.

SDK 提供三种重试策略:The SDKs provide three retry policies:

  • 带抖动的指数回退:此默认重试策略往往在开始时激进,并随时间推移减缓,直到达到最大延迟。Exponential back-off with jitter: This default retry policy tends to be aggressive at the start and slow down over time until it reaches a maximum delay. 该设计基于 Azure 体系结构中心的重试指南The design is based on Retry guidance from Azure Architecture Center.
  • 自定义重试:对于某些 SDK 语言,可以设计更适合你的方案的自定义重试策略,然后将其注入 RetryPolicy。Custom retry: For some SDK languages, you can design a custom retry policy that is better suited for your scenario and then inject it into the RetryPolicy. C SDK 中不提供自定义重试。Custom retry isn't available on the C SDK.
  • 不重试:可以将重试策略设置为“不重试”,这将禁用重试逻辑。No retry: You can set retry policy to "no retry," which disables the retry logic. SDK 假设连接已建立,尝试进行一次连接并发送一次消息。The SDK tries to connect once and send a message once, assuming the connection is established. 此策略通常用于有带宽或成本顾虑的方案。This policy is typically used in scenarios with bandwidth or cost concerns. 如果选择此选项,则未能发送的消息将丢失且无法恢复。If you choose this option, messages that fail to send are lost and can't be recovered.

重试策略 APIRetry policy APIs

SDKSDK SetRetryPolicy 方法SetRetryPolicy method 策略实现Policy implementations 实施指南Implementation guidance
自定义: 使用可用的 retryPolicyCustom: use available retryPolicy
C/iOS 实现C/iOS implementation
JavaJava SetRetryPolicySetRetryPolicy 默认ExponentialBackoffWithJitter 类Default: ExponentialBackoffWithJitter class
自定义: 实现 RetryPolicy 接口Custom: implement RetryPolicy interface
不重试: NoRetry 类No retry: NoRetry class
Java 实现Java implementation
.NET.NET DeviceClient.SetRetryPolicyDeviceClient.SetRetryPolicy 默认ExponentialBackoff 类Default: ExponentialBackoff class
自定义: 实现 IRetryPolicy 接口Custom: implement IRetryPolicy interface
不重试: NoRetry 类No retry: NoRetry class
C# 实现C# implementation
节点Node setRetryPolicysetRetryPolicy 默认ExponentialBackoffWithJitter 类Default: ExponentialBackoffWithJitter class
自定义: 实现 RetryPolicy 接口Custom: implement RetryPolicy interface
不重试: NoRetry 类No retry: NoRetry class
Node 实现Node implementation
PythonPython 目前不支持Not currently supported 目前不支持Not currently supported 目前不支持Not currently supported

以下代码示例说明此流程:The following code samples illustrate this flow:

.NET 实现指南.NET implementation guidance

以下代码示例说明如何定义和设置默认重试策略:The following code sample shows how to define and set the default retry policy:

// define/set default retry policy
IRetryPolicy retryPolicy = new ExponentialBackoff(int.MaxValue, TimeSpan.FromMilliseconds(100), TimeSpan.FromSeconds(10), TimeSpan.FromMilliseconds(100));

为了避免出现高 CPU 使用率,在代码立即失败的情况下会限制重试次数。To avoid high CPU usage, the retries are throttled if the code fails immediately. 例如,当没有到达目的地的网络或路线时。For example, when there's no network or route to the destination. 执行下一次重试的最短时间间隔为 1 秒。The minimum time to execute the next retry is 1 second.

如果服务响应时出现限制错误,则重试策略不同且不能通过公共 API 更改:If the service responds with a throttling error, the retry policy is different and can't be changed via public API:

// throttled retry policy
IRetryPolicy retryPolicy = new ExponentialBackoff(RetryCount, TimeSpan.FromSeconds(10), 
  TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(5)); SetRetryPolicy(retryPolicy);

重试机制在 DefaultOperationTimeoutInMilliseconds(目前设置为 4 分钟)后停止。The retry mechanism stops after DefaultOperationTimeoutInMilliseconds, which is currently set at 4 minutes.

其他语言实现指南Other languages implementation guidance

有关其他语言的代码示例,请查看以下实现文档。For code samples in other languages, review the following implementation documents. 存储库包含演示如何使用重试策略 API 的示例。The repository contains samples that demonstrate the use of retry policy APIs.

后续步骤Next steps