Azure 服务总线故障排除指南Troubleshooting guide for Azure Service Bus

本文提供了服务总线 .NET Framework API 生成的部分 .NET 异常,以及其他故障排除技巧。This article provides some of the .NET exceptions generated by Service Bus .NET Framework APIs and also other tips for troubleshooting issues.

服务总线消息传送异常Service Bus messaging exceptions

本部分列出了 .NET Framework API 生成的 .NET 异常。This section lists the .NET exceptions generated by .NET Framework APIs.

异常类别Exception categories

消息传送 API 会生成以下类别的异常,以及在尝试修复这些异常时可以采取的相关操作。The messaging APIs generate exceptions that can fall into the following categories, along with the associated action you can take to try to fix them. 异常的含义和原因会因消息传送实体的类型而异:The meaning and causes of an exception can vary depending on the type of messaging entity:

  1. 用户代码错误(System.ArgumentExceptionSystem.InvalidOperationExceptionSystem.OperationCanceledExceptionSystem.Runtime.Serialization.SerializationException)。User coding error (System.ArgumentException, System.InvalidOperationException, System.OperationCanceledException, System.Runtime.Serialization.SerializationException). 常规操作:继续之前尝试修复代码。General action: try to fix the code before proceeding.
  2. 设置/配置错误(Microsoft.ServiceBus.Messaging.MessagingEntityNotFoundExceptionSystem.UnauthorizedAccessException)。Setup/configuration error (Microsoft.ServiceBus.Messaging.MessagingEntityNotFoundException, System.UnauthorizedAccessException. 常规操作:检查配置,必要时进行更改。General action: review your configuration and change if necessary.
  3. 暂时性异常(Microsoft.ServiceBus.Messaging.MessagingExceptionMicrosoft.ServiceBus.Messaging.ServerBusyExceptionMicrosoft.ServiceBus.Messaging.MessagingCommunicationException)。Transient exceptions (Microsoft.ServiceBus.Messaging.MessagingException, Microsoft.ServiceBus.Messaging.ServerBusyException, Microsoft.ServiceBus.Messaging.MessagingCommunicationException). 常规操作:重试操作或通知用户。General action: retry the operation or notify users.
  4. 其他异常(System.Transactions.TransactionExceptionSystem.TimeoutExceptionMicrosoft.ServiceBus.Messaging.MessageLockLostExceptionMicrosoft.ServiceBus.Messaging.SessionLockLostException)。Other exceptions (System.Transactions.TransactionException, System.TimeoutException, Microsoft.ServiceBus.Messaging.MessageLockLostException, Microsoft.ServiceBus.Messaging.SessionLockLostException). 常规操作:特定于异常类型;请参考以下部分中的表。General action: specific to the exception type; please refer to the table in the following section.

异常类型Exception types

下表列出了消息异常的类型及其原因,并说明可以采取的建议性操作。The following table lists messaging exception types, and their causes, and notes suggested action you can take.

异常类型Exception Type 说明/原因/示例Description/Cause/Examples 建议的操作Suggested Action 自动/立即重试注意事项Note on automatic/immediate retry
TimeoutExceptionTimeoutException 服务器在 OperationTimeout 控制的指定时间内未响应请求的操作。The server did not respond to the requested operation within the specified time, which is controlled by OperationTimeout. 服务器可能已完成请求的操作。The server may have completed the requested operation. 这可能是由于网络或其他基础结构延迟造成的。This can happen due to network or other infrastructure delays. 检查系统状态的一致性,并根据需要重试。Check the system state for consistency and retry if necessary. 请参阅超时异常See Timeout exceptions. 在某些情况下,重试可能会有帮助;在代码中添加重试逻辑。Retry might help in some cases; add retry logic to code.
InvalidOperationExceptionInvalidOperationException 不允许在服务器或服务中执行请求的用户操作。The requested user operation is not allowed within the server or service. 有关详细信息,请查看异常消息。See the exception message for details. 例如,如果在 ReceiveAndDelete 模式下收到消息,则 Complete() 将生成此异常。For example, Complete() generates this exception if the message was received in ReceiveAndDelete mode. 检查代码和文档。Check the code and the documentation. 确保请求的操作有效。Make sure the requested operation is valid. 重试没有帮助。Retry does not help.
OperationCanceledExceptionOperationCanceledException 尝试对已关闭、中止或释放的对象调用某个操作。An attempt is made to invoke an operation on an object that has already been closed, aborted, or disposed. 在极少数情况下,环境事务已释放。In rare cases, the ambient transaction is already disposed. 检查代码并确保代码不会对已释放的对象调用操作。Check the code and make sure it does not invoke operations on a disposed object. 重试没有帮助。Retry does not help.
UnauthorizedAccessExceptionUnauthorizedAccessException TokenProvider 对象无法获取令牌、该令牌无效,或者令牌不包含执行操作所需的声明。The TokenProvider object could not acquire a token, the token is invalid, or the token does not contain the claims required to perform the operation. 确保使用正确的值创建令牌提供程序。Make sure the token provider is created with the correct values. 检查访问控制服务的配置。Check the configuration of the Access Control service. 在某些情况下,重试可能会有帮助;在代码中添加重试逻辑。Retry might help in some cases; add retry logic to code.
ArgumentExceptionArgumentException
ArgumentNullExceptionArgumentNullException
ArgumentOutOfRangeExceptionArgumentOutOfRangeException
提供给该方法的一个或多个参数均无效。One or more arguments supplied to the method are invalid.
提供给 NamespaceManagerCreate 的 URI 包含路径段。The URI supplied to NamespaceManager or Create contains path segment(s).
提供给 NamespaceManagerCreate 的 URI 方案无效。The URI scheme supplied to NamespaceManager or Create is invalid.
属性值大于 32 KB。The property value is larger than 32 KB.
检查调用代码并确保参数正确。Check the calling code and make sure the arguments are correct. 重试没有帮助。Retry does not help.
MessagingEntityNotFoundExceptionMessagingEntityNotFoundException 与操作关联的实体不存在或已被删除。Entity associated with the operation does not exist or it has been deleted. 确保该实体存在。Make sure the entity exists. 重试没有帮助。Retry does not help.
MessageNotFoundExceptionMessageNotFoundException 尝试接收具有特定序列号的消息。Attempt to receive a message with a particular sequence number. 找不到此消息。This message is not found. 确保该消息尚未接收。Make sure the message has not been received already. 检查死信队列,以确定该消息是否被视为死信。Check the deadletter queue to see if the message has been deadlettered. 重试没有帮助。Retry does not help.
MessagingCommunicationExceptionMessagingCommunicationException 客户端无法与服务总线建立连接。Client is not able to establish a connection to Service Bus. 确保提供的主机名正确并且主机可访问。Make sure the supplied host name is correct and the host is reachable. 如果存在间歇性的连接问题,重试可能会有帮助。Retry might help if there are intermittent connectivity issues.
ServerBusyExceptionServerBusyException 服务目前无法处理请求。Service is not able to process the request at this time. 客户端可以等待一段时间,并重试操作。Client can wait for a period of time, then retry the operation. 客户端可在特定的时间间隔后重试操作。Client may retry after certain interval. 如果重试导致其他异常,请检查该异常的重试行为。If a retry results in a different exception, check retry behavior of that exception.
MessageLockLostExceptionMessageLockLostException 与消息关联的锁令牌已过期,或者找不到锁令牌。Lock token associated with the message has expired, or the lock token is not found. 释放消息。Dispose the message. 重试没有帮助。Retry does not help.
SessionLockLostExceptionSessionLockLostException 与此会话关联的锁已丢失。Lock associated with this session is lost. 中止 MessageSession 对象。Abort the MessageSession object. 重试没有帮助。Retry does not help.
MessagingExceptionMessagingException 在以下情况下,可能会引发一般消息异常:Generic messaging exception that may be thrown in the following cases:
尝试使用属于其他实体类型(例如主题)的名称或路径创建 QueueClientAn attempt is made to create a QueueClient using a name or path that belongs to a different entity type (for example, a topic).
尝试发送大于 256 KB 的消息。An attempt is made to send a message larger than 256 KB. 服务器或服务在处理请求期间遇到错误。The server or service encountered an error during processing of the request. 有关详细信息,请查看异常消息。See the exception message for details. 这通常是暂时性异常。This is usually a transient exception.
检查代码,并确保只对消息正文使用可序列化对象(或使用自定义序列化程序)。Check the code and ensure that only serializable objects are used for the message body (or use a custom serializer). 在文档中查看属性支持的值类型,并只使用支持的类型。Check the documentation for the supported value types of the properties and only use supported types. 检查 IsTransient 属性。Check the IsTransient property. 如果为 true,可以重试操作。If it is true, you can retry the operation. 重试行为的效果不确定,可能不会解决问题。Retry behavior is undefined and might not help.
MessagingEntityAlreadyExistsExceptionMessagingEntityAlreadyExistsException 尝试使用已被该服务命名空间中另一实体使用的名称创建实体。Attempt to create an entity with a name that is already used by another entity in that service namespace. 删除现有的实体,或者选择不同的名称来创建实体。Delete the existing entity or choose a different name for the entity to be created. 重试没有帮助。Retry does not help.
QuotaExceededExceptionQuotaExceededException 消息实体已达到其允许的最大大小,或已超出到命名空间的最大连接数。The messaging entity has reached its maximum allowable size, or the maximum number of connections to a namespace has been exceeded. 通过从实体或其子队列接收消息在该实体中创建空间。Create space in the entity by receiving messages from the entity or its subqueues. 请参阅QuotaExceededExceptionSee QuotaExceededException. 如果同时已删除消息,则重试可能会有帮助。Retry might help if messages have been removed in the meantime.
RuleActionExceptionRuleActionException 如果尝试创建无效的规则操作,服务总线将返回此异常。Service Bus returns this exception if you attempt to create an invalid rule action. 如果在处理该消息的规则操作时出错,服务总线会将此异常附加到死信消息。Service Bus attaches this exception to a deadlettered message if an error occurs while processing the rule action for that message. 检查规则操作是否正确。Check the rule action for correctness. 重试没有帮助。Retry does not help.
FilterExceptionFilterException 如果尝试创建无效的筛选器,服务总线将返回此异常。Service Bus returns this exception if you attempt to create an invalid filter. 如果在处理该消息的筛选器时出错,服务总线会将此异常附加到死信消息。Service Bus attaches this exception to a deadlettered message if an error occurred while processing the filter for that message. 检查筛选器是否正确。Check the filter for correctness. 重试没有帮助。Retry does not help.
SessionCannotBeLockedExceptionSessionCannotBeLockedException 尝试接受具有特定会话 ID 的会话,但该会话当前已被另一客户端锁定。Attempt to accept a session with a specific session ID, but the session is currently locked by another client. 确保该会话未由其他客户端锁定。Make sure the session is unlocked by other clients. 如果在此期间会话已释放,则重试可能会有帮助。Retry might help if the session has been released in the interim.
TransactionSizeExceededExceptionTransactionSizeExceededException 事务包含过多的操作。Too many operations are part of the transaction. 减少此事务中操作的数目。Reduce the number of operations that are part of this transaction. 重试没有帮助。Retry does not help.
MessagingEntityDisabledExceptionMessagingEntityDisabledException 对已禁用的实体请求运行时操作。Request for a runtime operation on a disabled entity. 激活实体。Activate the entity. 如果在此期间该实体已激活,则重试可能会有帮助。Retry might help if the entity has been activated in the interim.
NoMatchingSubscriptionExceptionNoMatchingSubscriptionException 如果向已启用预筛选的主题发送消息并且所有筛选器都不匹配,则服务总线返回此异常。Service Bus returns this exception if you send a message to a topic that has pre-filtering enabled and none of the filters match. 确保至少有一个筛选器匹配。Make sure at least one filter matches. 重试没有帮助。Retry does not help.
MessageSizeExceededExceptionMessageSizeExceededException 消息有效负载超出 256 KB 限制。A message payload exceeds the 256-KB limit. 256-KB 限制是指总消息大小,可能包括系统属性和任何 .NET 开销。The 256-KB limit is the total message size, which can include system properties and any .NET overhead. 减少消息负载的大小,并重试操作。Reduce the size of the message payload, then retry the operation. 重试没有帮助。Retry does not help.
TransactionExceptionTransactionException 环境事务 (Transaction.Current) 无效。The ambient transaction (Transaction.Current) is invalid. 该事务可能已完成或已中止。It may have been completed or aborted. 内部异常可能提供了更多信息。Inner exception may provide additional information. 重试没有帮助。Retry does not help.
TransactionInDoubtExceptionTransactionInDoubtException 已对未决事务尝试进行操作,或尝试提交该事务并且事务进入不确定状态。An operation is attempted on a transaction that is in doubt, or an attempt is made to commit the transaction and the transaction becomes in doubt. 应用程序必须处理此异常(作为特例),因为此事务可能已提交。Your application must handle this exception (as a special case), as the transaction may have already been committed. -

QuotaExceededExceptionQuotaExceededException

QuotaExceededException 指示已超过某个特定实体的配额。QuotaExceededException indicates that a quota for a specific entity has been exceeded.

队列和主题Queues and topics

对队列和主题而言,这通常指队列的大小。For queues and topics, this is often the size of the queue. 错误消息属性会包含更多详细信息,如以下示例所示:The error message property contains further details, as in the following example:

Microsoft.ServiceBus.Messaging.QuotaExceededException
Message: The maximum entity size has been reached or exceeded for Topic: 'xxx-xxx-xxx'. 
    Size of entity in bytes:1073742326, Max entity size in bytes:
1073741824..TrackingId:xxxxxxxxxxxxxxxxxxxxxxxxxx, TimeStamp:3/15/2013 7:50:18 AM

该消息指出,主题超过其大小限制,本例中为 1 GB(默认大小限制)。The message states that the topic exceeded its size limit, in this case 1 GB (the default size limit).

命名空间Namespaces

对于命名空间,QuotaExceededException 可指示应用程序已超出到命名空间的最大连接数。For namespaces, QuotaExceededException can indicate that an application has exceeded the maximum number of connections to a namespace. 例如:For example:

Microsoft.ServiceBus.Messaging.QuotaExceededException: ConnectionsQuotaExceeded for namespace xxx.
<tracking-id-guid>_G12 ---> 
System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: 
ConnectionsQuotaExceeded for namespace xxx.

常见原因Common causes

此错误有两个常见的原因:死信队列和无法正常运行的消息接收器。There are two common causes for this error: the dead-letter queue, and non-functioning message receivers.

  1. 死信队列 读取器无法完成消息,当锁定过期后,消息将返回至队列/主题。Dead-letter queue A reader is failing to complete messages and the messages are returned to the queue/topic when the lock expires. 如果读取器发生异常,以致无法调用 BrokeredMessage.Complete,就会出现这种情况。It can happen if the reader encounters an exception that prevents it from calling BrokeredMessage.Complete. 消息读取 10 次后,默认移至死信队列。After a message has been read 10 times, it moves to the dead-letter queue by default. 此行为由 QueueDescription.MaxDeliveryCount 属性控制,默认值为 10。This behavior is controlled by the QueueDescription.MaxDeliveryCount property and has a default value of 10. 消息堆积在死信队列中会占用空间。As messages pile up in the dead letter queue, they take up space.

    若要解决此问题,请读取并完成死信队列中的消息,就像处理任何其他队列一样。To resolve the issue, read and complete the messages from the dead-letter queue, as you would from any other queue. 可以使用 FormatDeadLetterPath 方法帮助格式化死信队列路径。You can use the FormatDeadLetterPath method to help format the dead-letter queue path.

  2. 接收方已停止Receiver stopped. 接收方已停止从队列或订阅接收消息。A receiver has stopped receiving messages from a queue or subscription. 识别这种情况的方法是查看 QueueDescription.MessageCountDetails 属性,它会显示消息的完整细目。The way to identify this is to look at the QueueDescription.MessageCountDetails property, which shows the full breakdown of the messages. 如果 ActiveMessageCount 属性很高或不断增加,则表示消息写入的速度超过读取的速度。If the ActiveMessageCount property is high or growing, then the messages aren't being read as fast as they are being written.

TimeoutExceptionTimeoutException

TimeoutException 指示用户启动的操作所用的时间超过操作超时值。A TimeoutException indicates that a user-initiated operation is taking longer than the operation timeout.

应检查 ServicePointManager.DefaultConnectionLimit 属性的值,因为达到此限制也会导致 TimeoutException 异常。You should check the value of the ServicePointManager.DefaultConnectionLimit property, as hitting this limit can also cause a TimeoutException.

队列和主题Queues and topics

对于队列和主题,超时在 MessagingFactorySettings.OperationTimeout 属性中作为连接字符串的一部分指定,或通过 ServiceBusConnectionStringBuilder 指定。For queues and topics, the timeout is specified either in the MessagingFactorySettings.OperationTimeout property, as part of the connection string, or through ServiceBusConnectionStringBuilder. 错误消息本身可能会有所不同,但它始终包含当前操作的指定超时值。The error message itself might vary, but it always contains the timeout value specified for the current operation.

连接性、证书或超时问题Connectivity, certificate, or timeout issues

以下步骤可帮助排除 *.servicebus.chinacloudapi.cn 下所有服务的连接性/证书/超时问题。The following steps may help you with troubleshooting connectivity/certificate/timeout issues for all services under *.servicebus.windows.net.

  • 浏览至 https://<yournamespace>.servicebus.chinacloudapi.cn/ 或使用 wgetBrowse to or wget https://<yournamespace>.servicebus.chinacloudapi.cn/. 这可帮助检查是否存在 IP 筛选或虚拟网络或证书链问题(使用 java SDK 时最常见)。It helps with checking whether you have IP filtering or virtual network or certificate chain issues (most common when using java SDK).

    成功消息的示例:An example of successful message:

    <feed xmlns="http://www.w3.org/2005/Atom"><title type="text">Publicly Listed Services</title><subtitle type="text">This is the list of publicly-listed services currently available.</subtitle><id>uuid:27fcd1e2-3a99-44b1-8f1e-3e92b52f0171;id=30</id><updated>2019-12-27T13:11:47Z</updated><generator>Service Bus 1.1</generator></feed>
    

    失败错误消息的示例:An example of failure error message:

    <Error>
        <Code>400</Code>
        <Detail>
            Bad Request. To know more visit https://aka.ms/sbResourceMgrExceptions. . TrackingId:b786d4d1-cbaf-47a8-a3d1-be689cda2a98_G22, SystemTracker:NoSystemTracker, Timestamp:2019-12-27T13:12:40
        </Detail>
    </Error>
    
  • 运行以下命令,检查防火墙是否阻止了任何端口。Run the following command to check if any port is blocked on the firewall. 所用的端口为 443 (HTTPS)、5671 (AMQP) 和 9354 (Net Messaging/SBMP)。Ports used are 443 (HTTPS), 5671 (AMQP) and 9354 (Net Messaging/SBMP). 根据使用的库,还会使用其他端口。Depending on the library you use, other ports are also used. 下面是用于检查 5671 端口是否被阻止的示例命令。Here is the sample command that check whether the 5671 port is blocked.

    tnc <yournamespacename>.servicebus.chinacloudapi.cn -port 5671
    

    在 Linux 上:On Linux:

    telnet <yournamespacename>.servicebus.chinacloudapi.cn 5671
    
  • 出现间歇性连接问题时,请运行以下命令,检查是否存在任何丢弃的数据包。When there are intermittent connectivity issues, run the following command to check if there are any dropped packets. 此命令会尝试通过服务每隔 1 秒建立 25 个不同的 TCP 连接。This command will try to establish 25 different TCP connections every 1 second with the service. 然后,可以检查其中有多少成功/失败,还可以查看 TCP 连接延迟。Then, you can check how many of them succeeded/failed and also see TCP connection latency.

    .\psping.exe -n 25 -i 1 -q <yournamespace>.servicebus.chinacloudapi.cn:5671 -nobanner     
    

    如果使用的是 tncping 等其他工具,可以使用等效的命令。You can use equivalent commands if you're using other tools such as tnc, ping, and so on.

  • 如果上述步骤没有帮助,请执行网络跟踪并对其进行分析,或者联系 Microsoft 支持部门Obtain a network trace if the previous steps don't help and analyze it or contact Microsoft Support.

后续步骤Next steps

有关服务总线 .NET API 的完整参考,请参阅 Azure .NET API 参考For the complete Service Bus .NET API reference, see the Azure .NET API reference.

若要了解有关服务总线的详细信息,请参阅以下文章:To learn more about Service Bus, see the following articles: