事件网格消息传送和重试Event Grid message delivery and retry

本文介绍了未确认送达时 Azure 事件网格如何处理事件。This article describes how Azure Event Grid handles events when delivery isn't acknowledged.

事件网格提供持久传送。Event Grid provides durable delivery. 它会将每个订阅的每条消息至少发送一次。It delivers each message at least once for each subscription. 事件会立即发送到每个订阅的已注册终结点。Events are sent to the registered endpoint of each subscription immediately. 如果终结点未确认收到事件,事件网格会重试传送事件。If an endpoint doesn't acknowledge receipt of an event, Event Grid retries delivery of the event.

备注

事件网格不保证事件传送的顺序,因此订阅者可能会收到不按顺序的事件。Event Grid doesn't guarantee order for event delivery, so subscriber may receive them out of order.

批量事件传送Batched event delivery

默认情况下,事件网格单独将每个事件发送给订阅者。Event Grid defaults to sending each event individually to subscribers. 订阅者接收包含单个事件的数组。The subscriber receives an array with a single event. 你可以将事件网格配置为批量处理要传送的事件,以在高吞吐量方案中提高 HTTP 性能。You can configure Event Grid to batch events for delivery for improved HTTP performance in high-throughput scenarios.

批量传送有两个设置:Batched delivery has two settings:

  • 每批最大事件数 - 事件网格每批将传送的最大事件数。Max events per batch - Maximum number of events Event Grid will deliver per batch. 永远不会超过此数目,但是,如果在发布时没有更多事件,则可能会传送较少的事件。This number will never be exceeded, however fewer events may be delivered if no other events are available at the time of publish. 如果只有较少的事件,事件网格不会为了创建某个批而延迟事件传送。Event Grid does not delay events in order to create a batch if fewer events are available. 必须介于 1 到 5,000 之间。Must be between 1 and 5,000.
  • 首选批大小(KB) - 批大小的目标上限 (KB)。Preferred batch size in kilobytes - Target ceiling for batch size in kilobytes. 与最大事件数类似,如果发布时没有更多的事件,则批大小可能会较小。Similar to max events, the batch size may be smaller if more events are not available at the time of publish. 如果 单个事件大于首选大小,则批可能会大于首选批大小。It is possible that a batch is larger than the preferred batch size if a single event is larger than the preferred size. 例如,如果首选大小为 4 KB,并且一个 10 KB 的事件推送到了事件网格,则 10 KB 事件将会在其自己的批中传送,而不会被删除。For example, if the preferred size is 4 KB and a 10-KB event is pushed to Event Grid, the 10-KB event will still be delivered in its own batch rather than being dropped.

可以通过门户、CLI、PowerShell 或 SDK 以每事件订阅为基础配置批量传送。Batched delivery in configured on a per-event subscription basis via the portal, CLI, PowerShell, or SDKs.

Azure 门户:Azure portal:

文件传送设置

Azure CLIAzure CLI

创建事件订阅时,请使用以下参数:When creating an event subscription, use the following parameters:

  • max-events-per-batch - 每批的最大事件数。max-events-per-batch - Maximum number of events in a batch. 必须是介于 1 和 5000 之间的数字。Must be a number between 1 and 5000.
  • preferred-batch-size-in-kilobytes - 首选批大小 (KB)。preferred-batch-size-in-kilobytes - Preferred batch size in kilobytes. 必须是介于 1 和 1024 之间的数字。Must be a number between 1 and 1024.
storageid=$(az storage account show --name <storage_account_name> --resource-group <resource_group_name> --query id --output tsv)
endpoint=https://$sitename.chinacloudsites.cn/api/updates

az eventgrid event-subscription create \
  --source-resource-id $storageid \
  --name <event_subscription_name> \
  --endpoint $endpoint \
  --max-events-per-batch 1000 \
  --preferred-batch-size-in-kilobytes 512

有关将 Azure CLI 与事件网格配合使用的详细信息,请参阅使用 Azure CLI 将存储事件路由到 Web 终结点For more information on using Azure CLI with Event Grid, see Route storage events to web endpoint with Azure CLI.

重试计划和持续时间Retry schedule and duration

当 EventGrid 收到有关尝试传递事件的错误时,EventGrid 会根据错误的类型决定它是应当重试传递,还是应当对事件进行死信处理或删除事件。When EventGrid receives an error for an event delivery attempt, EventGrid decides whether it should retry the delivery or dead-letter or drop the event based on the type of the error.

如果订阅的终结点返回的错误是不能通过重试来解决的配置相关错误(例如,如果终结点已删除),EventGrid 将对事件执行死信处理,或者删除事件(如果未配置死信)。If the error returned by the subscribed endpoint is configuration related error that can't be fixed with retries (for example, if the endpoint is deleted), EventGrid will either perform dead lettering the event or drop the event if dead letter is not configured.

下面是不会进行重试的终结点的类型:Following are the types of endpoints for which retry doesn't happen:

终结点类型Endpoint Type 错误代码Error codes
Azure 资源Azure Resources 400 错误的请求、413 请求实体太大、403 禁止访问400 Bad Request, 413 Request Entity Too Large, 403 Forbidden
WebhookWebhook 400 错误的请求、413 请求实体太大、403 禁止访问、404 未找到、401 未授权400 Bad Request, 413 Request Entity Too Large, 403 Forbidden, 404 Not Found, 401 Unauthorized

备注

如果没有为终结点配置死信,则出现上述错误时将删除事件。If Dead-Letter is not configured for endpoint, events will be dropped when above errors happen. 如果不想删除这些类型的事件,请考虑配置死信。Consider configuring Dead-Letter, if you don't want these kinds of events to be dropped.

如果订阅的终结点返回的错误不在上面的列表中,则 EventGrid 将使用下面所述的策略执行重试:If the error returned by the subscribed endpoint is not among the above list, EventGrid performs the retry using policies described below:

传送消息后,事件网格将等待 30 秒以接收响应。Event Grid waits 30 seconds for a response after delivering a message. 如果终结点在 30 秒后未发出响应,消息将排入队列等待重试。After 30 seconds, if the endpoint hasn�t responded, the message is queued for retry. 对于事件传送,事件网格使用指数性的回退重试策略。Event Grid uses an exponential backoff retry policy for event delivery. 事件网格会尽量按以下计划重试传送:Event Grid retries delivery on the following schedule on a best effort basis:

  • 10 秒10 seconds
  • 30 秒30 seconds
  • 1 分钟1 minute
  • 5 分钟5 minutes
  • 10 分钟10 minutes
  • 30 分钟30 minutes
  • 1 小时1 hour
  • 3 小时3 hours
  • 6 小时6 hours
  • 每 12 小时到 24 小时Every 12 hours up to 24 hours

如果终结点在 3 分钟内做出了响应,则事件网格会尽量尝试从重试队列中删除事件,但仍可能会收到重复项。If the endpoint responds within 3 minutes, Event Grid will attempt to remove the event from the retry queue on a best effort basis but duplicates may still be received.

事件网格为所有重试步骤添加小的随机性,在某个终结点持续运行不正常、停机很长时间,或者看起来已过载的情况下,会适时跳过某些重试。Event Grid adds a small randomization to all retry steps and may opportunistically skip certain retries if an endpoint is consistently unhealthy, down for a long period, or appears to be overwhelmed.

对于确定性行为,请在订阅重试策略中设置事件生存时间和最大传递尝试次数。For deterministic behavior, set the event time to live and max delivery attempts in the subscription retry policies.

默认情况下,事件网格会使所有在 24 小时内未送达的事件过期。By default, Event Grid expires all events that aren't delivered within 24 hours. 创建事件订阅时,可自定义重试策略You can customize the retry policy when creating an event subscription. 提供最大传递尝试次数(默认值为 30)和事件生存时间(默认为 1440 分钟)。You provide the maximum number of delivery attempts (default is 30) and the event time-to-live (default is 1440 minutes).

延迟传送Delayed Delivery

当终结点遇到传送失败时,事件网格将开始延迟向该终结点传送和重试事件。As an endpoint experiences delivery failures, Event Grid will begin to delay the delivery and retry of events to that endpoint. 例如,如果发布到某个终结点的前 10 个事件失败,事件网格将假设该终结点遇到问题,并将所有后续重试和新的传送操作延迟一段时间 - 在某些情况下,会延迟几个小时。 For example, if the first 10 events published to an endpoint fail, Event Grid will assume that the endpoint is experiencing issues and will delay all subsequent retries and new deliveries for some time - in some cases up to several hours.

从功能上讲,延迟传送的目的是保护不正常的终结点以及事件网格系统。The functional purpose of delayed delivery is to protect unhealthy endpoints as well as the Event Grid system. 如果不采用退让机制并延迟向不正常的终结点传送事件,事件网格的重试策略和卷功能可能很容易使系统瘫痪。Without back-off and delay of delivery to unhealthy endpoints, Event Grid's retry policy and volume capabilities can easily overwhelm a system.

死信事件Dead-letter events

当事件网格无法在特定时间段内或在尝试传递事件一定次数后传递事件时,它可以将未传递的事件发送到存储帐户。When Event Grid can't deliver an event within a certain time period or after trying to deliver the event a certain number of times, it can send the undelivered event to a storage account. 此过程称为“死信处理”。This process is known as dead-lettering. 满足以下条件之一时,事件网格会将事件视为死信。Event Grid dead-letters an event when one of the following conditions is met.

  • 事件未在生存期内传递。Event isn't delivered within the time-to-live period.
  • 尝试传递事件的次数已超出限制。The number of tries to deliver the event has exceeded the limit.

如果满足上述任一条件,则会将该事件删除或视为死信。If either of the conditions is met, the event is dropped or dead-lettered. 默认情况下,事件网格不启用死信处理。By default, Event Grid doesn't turn on dead-lettering. 若要启用该功能,在创建事件订阅时必须指定一个存储帐户来存放未送达的事件。To enable it, you must specify a storage account to hold undelivered events when creating the event subscription. 你将从此存储帐户中拉取事件来解决传递问题。You pull events from this storage account to resolve deliveries.

事件网格已进行所有重试尝试后会将事件发送到死信位置。Event Grid sends an event to the dead-letter location when it has tried all of its retry attempts. 如果事件网格收到 400(错误请求)或 413(请求实体太大)响应代码,它会立即计划事件以进行死信处理。If Event Grid receives a 400 (Bad Request) or 413 (Request Entity Too Large) response code, it immediately schedules the event for dead-lettering. 这些响应代码指示事件传送将永远不会成功。These response codes indicate delivery of the event will never succeed.

仅在下一次计划的传递尝试时检查生存时间是否过期。The time-to-live expiration is checked ONLY at the next scheduled delivery attempt. 因此,即使生存时间在下一次计划的传递尝试之前到期,也只会在下一次传递时检查事件到期时间,然后再检查死信。Therefore, even if time-to-live expires before the next scheduled delivery attempt, event expiry is checked only at the time of the next delivery and then subsequently dead-lettered.

最后一次尝试发送事件与发送到死信位置之间有五分钟的延迟。There is a five-minute delay between the last attempt to deliver an event and when it is delivered to the dead-letter location. 此延迟旨在减少 Blob 存储操作的数量。This delay is intended to reduce the number Blob storage operations. 如果死信位置已四小时不可用,则会丢弃该事件。If the dead-letter location is unavailable for four hours, the event is dropped.

在设置死信位置之前,必须有一个包含容器的存储帐户。Before setting the dead-letter location, you must have a storage account with a container. 在创建事件订阅时,需要提供此容器的终结点。You provide the endpoint for this container when creating the event subscription. 终结点的格式如下:/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Storage/storageAccounts/<storage-name>/blobServices/default/containers/<container-name>The endpoint is in the format of: /subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Storage/storageAccounts/<storage-name>/blobServices/default/containers/<container-name>

你可能希望在事件发送到死信位置时收到通知。You might want to be notified when an event has been sent to the dead letter location. 若要使用事件网格来响应未送达的事件,请为死信 blob 存储创建事件订阅To use Event Grid to respond to undelivered events, create an event subscription for the dead-letter blob storage. 每当死信 blob 存储收到未送达的事件时,事件网格都会通知处理程序。Every time your dead-letter blob storage receives an undelivered event, Event Grid notifies your handler. 处理程序使用你希望采取的、用于协调未送达的事件的操作进行响应。The handler responds with actions you wish to take for reconciling undelivered events. 有关设置死信位置和重试策略的示例,请参阅死信和重试策略For an example of setting up a dead letter location and retry policies, see Dead letter and retry policies.

传递事件格式Delivery event formats

本部分提供了不同传递架构格式(事件网格架构、CloudEvents 1.0 架构和自定义架构)的事件和死信事件的示例。This section gives you examples of events and dead-lettered events in different delivery schema formats (Event Grid schema, CloudEvents 1.0 schema, and custom schema). 有关这些格式的详细信息,请参阅事件网格架构CloudEvents 1.0 架构这两篇文章。For more information about these formats, see Event Grid schema and Cloud Events 1.0 schema articles.

事件网格架构Event Grid schema

事件Event

{
    "id": "93902694-901e-008f-6f95-7153a806873c",
    "eventTime": "2020-08-13T17:18:13.1647262Z",
    "eventType": "Microsoft.Storage.BlobCreated",
    "dataVersion": "",
    "metadataVersion": "1",
    "topic": "/subscriptions/000000000-0000-0000-0000-00000000000000/resourceGroups/rgwithoutpolicy/providers/Microsoft.Storage/storageAccounts/myegteststgfoo",
    "subject": "/blobServices/default/containers/deadletter/blobs/myBlobFile.txt",    
    "data": {
        "api": "PutBlob",
        "clientRequestId": "c0d879ad-88c8-4bbe-8774-d65888dc2038",
        "requestId": "93902694-901e-008f-6f95-7153a8000000",
        "eTag": "0x8D83FACDC0C3402",
        "contentType": "text/plain",
        "contentLength": 0,
        "blobType": "BlockBlob",
        "url": "https://myegteststgfoo.blob.core.chinacloudapi.cn/deadletter/myBlobFile.txt",
        "sequencer": "00000000000000000000000000015508000000000005101c",
        "storageDiagnostics": { "batchId": "cfb32f79-3006-0010-0095-711faa000000" }
    }
}

死信事件Dead-letter event

{
    "id": "93902694-901e-008f-6f95-7153a806873c",
    "eventTime": "2020-08-13T17:18:13.1647262Z",
    "eventType": "Microsoft.Storage.BlobCreated",
    "dataVersion": "",
    "metadataVersion": "1",
    "topic": "/subscriptions/0000000000-0000-0000-0000-000000000000000/resourceGroups/rgwithoutpolicy/providers/Microsoft.Storage/storageAccounts/myegteststgfoo",
    "subject": "/blobServices/default/containers/deadletter/blobs/myBlobFile.txt",    
    "data": {
        "api": "PutBlob",
        "clientRequestId": "c0d879ad-88c8-4bbe-8774-d65888dc2038",
        "requestId": "93902694-901e-008f-6f95-7153a8000000",
        "eTag": "0x8D83FACDC0C3402",
        "contentType": "text/plain",
        "contentLength": 0,
        "blobType": "BlockBlob",
        "url": "https://myegteststgfoo.blob.core.chinacloudapi.cn/deadletter/myBlobFile.txt",
        "sequencer": "00000000000000000000000000015508000000000005101c",
        "storageDiagnostics": { "batchId": "cfb32f79-3006-0010-0095-711faa000000" }
    },

    "deadLetterReason": "MaxDeliveryAttemptsExceeded",
    "deliveryAttempts": 1,
    "lastDeliveryOutcome": "NotFound",
    "publishTime": "2020-08-13T17:18:14.0265758Z",
    "lastDeliveryAttemptTime": "2020-08-13T17:18:14.0465788Z" 
}

CloudEvents 1.0 架构CloudEvents 1.0 schema

事件Event

{
    "id": "caee971c-3ca0-4254-8f99-1395b394588e",
    "source": "mysource",
    "dataversion": "1.0",
    "subject": "mySubject",
    "type": "fooEventType",
    "datacontenttype": "application/json",
    "data": {
        "prop1": "value1",
        "prop2": 5
    }
}

死信事件Dead-letter event

{
    "id": "caee971c-3ca0-4254-8f99-1395b394588e",
    "source": "mysource",
    "dataversion": "1.0",
    "subject": "mySubject",
    "type": "fooEventType",
    "datacontenttype": "application/json",
    "data": {
        "prop1": "value1",
        "prop2": 5
    },

    "deadletterreason": "MaxDeliveryAttemptsExceeded",
    "deliveryattempts": 1,
    "lastdeliveryoutcome": "NotFound",
    "publishtime": "2020-08-13T21:21:36.4018726Z",
}

自定义架构Custom schema

事件Event

{
    "prop1": "my property",
    "prop2": 5,
    "myEventType": "fooEventType"
}

死信事件Dead-letter event

{
    "id": "8bc07e6f-0885-4729-90e4-7c3f052bd754",
    "eventTime": "2020-08-13T18:11:29.4121391Z",
    "eventType": "myEventType",
    "dataVersion": "1.0",
    "metadataVersion": "1",
    "topic": "/subscriptions/00000000000-0000-0000-0000-000000000000000/resourceGroups/rgwithoutpolicy/providers/Microsoft.EventGrid/topics/myCustomSchemaTopic",
    "subject": "subjectDefault",
  
    "deadLetterReason": "MaxDeliveryAttemptsExceeded",
    "deliveryAttempts": 1,
    "lastDeliveryOutcome": "NotFound",
    "publishTime": "2020-08-13T18:11:29.4121391Z",
    "lastDeliveryAttemptTime": "2020-08-13T18:11:29.4277644Z",
  
    "data": {
        "prop1": "my property",
        "prop2": 5,
        "myEventType": "fooEventType"
    }
}

消息传送状态Message delivery status

事件网格使用 HTTP 响应代码确认已接收事件。Event Grid uses HTTP response codes to acknowledge receipt of events.

成功代码Success codes

事件网格 将以下 HTTP 响应代码视为传送成功。Event Grid considers only the following HTTP response codes as successful deliveries. 所有其他状态代码被视为传送失败,将会相应地重试传送或将事件加入死信队列。All other status codes are considered failed deliveries and will be retried or deadlettered as appropriate. 收到成功状态代码后,事件网格认为传送已完成。Upon receiving a successful status code, Event Grid considers delivery complete.

  • 200 正常200 OK
  • 201 Created201 Created
  • 202 已接受202 Accepted
  • 203 非权威信息203 Non-Authoritative Information
  • 204 无内容204 No Content

失败代码Failure codes

不在上述集合 (200-204) 内的所有其他代码会被视为失败,将会重试(如果需要)。All other codes not in the above set (200-204) are considered failures and will be retried (if needed). 某些代码已关联到下面所述的特定重试策略,所有其他代码遵循标准的指数退让模型。Some have specific retry policies tied to them outlined below, all others follow the standard exponential back-off model. 请务必注意,由于事件网格体系结构的高度并行化特性,重试行为是不确定的。It's important to keep in mind that due to the highly parallelized nature of Event Grid's architecture, the retry behavior is non-deterministic.

状态代码Status code 重试行为Retry behavior
400 错误的请求400 Bad Request 不重试Not retried
401 未授权401 Unauthorized 在 5 分钟或更长时间后为 Azure 资源终结点进行重试Retry after 5 minutes or more for Azure Resources Endpoints
403 禁止访问403 Forbidden 不重试Not retried
404 未找到404 Not Found 在 5 分钟或更长时间后为 Azure 资源终结点进行重试Retry after 5 minutes or more for Azure Resources Endpoints
408 请求超时408 Request Timeout 2 分钟或更长时间后重试Retry after 2 minutes or more
413 请求实体太大413 Request Entity Too Large 不重试Not retried
503 服务不可用503 Service Unavailable 30 秒或更长时间后重试Retry after 30 seconds or more
所有其他All others 10 秒或更长时间后重试Retry after 10 seconds or more

后续步骤Next steps