Azure 事件中心的功能和术语Features and terminology in Azure Event Hubs

Azure 事件中心是可缩放的事件处理服务,它引入并处理大量事件和数据,具有低延迟和高可靠性。Azure Event Hubs is a scalable event processing service that ingests and processes large volumes of events and data, with low latency and high reliability. 有关简要概述,请参阅什么是事件中心?See What is Event Hubs? for a high-level overview.

本文基于概述文章中的信息编写,并提供有关事件中心组件和功能的技术和实现详细信息。This article builds on the information in the overview article, and provides technical and implementation details about Event Hubs components and features.


事件中心命名空间提供 DNS 集成网络终结点与一系列的访问控制和网络集成管理功能(例如 IP 筛选、虚拟网络服务终结点和专用链接),并且是用于多个事件中心实例(或 Kafka 用语中的“主题”)之一的管理容器。An Event Hubs namespace provides DNS integrated network endpoints and a range of access control and network integration management features such as IP filtering, virtual network service endpoint, and Private Link and is the management container for one of multiple Event Hub instances (or topics, in Kafka parlance).

事件发布者Event publishers

任何向事件中心发送数据的实体都是事件发布者(与“事件生成者”同义) 。Any entity that sends data to an Event Hub is an event publisher (synonymously used with event producer). 事件发布者可以使用 HTTPS、AMQP 1.0 或 Kafka 协议发布事件。Event publishers can publish events using HTTPS or AMQP 1.0 or the Kafka protocol. 事件发布者会将基于 Azure Active Directory 的授权与 OAuth2 颁发的 JWT 令牌或特定于事件中心的共享访问签名 (SAS) 令牌配合使用,从而获取发布访问权限。Event publishers use Azure Active Directory based authorization with OAuth2-issued JWT tokens or an Event Hub-specific Shared Access Signature (SAS) token gain publishing access.

发布事件Publishing an event

可以通过 AMQP 1.0、Kafka 协议或 HTTPS 发布事件。You can publish an event via AMQP 1.0, the Kafka protocol, or HTTPS. 事件中心服务提供 REST API.NETJavaPythonJavaScriptGo 客户端库,用于将事件发布到事件中心。The Event Hubs service provides REST API and .NET, Java, Python, JavaScript, and Go client libraries for publishing events to an event hub. 对于其他运行时和平台,可以使用任何 AMQP 1.0 客户端,例如 Apache QpidFor other runtimes and platforms, you can use any AMQP 1.0 client, such as Apache Qpid.

是要使用 AMQP 还 HTTPS 根据具体的使用方案而定。The choice to use AMQP or HTTPS is specific to the usage scenario. AMQP 除了需要使用传输级别安全 (TLS) 或 SSL/TLS 以外,还需要建立持久的双向套接字。AMQP requires the establishment of a persistent bidirectional socket in addition to transport level security (TLS) or SSL/TLS. 初始化会话时,AMQP 具有较高的网络成本,但是 HTTPS 需要为每个请求使用额外的 TLS 开销。AMQP has higher network costs when initializing the session, however HTTPS requires additional TLS overhead for every request. 对于频繁进行发布的发布者,AMQP 的性能明显更高,并且,在将 AMQP 与异步发布代码配合使用时,延迟时间可能会大大降低。AMQP has significantly higher performance for frequent publishers and can achieve much lower latencies when used with asynchronous publishing code.

可以逐个或者成批发送事件。You can publish events individually or batched. 单个发布的限制为 1 MB,不管它是单个事件还是一批事件。A single publication has a limit of 1 MB, regardless of whether it is a single event or a batch. 发布大于此阈值的事件将被拒绝。Publishing events larger than this threshold will be rejected.

事件中心吞吐量是通过使用分区和吞吐量单位分配进行缩放的(请参阅下文)。Event Hubs throughput is scaled by using partitions and throughput-unit allocations (see below). 发布者最好是不知道为事件中心选择的特定分区模型,而只是指定用于始终如一地将相关事件分配给同一分区的分区键。It is a best practice for publishers to remain unaware of the specific partitioning model chosen for an Event Hub and to only specify a partition key that is used to consistently assign related events to the same partition.


事件中心确保共享分区键值的所有事件存储在一起,并按到达顺序进行传递。Event Hubs ensures that all events sharing a partition key value are stored together and delivered in order of arrival. 如果将分区键与发布者策略结合使用,则发布者的标识与分区键的值必须匹配。If partition keys are used with publisher policies, then the identity of the publisher and the value of the partition key must match. 否则会出错。Otherwise, an error occurs.

发布者策略Publisher policy

使用事件中心可以通过 发布者策略 对事件发布者进行精细控制。Event Hubs enables granular control over event publishers through publisher policies. 发布者策略是运行时功能,旨在为大量的独立事件发布者提供方便。Publisher policies are run-time features designed to facilitate large numbers of independent event publishers. 借助发布者策略,每个发布者在使用以下机制将事件发布到事件中心时可以使用自身的唯一标识符:With publisher policies, each publisher uses its own unique identifier when publishing events to an event hub, using the following mechanism:

//<my namespace><event hub name>/publishers/<my publisher name>

不需要提前创建发布者名称,但它们必须与发布事件时使用的 SAS 令牌匹配,以确保发布者标识保持独立。You don't have to create publisher names ahead of time, but they must match the SAS token used when publishing an event, in order to ensure independent publisher identities. 使用发布者策略时,PartitionKey 值设置为发布者名称。When using publisher policies, the PartitionKey value is set to the publisher name. 若要正常工作,这些值必须匹配。To work properly, these values must match.


使用事件中心捕获,可以自动捕获事件中心的流式处理数据,并将其保存到所选 Blob 存储帐户或 Azure Data Lake 服务帐户。Event Hubs Capture enables you to automatically capture the streaming data in Event Hubs and save it to your choice of either a Blob storage account, or an Azure Data Lake Service account. 可以从 Azure 门户启用“捕获”,并指定用于执行捕获的最小大小和时间窗口。You can enable Capture from the Azure portal, and specify a minimum size and time window to perform the capture. 使用事件中心捕获,用户可以指定自己的 Azure Blob 存储帐户和容器或 Azure Data Lake 服务帐户(其中之一用于存储已捕获数据)。Using Event Hubs Capture, you specify your own Azure Blob Storage account and container, or Azure Data Lake Service account, one of which is used to store the captured data. 捕获的数据以 Apache Avro 格式编写。Captured data is written in the Apache Avro format.


事件中心通过分区使用者模式提供消息流式处理功能,在此模式下,每个使用者只读取消息流的特定子集或分区。Event Hubs provides message streaming through a partitioned consumer pattern in which each consumer only reads a specific subset, or partition, of the message stream. 此模式支持事件处理的水平缩放,同时提供队列和主题中不可用的其他面向流的功能。This pattern enables horizontal scale for event processing and provides other stream-focused features that are unavailable in queues and topics.

分区是事件中心内保留的有序事件。A partition is an ordered sequence of events that is held in an event hub. 当较新的事件到达时,它们将添加到此序列的末尾。As newer events arrive, they are added to the end of this sequence. 可以将分区视为“提交日志”。A partition can be thought of as a "commit log."


事件中心按配置的保留时间保留数据,该时间适用于事件中心的所有分区。Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. 事件根据特定的时间过期;无法显式删除事件。Events expire on a time basis; you cannot explicitly delete them. 由于分区互相独立且包含自身的数据序列,因此通常按不同速率增大。Because partitions are independent and contain their own sequence of data, they often grow at different rates.


分区数在创建时指定,并且必须介于 1 和 32 之间。The number of partitions is specified at creation and must be between 1 and 32. 分区计数不可更改,因此在设置分区计数时应考虑长期规模。The partition count is not changeable, so you should consider long-term scale when setting partition count. 分区是一种数据组织机制,与使用方应用程序中所需的下游并行度相关。Partitions are a data organization mechanism that relates to the downstream parallelism required in consuming applications. 事件中心的分区数与预期会有的并发读取者数直接相关。The number of partitions in an event hub directly relates to the number of concurrent readers you expect to have. 要将分区数增加到 32 以上,可以联系事件中心团队。You can increase the number of partitions beyond 32 by contacting the Event Hubs team.

你可能希望在创建时将其设置为最高可能值,即 32。You may want to set it to be the highest possible value, which is 32, at the time of creation. 请记住,拥有多个分区将导致事件发送到多个分区而不保留顺序,除非你将发送方配置为仅发送到 32 个分区中的一个分区,剩下的 31 个分区是冗余分区。Remember that having more than one partition will result in events sent to multiple partitions without retaining the order, unless you configure senders to only send to a single partition out of the 32 leaving the remaining 31 partitions redundant. 在前一种情况下,必须跨所有 32 个分区读取事件。In the former case, you will have to read events across all 32 partitions. 在后一种情况下,除了必须在事件处理器主机上进行额外配置外,没有明显的额外成本。In the latter case, there is no obvious additional cost apart from the extra configuration you have to make on Event Processor Host.

虽然可以标识分区并向其直接发送数据,但并不建议直接发送到分区,While partitions are identifiable and can be sent to directly, sending directly to a partition is not recommended. 而应使用事件发布者部分介绍的更高级构造。Instead, you can use higher level constructs introduced in the Event publishers section.

分区中填充了一系列的事件数据,这些数据包含事件的正文、用户定义的属性包和元数据,例如,它在分区中的偏移量,以及它在流序列中的编号。Partitions are filled with a sequence of event data that contains the body of the event, a user-defined property bag, and metadata such as its offset in the partition and its number in the stream sequence.

建议以 1:1 的比例来平衡吞吐量单位和分区数目,实现最佳缩放。We recommend that you balance 1:1 throughput units and partitions to achieve optimal scale. 一个分区最多只能保证一个吞吐量单位的入口和出口。A single partition has a guaranteed ingress and egress of up to one throughput unit. 虽然你也许可以在一个分区实现更高的吞吐量,但性能无法得到保证。While you may be able to achieve higher throughput on a partition, performance is not guaranteed. 这就是我们强烈建议一个事件中心的分区数大于或等于吞吐量单位数的原因。This is why we strongly recommend that the number of partitions in an event hub be greater than or equal to the number of throughput units.

如果已确定所需总吞吐量,则可以知道所需吞吐量单位数和最小分区数,但应该有多少分区呢?Given the total throughput you plan on needing, you know the number of throughput units you require and the minimum number of partitions, but how many partitions should you have? 选择的分区数取决于要实现的下游并行度以及未来的吞吐量需求。Choose number of partitions based on the downstream parallelism you want to achieve as well as your future throughput needs. 我们不根据事件中心的分区数收费。There is no charge for the number of partitions you have within an Event Hub.

如需详细了解分区以及如何在可用性和可靠性之间进行取舍,请参阅事件中心编程指南事件中心中的可用性和一致性这两篇文章。For more information about partitions and the trade-off between availability and reliability, see the Event Hubs programming guide and the Availability and consistency in Event Hubs article.

SAS 令牌SAS tokens

事件中心使用在命名空间和事件中心级别提供的共享访问签名。Event Hubs uses Shared Access Signatures, which are available at the namespace and event hub level. SAS 令牌是从 SAS 密钥生成的,它是以特定格式编码的 URL 的 SHA 哈希。A SAS token is generated from a SAS key and is an SHA hash of a URL, encoded in a specific format. 事件中心可以使用密钥(策略)的名称和令牌重新生成哈希,以便对发送者进行身份验证。Using the name of the key (policy) and the token, Event Hubs can regenerate the hash and thus authenticate the sender. 通常,为事件发布者创建的 SAS 令牌只对特定的事件中心具有“发送”权限。Normally, SAS tokens for event publishers are created with only send privileges on a specific event hub. 此 SAS 令牌 URL 机制是“发布者策略”中介绍的发布者标识的基础。This SAS token URL mechanism is the basis for publisher identification introduced in the publisher policy. 有关使用 SAS 的详细信息,请参阅使用服务总线进行共享访问签名身份验证For more information about working with SAS, see Shared Access Signature Authentication with Service Bus.

事件使用者Event consumers

从事件中心读取事件数据的任何实体称为“事件使用者”。Any entity that reads event data from an event hub is an event consumer. 所有事件中心使用者都通过 AMQP 1.0 会话进行连接,事件会在可用时通过该会话传送。All Event Hubs consumers connect via the AMQP 1.0 session and events are delivered through the session as they become available. 客户端不需要轮询数据可用性。The client does not need to poll for data availability.

使用者组Consumer groups

事件中心的发布/订阅机制通过“使用者组”启用。The publish/subscribe mechanism of Event Hubs is enabled through consumer groups. 使用者组是整个事件中心视图(状态、位置或偏移量)。A consumer group is a view (state, position, or offset) of an entire event hub. 使用者组使多个消费应用程序都有各自独立的事件流视图,并按自身步调和偏移量独立读取流。Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets.

在流处理体系结构中,每个下游应用程序相当于一个使用者组。In a stream processing architecture, each downstream application equates to a consumer group. 如果要将事件数据写入长期存储,则该存储写入器应用程序就是一个使用者组。If you want to write event data to long-term storage, then that storage writer application is a consumer group. 然后,复杂的事件处理可由另一个独立的使用者组执行。Complex event processing can then be performed by another, separate consumer group. 只能通过使用者组访问分区。You can only access partitions through a consumer group. 事件中心内始终有一个默认的使用者组,最多可为一个标准层事件中心创建 20 个使用者组。There is always a default consumer group in an event hub, and you can create up to 20 consumer groups for a Standard tier event hub.

每个使用者组的分区上最多可以有 5 个并发读取者,但是 建议每个使用者组的分区上只有一个活动接收者There can be at most 5 concurrent readers on a partition per consumer group; however it is recommended that there is only one active receiver on a partition per consumer group. 在单个分区中,每个读取者接收所有消息。Within a single partition, each reader receives all of the messages. 如果在同一分区上有多个读取者,则处理重复消息。If you have multiple readers on the same partition, then you process duplicate messages. 需在代码中处理此问题,这并非易于处理的。You need to handle this in your code, which may not be trivial. 但是,在某些情况下,这是一种有效的方法。However, it's a valid approach in some scenarios.

Azure SDK 提供的某些客户端是智能使用者代理,可以自动管理详细信息,以确保每个分区都有一个读取者,并确保正在读取事件中心的所有分区。Some clients offered by the Azure SDKs are intelligent consumer agents that automatically manage the details of ensuring that each partition has a single reader and that all partitions for an event hub are being read from. 这样,你的代码的处理范围便可集中于从事件中心读取的事件,从而可以忽略分区的许多细节。This allows your code to focus on processing the events being read from the event hub so it can ignore many of the details of the partitions. 有关详细信息,请参阅连接到分区For more information, see Connect to a partition.

以下示例显示了使用者组 URI 约定:The following examples show the consumer group URI convention:

//<my namespace><event hub name>/<Consumer Group #1>
//<my namespace><event hub name>/<Consumer Group #2>

下图显示了事件中心流处理体系结构:The following figure shows the Event Hubs stream processing architecture:


流偏移量Stream offsets

偏移量 是事件在分区中的位置。An offset is the position of an event within a partition. 可以将偏移量视为客户端游标。You can think of an offset as a client-side cursor. 偏移量是事件的字节编号。The offset is a byte numbering of the event. 有了该偏移量,事件使用者(读取者)便可以在事件流中指定要从其开始读取事件的点。This offset enables an event consumer (reader) to specify a point in the event stream from which they want to begin reading events. 可以时间戳或者偏移量值的形式指定偏移量。You can specify the offset as a timestamp or as an offset value. 使用者负责在事件中心服务的外部存储其自身的偏移量值。Consumers are responsible for storing their own offset values outside of the Event Hubs service. 在分区中,每个事件都包含一个偏移量。Within a partition, each event includes an offset.



检查点 是读取者在分区事件序列中标记或提交其位置时执行的过程。Checkpointing is a process by which readers mark or commit their position within a partition event sequence. 检查点操作由使用者负责,并在使用者组中的每个分区上进行。Checkpointing is the responsibility of the consumer and occurs on a per-partition basis within a consumer group. 这种责任意味着,对于每个使用者组而言,每个分区读取者必须跟踪它在事件流中的当前位置,当它认为数据流已完成时,可以通知服务。This responsibility means that for each consumer group, each partition reader must keep track of its current position in the event stream, and can inform the service when it considers the data stream complete.

如果读取者与分区断开连接,当它重新连接时,将开始读取前面由该使用者组中该分区的最后一个读取者提交的检查点。If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. 当读取者建立连接时,它会将此偏移量传递给事件中心,以指定要从其开始读取数据的位置。When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. 这样,用户便可以使用检查点将事件标记为已由下游应用程序“完成”,并且在不同计算机上运行的读取者之间发生故障转移时,还可以提供弹性。In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. 若要返回到较旧的数据,可以在此检查点过程中指定较低的偏移量。It is possible to return to older data by specifying a lower offset from this checkpointing process. 借助此机制,检查点可以实现故障转移复原和事件流重放。Through this mechanism, checkpointing enables both failover resiliency and event stream replay.


如果你在一个环境中使用 Azure Blob 存储作为检查点存储,该环境支持与 Azure 上通常可用的存储 Blob SDK 版本不同的版本,那么你需要使用代码将存储服务 API 版本更改为该环境支持的特定版本。If you are using Azure Blob Storage as the checkpoint store in an environment that supports a different version of Storage Blob SDK than those typically available on Azure, you'll need to use code to change the Storage service API version to the specific version supported by that environment. 例如,如果在 Azure Stack Hub 版本 2002 上运行事件中心,则存储服务的最高可用版本为 2017-11-09。For example, if you are running Event Hubs on an Azure Stack Hub version 2002, the highest available version for the Storage service is version 2017-11-09. 在这种情况下,需要使用代码将存储服务 API 版本设定为 2017-11-09。In this case, you need to use code to target the Storage service API version to 2017-11-09. 如需通过示例来了解如何以特定的存储 API 版本为目标,请参阅“GitHub 上的这些示例”:For an example on how to target a specific Storage API version, see these samples on GitHub:

常见的使用者任务Common consumer tasks

所有事件中心使用者都通过 AMQP 1.0 会话,一种状态感知型双向信道进行连接。All Event Hubs consumers connect via an AMQP 1.0 session, a state-aware bidirectional communication channel. 每个分区都提供一个 AMQP 1.0 会话,方便传输按分区隔离的事件。Each partition has an AMQP 1.0 session that facilitates the transport of events segregated by partition.

连接到分区Connect to a partition

在连接到分区时,常见的做法是使用租用机制来协调读取者与特定分区的连接。When connecting to partitions, it's common practice to use a leasing mechanism to coordinate reader connections to specific partitions. 这样,便可以做到一个使用者组中每分区只有一个活动的读取者。This way, it's possible for every partition in a consumer group to have only one active reader. 使用事件中心 SDK 中的客户端(充当智能使用者代理)可以简化检查点、租用和管理读取者的操作。Checkpointing, leasing, and managing readers are simplified by using the clients within the Event Hubs SDKs, which act as intelligent consumer agents. 它们是:These are:

读取事件Read events

为特定分区建立 AMQP 1.0 会话和链接后,事件中心服务会将事件传送到 AMQP 1.0 客户端。After an AMQP 1.0 session and link is opened for a specific partition, events are delivered to the AMQP 1.0 client by the Event Hubs service. 与 HTTP GET 等基于提取的机制相比,此传送机制可以实现更高的吞吐量和更低的延迟。This delivery mechanism enables higher throughput and lower latency than pull-based mechanisms such as HTTP GET. 将事件发送到客户端时,每个事件数据实例包含重要的元数据,例如,用于简化对事件序列执行的检查点操作的偏移量和序列号。As events are sent to the client, each event data instance contains important metadata such as the offset and sequence number that are used to facilitate checkpointing on the event sequence.

事件数据:Event data:

  • OffsetOffset
  • 序列号Sequence number
  • 正文Body
  • 用户属性User properties
  • 系统属性System properties

用户负责管理偏移量。It is your responsibility to manage the offset.

后续步骤Next steps

有关事件中心的详细信息,请访问以下链接:For more information about Event Hubs, visit the following links: