Azure 事件中心的功能和术语Features and terminology in Azure Event Hubs

Azure 事件中心是可缩放的事件处理服务,它引入并处理大量事件和数据,具有低延迟和高可靠性。Azure Event Hubs is a scalable event processing service that ingests and processes large volumes of events and data, with low latency and high reliability. 有关简要概述,请参阅什么是事件中心?See What is Event Hubs? for a high-level overview.

本文基于概述文章中的信息编写,并提供有关事件中心组件和功能的技术和实现详细信息。This article builds on the information in the overview article, and provides technical and implementation details about Event Hubs components and features.

命名空间Namespace

事件中心命名空间提供唯一的范围容器,可以通过其完全限定的域名进行引用,而在该容器中,可以创建一个或多个事件中心主题。An Event Hubs namespace provides a unique scoping container, referenced by its fully qualified domain name, in which you create one or more event hubs topics.

事件发布者Event publishers

向事件中心发送数据的任何实体都是事件生成者或事件发布者 。Any entity that sends data to an event hub is an event producer, or event publisher. 事件发布者可以使用 HTTPS 或 AMQP 1.0 发布事件。Event publishers can publish events using HTTPS or AMQP 1.0. 事件发布者通过共享访问签名 (SAS) 令牌向事件中心表明其身份,可以使用唯一的标识,也可以使用通用的 SAS 令牌。Event publishers use a Shared Access Signature (SAS) token to identify themselves to an event hub, and can have a unique identity, or use a common SAS token.

发布事件Publishing an event

可以通过 AMQP 1.0 或 HTTPS 发布事件。You can publish an event via AMQP 1.0 or HTTPS. 服务总线提供了客户端库和类,用于从 .NET 客户端将事件发布到事件中心。Event Hubs provides client libraries and classes for publishing events to an event hub from .NET clients. 对于其他运行时和平台,可以使用任何 AMQP 1.0 客户端,例如 Apache QpidFor other runtimes and platforms, you can use any AMQP 1.0 client, such as Apache Qpid. 可以逐个或者批量发送事件。You can publish events individually, or batched. 单个发布(事件数据实例)限制为 256 KB,无论其为单个事件还是批量事件。A single publication (event data instance) has a limit of 256 KB, regardless of whether it is a single event or a batch. 发布大于此限制的事件将导致出错。Publishing events larger than this threshold results in an error. 发布者最好是不知道事件中心内的分区数,而只是通过其 SAS 令牌指定“分区键” (如下一部分所述)或其标识。It is a best practice for publishers to be unaware of partitions within the event hub and to only specify a partition key (introduced in the next section), or their identity via their SAS token.

是要使用 AMQP 还 HTTPS 根据具体的使用方案而定。The choice to use AMQP or HTTPS is specific to the usage scenario. AMQP 除了需要使用传输级别安全 (TLS) 或 SSL/TLS 以外,还需要建立持久的双向套接字。AMQP requires the establishment of a persistent bidirectional socket in addition to transport level security (TLS) or SSL/TLS. AMQP 在初始化会话时的网络成本更高,而 HTTPS 则每次请求都需要额外的 SSL 开销。AMQP has higher network costs when initializing the session, however HTTPS requires additional SSL overhead for every request. 对于活动频繁的发布者,AMQP 的性能更高。AMQP has higher performance for frequent publishers.

事件中心

事件中心可确保按顺序将共享分区键值的所有事件传送到同一分区。Event Hubs ensures that all events sharing a partition key value are delivered in order, and to the same partition. 如果将分区键与发布者策略结合使用,则发布者的标识与分区键的值必须匹配。If partition keys are used with publisher policies, then the identity of the publisher and the value of the partition key must match. 否则会出错。Otherwise, an error occurs.

发布者策略Publisher policy

事件中心可让你通过发布者策略对事件发布者进行精细控制 。Event Hubs enables granular control over event publishers through publisher policies. 发布者策略是运行时功能,旨在为大量的独立事件发布者提供方便。Publisher policies are run-time features designed to facilitate large numbers of independent event publishers. 借助发布者策略,每个发布者在使用以下机制将事件发布到事件中心时可以使用自身的唯一标识符:With publisher policies, each publisher uses its own unique identifier when publishing events to an event hub, using the following mechanism:

//[my namespace].servicebus.chinacloudapi.cn/[event hub name]/publishers/[my publisher name]

不需要提前创建发布者名称,但它们必须与发布事件时使用的 SAS 令牌匹配,以确保发布者标识保持独立。You don't have to create publisher names ahead of time, but they must match the SAS token used when publishing an event, in order to ensure independent publisher identities. 使用发布者策略时,PartitionKey 值设置为发布者名称。When using publisher policies, the PartitionKey value is set to the publisher name. 若要正常工作,这些值必须匹配。To work properly, these values must match.

捕获Capture

使用事件中心捕获,可自动捕获事件中心中的流数据,并将其保存到所选 Blob 存储帐户。Event Hubs Capture enables you to automatically capture the streaming data in Event Hubs and save it to your choice of a Blob storage account. 可以从 Azure 门户启用“捕获”,并指定用于执行捕获的最小大小和时间窗口。You can enable Capture from the Azure portal, and specify a minimum size and time window to perform the capture. 使用“事件中心捕获”,可以指定自己的 Azure Blob 存储帐户和容器(用于存储捕获的数据)。Using Event Hubs Capture, you specify your own Azure Blob Storage account and container, which is used to store the captured data. 捕获的数据以 Apache Avro 格式编写。Captured data is written in the Apache Avro format.

分区Partitions

事件中心使用分区使用者模式提供消息流式处理,在此模式下,每个使用者只读取消息流的特定子集或分区。Event Hubs provides message streaming through a partitioned consumer pattern in which each consumer only reads a specific subset, or partition, of the message stream. 此模式允许以水平缩放规模进行事件处理,并提供其他面向流的功能,而队列和主题不能提供这些功能。This pattern enables horizontal scale for event processing and provides other stream-focused features that are unavailable in queues and topics.

分区是事件中心内保留的有序事件。A partition is an ordered sequence of events that is held in an event hub. 当较新的事件到达时,它们将添加到此序列的末尾。As newer events arrive, they are added to the end of this sequence. 可以将分区视为“提交日志”。A partition can be thought of as a "commit log."

事件中心

事件中心按配置的保留时间保留数据,该时间适用于事件中心的所有分区。Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. 事件根据特定的时间过期;无法显式删除事件。Events expire on a time basis; you cannot explicitly delete them. 因为分区相互独立,并且包含其各自的数据序列,所以它们通常以不同的速率增长。Because partitions are independent and contain their own sequence of data, they often grow at different rates.

事件中心

分区数在创建时指定,必须介于 2 到 32 之间。The number of partitions is specified at creation and must be between 2 and 32. 分区计数不可更改,因此在设置分区计数时应考虑长期规模。The partition count is not changeable, so you should consider long-term scale when setting partition count. 分区是一种数据组织机制,与使用方应用程序中所需的下游并行度相关。Partitions are a data organization mechanism that relates to the downstream parallelism required in consuming applications. 事件中心的分区数与预期会有的并发读取者数直接相关。The number of partitions in an event hub directly relates to the number of concurrent readers you expect to have. 可以通过联系事件中心团队将分区数增加到 32 个以上。You can increase the number of partitions beyond 32 by contacting the Event Hubs team.

虽然可以标识分区并向其直接发送数据,但并不建议直接发送到分区。While partitions are identifiable and can be sent to directly, sending directly to a partition is not recommended. 而应使用事件发布者部分介绍的更高级构造。Instead, you can use higher level constructs introduced in the Event publishers section.

分区中填充了一系列的事件数据,这些数据包含事件的正文、用户定义的属性包和元数据,例如,它在分区中的偏移量,以及它在流序列中的编号。Partitions are filled with a sequence of event data that contains the body of the event, a user-defined property bag, and metadata such as its offset in the partition and its number in the stream sequence.

建议以 1:1 的比例来平衡吞吐量单位和分区数目,实现最佳缩放。We recommend that you balance 1:1 throughput units and partitions to achieve optimal scale. 一个分区最多只能保证一个吞吐量单位的入口和出口。A single partition has a guaranteed ingress and egress of up to one throughput unit. 虽然你也许可以在一个分区实现更高的吞吐量,但性能无法得到保证。While you may be able to achieve higher throughput on a partition, performance is not guaranteed. 这就是我们强烈建议一个事件中心的分区数大于或等于吞吐量单位数的原因。This is why we strongly recommend that the number of partitions in an event hub be greater than or equal to the number of throughput units.

如果已确定所需总吞吐量,则可以知道所需吞吐量单位数和最小分区数,但应该有多少分区呢?Given the total throughput you plan on needing, you know the number of throughput units you require and the minimum number of partitions, but how many partitions should you have? 选择的分区数取决于要实现的下游并行度以及未来的吞吐量需求。Choose number of partitions based on the downstream parallelism you want to achieve as well as your future throughput needs. 我们不根据事件中心的分区数收费。There is no charge for the number of partitions you have within an Event Hub.

如需详细了解分区以及如何在可用性和可靠性之间进行取舍,请参阅事件中心编程指南事件中心中的可用性和一致性这两篇文章。For more information about partitions and the trade-off between availability and reliability, see the Event Hubs programming guide and the Availability and consistency in Event Hubs article.

SAS 令牌SAS tokens

事件中心使用在命名空间和事件中心级别提供的共享访问签名 。Event Hubs uses Shared Access Signatures, which are available at the namespace and event hub level. SAS 令牌是从 SAS 密钥生成的,它是以特定格式编码的 URL 的 SHA 哈希。A SAS token is generated from a SAS key and is an SHA hash of a URL, encoded in a specific format. 事件中心可以使用密钥(策略)的名称和令牌重新生成哈希,以便对发送者进行身份验证。Using the name of the key (policy) and the token, Event Hubs can regenerate the hash and thus authenticate the sender. 通常,为事件发布者创建的 SAS 令牌只对特定的事件中心具有“发送” 权限。Normally, SAS tokens for event publishers are created with only send privileges on a specific event hub. 此 SAS 令牌 URL 机制是“发布者策略”中介绍的发布者标识的基础。This SAS token URL mechanism is the basis for publisher identification introduced in the publisher policy. 有关使用 SAS 的详细信息,请参阅使用服务总线进行共享访问签名身份验证For more information about working with SAS, see Shared Access Signature Authentication with Service Bus.

事件使用者Event consumers

从事件中心读取事件数据的任何实体称为“事件使用者”。 Any entity that reads event data from an event hub is an event consumer. 所有事件中心使用者通过 AMQP 1.0 会话进行连接,事件在可用时通过会话传送。All Event Hubs consumers connect via the AMQP 1.0 session and events are delivered through the session as they become available. 客户端不需要轮询数据可用性。The client does not need to poll for data availability.

使用者组Consumer groups

事件中心的发布/订阅机制通过“使用者组”启用。 The publish/subscribe mechanism of Event Hubs is enabled through consumer groups. 使用者组是整个事件中心视图(状态、位置或偏移量)。A consumer group is a view (state, position, or offset) of an entire event hub. 使用者组使多个消费应用程序都有各自独立的事件流视图,并按自身步调和偏移量独立读取流。Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets.

在流处理体系结构中,每个下游应用程序相当于一个使用者组。In a stream processing architecture, each downstream application equates to a consumer group. 如果要将事件数据写入长期存储,则该存储写入器应用程序就是一个使用者组。If you want to write event data to long-term storage, then that storage writer application is a consumer group. 然后,复杂的事件处理可由另一个独立的使用者组执行。Complex event processing can then be performed by another, separate consumer group. 只能通过使用者组访问分区。You can only access partitions through a consumer group. 事件中心内始终有一个默认的使用者组,最多可为一个标准层事件中心创建 20 个使用者组。There is always a default consumer group in an event hub, and you can create up to 20 consumer groups for a Standard tier event hub.

每个使用者组的分区上最多可以有 5 个并发读取者,但是建议每个使用者组的分区上只有一个活动接收者There can be at most 5 concurrent readers on a partition per consumer group; however it is recommended that there is only one active receiver on a partition per consumer group. 在单个分区中,每个读取者接收所有消息。Within a single partition, each reader receives all of the messages. 如果在同一分区上有多个读取者,则处理重复消息。If you have multiple readers on the same partition, then you process duplicate messages. 需在代码中处理此问题,这并非易于处理的。You need to handle this in your code, which may not be trivial. 但是,在某些情况下,这是一种有效的方法。However, it's a valid approach in some scenarios.

以下是使用者组 URI 约定的示例:The following are examples of the consumer group URI convention:

//[my namespace].servicebus.chinacloudapi.cn/[event hub name]/[Consumer Group #1]
//[my namespace].servicebus.chinacloudapi.cn/[event hub name]/[Consumer Group #2]

下图显示了事件中心流处理体系结构:The following figure shows the Event Hubs stream processing architecture:

事件中心

流偏移量Stream offsets

偏移量 是事件在分区中的位置。An offset is the position of an event within a partition. 可以将偏移量视为客户端游标。You can think of an offset as a client-side cursor. 偏移量是事件的字节编号。The offset is a byte numbering of the event. 有了该偏移量,事件使用者(读取者)便可以在事件流中指定要从其开始读取事件的点。This offset enables an event consumer (reader) to specify a point in the event stream from which they want to begin reading events. 可以时间戳或者偏移量值的形式指定偏移量。You can specify the offset as a timestamp or as an offset value. 使用者负责在事件中心服务的外部存储其自身的偏移量值。Consumers are responsible for storing their own offset values outside of the Event Hubs service. 在分区中,每个事件都包含一个偏移量。Within a partition, each event includes an offset.

事件中心

检查点Checkpointing

检查点 是读取者在分区事件序列中标记或提交其位置时执行的过程。Checkpointing is a process by which readers mark or commit their position within a partition event sequence. 检查点操作由使用者负责,并在使用者组中的每个分区上进行。Checkpointing is the responsibility of the consumer and occurs on a per-partition basis within a consumer group. 这种责任意味着,对于每个使用者组而言,每个分区读取者必须跟踪它在事件流中的当前位置,当它认为数据流已完成时,可以通知服务。This responsibility means that for each consumer group, each partition reader must keep track of its current position in the event stream, and can inform the service when it considers the data stream complete.

如果读取者与分区断开连接,当它重新连接时,将开始读取前面由该使用者组中该分区的最后一个读取者提交的检查点。If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. 当读取者建立连接时,它会将此偏移量传递给事件中心,以指定要从其开始读取数据的位置。When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. 这样,用户便可以使用检查点将事件标记为已由下游应用程序“完成”,并且在不同计算机上运行的读取者之间发生故障转移时,还可以提供弹性。In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. 若要返回到较旧的数据,可以在此检查点过程中指定较低的偏移量。It is possible to return to older data by specifying a lower offset from this checkpointing process. 借助此机制,检查点可以实现故障转移复原和事件流重放。Through this mechanism, checkpointing enables both failover resiliency and event stream replay.

常见的使用者任务Common consumer tasks

所有事件中心使用者都通过 AMQP 1.0 会话,一种状态感知型双向信道进行连接。All Event Hubs consumers connect via an AMQP 1.0 session, a state-aware bidirectional communication channel. 每个分区都提供一个 AMQP 1.0 会话,方便传输按分区隔离的事件。Each partition has an AMQP 1.0 session that facilitates the transport of events segregated by partition.

连接到分区Connect to a partition

在连接到分区时,常见的做法是使用租用机制来协调读取者与特定分区的连接。When connecting to partitions, it is common practice to use a leasing mechanism to coordinate reader connections to specific partitions. 这样,便可以做到一个使用者组中每分区只有一个活动的读取者。This way, it is possible for every partition in a consumer group to have only one active reader. 使用 .NET 客户端的 EventProcessorHost 类可以简化检查点、租用和读取者管理功能。Checkpointing, leasing, and managing readers are simplified by using the EventProcessorHost class for .NET clients. 事件处理程序主机是智能使用者代理。The Event Processor Host is an intelligent consumer agent.

读取事件Read events

为特定分区建立 AMQP 1.0 会话和链接后,事件中心服务会将事件传送到 AMQP 1.0 客户端。After an AMQP 1.0 session and link is opened for a specific partition, events are delivered to the AMQP 1.0 client by the Event Hubs service. 与 HTTP GET 等基于提取的机制相比,此传送机制可以实现更高的吞吐量和更低的延迟。This delivery mechanism enables higher throughput and lower latency than pull-based mechanisms such as HTTP GET. 将事件发送到客户端时,每个事件数据实例将包含重要的元数据,例如,用于简化对事件序列执行的检查点操作的偏移量和序列号。As events are sent to the client, each event data instance contains important metadata such as the offset and sequence number that are used to facilitate checkpointing on the event sequence.

事件数据:Event data:

  • OffsetOffset
  • 序列号Sequence number
  • 正文Body
  • 用户属性User properties
  • 系统属性System properties

用户负责管理偏移量。It is your responsibility to manage the offset.

后续步骤Next steps

有关事件中心的详细信息,请访问以下链接:For more information about Event Hubs, visit the following links: