通过事件中心进行缩放Scaling with Event Hubs

有两个因素会影响通过事件中心进行的缩放。There are two factors which influence scaling with Event Hubs.

  • 吞吐量单位Throughput units
  • 分区Partitions

吞吐量单位Throughput units

事件中心的吞吐量容量由吞吐量单位 控制。The throughput capacity of Event Hubs is controlled by throughput units. 吞吐量单位是预先购买的容量单位。Throughput units are pre-purchased units of capacity. 单个吞吐量是指:A single throughput lets you:

  • 入口:最高每秒 1 MB 或每秒 1000 个事件(以先达到的限制为准)。Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first).
  • 出口:最高每秒 2 MB,或每秒 4096 个事件。Egress: Up to 2 MB per second or 4096 events per second.

超出所购吞吐量单位的容量时,入口受限,返回 ServerBusyExceptionBeyond the capacity of the purchased throughput units, ingress is throttled and a ServerBusyException is returned. 出口不会出现限制异常,但仍受限于所购买吞吐量单位的容量。Egress does not produce throttling exceptions, but is still limited to the capacity of the purchased throughput units. 如果收到发布速率异常或者预期看到更高的出口,请务必检查为命名空间购买的吞吐量单位数量。If you receive publishing rate exceptions or are expecting to see higher egress, be sure to check how many throughput units you have purchased for the namespace. 可以在 Azure 门户的命名空间的“规模”边栏选项卡上管理吞吐量单位。You can manage throughput units on the Scale blade of the namespaces in the Azure portal. 也可使用事件中心 API 以编程方式管理吞吐量单位。You can also manage throughput units programmatically using the Event Hubs APIs.

吞吐量单位按小时计费,需提前购买。Throughput units are pre-purchased and are billed per hour. 购买后,吞吐量单位的最短计费时限为一小时。Once purchased, throughput units are billed for a minimum of one hour. 最多可以为一个事件中心命名空间购买 20 个吞吐量单位,这些单位跨此命名空间内的所有事件中心进行共享。Up to 20 throughput units can be purchased for an Event Hubs namespace and are shared across all event hubs in that namespace.

事件中心的自动膨胀功能通过增加吞吐量单位数进行自动纵向扩展,以便满足使用量需求 。The Auto-inflate feature of Event Hubs automatically scales up by increasing the number of throughput units, to meet usage needs. 增加吞吐量单位数可防止出现限制情况,在这些情况下:Increasing throughput units prevents throttling scenarios, in which:

  • 数据入口速率超过设置的吞吐量单位数。Data ingress rates exceed set throughput units.
  • 数据出口请求速率超过设置的吞吐量单位数。Data egress request rates exceed set throughput units.

当负载的增加超过最小阈值时,事件中心服务会增加吞吐量,不会因服务器繁忙错误导致任何请求失败。The Event Hubs service increases the throughput when load increases beyond the minimum threshold, without any requests failing with ServerBusy errors.

有关自动扩充功能的详细信息,请参阅自动缩放吞吐量单位For more information about the auto-inflate feature, see Automatically scale throughput units.

分区Partitions

事件中心将发送到事件中心的事件序列组织到一个或多个分区中。Event Hubs organizes sequences of events sent to an event hub into one or more partitions. 当较新的事件到达时,它们将添加到此序列的末尾。As newer events arrive, they're added to the end of this sequence.

事件中心

可以将分区视为“提交日志”。A partition can be thought of as a "commit log". 分区保存事件数据,这些数据包含事件的主体、描述事件的用户定义属性包以及元数据(例如它在分区中的偏移量、它在流序列中的编号以及它被接受时的服务端时间戳)。Partitions hold event data that contains body of the event, a user-defined property bag describing the event, metadata such as its offset in the partition, its number in the stream sequence, and service-side timestamp at which it was accepted.

显示从旧到新的事件序列的示意图。

使用分区的优势Advantages of using partitions

事件中心旨在帮助处理量较大的事件,分区通过两种方式对此提供帮助:Event Hubs is designed to help with processing of large volumes of events, and partitioning helps with that in two ways:

  • 尽管事件中心是一项 PaaS 服务,但其背后存在一个物理现实,并且维护一个保持事件顺序的日志需要将这些事件一起保存在基础存储及其副本中,这将导致出现针对此类日志的吞吐量上限。Even though Event Hubs is a PaaS service, there's a physical reality underneath, and maintaining a log that preserves the order of events requires that these events are being kept together in the underlying storage and its replicas and that results in a throughput ceiling for such a log. 分区允许将多个并行日志用于同一个事件中心,从而使可用的原始 IO 吞吐容量倍增。Partitioning allows for multiple parallel logs to be used for the same event hub and therefore multiplying the available raw IO throughput capacity.
  • 你自己的应用程序必须能够及时处理要发送到事件中心的事件量。Your own applications must be able to keep up with processing the volume of events that are being sent into an event hub. 这可能很复杂,并且需要大量的横向扩展并行处理容量。It may be complex and requires substantial, scaled-out, parallel processing capacity. 用于处理事件的单个进程的容量有限,因此需要多个进程。The capacity of a single process to handle events is limited, so you need several processes. 分区是解决方案为这些进程供给容量的一种方式,它们还能确保每个事件都有一个明确的处理所有者。Partitions are how your solution feeds those processes and yet ensures that each event has a clear processing owner.

分区数Number of partitions

分区数在创建时指定,并且在事件中心标准层中必须介于 1 和 32 之间。The number of partitions is specified at creation and must be between 1 and 32 in Event Hubs Standard. 在事件中心专用层中,每个容量单位的分区计数最多可达 2000 个分区。The partition count can be up to 2000 partitions per Capacity Unit in Event Hubs Dedicated.

建议在特定事件中心的应用程序峰值负载期间,至少选择你预期需要的持续吞吐量单位 (TU) 数量的分区。We recommend that you choose at least as many partitions as you expect to require in sustained throughput units (TU) during the peak load of your application for that particular Event Hub. 应该以吞吐容量为 1 TU(1 MByte 输入,2 MByte 输出)的单个分区进行计算。You should calculate with a single partition having a throughput capacity of 1 TU (1 MByte in, 2 MByte out). 你可以扩展命名空间上的 TU 或群集的容量单位,而不依赖分区计数。You can scale the TUs on your namespace or the capacity units of your cluster independent of the partition count. 当命名空间设置为 1 TU 容量时,具有 32 个分区的事件中心或具有 1 个分区的事件中心会产生完全相同的费用。An Event Hub with 32 partitions or an Event Hub with 1 partition incur the exact same cost when the namespace is set to 1 TU capacity.

创建事件中心后,可以增加专用事件中心群集中事件中心的分区计数,但当分区键到分区的映射发生更改时,流在分区之间的分布也会发生更改,因此如果应用程序中事件的相对顺序很重要,你应该尽力避免此类更改。The partition count for an event hub in a dedicated Event Hubs cluster can be increased after the event hub has been created, but the distribution of streams across partitions will change when it's done as the mapping of partition keys to partitions changes, so you should try hard to avoid such changes if the relative order of events matters in your application.

将分区数设置为允许的最大值很有吸引力,但请始终记住,事件流需要进行结构化,这样你才能真正利用多个分区。Setting the number of partitions to the maximum permitted value is tempting, but always keep in mind that your event streams need to be structured such that you can indeed take advantage of multiple partitions. 如果需要跨所有事件或仅少数几个子流保持绝对顺序,则你可能无法利用多个分区。If you need absolute order preservation across all events or only a handful of substreams, you may not be able to take advantage of many partitions. 而且,多个分区会使处理端更加复杂。Also, many partitions make the processing side more complex.

事件到分区的映射Mapping of events to partitions

可以使用分区键将传入事件数据映射到特定分区,以便进行数据组织。You can use a partition key to map incoming event data into specific partitions for the purpose of data organization. 分区键是发送者提供的、要传递给事件中心的值。The partition key is a sender-supplied value passed into an event hub. 该键通过静态哈希函数进行处理,以便分配分区。It is processed through a static hashing function, which creates the partition assignment. 如果在发布事件时未指定分区键,则会使用循环分配。If you don't specify a partition key when publishing an event, a round-robin assignment is used.

事件发布者只知道其分区密钥,而不知道事件要发布到的分区。The event publisher is only aware of its partition key, not the partition to which the events are published. 键与分区的这种分离使发送者无需了解有关下游处理的过多信息。This decoupling of key and partition insulates the sender from needing to know too much about the downstream processing. 每个设备或用户的唯一标识就可以充当一个适当的分区键,但是,也可以使用其他属性(例如地理位置),以便将相关的事件分组到单个分区中。A per-device or user unique identity makes a good partition key, but other attributes such as geography can also be used to group related events into a single partition.

通过指定分区键,可使相关事件保持在同一分区中,并按其发送的确切顺序排列。Specifying a partition key enables keeping related events together in the same partition and in the exact order in which they were sent. 分区键是派生自应用程序上下文并标识事件之间的相互关系的字符串。The partition key is some string that is derived from your application context and identifies the interrelationship of the events. 分区键标识的事件序列是一个流。A sequence of events identified by a partition key is a stream. 分区是针对许多此类流的多路复用日志存储。A partition is a multiplexed log store for many such streams.

备注

尽管你可以直接向分区发送事件,但我们不建议这样做,尤其是保持高可用性至关重要时。While you can send events directly to partitions, we don't recommend it, especially when high availability is important to you. 这种做法会将事件中心的可用性降级到分区级别。It downgrades the availability of an event hub to partition-level. 有关详细信息,请参阅可用性和一致性For more information, see Availability and Consistency.

后续步骤Next steps

访问以下链接可以了解有关事件中心的详细信息:You can learn more about Event Hubs by visiting the following links: