通过事件中心进行缩放Scaling with Event Hubs

有两个因素会影响通过事件中心进行的缩放。There are two factors which influence scaling with Event Hubs.

  • 吞吐量单位Throughput units
  • 分区Partitions

吞吐量单位Throughput units

事件中心的吞吐量容量由吞吐量单位 控制。The throughput capacity of Event Hubs is controlled by throughput units. 吞吐量单位是预先购买的容量单位。Throughput units are pre-purchased units of capacity. 单个吞吐量是指:A single throughput lets you:

  • 入口:最高每秒 1 MB 或每秒 1000 个事件(以先达到的限制为准)。Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first).
  • 出口:最高每秒 2 MB,或每秒 4096 个事件。Egress: Up to 2 MB per second or 4096 events per second.

超出所购吞吐量单位的容量时,入口受限,返回 ServerBusyExceptionBeyond the capacity of the purchased throughput units, ingress is throttled and a ServerBusyException is returned. 出口不会出现限制异常,但仍受限于所购买吞吐量单位的容量。Egress does not produce throttling exceptions, but is still limited to the capacity of the purchased throughput units. 如果收到发布速率异常或者预期看到更高的出口,请务必检查为命名空间购买的吞吐量单位数量。If you receive publishing rate exceptions or are expecting to see higher egress, be sure to check how many throughput units you have purchased for the namespace. 可以在 Azure 门户的命名空间的“规模”边栏选项卡上管理吞吐量单位。You can manage throughput units on the Scale blade of the namespaces in the Azure portal. 也可使用事件中心 API 以编程方式管理吞吐量单位。You can also manage throughput units programmatically using the Event Hubs APIs.

吞吐量单位按小时计费,需提前购买。Throughput units are pre-purchased and are billed per hour. 购买后,吞吐量单位的最短计费时限为一小时。Once purchased, throughput units are billed for a minimum of one hour. 最多可以为一个事件中心命名空间购买 20 个吞吐量单位,这些单位跨此命名空间内的所有事件中心进行共享。Up to 20 throughput units can be purchased for an Event Hubs namespace and are shared across all event hubs in that namespace.

事件中心的自动膨胀功能通过增加吞吐量单位数进行自动纵向扩展,以便满足使用量需求 。The Auto-inflate feature of Event Hubs automatically scales up by increasing the number of throughput units, to meet usage needs. 增加吞吐量单位数可防止出现限制情况,在这些情况下:Increasing throughput units prevents throttling scenarios, in which:

  • 数据入口速率超过设置的吞吐量单位数。Data ingress rates exceed set throughput units.
  • 数据出口请求速率超过设置的吞吐量单位数。Data egress request rates exceed set throughput units.

当负载的增加超过最小阈值时,事件中心服务会增加吞吐量,不会因服务器繁忙错误导致任何请求失败。The Event Hubs service increases the throughput when load increases beyond the minimum threshold, without any requests failing with ServerBusy errors.

有关自动扩充功能的详细信息,请参阅自动缩放吞吐量单位For more information about the auto-inflate feature, see Automatically scale throughput units.

分区Partitions

事件中心通过分区使用者模式提供消息流式处理功能,在此模式下,每个使用者只读取消息流的特定子集或分区。Event Hubs provides message streaming through a partitioned consumer pattern in which each consumer only reads a specific subset, or partition, of the message stream. 此模式支持事件处理的水平缩放,同时提供队列和主题中不可用的其他面向流的功能。This pattern enables horizontal scale for event processing and provides other stream-focused features that are unavailable in queues and topics.

分区是事件中心内保留的有序事件。A partition is an ordered sequence of events that is held in an event hub. 当较新的事件到达时,它们将添加到此序列的末尾。As newer events arrive, they are added to the end of this sequence. 可以将分区视为“提交日志”。A partition can be thought of as a "commit log."

事件中心

事件中心按配置的保留时间保留数据,该时间适用于事件中心的所有分区。Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. 事件根据特定的时间过期;无法显式删除事件。Events expire on a time basis; you cannot explicitly delete them. 由于分区互相独立且包含自身的数据序列,因此通常按不同速率增大。Because partitions are independent and contain their own sequence of data, they often grow at different rates.

事件中心

分区数在创建时指定,必须介于 2 到 32 之间。The number of partitions is specified at creation and must be between 2 and 32. 分区计数不可更改,因此在设置分区计数时应考虑长期规模。The partition count is not changeable, so you should consider long-term scale when setting partition count. 分区是一种数据组织机制,与使用方应用程序中所需的下游并行度相关。Partitions are a data organization mechanism that relates to the downstream parallelism required in consuming applications. 事件中心的分区数与预期会有的并发读取者数直接相关。The number of partitions in an event hub directly relates to the number of concurrent readers you expect to have. 要将分区数增加到 32 以上,可以联系事件中心团队。You can increase the number of partitions beyond 32 by contacting the Event Hubs team.

你可能希望在创建时将其设置为最高可能值,即 32。You may want to set it to be the highest possible value, which is 32, at the time of creation. 请记住,拥有多个分区将导致事件发送到多个分区而不保留顺序,除非你将发送方配置为仅发送到 32 个分区中的一个分区,剩下的 31 个分区是冗余分区。Remember that having more than one partition will result in events sent to multiple partitions without retaining the order, unless you configure senders to only send to a single partition out of the 32 leaving the remaining 31 partitions redundant. 在前一种情况下,必须跨所有 32 个分区读取事件。In the former case, you will have to read events across all 32 partitions. 在后一种情况下,除了必须在事件处理器主机上进行额外配置外,没有明显的额外成本。In the latter case, there is no obvious additional cost apart from the extra configuration you have to make on Event Processor Host.

虽然可以标识分区并向其直接发送数据,但并不建议直接发送到分区,While partitions are identifiable and can be sent to directly, sending directly to a partition is not recommended. 而应使用事件发布者部分介绍的更高级构造。Instead, you can use higher level constructs introduced in the Event publishers section.

分区中填充了一系列的事件数据,这些数据包含事件的正文、用户定义的属性包和元数据,例如,它在分区中的偏移量,以及它在流序列中的编号。Partitions are filled with a sequence of event data that contains the body of the event, a user-defined property bag, and metadata such as its offset in the partition and its number in the stream sequence.

建议以 1:1 的比例来平衡吞吐量单位和分区数目,实现最佳缩放。We recommend that you balance 1:1 throughput units and partitions to achieve optimal scale. 一个分区最多只能保证一个吞吐量单位的入口和出口。A single partition has a guaranteed ingress and egress of up to one throughput unit. 虽然你也许可以在一个分区实现更高的吞吐量,但性能无法得到保证。While you may be able to achieve higher throughput on a partition, performance is not guaranteed. 这就是我们强烈建议一个事件中心的分区数大于或等于吞吐量单位数的原因。This is why we strongly recommend that the number of partitions in an event hub be greater than or equal to the number of throughput units.

如果已确定所需总吞吐量,则可以知道所需吞吐量单位数和最小分区数,但应该有多少分区呢?Given the total throughput you plan on needing, you know the number of throughput units you require and the minimum number of partitions, but how many partitions should you have? 选择的分区数取决于要实现的下游并行度以及未来的吞吐量需求。Choose number of partitions based on the downstream parallelism you want to achieve as well as your future throughput needs. 我们不根据事件中心的分区数收费。There is no charge for the number of partitions you have within an Event Hub.

如需详细了解分区以及如何在可用性和可靠性之间进行取舍,请参阅事件中心编程指南事件中心中的可用性和一致性这两篇文章。For more information about partitions and the trade-off between availability and reliability, see the Event Hubs programming guide and the Availability and consistency in Event Hubs article.

分区键Partition key

可以使用分区键将传入事件数据映射到特定分区,以便进行数据组织。You can use a partition key to map incoming event data into specific partitions for the purpose of data organization. 分区键是发送者提供的、要传递给事件中心的值。The partition key is a sender-supplied value passed into an event hub. 该键通过静态哈希函数进行处理,以便分配分区。It is processed through a static hashing function, which creates the partition assignment. 如果在发布事件时未指定分区键,则会使用循环分配。If you don't specify a partition key when publishing an event, a round-robin assignment is used.

事件发布者只知道其分区密钥,而不知道事件要发布到的分区。The event publisher is only aware of its partition key, not the partition to which the events are published. 键与分区的这种分离使发送者无需了解有关下游处理的过多信息。This decoupling of key and partition insulates the sender from needing to know too much about the downstream processing. 每个设备或用户的唯一标识就可以充当一个适当的分区键,但是,也可以使用其他属性(例如地理位置),以便将相关的事件分组到单个分区中。A per-device or user unique identity makes a good partition key, but other attributes such as geography can also be used to group related events into a single partition.

后续步骤Next steps

访问以下链接可以了解有关事件中心的详细信息:You can learn more about Event Hubs by visiting the following links: