使用 Apache Kafka 应用程序中的 Azure 事件中心Use Azure Event Hubs from Apache Kafka applications

事件中心提供与 Apache Kafka® 生成者和使用者 API 兼容的终结点,大多数现有 Apache Kafka 客户端应用程序可以使用这些 API 作为运行你自己的 Apache Kafka 群集的替代方法。Event Hubs provides an endpoint compatible with the Apache Kafka® producer and consumer APIs that can be used by most existing Apache Kafka client applications as an alternative to running your own Apache Kafka cluster. 事件中心 1.0 及更高版本支持 Apache Kafka 的生成者和使用者 API 客户端。Event Hubs supports Apache Kafka's producer and consumer APIs clients at version 1.0 and above.

适用于 Kafka 的事件中心可提供什么?What does Event Hubs for Kafka provide?

适用于 Apache Kafka 的事件中心功能在 Azure 事件中心的基础上提供了一个协议头,该协议与为 Apache Kafka 服务器 1.0 及更高版本生成的 Apache Kafka 客户端兼容,并支持从与 Apache Kafka 主题等效的事件中心进行读取以及向其中进行写入。The Event Hubs for Apache Kafka feature provides a protocol head on top of Azure Event Hubs that is protocol compatible with Apache Kafka clients built for Apache Kafka server versions 1.0 and later and supports for both reading from and writing to Event Hubs, which are equivalent to Apache Kafka topics.

你通常可以从应用程序中使用事件中心 Kafka 终结点,与现有的 Kafka 设置相比,无需更改代码,只需修改配置:更新配置中的连接字符串,使之指向事件中心公开的 Kafka 终结点,而不是指向 Kafka 群集。You can often use the Event Hubs Kafka endpoint from your applications without code changes compared to your existing Kafka setup and only modify the configuration: Update the connection string in configurations to point to the Kafka endpoint exposed by your event hub instead of pointing to your Kafka cluster. 然后,可以开始将使用 Kafka 协议的应用程序中的事件流式传输到事件中心。Then, you can start streaming events from your applications that use the Kafka protocol into Event Hubs.

从概念上讲,Kafka 和事件中心非常类似:它们都是为流式处理数据生成的已分区日志,因此,客户端可以控制它要读取保留的日志的哪一部分。Conceptually, Kafka and Event Hubs are very similar: they're both partitioned logs built for streaming data, whereby the client controls which part of the retained log it wants to read. 下表映射 Kafka 和事件中心之间的概念。The following table maps concepts between Kafka and Event Hubs.

Kafka 和事件中心概念映射Kafka and Event Hub conceptual mapping

Kafka 概念Kafka Concept 事件中心概念Event Hubs Concept
群集Cluster 命名空间Namespace
主题Topic 事件中心Event Hub
分区Partition 分区Partition
使用者组Consumer Group 使用者组Consumer Group
OffsetOffset OffsetOffset

Apache Kafka 和事件中心之间的主要区别Key differences between Apache Kafka and Event Hubs

尽管 Apache Kafka 是你通常需要安装和操作的软件,但事件中心是一种完全托管的云原生服务。While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. 没有需要管理和监视的服务器、磁盘或网络,也没有需要考虑或配置的中转站。There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. 你需要创建一个命名空间(即具有完全限定的域名的终结点),然后在该命名空间中创建事件中心(主题)。You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace.

有关事件中心和命名空间的详细信息,请参阅事件中心功能For more information about Event Hubs and namespaces, see Event Hubs features. 作为云服务,事件中心使用单一稳定的虚拟 IP 地址作为终结点,因此客户端无需了解群集中代理或计算机的情况。As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. 尽管事件中心实现了同一协议,但此差异意味着所有分区的所有 Kafka 流量都可通过这一个终结点以可预测方式进行路由,而无需对群集的所有中转站进行防火墙访问。Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster.

事件中心的规模由购买的吞吐量单位数控制,每个吞吐量单位可提供每秒 1 MB 的速度或每秒 1,000 个事件的流入量,而流出量则是流入量的两倍。Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. 如果使用自动扩充功能,则在达到吞吐量限制时,事件中心可以自动纵向扩展吞吐量单位;此功能也适用于 Apache Kafka 协议支持。Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.

安全性和身份验证Security and authentication

每次你发布或使用来自用于 Kafka 的事件中心的事件时,客户端都会尝试访问事件中心资源。Every time you publish or consume events from an Event Hubs for Kafka, your client is trying to access the Event Hubs resources. 你希望确保使用已授权的实体来访问资源。You want to ensure that the resources are accessed using an authorized entity. 在客户端上使用 Apache Kafka 协议时,可以使用 SASL 机制设置用于身份验证和加密的配置。When using Apache Kafka protocol with your clients, you can set your configuration for authentication and encryption using the SASL mechanisms. 使用用于 Kafka 的事件中心时,需要进行 TLS 加密(因为使用事件中心传输的所有数据均经过 TLS 加密)。When using Event Hubs for Kafka requires the TLS-encryption (as all data in transit with Event Hubs is TLS encrypted). 可以在配置文件中指定 SASL_SSL 选项来完成此设置。It can be done specifying the SASL_SSL option in your configuration file.

Azure 事件中心提供了多个选项来授予对安全资源的访问权限。Azure Event Hubs provides multiple options to authorize access to your secure resources.

  • OAuth 2.0OAuth 2.0
  • 共享访问签名 (SAS)Shared access signature (SAS)

OAuth 2.0OAuth 2.0

事件中心会与 Azure Active Directory (Azure AD) 集成,后者提供了与 OAuth 2.0 兼容的集中式授权服务器。Event Hubs integrates with Azure Active Directory (Azure AD), which provides an OAuth 2.0 compliant centralized authorization server. 使用 Azure AD,你可以通过 Azure 基于角色的访问控制 (Azure RBAC) 向客户端标识授予细粒度权限。With Azure AD, you can use Azure role-based access control (Azure RBAC) to grant fine grained permissions to your client identities. 可以指定“SASL_SSL”作为协议,并指定“OAUTHBEARER”作为机制,通过这种方式将此功能用于 Kafka 客户端。You can use this feature with your Kafka clients by specifying SASL_SSL for the protocol and OAUTHBEARER for the mechanism. 有关 Azure 角色和范围访问级别的详细信息,请参阅使用 Azure AD 授予访问权限For details about Azure roles and levels for scoping access, see Authorize access with Azure AD.

bootstrap.servers=NAMESPACENAME.servicebus.chinacloudapi.cn:9093
security.protocol=SASL_SSL
sasl.mechanism=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;
sasl.login.callback.handler.class=CustomAuthenticateCallbackHandler;

共享访问签名 (SAS)Shared Access Signature (SAS)

事件中心还提供了共享访问签名 (SAS),方便你对用于 Kafka 的事件中心资源进行委派访问。Event Hubs also provides the Shared Access Signatures (SAS) for delegated access to Event Hubs for Kafka resources. 与 SAS 相比,使用 OAuth 2.0 基于令牌的机制授予访问权限具有更好的安全性和易用性。Authorizing access using OAuth 2.0 token-based mechanism provides superior security and ease of use over SAS. 内置角色还可以消除基于 ACL 的授权(用户必须对其进行维护和管理)的需要。The built-in roles can also eliminate the need for ACL-based authorization, which has to be maintained and managed by the user. 可以指定“SASL_SSL”作为协议,并指定“PLAIN”作为机制,通过这种方式将此功能用于 Kafka 客户端。You can use this feature with your Kafka clients by specifying SASL_SSL for the protocol and PLAIN for the mechanism.

bootstrap.servers=NAMESPACENAME.servicebus.chinacloudapi.cn:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";

重要

{YOUR.EVENTHUBS.CONNECTION.STRING} 替换为事件中心命名空间的连接字符串。Replace {YOUR.EVENTHUBS.CONNECTION.STRING} with the connection string for your Event Hubs namespace. 有关获取连接字符串的说明,请参阅获取事件中心连接字符串For instructions on getting the connection string, see Get an Event Hubs connection string. 下面是一个配置示例:sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.chinacloudapi.cn/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXXXXX";Here's an example configuration: sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.chinacloudapi.cn/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXXXXX";

备注

对 Kafka 客户端使用 SAS 身份验证时,在重新生成 SAS 密钥时,已建立的连接不会断开。When using SAS authentication with Kafka clients, established connections aren't disconnected when the SAS key is regenerated.

示例Samples

有关创建事件中心并使用 SAS 或 OAuth 对其进行访问的分步说明教程,请参阅快速入门:使用 Kafka 协议通过事件中心进行数据流式传输For a tutorial with step-by-step instructions to create an event hub and access it using SAS or OAuth, see Quickstart: Data streaming with Event Hubs using the Kafka protocol.

如需更多示例来演示如何在用于 Kafka 的事件中心使用 OAuth,请参阅 GitHub 上的示例For more samples that show how to use OAuth with Event Hubs for Kafka, see samples on GitHub.

其他事件中心功能Other Event Hubs features

适用于 Apache Kafka 的事件中心功能是在 Azure 事件中心并发提供的三个协议之一,用于补充 HTTP 和 AMQP。The Event Hubs for Apache Kafka feature is one of three protocols concurrently available on Azure Event Hubs, complementing HTTP and AMQP. 你可以使用这些协议中的任何一种进行写入,也可以使用任何一种进行读取,以便你当前的 Apache Kafka 生成者可以继续通过 Apache Kafka 进行发布,但是你的读取者可以受益于与事件中心的 AMQP 接口的原生集成,例如 Azure 流分析或 Azure Functions。You can write with any of these protocols and read with any another, so that your current Apache Kafka producers can continue publishing via Apache Kafka, but your reader can benefit from the the native integration with Event Hubs' AMQP interface, such as Azure Stream Analytics or Azure Functions. 反过来,你可以轻松地将 Azure 事件中心作为目标终结点集成到 AMQP 路由网络中,但仍通过 Apache Kafka 集成读取数据。Reversely, you can readily integrate Azure Event Hubs into AMQP routing networks as an target endpoint, and yet read data through Apache Kafka integrations.

此外,事件中心功能(例如捕获)可通过 Azure Blob 存储和 Azure Data Lake Storage 实现极具成本效益的长期存档,也可与适用于 Kafka 的事件中心功能一起使用。Additionally, Event Hubs features such as Capture, which enables extremely cost efficient long term archival via Azure Blob Storage and Azure Data Lake Storage also work with the Event Hubs for Kafka feature.

Apache Kafka 功能差异Apache Kafka feature differences

适用于 Kafka 的事件中心的目标是为锁定到 Apache Kafka API 的应用程序提供对 Azure 事件中心功能的访问权限,否则必须由 Apache Kafka 群集提供支持。The goal of Event Hubs for Apache Kafka is to provide access to Azure Event Hub's capabilities to applications that are locked into the Apache Kafka API and would otherwise have to be backed by an Apache Kafka cluster.

上面所述,Azure 消息传递团队为许多消息传递场景提供了丰富而强大的覆盖范围。尽管当前无法借助事件中心对 Apache Kafka API 的支持来支持以下功能,但我们指出了在何处以何种方式获得所需的功能。As explained above, the Azure Messaging fleet provides rich and robust coverage for a multitude of messaging scenarios, and although the following features are not currently supported through Event Hubs' support for the Apache Kafka API, we point out where and how the desired capability is available.

事务Transactions

Azure 服务总线具有强大的事务支持,该事务支持允许在事务的一致性保护下接收并处理消息和会话,同时将消息处理过程中产生的出站消息发送到多个目标实体。Azure Service Bus has robust transaction support that allows receiving and settling messages and sessions while sending outbound messages resulting from message processing to multiple target entities under the consistency protection of a transaction. 此功能集不仅允许对序列中的每条消息进行恰好一次处理,而且还避免了其他使用者无意中重新处理相同消息的风险,就像 Apache Kafka 的情况一样。The feature set not only allows for exactly-once processing of each message in a sequence, but also avoids the risk of another consumer inadvertently reprocessing the same messages as it would be the case with Apache Kafka. 对于事务性消息工作负荷,建议使用服务总线服务。Service Bus is the recommended service for transactional message workloads.

压缩Compression

Apache Kafka 的客户端压缩功能可将一批(多条)消息在生成者端压缩为单条消息,然后在使用者端解压该批。The client-side compression feature of Apache Kafka compresses a batch of multiple messages into a single message on the producer side and decompresses the batch on the consumer side. Apache Kafka 中转站将批视为一条特殊消息。The Apache Kafka broker treats the batch as a special message.

此功能与 Azure 事件中心的多协议模型存在根本的不同,多协议模型允许从中转站通过任何协议单独检索消息(甚至那些成批发送的消息)。This feature is fundamentally at odds with Azure Event Hubs' multi-protocol model, which allows for messages, even those sent in batches, to be individually retrievable from the broker and through any protocol.

任何事件中心事件的有效负载都是一个字节流,内容可以使用你选择的算法进行压缩。The payload of any Event Hub event is a byte stream and the content can be compressed with an algorithm of your choosing. Apache Avro 编码格式本身支持压缩。The Apache Avro encoding format supports compression natively.

日志压缩Log Compaction

Apache-Kafka 日志压缩是如下所述的一项功能:它允许从分区中逐出每个键除了最后一条记录之外的所有记录,这实际上会将 Apache Kafka 主题转换为一个键-值存储,其中最后添加的值将替代前一个。Apache Kafka log compaction is a feature that allows evicting all but the last record of each key from a partition, which effectively turns an Apache Kafka topic into a key-value store where the last value added overrides the previous one. 键-值存储模式,即使频繁进行更新,在由 Azure Cosmos DB 之类的数据库服务提供支持时也比其他支持方式要好得多。The key-value store pattern, even with frequent updates, is far better supported by database services like Azure Cosmos DB.

日志压缩功能由 Kafka Connect 和 Kafka Streams 客户端框架使用。The log compaction feature is used by the Kafka Connect and Kafka Streams client frameworks.

Kafka StreamKafka Streams

Kafka Streams 是用于流分析的一个客户端库,它是 Apache Kafka 开源项目的一部分,但独立于 Apache Kafka 事件流中转站。Kafka Streams is a client library for stream analytics that is part of the Apache Kafka open source project, but is separate from the Apache Kafka event stream broker.

Azure 事件中心客户要求提供 Kafka Streams 支持的最常见原因是,他们对 Confluent 的“ksqlDB”产品感兴趣。The most common reason Azure Event Hubs customers ask for Kafka Streams support is because they are interested in Confluent's "ksqlDB" product. “ksqlDB”是一个具有专利的共享源项目,它的许可要求如下:任何“提供软件即服务、平台即服务、基础设施即服务或其他与 Confluent 产品或服务竞争的类似在线服务”的供应商都不允许使用或提供“ksqlDB”支持。"ksqlDB" is a proprietary shared source project that is licensed such that no vendor "offering software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online services that competes with Confluent products or services" is permitted to use or offer "ksqlDB" support. 实际上,如果你使用 ksqlDB,你必须亲自运营 Kafka,或者必须使用 Confluent 的云产品/服务。Practically, if you use ksqlDB, you must either operate Kafka yourself or you must use Confluent’s cloud offerings. 许可条款还可能会影响为许可证排除的用途提供服务的 Azure 客户。The licensing terms might also affect Azure customers who offer services for a purpose excluded by the license.

Kafka Streams 独立且没有 ksqlDB,其功能与许多替代框架和服务相比较少,它们的大多数都具有内置的流式处理 SQL 接口,目前全部与 Azure 事件中心集成:Standalone and without ksqlDB, Kafka Streams has fewer capabilities than many alternative frameworks and services, most of which have built-in streaming SQL interfaces, and all of which integrate with Azure Event Hubs today:

列出的服务和框架通常可以通过适配器直接从不同的源集中获取事件流和参考数据。The listed services and frameworks can generally acquire event streams and reference data directly from a diverse set of sources through adapters. Kafka Streams 只能从 Apache Kafka 获取数据,因此你的分析项目会被锁定到 Apache Kafka 中。Kafka Streams can only acquire data from Apache Kafka and your analytics projects are therefore locked into Apache Kafka. 若要使用来自其他源的数据,需要先使用 Kafka Connect 框架将数据导入到 Apache Kafka 中。To use data from other sources, you are required to first import data into Apache Kafka with the Kafka Connect framework.

如果你必须使用 Azure 上的 Kafka Streams 框架,Apache Kafka on HDInsight 会提供该选项。If you must use the Kafka Streams framework on Azure, Apache Kafka on HDInsight will provide you with that option. Apache Kafka on HDInsight 提供对 Apache Kafka 的所有配置方面的完全控制,同时与 Azure 平台的各个方面(从故障/更新域放置到网络隔离再到监视集成)完全集成。Apache Kafka on HDInsight provides full control over all configuration aspects of Apache Kafka, while being fully integrated with various aspects of the Azure platform, from fault/update domain placement to network isolation to monitoring integration.

后续步骤Next steps

本文介绍了适用于 Kafka 的事件中心。This article provided an introduction to Event Hubs for Kafka. 若要了解详细信息,请参阅针对 Azure 事件中心的 Apache Kafka 开发人员指南To learn more, see Apache Kafka developer guide for Azure Event Hubs.