将 Apache Kafka 工作负荷迁移到 Azure HDInsight 4.0Migrate Apache Kafka workloads to Azure HDInsight 4.0

Azure HDInsight 4.0 提供最新的开源组件,并在性能、连接和安全性方面有了很大的增强。Azure HDInsight 4.0 offers the latest open-source components with significant enhancements in performance, connectivity, and security. 本文档介绍如何将 HDInsight 3.6 上的 Apache Kafka 工作负荷迁移到 HDInsight 4.0。This document explains how to migrate Apache Kafka workloads on HDInsight 3.6 to HDInsight 4.0. 将工作负荷迁移到 HDInsight 4.0 后,可以使用 HDInsight 3.6 所不能提供的许多新功能。After migrating your workloads to HDInsight 4.0, you can use many of the new features that aren't available on HDInsight 3.6.

HDInsight 3.6 Kafka 迁移路径HDInsight 3.6 Kafka migration paths

HDInsight 3.6 支持两个版本的 Kafka:1.0.0 和 1.1.0。HDInsight 3.6 supports two versions of Kafka: 1.0.0 and 1.1.0. HDInsight 4.0 支持版本 1.1.0 和 2.1.0。HDInsight 4.0 supports versions 1.1.0 and 2.1.0. 根据要运行的 Kafka 版本和 HDInsight 版本,有多种受支持的迁移路径。Depending on which version of Kafka and which version of HDInsight you would like to run, there are multiple supported migration paths. 下文将介绍这些路径,下图也演示了这些路径。These paths are explained below and illustrated in the following diagram.

  • 在最新的版本上运行 Kafka 和 HDInsight(推荐) :将 HDInsight 3.6 和 Kafka 1.0.0 或 1.1.0 应用程序迁移到包含 Kafka 2.1.0 的 HDInsight 4.0(下面的路径 D 和 E)。Run both Kafka and HDInsight on latest versions (recommended): Migrate an HDInsight 3.6 and Kafka 1.0.0 or 1.1.0 application to HDInsight 4.0 with Kafka 2.1.0 (paths D and E below).
  • 在最新的版本上运行 HDInsight,但在较新的版本上运行 Kafka: 将 HDInsight 3.6 和 Kafka 1.0.0 应用程序迁移到包含 Kafka 1.1.0 的 HDInsight 4.0(下面的路径 B)。Run HDInsight on the latest version, but Kafka only on a more recent version: Migrate an HDInsight 3.6 and Kafka 1.0.0 application to HDInsight 4.0 with Kafka 1.1.0 (path B below).
  • 在最新的版本上运行 HDInsight,保留 Kafka 版本:将 HDInsight 3.6 和 Kafka 1.1.0 应用程序迁移到包含 Kafka 1.1.0 的 HDInsight 4.0(下面的路径 C)。Run HDInsight on the latest version, retain Kafka version: Migrate an HDInsight 3.6 and Kafka 1.1.0 application to HDInsight 4.0 with Kafka 1.1.0 (path C below).
  • 在较新的版本上运行 Kafka,保留 HDInsight 版本:将 Kafka 1.0.0 应用程序迁移到 1.1.0,并保留 HDInsight 3.6(下面的路径 A)。Run Kafka on a more recent version, retain HDInsight version: Migrate a Kafka 1.0.0 application to 1.1.0 and stay on HDInsight 3.6 (path A below). 请注意,此选项仍需部署新群集。Note that this option will still require deploying a new cluster. 不支持升级现有群集上的 Kafka 版本。Upgrading the Kafka version on an existing cluster is not supported. 使用所需的版本创建群集后,迁移 Kafka 客户端以使用新群集。After you create a cluster with the version you want, migrate your Kafka clients to use the new cluster.

HDInsight 3.6 上的 Apache Kafka 的升级路径

Apache Kafka 版本Apache Kafka versions

Kafka 1.1.0Kafka 1.1.0

从 Kafka 1.0.0 迁移到 1.1.0 后,可以利用以下新功能:If you migrate from Kafka 1.0.0 to 1.1.0 you can take advantage of the following new features:

  • Kafka 控制器的改进可以加速受控关机,使你能够更快地重启中介,以及在出现问题后进行恢复。Improvements to the Kafka controller speed up controlled shutdown, so you can restart brokers and recover from issues faster.
  • FetchRequests 逻辑的改进使你能够在群集上创建更多的分区(从而获得更多的主题)。Improvements in the FetchRequests logic which enable you to have more partitions (and hence more topics) on the cluster.
  • Kafka Connect 支持主题的记录标头正则表达式Kafka Connect supports record headers and regular expressions for topics.

有关完整更新列表,请参阅 Apache Kafka 1.1 发行说明For a complete list of updates, see Apache Kafka 1.1 release notes.

Apache Kafka 2.1.0Apache Kafka 2.1.0

迁移到 Kafka 2.1 后,可以利用以下功能:If you migrate to Kafka 2.1, you can take advantage of the following features:

  • 中介复原能力随着复制协议的改进而得到提高。Better broker resiliency due to an improved replication protocol.
  • KafkaAdminClient API 中的新功能。New functionality in the KafkaAdminClient API.
  • 可配置的配额管理。Configurable quota management.
  • 支持 Zstandard 压缩。Support for Zstandard compression.

有关完整更新列表,请参阅 Apache Kafka 2.0 发行说明Apache Kafka 2.1 发行说明For a complete list of updates, see Apache Kafka 2.0 release notes and Apache Kafka 2.1 release notes.

Kafka 客户端兼容性Kafka client compatibility

新的 Kafka 中介支持旧版客户端。New Kafka brokers support older clients. KIP-35 - 检索协议版本介绍了一种动态确定 Kafka 中介功能的机制,KIP-97:改进了 Kafka 客户端 RPC 兼容性策略介绍了 Java 客户端的新兼容性策略和保证。KIP-35 - Retrieving protocol version introduced a mechanism for dynamically determining the functionality of a Kafka broker and KIP-97: Improved Kafka Client RPC Compatibility Policy introduced a new compatibility policy and guarantees for the Java client. 以前,Kafka 客户端必须与相同或更高版本的中介交互。Previously, a Kafka client had to interact with a broker of the same version or a newer version. 现在,更高版本的 Java 客户端以及支持 KIP-35 的其他客户端(例如 librdkafka)可以回退到较旧的请求类型,或者在功能不可用时引发相应的错误。Now, newer versions of the Java clients and other clients that support KIP-35 such as librdkafka can fall back to older request types or throw appropriate errors if functionality isn't available.

升级 Kafka 客户端兼容性

请注意,这并不意味着客户端支持旧版中介。Note that it does not mean that the client supports older brokers. 有关详细信息,请参阅兼容性矩阵For more information, see Compatibility Matrix.

一般迁移过程General migration process

以下迁移指导假设在单个虚拟网络中的 HDInsight 3.6 上部署了 Apache Kafka 1.0.0 或 1.1.0 群集。The following migration guidance assumes an Apache Kafka 1.0.0 or 1.1.0 cluster deployed on HDInsight 3.6 in a single virtual network. 现有中介包含一些主题,并正在由生成者和使用者使用。The existing broker has some topics and is being actively used by producers and consumers.

假设的当前 Kafka 环境

若要完成迁移,请执行以下步骤:To complete the migration, do the following steps:

  1. 部署新的 HDInsight 4.0 群集和客户端用于测试。Deploy a new HDInsight 4.0 cluster and clients for test. 部署新的 HDInsight 4.0 Kafka 群集。Deploy a new HDInsight 4.0 Kafka cluster. 如果可以选择多个 Kafka 群集版本,建议选择最新版本。If multiple Kafka cluster versions can be selected, it's recommended to select the latest version. 部署后,根据需要设置一些参数,并创建与现有环境相同名称的主题。After deployment, set some parameters as needed and create a topic with the same name as your existing environment. 此外,根据需要设置 TLS 和自带密钥 (BYOK) 加密。Also, set TLS and bring-your-own-key (BYOK) encryption as needed. 然后,检查此设置是否可在新群集上正常工作。Then check if it works correctly with the new cluster.

    部署新的 HDInsight 4.0 群集

  2. 切换生成者应用程序的群集,并等待所有队列数据已由当前使用者使用。Switch the cluster for the producer application, and wait until all the queue data is consumed by the current consumers. 新的 HDInsight 4.0 Kafka 群集准备就绪后,将现有生成者目标切换到新群集。When the new HDInsight 4.0 Kafka cluster is ready, switch the existing producer destination to the new cluster. 在现有使用者应用已使用现有群集中的所有数据之前,请将此目标保持原样。Leave it as it is until the existing Consumer app has consumed all the data from the existing cluster.


  3. 切换使用者应用程序上的群集。Switch the cluster on the consumer application. 确认现有使用者应用程序已用完现有群集中的所有数据后,将连接切换到新群集。After confirming that the existing consumer application has finished consuming all data from the existing cluster, switch the connection to the new cluster.


  4. 根据需要删除旧群集并测试应用程序。Remove the old cluster and test applications as needed. 完成切换并正常运行后,根据需要删除旧的 HDInsight 3.6 Kafka 群集,以及在测试中使用的生成者和使用者。Once the switch is complete and working properly, remove the old HDInsight 3.6 Kafka cluster and the producers and consumers used in the test as needed.

后续步骤Next steps