有关 Azure HDInsight 中的 Apache Kafka 的常见问题解答Frequently asked questions about Apache Kafka in Azure HDInsight

本文解答使用 Azure HDInsight 上的 Apache Kafka 时出现的一些常见问题。This article addresses some common questions about using Apache Kafka on Azure HDInsight.

HDInsight 支持哪些 Kafka 版本?What Kafka versions are supported by HDInsight?

请在在 HDInsight 中可以使用哪些 Apache Hadoop 组件和版本?中查找有关 HDInsight 正式支持的组件版本的详细信息。Find more information about HDInsight officially supported component versions in What are the Apache Hadoop components and versions available with HDInsight?. 建议始终使用最新版本,以确保获得最佳性能和用户体验。We recommend always using the latest version to ensure the best possible performance and user experience.

HDInsight Kafka 群集中提供哪些资源,我需要为哪些资源付费?What resources are provided in an HDInsight Kafka cluster and what resources am I charged for?

HDInsight Kafka 群集包含以下资源:A HDInsight Kafka cluster includes the following resources:

  • 头节点Head nodes
  • Zookeeper 节点Zookeeper nodes
  • 代理(工作器)节点Broker (worker) nodes
  • 附加到代理节点的 Azure 托管磁盘Azure Managed Disks attached to the broker nodes
  • 网关节点Gateway nodes

所有这些资源根据 HDInsight 定价模型收费(网关节点除外)。All of these resources are charged based on our HDInsight pricing model, except for gateway nodes. 无需为网关节点付费。You are not charged for gateway nodes.

有关各种节点类型的更详细说明,请参阅 Azure HDInsight 虚拟网络体系结构For a more detailed description of various node types, see Azure HDInsight virtual network architecture. 定价基于每分钟节点使用量。Pricing is based on per minute node usage. 价格因节点大小、节点数、所用托管磁盘类型和区域而异。Prices vary depending on node size, number of nodes, type of managed disk used, and region.

Apache Kafka API 是否适用于 HDInsight?Do Apache Kafka APIs work with HDInsight?

是,HDInsight 使用本机 Kafka API。Yes, HDInsight uses native Kafka APIs. 无需更改客户端应用程序代码。Your client application code doesn't need to change. 请参阅教程:使用 Apache Kafka 生成者和使用者 API 来了解如何对群集使用基于 Java 的生成者/使用者 API。See Tutorial: Use the Apache Kafka Producer and Consumer APIs to see how you can use Java-based producer/consumer APIs with your cluster.

是否可以更改群集配置?Can I change cluster configurations?

是的,可通过 Ambari 门户更改。Yes, through the Ambari portal. 门户中的每个组件都附带一个“配置”部分,可在该部分更改组件配置。 Each component in the portal has a configs section, which can be used to change component configurations. 某些更改可能需要重启代理。Some changes may require broker restarts.

我的数据是否已加密?Is my data encrypted? 是否可以使用我自己的密钥?Can I use my own keys?

托管磁盘上的所有 Kafka 消息都已通过 Azure 存储服务加密 (SSE) 进行加密。All Kafka messages on the managed disks are encrypted with Azure Storage Service Encryption (SSE). 传输中的数据(例如,在客户端与代理之间来回传输的数据)默认不会加密。Data-in-transit (for example, data being transmitted from clients to brokers and the other way around) isn't encrypted by default. 可以通过自行设置 SSL 来加密此类流量。It's possible to encrypt such traffic by setting up SSL on your own. 此外,HDInsight 允许管理其自身的密钥来加密静态数据。Additionally, HDInsight allows you to manage their own keys to encrypt the data at rest. 有关详细信息,请参阅客户管理的密钥磁盘加密See Customer-managed key disk encryption, for more information.

如何将客户端连接到群集?How do I connect clients to my cluster?

要使 Kafka 客户端可与 Kafka 代理通信,这些客户端必须能够通过网络访问代理。For Kafka clients to communicate with Kafka brokers, they must be able to reach the brokers over the network. 对于 HDInsight 群集而言,虚拟网络 (VNet) 是安全边界。For HDInsight clusters, the Virtual Network (VNet) is the security boundary. 因此,将客户端连接到 HDInsight 群集的最简单方法是在群集所在的同一 VNet 中创建客户端。Hence, the easiest way to connect clients to your HDInsight cluster is to create clients within the same VNet as the cluster. 这些方案包括:Other scenarios include:

  • 连接其他 Azure VNet 中的客户端 – 将群集 VNet 对等互连到客户端 VNet,并在群集中配置 IP 播发Connecting clients in a different Azure VNet – Peer the cluster VNet and the client VNet and configure the cluster for IP Advertising. 使用 IP 播发时,Kafka 客户端必须使用代理 IP 地址而不是完全限定的域名 (FQDN) 来连接代理。When using IP advertising, Kafka clients must use Broker IP addresses to connect with the brokers, instead of Fully Qualified Domain Names (FQDNs).

  • 连接本地客户端 – 根据规划 Azure HDInsight 的虚拟网络中所述,使用 VPN 网络并设置自定义 DNS 服务器。Connecting on-premises clients – Using a VPN network and setting up custom DNS servers as described in Plan a virtual network for Azure HDInsight.

  • 为 Kafka 服务创建公共终结点 – 如果你的企业安全要求允许这样做,可为 Kafka 代理部署一个公共终结点,或者结合公共终结点部署一个自我管理的开源 REST 终结点。Creating a public endpoint for your Kafka service – If your enterprise security requirements allow it, you can deploy a public endpoint for your Kafka brokers, or a self-managed open-source REST end point with a public endpoint.

是否可在现有群集中添加更多的磁盘空间?Can I add more disk space on an existing cluster?

若要增加 Kafka 消息可用的空间量,可以增大节点数目。To increase the amount of space available for Kafka messages, you can increase the number of nodes. 目前不支持将更多磁盘添加到现有群集。Currently, adding more disks to an existing cluster isn't supported.

如何获得最大的数据持久性?How can I have maximum data durability?

数据持久性可以最大程度地降低消息丢失的风险。Data durability allows you to achieve the lowest risk of message loss. 若要实现最大数据持久性,我们建议采用以下设置:In order to achieve maximum data durability, we recommend the following settings:

  • 在大部分区域中使用最小复制系数 3use a minimum replication factor of 3 in most regions
  • 在仅包含两个容错域的区域中使用最小复制系数 4use a minimum replication factor of 4 in regions with only two fault domains
  • 禁用不纯粹的群首选举disable unclean leader elections
  • min.insync.replicas 设置为 2 或更大 - 这会更改在继续写入之前必须与群首完全同步的副本数set min.insync.replicas to 2 or more - this changes the number of replicas which must be completely in sync with the leader before a write can proceed
  • ack 属性设置为 all - 此属性要求所有副本确认所有消息set the acks property to all - this property requires all replicas to acknowledge all messages

配置 Kafka 以提高数据一致性的操作会影响生成请求的代理的可用性。Configuring Kafka for higher data consistency affects the availability of brokers to produce requests.

是否可将数据复制到多个群集?Can I replicate my data to multiple clusters?

是,可以使用 Kafka MirrorMaker 将数据复制到多个群集。Yes, data can be replicated to multiple clusters using Kafka MirrorMaker. 有关设置 MirrorMaker 的详细信息,请参阅镜像 Apache Kafka 主题See details on setting up MirrorMaker can be found in Mirror Apache Kafka topics. 此外,还有其他自我管理的开源技术(例如 Brooklin)和供应商能够帮助你复制到多个群集。Additionally, there are other self-managed open-source technologies and vendors that can help achieve replication to multiple clusters such as Brooklin.

是否可以升级群集?Can I upgrade my cluster? 如何升级群集?How should I upgrade my cluster?

目前不支持就地升级群集版本。We don't currently support in-place cluster version upgrades. 若要将群集更新到更高的 Kafka 版本,请使用所需的版本创建新群集,然后迁移 Kafka 客户端以使用新群集。To update your cluster to a higher Kafka version, create a new cluster with the version that you want and migrate your Kafka clients to use the new cluster.

后续步骤Next steps