教程:使用 Azure CLI 在 HDInsight 中创建启用 Apache Kafka REST 代理的群集Tutorial: Create an Apache Kafka REST proxy enabled cluster in HDInsight using Azure CLI

本教程介绍如何使用 Azure 命令行接口 (CLI) 在 Azure HDInsight 中创建启用 Apache Kafka REST 代理的群集。In this tutorial, you learn how to create an Apache Kafka REST proxy enabled cluster in Azure HDInsight using Azure command-line interface (CLI). Azure HDInsight 是适用于企业的分析服务,具有托管、全面且开源的特点。Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. Apache Kafka 是开源分布式流式处理平台。Apache Kafka is an open-source, distributed streaming platform. 通常用作消息代理,因为它可提供类似于发布-订阅消息队列的功能。It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue. Kafka REST 代理可让你使用 HTTP 通过 REST API 来与 Kafka 群集交互。Kafka REST Proxy enables you to interact with your Kafka cluster via a REST API over HTTP. Azure CLI 是用于管理 Azure 资源的 Microsoft 跨平台命令行体验。The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources.

仅可通过相同虚拟网络内的资源访问 Apache Kafka API。The Apache Kafka API can only be accessed by resources inside the same virtual network. 可以使用 SSH 直接访问群集。You can access the cluster directly using SSH. 若要将其他服务、网络或虚拟机连接到 Apache Kafka,则必须首先创建虚拟机,然后才能在网络中创建资源。To connect other services, networks, or virtual machines to Apache Kafka, you must first create a virtual network and then create the resources within the network. 有关详细信息,请参阅使用虚拟网络连接到 Apache KafkaFor more information, see Connect to Apache Kafka using a virtual network.

在本教程中,学习以下内容:In this tutorial, you learn:

  • Kafka REST 代理的先决条件Prerequisites for Kafka REST proxy
  • 使用 Azure CLI 创建 Apache Kafka 群集Create an Apache Kafka cluster using Azure CLI

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don’t have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

创建 Apache Kafka 群集Create an Apache Kafka cluster

  1. 登录到 Azure 订阅。Sign in to your Azure subscription.

    az login
    
    # If you have multiple subscriptions, set the one to use
    # az account set --subscription "SUBSCRIPTIONID"
    
  2. 设置环境变量。Set environment variables. 本教程中的变量用法基于 Bash。The use of variables in this tutorial is based on Bash. 在其他环境中需要进行细微的更改。Slight variations will be needed for other environments.

    变量Variable 说明Description
    resourceGroupNameresourceGroupName 将 RESOURCEGROUPNAME 替换为新资源组的名称。Replace RESOURCEGROUPNAME with the name for your new resource group.
    locationlocation 将 LOCATION 替换为要在其中创建群集的区域。Replace LOCATION with a region where the cluster will be created. 如需有效位置的列表,请使用 az account list-locations 命令For a list of valid locations, use the az account list-locations command
    clusterNameclusterName 将 CLUSTERNAME 替换为新群集的全局唯一名称。Replace CLUSTERNAME with a globally unique name for your new cluster.
    storageAccountstorageAccount 将 STORAGEACCOUNTNAME 替换为新存储帐户的名称。Replace STORAGEACCOUNTNAME with a name for your new storage account.
    httpPasswordhttpPassword 将 PASSWORD 替换为群集登录名 admin 的密码。Replace PASSWORD with a password for the cluster login, admin.
    sshPasswordsshPassword 将 PASSWORD 替换为安全外壳用户名 sshuser 的密码。Replace PASSWORD with a password for the secure shell username, sshuser.
    securityGroupNamesecurityGroupName 将 SECURITYGROUPNAME 替换为 Kafka REST 代理的客户端 AAD 安全组名称。Replace SECURITYGROUPNAME with the client AAD security group name for Kafka Rest Proxy. 变量将传递给 az-hdinsight-create--kafka-client-group-name 参数。The variable will be passed to the --kafka-client-group-name parameter for az-hdinsight-create.
    securityGroupIDsecurityGroupID 将 SECURITYGROUPID 替换为 Kafka REST 代理的客户端 AAD 安全组 ID。Replace SECURITYGROUPID with the client AAD security group ID for Kafka Rest Proxy. 变量将传递给 az-hdinsight-create--kafka-client-group-id 参数。The variable will be passed to the --kafka-client-group-id parameter for az-hdinsight-create.
    storageContainerstorageContainer 群集将使用的存储容器,对于本教程请保留现有值。Storage container the cluster will use, leave as-is for this tutorial. 此变量将设置为群集的名称。This variable will be set with the name of the cluster.
    workernodeCountworkernodeCount 群集中的工作器节点数,对于本教程请保留现有值。Number of worker nodes in the cluster, leave as-is for this tutorial. 为了保证高可用性,Kafka 至少需要 3 个辅助角色节点To guarantee high availability, Kafka requires a minimum of 3 worker nodes
    clusterTypeclusterType HDInsight 群集的类型,对于本教程请保留现有值。Type of HDInsight cluster, leave as-is for this tutorial.
    clusterVersionclusterVersion HDInsight 群集版本,对于本教程请保留现有值。HDInsight cluster version, leave as-is for this tutorial. Kafka REST 代理要求至少安装群集版本 4.0。Kafka Rest Proxy requires a minimum cluster version of 4.0.
    componentVersioncomponentVersion Kafka 版本,对于本教程请保留现有值。Kafka version, leave as-is for this tutorial. Kafka REST 代理要求至少安装组件版本 2.1。Kafka Rest Proxy requires a minimum component version of 2.1.

    使用所需的值更新变量。Update the variables with desired values. 然后,输入 CLI 命令来设置环境变量。Then enter the CLI commands to set the environment variables.

    export resourceGroupName=RESOURCEGROUPNAME
    export location=LOCATION
    export clusterName=CLUSTERNAME
    export storageAccount=STORAGEACCOUNTNAME
    export httpPassword='PASSWORD'
    export sshPassword='PASSWORD'
    export securityGroupName=SECURITYGROUPNAME
    export securityGroupID=SECURITYGROUPID
    
    export storageContainer=$(echo $clusterName | tr "[:upper:]" "[:lower:]")
    export workernodeCount=3
    export clusterType=kafka
    export clusterVersion=4.0
    export componentVersion=kafka=2.1
    
  3. 输入以下命令创建资源组Create the resource group by entering the command below:

     az group create \
        --location $location \
        --name $resourceGroupName
    
  4. 输入以下命令创建 Azure 存储帐户Create an Azure Storage account by entering the command below:

    # Note: kind BlobStorage is not available as the default storage account.
    az storage account create \
        --name $storageAccount \
        --resource-group $resourceGroupName \
        --https-only true \
        --kind StorageV2 \
        --location $location \
        --sku Standard_LRS
    
  5. 输入以下命令从 Azure 存储帐户中提取主密钥,然后将其存储在某个变量中:Extract the primary key from the Azure Storage account and store it in a variable by entering the command below:

    export storageAccountKey=$(az storage account keys list \
        --account-name $storageAccount \
        --resource-group $resourceGroupName \
        --query [0].value -o tsv)
    
  6. 输入以下命令创建 Azure 存储容器Create an Azure Storage container by entering the command below:

    az storage container create \
        --name $storageContainer \
        --account-key $storageAccountKey \
        --account-name $storageAccount
    
  7. 创建 HDInsight 群集Create the HDInsight cluster. 输入该命令之前,请注意以下参数:Before entering the command, note the following parameters:

    1. Kafka 群集所需的参数:Required parameters for Kafka clusters:

      参数Parameter 说明Description
      --type--type 值必须是 KafkaThe value must be Kafka.
      --workernode-data-disks-per-node--workernode-data-disks-per-node 每个工作器节点要使用的数据磁盘数。The number of data disks to use per worker node. HDInsight Kafka 仅支持数据磁盘。HDInsight Kafka is only supported with data disks. 本教程使用的值为 2This tutorial uses a value of 2.
    2. Kafka REST 代理所需的参数:Required parameters for Kafka REST proxy:

      参数Parameter 说明Description
      --kafka-management-node-size--kafka-management-node-size 节点的大小。The size of the node. 本教程使用的值为 Standard_D4_v2This tutorial uses the value Standard_D4_v2.
      --kafka-client-group-id--kafka-client-group-id Kafka REST 代理的客户端 AAD 安全组 ID。The client AAD security group ID for Kafka Rest Proxy. 该值是从变量 $securityGroupID 传递的。The value is passed from the variable $securityGroupID.
      --kafka-client-group-name--kafka-client-group-name Kafka REST 代理的客户端 AAD 安全组名称。The client AAD security group name for Kafka Rest Proxy. 该值是从变量 $securityGroupName 传递的。The value is passed from the variable $securityGroupName.
      --version--version HDInsight 群集版本必须至少为 4.0。The HDInsight cluster version must be at least 4.0. 该值是从变量 $clusterVersion 传递的。The value is passed from the variable $clusterVersion.
      --component-version--component-version Kafka 版本必须至少为 2.1。The Kafka version must be at least 2.1. 该值是从变量 $componentVersion 传递的。The value is passed from the variable $componentVersion.

      若要创建不含 REST 代理的群集,请在 az hdinsight create 命令中去除 --kafka-management-node-size--kafka-client-group-id--kafka-client-group-nameIf you would like to create the cluster without REST proxy, eliminate --kafka-management-node-size, --kafka-client-group-id, and --kafka-client-group-name from the az hdinsight create command.

    3. 如果有现有的虚拟网络,请添加参数 --vnet-name--subnet 及其值。If you have an existing virtual network, add the parameters --vnet-name and --subnet, and their values.

    输入以下命令创建群集:Enter the following command to create the cluster:

    az hdinsight create \
        --name $clusterName \
        --resource-group $resourceGroupName \
        --type $clusterType \
        --component-version $componentVersion \
        --http-password $httpPassword \
        --http-user admin \
        --location $location \
        --ssh-password $sshPassword \
        --ssh-user sshuser \
        --storage-account $storageAccount \
        --storage-account-key $storageAccountKey \
        --storage-container $storageContainer \
        --version $clusterVersion \
        --workernode-count $workernodeCount \
        --workernode-data-disks-per-node 2 \
        --kafka-management-node-size "Standard_D4_v2" \
        --kafka-client-group-id $securityGroupID \
        --kafka-client-group-name "$securityGroupName"
    

    可能需要几分钟时间才能完成群集创建过程。It may take several minutes for the cluster creation process to complete. 通常大约为 15 分钟。Usually around 15.

清理资源Clean up resources

完成本文后,可以删除群集。After you complete the article, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. 此外,还需要为 HDInsight 群集付费,即使不用也是如此。You're also charged for an HDInsight cluster, even when it's not in use. 由于群集费用数倍于存储空间费用,因此在群集不用时删除群集可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.

输入以下命令中的全部或部分来删除资源:Enter all or some of the following commands to remove resources:

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $storageAccount  \
    --name $storageContainer

# Remove storage account
az storage account delete \
    --name $storageAccount  \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

后续步骤Next steps

使用 Azure CLI 在 Azure HDInsight 中成功创建启用 Apache Kafka REST 代理的群集后,接下来请使用 Python 代码来与 REST 代理交互:Now that you've successfully created an Apache Kafka REST proxy enabled cluster in Azure HDInsight using Azure CLI, use Python code to interact with the REST proxy: