快速入门:使用资源管理器模板在 Azure HDInsight 中创建 Apache Hadoop 群集Quickstart: Create Apache Hadoop cluster in Azure HDInsight using Resource Manager template

在本快速入门中,我们使用 Azure 资源管理器模板在 Azure HDInsight 中创建一个 Apache Hadoop 群集。In this quickstart, you use an Azure Resource Manager template to create an Apache Hadoop cluster in Azure HDInsight. Hadoop 是原始的开源框架,适用于对群集上的大数据集进行分布式处理和分析。Hadoop was the original open-source framework for distributed processing and analysis of big data sets on clusters. Hadoop 生态系统包括相关的软件和实用程序,例如 Apache Hive、Apache HBase、Spark、Kafka 等等。The Hadoop ecosystem includes related software and utilities, including Apache Hive, Apache HBase, Spark, Kafka, and many others.

资源管理器模板是定义项目基础结构和配置的 JavaScript 对象表示法 (JSON) 文件。Resource Manager template is a JavaScript Object Notation (JSON) file that defines the infrastructure and configuration for your project. 该模板使用声明性语法,使你可以声明要部署的内容,而不需要编写一系列编程命令来进行创建。The template uses declarative syntax, which lets you state what you intend to deploy without having to write the sequence of programming commands to create it. 若要详细了解如何开发资源管理器模板,请参阅资源管理器文档If you want to learn more about developing Resource Manager templates, see Resource Manager documentation.

目前,HDInsight 附带六个不同的群集类型Currently HDInsight comes with six different cluster types. 每个群集类型都支持一组不同的组件。Each cluster type supports a different set of components. 所有群集类型都支持 Hive。All cluster types support Hive. 有关 HDInsight 中受支持组件的列表,请参阅 HDInsight 提供的 Hadoop 群集版本中有哪些新功能?For a list of supported components in HDInsight, see What's new in the Hadoop cluster versions provided by HDInsight?

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

创建 Apache Hadoop 群集Create an Apache Hadoop cluster

查看模板Review the template

本快速入门中使用的模板来自 Azure 快速入门模板The template used in this quickstart is from Azure Quickstart templates.

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "clusterName": {
      "type": "string",
      "metadata": {
        "description": "The name of the HDInsight cluster to create."
      }
    },
    "clusterType": {
      "type": "string",
      "allowedValues": [
        "hadoop",
        "intractivehive",
        "hbase",
        "storm",
        "spark"
      ],
      "metadata": {
        "description": "The type of the HDInsight cluster to create."
      }
    },
    "clusterLoginUserName": {
      "type": "string",
      "metadata": {
        "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
      }
    },
    "clusterLoginPassword": {
      "type": "securestring",
      "metadata": {
        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
      }
    },
    "sshUserName": {
      "type": "string",
      "metadata": {
        "description": "These credentials can be used to remotely access the cluster."
      }
    },
    "sshPassword": {
      "type": "securestring",
      "metadata": {
        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "Location for all resources."
      }
    }
  },
  "variables": {
    "defaultStorageAccount": {
      "name": "[uniqueString(resourceGroup().id)]",
      "type": "Standard_LRS"
    }
  },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[variables('defaultStorageAccount').name]",
      "location": "[parameters('location')]",
      "apiVersion": "2016-01-01",
      "sku": {
        "name": "[variables('defaultStorageAccount').type]"
      },
      "kind": "Storage",
      "properties": {}
    },
    {
      "type": "Microsoft.HDInsight/clusters",
      "name": "[parameters('clusterName')]",
      "location": "[parameters('location')]",
      "apiVersion": "2015-03-01-preview",
      "dependsOn": [
        "[concat('Microsoft.Storage/storageAccounts/',variables('defaultStorageAccount').name)]"
      ],
      "properties": {
        "clusterVersion": "3.6",
        "osType": "Linux",
        "clusterDefinition": {
          "kind": "[parameters('clusterType')]",
          "configurations": {
            "gateway": {
              "restAuthCredential.isEnabled": true,
              "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
              "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
            }
          }
        },
        "storageProfile": {
          "storageaccounts": [
            {
              "name": "[replace(replace(concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob),'https:',''),'/','')]",
              "isDefault": true,
              "container": "[parameters('clusterName')]",
              "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').keys[0].value]"
            }
          ]
        },
        "computeProfile": {
          "roles": [
            {
              "name": "headnode",
              "targetInstanceCount": 2,
              "hardwareProfile": {
                "vmSize": "Standard_D3_v2"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                }
              }
            },
            {
              "name": "workernode",
              "targetInstanceCount": 2,
              "hardwareProfile": {
                "vmSize": "Standard_D3_v2"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                }
              }
            }
          ]
        }
      }
    }
  ],
  "outputs": {
    "storage": {
      "type": "object",
      "value": "[reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name))]"
    },
    "cluster": {
      "type": "object",
      "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
    }
  }
}

该模板中定义了两个 Azure 资源:Two Azure resources are defined in the template:

部署模板Deploy the template

  1. 选择下面的“部署到 Azure”按钮以登录到 Azure,并打开资源管理器模板 。Select the Deploy to Azure button below to sign in to Azure and open the Resource Manager template.

    Deploy to Azure

  2. 输入或选择下列值:Enter or select the following values:

    属性Property 说明Description
    订阅Subscription 从下拉列表中选择用于此群集的 Azure 订阅。From the drop-down list, select the Azure subscription that's used for the cluster.
    资源组Resource group 从下拉列表中选择现有资源组,或选择“新建” 。From the drop-down list, select your existing resource group, or select Create new.
    位置Location 将使用用于资源组的位置自动填充此值。The value will autopopulate with the location used for the resource group.
    群集名称Cluster Name 输入任何全局唯一的名称。Enter a globally unique name. 对于此模板,请只使用小写字母和数字。For this template, use only lowercase letters, and numbers.
    群集类型Cluster Type 选择“hadoop” 。Select hadoop.
    群集登录用户名Cluster Login User Name 提供用户名,默认值为 adminProvide the username, default is admin.
    群集登录密码Cluster Login Password 提供密码。Provide a password. 密码长度不得少于 10 个字符,且至少必须包含一个数字、一个大写字母和一个小写字母、一个非字母数字字符(' " ` 字符除外)。The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ).
    SSH 用户名Ssh User Name 提供用户名,默认值为 sshuserProvide the username, default is sshuser
    SSH 密码Ssh Password 提供密码。Provide the password.

    某些属性已在模板中硬编码。Some properties have been hardcoded in the template. 可以在模板中配置这些值。You can configure these values from the template. 有关这些属性的详细说明,请参阅在 HDInsight 中创建 Apache Hadoop 群集For more explanation of these properties, see Create Apache Hadoop clusters in HDInsight.

    Note

    提供的值必须唯一,并应遵循命名指南。The values you provide must be unique and should follow the naming guidelines. 模板不会执行验证检查。The template does not perform validation checks. 如果提供的值已被使用,或不遵循指南,则提交模板后可能会出错。If the values you provide are already in use, or do not follow the guidelines, you get an error after you have submitted the template.

    HDInsight Linux 入门之门户中的资源管理器模板HDInsight Linux gets started Resource Manager template on portal

  3. 查看“条款和条件” 。Review the TERMS AND CONDITIONS. 接着选择“我同意上述条款和条件”,然后选择“购买” 。Then select I agree to the terms and conditions stated above, then Purchase. 你会收到一则通知,说明正在进行部署。You'll receive a notification that your deployment is in progress. 创建群集大约需要 20 分钟时间。It takes about 20 minutes to create a cluster.

查看已部署的资源Review deployed resources

创建群集后,你将收到“部署成功” 通知,其中包含“转到资源” 链接。Once the cluster is created, you'll receive a Deployment succeeded notification with a Go to resource link. “资源组”页将列出新的 HDInsight 群集以及与此群集关联的默认存储。Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. 每个群集都有一个 Azure 存储帐户依赖项。Each cluster has an Azure Storage account dependency. 该帐户称为默认存储帐户。It's referred as the default storage account. HDInsight 群集及其默认存储帐户必须共存于同一个 Azure 区域中。The HDInsight cluster and its default storage account must be colocated in the same Azure region. 删除群集不会删除存储帐户。Deleting clusters doesn't delete the storage account.

Note

如需其他群集创建方法或要了解本快速入门中使用的属性,请参阅创建 HDInsight 群集For other cluster creation methods and understanding the properties used in this quickstart, see Create HDInsight clusters.

清理资源Clean up resources

完成本快速入门后,可以删除群集。After you complete the quickstart, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. 此外,还需要支付 HDInsight 群集费用,即使未使用。You are also charged for an HDInsight cluster, even when it is not in use. 由于群集费用高于存储空间费用数倍,因此在不使用群集时将其删除可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

Note

如果立即进行下一教程,了解如何使用 Hadoop on HDInsight 运行 ETL 操作,建议保持群集运行 。If you are immediately proceeding to the next tutorial to learn how to run ETL operations using Hadoop on HDInsight, you may want to keep the cluster running. 这是因为该教程中必须再次创建 Hadoop 群集。This is because in the tutorial you have to create a Hadoop cluster again. 但是,如果不立即学习下一教程,则必须立即删除该群集。However, if you are not going through the next tutorial right away, you must delete the cluster now.

从 Azure 门户导航到群集,然后选择“删除”。 From the Azure portal, navigate to your cluster, and select Delete.

HDInsight 从门户中删除群集HDInsight delete cluster from portal

还可以选择资源组名称来打开“资源组”页,然后选择“删除资源组” 。You can also select the resource group name to open the resource group page, and then select Delete resource group. 通过删除资源组,可以删除 HDInsight 群集和默认存储帐户。By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.

后续步骤Next steps

在本快速入门中,你已了解了如何使用资源管理器模板在 HDInsight 中创建 Apache Hadoop 群集。In this quickstart, you learned how to create an Apache Hadoop cluster in HDInsight using a Resource Manager template. 下一篇文章将介绍如何使用 Hadoop on HDInsight 执行提取、转换和加载 (ETL) 操作。In the next article, you learn how to perform an extract, transform, and load (ETL) operation using Hadoop on HDInsight.