通过 Azure CLI 创建使用 Data Lake Storage Gen2 的群集Create a cluster with Data Lake Storage Gen2 using Azure CLI

若要创建将 Data Lake Storage Gen2 用作存储的 HDInsight 群集,请执行以下步骤。To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, follow these steps.

先决条件Prerequisites

  • 如果不熟悉 Azure Data Lake Storage Gen2,请查阅概述部分If you're unfamiliar with Azure Data Lake Storage Gen2, check out the overview section.

  • 如果没有 Azure 帐户,请注册试用帐户,然后继续操作。If you don't already have an Azure account, sign up for a trial account before continuing.

  • 若要运行 CLI 脚本示例,请执行以下操作:To run the CLI script examples:

    • 如果喜欢使用本地 CLI 控制台,请安装最新版 Azure CLI(2.0.13 或更高版本)。Install the latest version of the Azure CLI (2.0.13 or later) if you prefer to use a local CLI console. 借助 Azure CLI,使用与要在其下部署用户分配的托管标识的 Azure 订阅关联的帐户通过 az login 登录到 Azure。Sign in to Azure using az login, using an account that is associated with the Azure subscription under which you would like to deploy the user-assigned managed identity.Azure CLI.

警告

HDInsight 群集是基于分钟按比例计费,而不管用户是否使用它们。Billing for HDInsight clusters is prorated per minute, whether you use them or not. 请务必在使用完群集之后将其删除。Be sure to delete your cluster after you finish using it. 请参阅如何删除 HDInsight 群集See how to delete an HDInsight cluster.

可以下载示例模板文件下载示例参数文件You can download a sample template file and download a sample parameters file. 在使用下面的模板和 Azure CLI 代码片段之前,请将以下占位符替换为其正确值:Before using the template and the Azure CLI code snippet below, replace the following placeholders with their correct values:

占位符Placeholder 说明Description
<SUBSCRIPTION_ID> Azure 订阅的 IDThe ID of your Azure subscription
<RESOURCEGROUPNAME> 要在其中创建新群集和存储帐户的资源组。The resource group where you want the new cluster and storage account created.
<MANAGEDIDENTITYNAME> 将在启用了 Azure Data Lake Storage Gen2 的存储帐户上为其授予权限的托管标识的名称。The name of the managed identity that will be given permissions on your storage account with Azure Data Lake Storage Gen2.
<STORAGEACCOUNTNAME> 将要创建的启用了 Azure Data Lake Storage Gen2 的新存储帐户。The new storage account with Azure Data Lake Storage Gen2 that will be created.
<FILESYSTEMNAME> 此群集应在存储帐户中使用的文件系统的名称。The name of the filesystem that this cluster should use in the storage account.
<CLUSTERNAME> 你的 HDInsight 群集的名称。The name of your HDInsight cluster.
<PASSWORD> 你选择的使用 SSH 及 Ambari 仪表板登录群集的密码。Your chosen password for signing in to the cluster using SSH and the Ambari dashboard.

以下代码片段将会执行下述初始步骤:The code snippet below does the following initial steps:

  1. 登录到 Azure 帐户。Logs in to your Azure account.
  2. 设置要在其中执行创建操作的活动订阅。Sets the active subscription where the create operations will be done.
  3. 为新的部署活动创建新的资源组。Creates a new resource group for the new deployment activities.
  4. 创建用户分配的托管标识。Creates a user-assigned managed identity.
  5. 将一个扩展添加到 Azure CLI,以使用 Data Lake Storage Gen2 的功能。Adds an extension to the Azure CLI to use features for Data Lake Storage Gen2.
  6. 使用 --hierarchical-namespace true 标志创建启用了 Data Lake Storage Gen2 的新存储帐户。Creates a new storage account with Data Lake Storage Gen2 by using the --hierarchical-namespace true flag.
az login
az account set --subscription <SUBSCRIPTION_ID>

# Create resource group
az group create --name <RESOURCEGROUPNAME> --location chinaeast

# Create managed identity
az identity create -g <RESOURCEGROUPNAME> -n <MANAGEDIDENTITYNAME>

az extension add --name storage-preview

az storage account create --name <STORAGEACCOUNTNAME> \
    --resource-group <RESOURCEGROUPNAME> \
    --location chinaeast --sku Standard_LRS \
    --kind StorageV2 --hierarchical-namespace true

接下来,登录到门户。Next, sign in to the portal. 将新的用户分配的托管标识添加到存储帐户上的“存储 Blob 数据参与者”角色。Add the new user-assigned managed identity to the Storage Blob Data Contributor role on the storage account. 此步骤在使用 Azure 门户的步骤 3 中已描述。This step is described in step 3 under Using the Azure portal.

重要

请确保你的存储帐户具有用户分配的具有“存储 Blob 数据参与者”角色权限的标识,否则群集创建将失败。Ensure that your storage account has the user-assigned identity with Storage Blob Data Contributor role permissions, otherwise cluster creation will fail.

az deployment group create --name HDInsightADLSGen2Deployment \
    --resource-group <RESOURCEGROUPNAME> \
    --template-file hdinsight-adls-gen2-template.json \
    --parameters parameters.json

清理资源Clean up resources

完成本文后,可以删除群集。After you complete the article, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. 此外,还需要为 HDInsight 群集付费,即使不用也是如此。You're also charged for an HDInsight cluster, even when it's not in use. 由于群集费用数倍于存储空间费用,因此在群集不用时删除群集可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.

输入以下命令中的全部或部分来删除资源:Enter all or some of the following commands to remove resources:

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --name $AZURE_STORAGE_CONTAINER

# Remove storage account
az storage account delete \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

故障排除Troubleshoot

如果在创建 HDInsight 群集时遇到问题,请参阅访问控制要求If you run into issues with creating HDInsight clusters, see access control requirements.

后续步骤Next steps

你已成功创建 HDInsight 群集。You've successfully created an HDInsight cluster. 现在可以了解如何使用群集了。Now learn how to work with your cluster.

Apache Spark 群集Apache Spark clusters

Apache Hadoop 群集Apache Hadoop clusters

Apache HBase 群集Apache HBase clusters