Create a cluster with Data Lake Storage Gen2 using Azure CLI

To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, follow these steps.

Prerequisites

  • If you're unfamiliar with Azure Data Lake Storage Gen2, check out the overview section.

  • If you don't already have an Azure account, sign up for a trial subscription before continuing.

  • To run the CLI script examples:

    • Install the latest version of the Azure CLI (2.0.13 or later) if you prefer to use a local CLI console. Sign in to Azure using az login, using an account that is associated with the Azure subscription under which you would like to deploy the user-assigned managed identity.Azure CLI.

Warning

Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.

You can download a sample template file and download a sample parameters file. Before using the template and the Azure CLI code snippet below, replace the following placeholders with their correct values:

Placeholder Description
<SUBSCRIPTION_ID> The ID of your Azure subscription
<RESOURCEGROUPNAME> The resource group where you want the new cluster and storage account created.
<MANAGEDIDENTITYNAME> The name of the managed identity that will be given permissions on your storage account with Azure Data Lake Storage Gen2.
<STORAGEACCOUNTNAME> The new storage account with Azure Data Lake Storage Gen2 that will be created.
<FILESYSTEMNAME> The name of the filesystem that this cluster should use in the storage account.
<CLUSTERNAME> The name of your HDInsight cluster.
<PASSWORD> Your chosen password for signing in to the cluster using SSH and the Ambari dashboard.

The code snippet below does the following initial steps:

  1. Logs in to your Azure account.
  2. Sets the active subscription where the create operations will be done.
  3. Creates a new resource group for the new deployment activities.
  4. Creates a user-assigned managed identity.
  5. Adds an extension to the Azure CLI to use features for Data Lake Storage Gen2.
  6. Creates a new storage account with Data Lake Storage Gen2 by using the --hierarchical-namespace true flag.
az login
az account set --subscription <SUBSCRIPTION_ID>

# Create resource group
az group create --name <RESOURCEGROUPNAME> --location chinaeast

# Create managed identity
az identity create -g <RESOURCEGROUPNAME> -n <MANAGEDIDENTITYNAME>

az extension add --name storage-preview

az storage account create --name <STORAGEACCOUNTNAME> \
    --resource-group <RESOURCEGROUPNAME> \
    --location chinaeast --sku Standard_LRS \
    --kind StorageV2 --hierarchical-namespace true

Next, sign in to the portal. Add the new user-assigned managed identity to the Storage Blob Data Owner role on the storage account. This step is described in step 3 under Using the Azure portal.

Important

Ensure that your storage account has the user-assigned identity with Storage Blob Data Owner role permissions, otherwise cluster creation will fail.

az deployment group create --name HDInsightADLSGen2Deployment \
    --resource-group <RESOURCEGROUPNAME> \
    --template-file hdinsight-adls-gen2-template.json \
    --parameters parameters.json

Clean up resources

After you complete the article, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it's not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.

Enter all or some of the following commands to remove resources:

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --name $AZURE_STORAGE_CONTAINER

# Remove storage account
az storage account delete \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

You've successfully created an HDInsight cluster. Now learn how to work with your cluster.

Apache Spark clusters

Apache Hadoop clusters

Apache HBase clusters