Create a cluster with Data Lake Storage Gen2 using the Azure portal
The Azure portal is a web-based management tool for services and resources hosted in the Azure cloud. In this article, you learn how to create Linux-based Azure HDInsight clusters by using the portal. More details are available from Create HDInsight clusters.
Warning
Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.
If you don't have an Azure subscription, create a trial account before you begin.
To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, follow these steps to configure a storage account that has a hierarchical namespace.
Create a user-assigned managed identity
Create a user-assigned managed identity, if you don't already have one.
- Sign in to the Azure portal.
- In the upper-left click Create a resource.
- In the search box, type user assigned and click User Assigned Managed Identity.
- Click Create.
- Enter a name for your managed identity, select the correct subscription, resource group, and location.
- Click Create.
For more information on how managed identities work in Azure HDInsight, see Managed identities in Azure HDInsight.
Create a storage account to use with Data Lake Storage Gen2
Create a storage account to use with Azure Data Lake Storage Gen2.
- Sign in to the Azure portal.
- In the upper-left click Create a resource.
- In the search box, type storage and click storage account.
- Click Create.
- On the Create storage account screen:
- Select the correct subscription and resource group.
- Enter a name for your storage account with Data Lake Storage Gen2.
- Click on the Advanced tab.
- Click Enabled next to Hierarchical namespace under Data Lake Storage Gen2.
- Click Review + create.
- Click Create
For more information on other options during storage account creation, see Quickstart: Create a storage account for Azure Data Lake Storage Gen2.
Set up permissions for the managed identity on the Data Lake Storage Gen2
Assign the managed identity to the Storage Blob Data Owner role on the storage account.
In the Azure portal, go to your storage account.
Select Access control (IAM).
Select Add > Add role assignment.
On the Role tab, select Storage Blob Data Owner.
On the Members tab, select Managed identity, and then select Select members.
Select your subscription, select User-assigned managed identity, and then select your user-assigned managed identity.
On the Review + assign tab, select Review + assign to assign the role.
The user-assigned identity that you selected is now listed under the selected role.
For more information about role assignments, see Assign Azure roles using the Azure portal
After this initial setup is complete, you can create a cluster through the portal. The cluster must be in the same Azure region as the storage account. In the Storage tab of the cluster creation menu, select the following options:
For Primary storage type, select Azure Data Lake Storage Gen2.
Under Primary Storage account, search for and select the newly created storage account with Data Lake Storage Gen2 storage.
Under Identity, select the newly created user-assigned managed identity.
Note
- To add a secondary storage account with Data Lake Storage Gen2, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 that you want to add. Please be advised that adding a secondary storage account with Data Lake Storage Gen2 via the "Additional storage accounts" blade on HDInsight isn't supported.
- You can enable RA-GRS or RA-ZRS on the Azure Blob storage account that HDInsight uses. However, creating a cluster against the RA-GRS or RA-ZRS secondary endpoint isn't supported.
- HDInsight does not support setting Data Lake Storage Gen2 as read-access geo-zone-redundant storage (RA-GZRS) or geo-zone-redundant storage (GZRS).
Delete the cluster
See Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.
Troubleshoot
If you run into issues with creating HDInsight clusters, see access control requirements.
Next steps
You've successfully created an HDInsight cluster. Now learn how to work with your cluster.
Apache Spark clusters
- Customize Linux-based HDInsight clusters by using script actions
- Create a standalone application using Scala
- Run jobs remotely on an Apache Spark cluster using Apache Livy
- Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools
- Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results