Add additional storage accounts to HDInsight
Learn how to use script actions to add extra Azure Storage accounts to HDInsight. The steps in this document add a storage account to an existing HDInsight cluster. This article applies to storage accounts (not the default cluster storage account), and not additional storage such as Azure Data Lake Storage Gen2
.
Important
The information in this document is about adding additional storage account(s) to a cluster after it has been created. For information on adding storage accounts during cluster creation, see Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more.
Prerequisites
- A Hadoop cluster on HDInsight. See Get Started with HDInsight on Linux.
- Storage account name and key. See Manage storage account access keys.
- If using PowerShell, you need the AZ module. See Overview of Azure PowerShell.
How it works
During processing, the script does the following actions:
If the storage account already exists in the core-site.xml configuration for the cluster, the script exits and no further actions are done.
Verifies that the storage account exists and can be accessed using the key.
Encrypts the key using the cluster credential.
Adds the storage account to the core-site.xml file.
Stops and restarts the Apache Oozie, Apache Hadoop YARN, Apache Hadoop MapReduce2, and Apache Hadoop HDFS services. Stopping and starting these services allows them to use the new storage account.
Warning
Using a storage account in a different location than the HDInsight cluster is not supported.
Add storage account
Use Script Action to apply the changes with the following considerations:
Property | Value |
---|---|
Bash script URI | https://hdiconfigactions.blob.core.chinacloudapi.cn/linuxaddstorageaccountv01/add-storage-account-v01.sh |
Node type(s) | Head |
Parameters | ACCOUNTNAME ACCOUNTKEY -p (optional) |
ACCOUNTNAME
is the name of the storage account to add to the HDInsight cluster.ACCOUNTKEY
is the access key forACCOUNTNAME
.-p
is optional. If specified, the key isn't encrypted and is stored in the core-site.xml file as plain text.
Verification
When you view the HDInsight cluster in the Azure portal, select the Storage Accounts entry under Properties doesn't display storage accounts added through this script action. Azure PowerShell and Azure CLI don't display the additional storage account either. The storage information isn't displayed because the script only modifies the core-site.xml
configuration for the cluster. This information isn't used when retrieving the cluster information using Azure management APIs.
To verify the additional storage use one of the methods shown:
PowerShell
The script returns the Storage Account name(s) associated with the given cluster. Replace CLUSTERNAME
with the actual cluster name, and then run the script.
# Update values
$clusterName = "CLUSTERNAME"
$creds = Get-Credential -UserName "admin" -Message "Enter the cluster login credentials"
$clusterName = $clusterName.ToLower();
# getting service_config_version
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.cn/api/v1/clusters/$clusterName`?fields=Clusters/desired_service_config_versions/HDFS" `
-Credential $creds -UseBasicParsing
$respObj = ConvertFrom-Json $resp.Content
$configVersion=$respObj.Clusters.desired_service_config_versions.HDFS.service_config_version
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.cn/api/v1/clusters/$clusterName/configurations/service_config_versions?service_name=HDFS&service_config_version=$configVersion" `
-Credential $creds
$respObj = ConvertFrom-Json $resp.Content
# extract account names
$value = ($respObj.items.configurations | Where type -EQ "core-site").properties | Get-Member -membertype properties | Where Name -Like "fs.azure.account.key.*"
foreach ($name in $value ) { $name.Name.Split(".")[4]}
Apache Ambari
From a web browser, navigate to
https://CLUSTERNAME.azurehdinsight.cn
, whereCLUSTERNAME
is the name of your cluster.Navigate to HDFS > Configs > Advanced > Custom core-site.
Observe the keys that begin with
fs.azure.account.key
. The account name is part of the key as seen in this sample image:
Remove storage account
From a web browser, navigate to
https://CLUSTERNAME.azurehdinsight.cn
, whereCLUSTERNAME
is the name of your cluster.Navigate to HDFS > Configs > Advanced > Custom core-site.
Remove the following keys:
fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.chinacloudapi.cn
fs.azure.account.keyprovider.<STORAGE_ACCOUNT_NAME>.blob.core.chinacloudapi.cn
After removing these keys and saving the configuration, you need to restart Oozie, Yarn, MapReduce2, HDFS, and Hive one by one.
Known issues
Storage firewall
If you choose to secure your storage account with the Firewalls and virtual networks restrictions on Selected networks, be sure to enable the exception Allow trusted Azure services so that HDInsight can access your storage account.
Unable to access storage after changing key
If you change the key for a storage account, HDInsight can no longer access the storage account. HDInsight uses a cached copy of key in the core-site.xml for the cluster. This cached copy must be updated to match the new key.
Running the script action again doesn't update the key, as the script checks to see if an entry for the storage account already exists. If an entry already exists, it doesn't make any changes.
To work around this problem:
See Update storage account access keys on how to rotate the access keys.
You can also remove the storage account and then add back the storage account.
Next steps
You've learned how to add additional storage accounts to an existing HDInsight cluster. For more information on script actions, see Customize Linux-based HDInsight clusters using script action