将其他存储帐户添加到 HDInsightAdd additional storage accounts to HDInsight

了解如何使用脚本操作,将其他 Azure 存储帐户添加到 HDInsight。 Learn how to use script actions to add additional Azure storage accounts to HDInsight. 本文档中的步骤会将存储帐户添加到现有 HDInsight 群集。 The steps in this document add a storage account to an existing HDInsight cluster. 本文适用于存储帐户 (而不是默认的群集存储帐户),并且不适用于额外的存储,例如 Azure Data Lake Storage Gen2This article applies to storage accounts (not the default cluster storage account), and not additional storage such as Azure Data Lake Storage Gen2.

重要

本文档中的信息是关于在创建群集后将其他存储帐户添加到群集。The information in this document is about adding additional storage account(s) to a cluster after it has been created. 有关如何在创建群集期间添加存储帐户的信息,请参阅使用 Apache Hadoop、Apache Spark、Apache Kafka 等设置 HDInsight 中的群集For information on adding storage accounts during cluster creation, see Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more.

先决条件Prerequisites

工作原理How it works

在处理期间,脚本执行以下操作:During processing, the script performs the following actions:

  • 如果群集的 core-site.xml 配置中已存在该存储帐户,脚本会退出,并不执行任何进一步操作。If the storage account already exists in the core-site.xml configuration for the cluster, the script exits and no further actions are performed.

  • 使用密钥验证该存储帐户是否存在并且是否可以访问。Verifies that the storage account exists and can be accessed using the key.

  • 使用群集凭据对密钥进行加密。Encrypts the key using the cluster credential.

  • 将存储帐户添加到 core-site.xml 文件中。Adds the storage account to the core-site.xml file.

  • 停止并重启 Apache OozieApache Hadoop YARNApache Hadoop MapReduce2Apache Hadoop HDFS 服务。Stops and restarts the Apache Oozie, Apache Hadoop YARN, Apache Hadoop MapReduce2, and Apache Hadoop HDFS services. 通过停止和重启这些服务来使用新的存储帐户。Stopping and starting these services allows them to use the new storage account.

警告

不支持在 HDInsight 群集之外的其他位置使用存储帐户。Using a storage account in a different location than the HDInsight cluster is not supported.

添加存储帐户Add storage account

使用脚本操作应用更改时请注意以下事项:Use Script Action to apply the changes with the following considerations:

属性Property ValueValue
Bash 脚本 URIBash script URI https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh
节点类型Node type(s) Head
parametersParameters ACCOUNTNAME ACCOUNTKEY -p(可选)ACCOUNTNAME ACCOUNTKEY -p (optional)
  • ACCOUNTNAME 是要添加到 HDInsight 群集的存储帐户的名称。ACCOUNTNAME is the name of the storage account to add to the HDInsight cluster.
  • ACCOUNTKEYACCOUNTNAME 的访问密钥。ACCOUNTKEY is the access key for ACCOUNTNAME.
  • -p 是可选项。-p is optional. 如果指定此参数,则密钥不会加密,并以纯文本形式存储在 core-site.xml 文件中。If specified, the key isn't encrypted and is stored in the core-site.xml file as plain text.

验证Verification

在 Azure 门户中查看 HDInsight 群集时,选择“属性” 下的“存储帐户” 项时不会显示通过此脚本操作添加的存储帐户。When viewing the HDInsight cluster in the Azure portal, selecting the Storage Accounts entry under Properties doesn't display storage accounts added through this script action. Azure PowerShell 和 Azure CLI 也不显示其他存储帐户。Azure PowerShell and Azure CLI don't display the additional storage account either. 之所以未显示存储信息是因为该脚本只修改群集的 core-site.xml 配置。The storage information isn't displayed because the script only modifies the core-site.xml configuration for the cluster. 使用 Azure 管理 API 检索群集信息时,不会使用此信息。This information isn't used when retrieving the cluster information using Azure management APIs.

若要验证其他存储,请使用下面显示的方法之一:To verify the additional storage use one of the methods shown below:

PowershellPowershell

脚本将返回与给定群集关联的存储帐户名称。The script will return the Storage Account name(s) associated with the given cluster. CLUSTERNAME 替换为实际群集名称,然后运行脚本。Replace CLUSTERNAME with the actual cluster name, and then run the script.

# Update values
$clusterName = "CLUSTERNAME"

$creds = Get-Credential -UserName "admin" -Message "Enter the cluster login credentials"

$clusterName = $clusterName.ToLower();

# getting service_config_version
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.cn/api/v1/clusters/$clusterName`?fields=Clusters/desired_service_config_versions/HDFS" `
    -Credential $creds -UseBasicParsing
$respObj = ConvertFrom-Json $resp.Content

$configVersion=$respObj.Clusters.desired_service_config_versions.HDFS.service_config_version

$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.cn/api/v1/clusters/$clusterName/configurations/service_config_versions?service_name=HDFS&service_config_version=$configVersion" `
    -Credential $creds
$respObj = ConvertFrom-Json $resp.Content

# extract account names
$value = ($respObj.items.configurations | Where type -EQ "core-site").properties | Get-Member -membertype properties | Where Name -Like "fs.azure.account.key.*"
foreach ($name in $value ) { $name.Name.Split(".")[4]}

Apache AmbariApache Ambari

  1. 在 Web 浏览器中,导航到 https://CLUSTERNAME.azurehdinsight.cn,其中 CLUSTERNAME 是群集的名称。From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.cn, where CLUSTERNAME is the name of your cluster.

  2. 导航到“HDFS” > “配置” > “高级” > “自定义 core-site”。Navigate to HDFS > Configs > Advanced > Custom core-site.

  3. 观察以 fs.azure.account.key 开头的密钥。Observe the keys that begin with fs.azure.account.key. 如此示例图像中所示,帐户名称将成为密钥的一部分:The account name will be a part of the key as seen in this sample image:

    通过 Apache Ambari 进行验证

删除存储帐户Remove storage account

  1. 在 Web 浏览器中,导航到 https://CLUSTERNAME.azurehdinsight.cn, where,其中 CLUSTERNAME 是你的群集的名称。From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.cn, whereCLUSTERNAME` is the name of your cluster.

  2. 导航到“HDFS” > “配置” > “高级” > “自定义 core-site”。Navigate to HDFS > Configs > Advanced > Custom core-site.

  3. 删除以下密钥:Remove the following keys:

    • fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.chinacloudapi.cn
    • fs.azure.account.keyprovider.<STORAGE_ACCOUNT_NAME>.blob.core.chinacloudapi.cn

删除这些密钥并保存配置后,你需要逐一重启 Oozie、Yarn、MapReduce2、HDFS 和 Hive。After removing these keys and saving the configuration, you need to restart Oozie, Yarn, MapReduce2, HDFS, and Hive one by one.

已知问题Known issues

存储防火墙Storage firewall

如果选择在“选定网络”上通过“防火墙和虚拟网络”限制来保护存储帐户的安全, 请务必启用例外“允许受信任的 Microsoft 服务...”, 这样 HDInsight 就能访问存储帐户。If you choose to secure your storage account with the Firewalls and virtual networks restrictions on Selected networks, be sure to enable the exception Allow trusted Microsoft services... so that HDInsight can access your storage account.

更改密钥后,无法访问存储Unable to access storage after changing key

如果更改了存储帐户的密钥,HDInsight 不再能够访问存储帐户。If you change the key for a storage account, HDInsight can no longer access the storage account. HDInsight 使用群集的 core-site.xml 中缓存的密钥副本。HDInsight uses a cached copy of key in the core-site.xml for the cluster. 必须更新此缓存的副本,使其匹配新密钥。This cached copy must be updated to match the new key.

再次运行脚本操作 会更新密钥,因为脚本会检查该存储帐户的条目是否已存在。Running the script action again does not update the key, as the script checks to see if an entry for the storage account already exists. 如果该条目已存在,则它不执行任何更改。If an entry already exists, it does not make any changes.

若要解决此问题,请执行以下操作:To work around this problem:

  1. 删除存储帐户。Remove the storage account.
  2. 添加存储帐户。Add the storage account.

重要

对于附加到群集的主存储帐户,不支持轮换使用存储密钥。Rotating the storage key for the primary storage account attached to a cluster is not supported.

性能不佳Poor performance

如果存储帐户与 HDInsight 群集不在同一个区域中,可能会遇到性能不佳的情况。If the storage account is in a different region than the HDInsight cluster, you may experience poor performance. 访问不同区域中的数据会在区域 Azure 数据中心外部跨公共 Internet 发送网络流量,从而会导致延迟。Accessing data in a different region sends network traffic outside the regional Azure data center and across the public internet, which can introduce latency.

额外费用Additional charges

如果存储帐户与 HDInsight 群集不在同一个区域中,可能会在 Azure 帐单上发现额外出口费用。If the storage account is in a different region than the HDInsight cluster, you may notice additional egress charges on your Azure billing. 当数据离开区域数据中心时,将收取出口费用。An egress charge is applied when data leaves a regional data center. 即使流量发往另一区域中的另一个 Azure 数据中心,也将收取此费用。This charge is applied even if the traffic is destined for another Azure data center in a different region.

后续步骤Next steps

已学习如何将其他存储帐户添加到现有 HDInsight 群集。You have learned how to add additional storage accounts to an existing HDInsight cluster. 有关脚本操作的详细信息,请参阅使用脚本操作自定义基于 Linux 的 HDInsight 群集For more information on script actions, see Customize Linux-based HDInsight clusters using script action.