Azure HDInsight 静态数据双重加密Azure HDInsight double encryption for data at rest

本文介绍 Azure HDInsight 群集中静态数据的加密方法。This article discusses methods for encryption of data at rest in Azure HDInsight clusters. 静态数据加密是指对附加到 HDInsight 群集虚拟机的托管磁盘(数据磁盘、OS 磁盘和临时磁盘)进行加密。Data encryption at rest refers to encryption on managed disks (data disks, OS disks and temporary disks) attached to HDInsight cluster virtual machines.

本文档不会探讨 Azure 存储帐户中存储的数据。This document doesn't address data stored in your Azure Storage account. 群集中可能附加了一个或多个 Azure 存储帐户,其中的加密密钥可能是由 Microsoft 托管的或者客户自己管理的,但加密服务是不同的。Your clusters may have one or more attached Azure Storage accounts where the encryption keys could also be Microsoft-managed or customer-managed, but the encryption service is different. 有关 Azure 存储加密的详细信息,请参阅静态数据的 Azure 存储加密For more information about Azure Storage encryption, see Azure Storage encryption for data at rest.

简介Introduction

在 Azure 中有三个主要的托管磁盘角色:数据磁盘、OS 磁盘和临时磁盘。There are three main managed disk roles in Azure: the data disk, the OS disk, and the temporary disk. 有关不同类型的托管磁盘的详细信息,请参阅 Azure 托管磁盘简介For more information about different types of managed disks, see Introduction to Azure managed disks.

HDInsight 支持两个不同层中多种类型的加密:HDInsight supports multiple types of encryption in two different layers:

  • 服务器端加密 (SSE) - SSE 由存储服务执行。Server Side Encryption (SSE) - SSE is performed by the storage service. 在 HDInsight 中,SSE 用于加密 OS 磁盘和数据磁盘。In HDInsight, SSE is used to encrypt OS disks and data disks. 它默认为启用状态。It is enabled by default. SSE 是第 1 层加密服务。SSE is a layer 1 encryption service.
  • 在主机上使用平台管理的密钥进行加密(类似于 SSE),此类型的加密由存储服务执行。Encryption at host using platform-managed key - Similar to SSE, this type of encryption is performed by the storage service. 但是,它仅适用于临时磁盘,且默认情况下不启用。However, it is only for temporary disks and is not enabled by default. 主机加密也是第 1 层加密服务。Encryption at host is also a layer 1 encryption service.
  • 使用客户管理的密钥进行静态加密 - 可对数据和临时磁盘使用此类型的加密。Encryption at rest using customer managed key - This type of encryption can be used on data and temporary disks. 默认情况下不启用,并要求客户通过 Azure 密钥保管库提供自己的密钥。It is not enabled by default and requires the customer to provide their own key through Azure key vault. 静态加密是第 2 层加密服务。Encryption at rest is a layer 2 encryption service.

下表汇总了这些类型。These types are summarized in the following table.

群集类型Cluster type OS 磁盘(托管磁盘)OS Disk (Managed disk) 数据磁盘(托管磁盘)Data disk (Managed disk) 临时数据磁盘(本地 SSD)Temp data disk (Local SSD)
Kafka、有加速写入的 HBaseKafka, HBase with Accelerated writes Layer1:默认为 SSE 加密Layer1: SSE Encryption by default Layer1:默认为 SSE 加密,Layer2:使用 CMK 的可选静态加密Layer1: SSE Encryption by default, Layer2: Optional encryption at rest using CMK Layer1:使用 PMK 的可选主机加密,Layer2:使用 CMK 的可选静态加密Layer1: Optional Encryption at host using PMK, Layer2: Optional encryption at rest using CMK
所有其他群集(Spark、Interactive、Hadoop、无加速写入的 HBase)All other clusters (Spark, Interactive, Hadoop, HBase without Accelerated writes) Layer1:默认为 SSE 加密Layer1: SSE Encryption by default 空值N/A Layer1:使用 PMK 的可选主机加密,Layer2:使用 CMK 的可选静态加密Layer1: Optional Encryption at host using PMK, Layer2: Optional encryption at rest using CMK

使用客户管理的密钥进行静态加密Encryption at rest using Customer-managed keys

客户管理的密钥加密是在群集创建期间处理的单步骤过程,不收取额外的费用。Customer-managed key encryption is a one-step process handled during cluster creation at no additional cost. 需要做的就是使用 Azure Key Vault 授权托管标识,并在创建群集时添加加密密钥。All you need to do is to authorize a managed identity with Azure Key Vault and add the encryption key when you create your cluster.

群集每个节点上的数据磁盘和临时磁盘均已使用对称数据加密密钥 (DEK) 进行加密。Both data disks and temporary disks on each node of the cluster are encrypted with a symmetric Data Encryption Key (DEK). 使用密钥保管库中的密钥加密密钥 (KEK) 保护 DEK。The DEK is protected using the Key Encryption Key (KEK) from your key vault. 加密和解密过程完全由 Azure HDInsight 处理。The encryption and decryption processes are handled entirely by Azure HDInsight.

对于附加到群集 VM 的 OS 磁盘,只有一个加密层 (PMK) 可用。For OS disks attached to the cluster VMs only one layer of encryption (PMK) is available. 如果方案需要 CMK 加密,建议客户避免将敏感数据复制到 OS 磁盘。It is recommended that customers avoid copying sensitive data to OS disks if having a CMK encryption is required for their scenarios.

如果在存储磁盘加密密钥的密钥保管库上启用密钥保管库防火墙,则必须将用于部署群集的区域的 HDInsight 区域资源提供程序 IP 地址添加到密钥保管库防火墙配置。If the key vault firewall is enabled on the key vault where the disk encryption key is stored, the HDInsight regional Resource Provider IP addresses for the region where the cluster will be deployed must be added to the key vault firewall configuration. 这是必需的,因为 HDInsight 不是受信任的 Azure 密钥保管库服务。This is necessary because HDInsight is not a trusted Azure key vault service.

可以使用 Azure 门户或 Azure CLI 安全地旋转密钥保管库中的密钥。You can use the Azure portal or Azure CLI to safely rotate the keys in the key vault. 轮换密钥时,HDInsight 群集在几分钟内即可开始使用新密钥。When a key rotates, the HDInsight cluster starts using the new key within minutes. 启用软删除密钥保护功能可以防范勒索软件和意外删除。Enable the Soft Delete key protection features to protect against ransomware scenarios and accidental deletion. 未启用此项保护功能的 Key Vault 不受支持。Key vaults without this protection feature aren't supported.

客户托管密钥入门Get started with customer-managed keys

若要创建已启用客户管理的密钥的 HDInsight 群集,请完成以下步骤:To create a customer-managed key enabled HDInsight cluster, we'll go through the following steps:

  1. 创建 Azure 资源的托管标识Create managed identities for Azure resources
  2. 创建 Azure Key VaultCreate Azure Key Vault
  3. 创建密钥Create key
  4. 创建访问策略Create access policy
  5. 创建已启用客户管理的密钥的 HDInsight 群集Create HDInsight cluster with customer-managed key enabled
  6. 转换加密密钥Rotating the encryption key

以下各部分详细介绍了每个步骤。Each step is explained in one of the following sections in detail.

创建 Azure 资源的托管标识Create managed identities for Azure resources

创建用户分配的托管标识,以便向 Key Vault 进行身份验证。Create a user-assigned managed identity to authenticate to Key Vault.

有关具体步骤,请参阅创建用户分配的托管标识See Create a user-assigned managed identity for specific steps. 有关 Azure HDInsight 中托管标识的工作原理的详细信息,请参阅 Azure HDInsight 中的托管标识For more information on how managed identities work in Azure HDInsight, see Managed identities in Azure HDInsight. 将托管标识资源 ID 添加到 Key Vault 访问策略时,请务必保存该 ID。Be sure to save the managed identity resource ID for when you add it to the Key Vault access policy.

创建 Azure Key VaultCreate Azure Key Vault

创建密钥保管库。Create a key vault. 有关具体步骤,请参阅创建 Azure Key VaultSee Create Azure Key Vault for specific steps.

HDInsight 仅支持 Azure Key Vault。HDInsight only supports Azure Key Vault. 如果拥有自己的密钥保管库,则可以将密钥导入 Azure Key Vault。If you have your own key vault, you can import your keys into Azure Key Vault. 请记住,密钥保管库必须启用“软删除”。Remember that the key vault must have Soft delete enabled. 有关导入现有密钥的详细信息,请访问关于密钥、机密和证书For more information about importing existing keys, visit About keys, secrets, and certificates.

创建密钥Create key

  1. 在新密钥保管库中,导航到“设置” > “密钥” > “生成/导入”。 From your new key vault, navigate to Settings > Keys > + Generate/Import.

    在 Azure Key Vault 中生成新密钥Generate a new key in Azure Key Vault

  2. 提供名称,然后选择“创建”。Provide a name, then select Create. 保留默认密钥类型 RSAMaintain the default Key Type of RSA.

    生成密钥名称generates key name

  3. 返回到“密钥” 页时,选择创建的密钥。When you return to the Keys page, select the key you created.

    Key Vault 密钥列表

  4. 选择要打开“密钥版本” 页的版本。Select the version to open the Key Version page. 使用自己的密钥加密 HDInsight 群集时,需要提供密钥 URI。When you use your own key for HDInsight cluster encryption, you need to provide the key URI. 复制“密钥标识符” 并将其保存在某处,直到你准备好创建群集。Copy the Key identifier and save it somewhere until you're ready to create your cluster.

    获取密钥标识符

创建访问策略Create access policy

  1. 在新密钥保管库中,导航到“设置” > “访问策略” > “+ 添加访问策略”。 From your new key vault, navigate to Settings > Access policies > + Add Access Policy.

    创建新的 Azure Key Vault 访问策略

  2. 在“添加访问策略”页中提供以下信息:From the Add access policy page, provide the following information:

    属性Property 说明Description
    密钥权限Key Permissions 选择“获取”、“解包密钥” 和“包装密钥”。Select Get, Unwrap Key, and Wrap Key.
    机密权限Secret Permissions 选择“获取”、“设置”和“删除”。 Select Get, Set, and Delete.
    选择主体Select principal 选择前面创建的用户分配的托管标识。Select the user-assigned managed identity you created earlier.

    为 Azure Key Vault 访问策略设置“选择主体”

  3. 选择“添加” 。Select Add.

  4. 选择“保存” 。Select Save.

    保存 Azure Key Vault 访问策略

创建支持客户管理的密钥磁盘加密的群集Create cluster with customer-managed key disk encryption

现在已准备好新建 HDInsight 群集。You're now ready to create a new HDInsight cluster. 客户管理的密钥只能在群集创建期间应用于新群集。Customer-managed keys can only be applied to new clusters during cluster creation. 无法从客户管理的密钥群集中删除加密,无法将客户管理的密钥添加到现有群集。Encryption can't be removed from customer-managed key clusters, and customer-managed keys can't be added to existing clusters.

使用 Azure 门户Using the Azure portal

在群集创建期间,提供完整的“密钥标识符”,包括密钥版本。During cluster creation, provide the full Key identifier, including the key version. 例如,https://contoso-kv.vault.azure.net/keys/myClusterKey/46ab702136bc4b229f8b10e8c2997fa4For example, https://contoso-kv.vault.azure.net/keys/myClusterKey/46ab702136bc4b229f8b10e8c2997fa4. 还需要将托管标识分配给集群并提供密钥 URI。You also need to assign the managed identity to the cluster and provide the key URI.

创建新群集

使用 Azure CLIUsing Azure CLI

以下示例演示如何使用 Azure CLI 来创建已启用磁盘加密的新 Apache Spark 群集。The following example shows how to use Azure CLI to create a new Apache Spark cluster with disk encryption enabled. 有关详细信息,请参阅 Azure CLI az hdinsight createFor more information, see Azure CLI az hdinsight create.

az hdinsight create -t spark -g MyResourceGroup -n MyCluster \
-p "HttpPassword1234!" --workernode-data-disks-per-node 2 \
--storage-account MyStorageAccount \
--encryption-key-name SparkClusterKey \
--encryption-key-version 00000000000000000000000000000000 \
--encryption-vault-uri https://MyKeyVault.vault.azure.net \
--assign-identity MyMSI

使用 Azure 资源管理器模板Using Azure Resource Manager templates

以下示例演示如何使用 Azure 资源管理器模板来创建已启用磁盘加密的新 Apache Spark 群集。The following example shows how to use an Azure Resource Manager template to create a new Apache Spark cluster with disk encryption enabled. 有关详细信息,请参阅什么是 ARM 模板?For more information, see What are ARM templates?.

此示例使用 PowerShell 调用模板。This example uses PowerShell to call the template.

$templateFile = "azuredeploy.json"
$ResourceGroupName = "MyResourceGroup"
$clusterName = "MyCluster"
$password = ConvertTo-SecureString 'HttpPassword1234!' -AsPlainText -Force
$diskEncryptionVaultUri = "https://MyKeyVault.vault.azure.cn"
$diskEncryptionKeyName = "SparkClusterKey"
$diskEncryptionKeyVersion = "00000000000000000000000000000000"
$managedIdentityName = "MyMSI"

New-AzResourceGroupDeployment `
  -Name mySpark `
  -TemplateFile $templateFile `
  -ResourceGroupName $ResourceGroupName `
  -clusterName $clusterName `
  -clusterLoginPassword $password `
` -sshPassword $password `
  -diskEncryptionVaultUri $diskEncryptionVaultUri `
  -diskEncryptionKeyName $diskEncryptionKeyName `
  -diskEncryptionKeyVersion $diskEncryptionKeyVersion `
  -managedIdentityName $managedIdentityName

资源管理模板 azuredeploy.json 的内容:The contents of the resource management template, azuredeploy.json:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "0.9.0.0",
  "parameters": {
    "clusterName": {
      "type": "string",
      "metadata": {
        "description": "The name of the HDInsight cluster to create."
      }
    },
    "clusterLoginUserName": {
      "type": "string",
      "defaultValue": "admin",
      "metadata": {
        "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
      }
    },
    "clusterLoginPassword": {
      "type": "securestring",
      "metadata": {
        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "The location where all azure resources will be deployed."
      }
    },
    "sshUserName": {
      "type": "string",
      "defaultValue": "sshuser",
      "metadata": {
        "description": "These credentials can be used to remotely access the cluster."
      }
    },
    "sshPassword": {
      "type": "securestring",
      "metadata": {
        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
      }
    },
    "headNodeSize": {
      "type": "string",
      "defaultValue": "Standard_D12_v2",
      "metadata": {
        "description": "The VM size of the head nodes."
      }
    },
    "workerNodeSize": {
      "type": "string",
      "defaultValue": "Standard_D13_v2",
      "metadata": {
        "description": "The VM size of the worker nodes."
      }
    },
    "diskEncryptionVaultUri": {
      "type": "string",
      "metadata": {
        "description": "The Key Vault DNSname."
      }
    },
    "diskEncryptionKeyName": {
      "type": "string",
      "metadata": {
        "description": "The Key Vault key name."
      }
    },
    "diskEncryptionKeyVersion": {
      "type": "string",
      "metadata": {
        "description": "The Key Vault key version for the selected key."
      }
    },
    "managedIdentityName": {
      "type": "string",
      "metadata": {
        "description": "The user-assigned managed identity."
      }
    }
  },
  "variables": {
    "defaultStorageAccount": {
      "name": "[uniqueString(resourceGroup().id)]",
      "type": "Standard_LRS"
    }
  },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[variables('defaultStorageAccount').name]",
      "location": "[parameters('location')]",
      "apiVersion": "2019-06-01",
      "sku": {
        "name": "[variables('defaultStorageAccount').type]"
      },
      "kind": "Storage",
      "properties": {}
    },
    {
      "apiVersion": "2018-06-01-preview",
      "name": "[parameters('clusterName')]",
      "type": "Microsoft.HDInsight/clusters",
      "location": "[parameters('location')]",
      "properties": {
        "clusterVersion": "3.6",
        "osType": "Linux",
        "tier": "standard",
        "clusterDefinition": {
          "kind": "spark",
          "componentVersion": {
            "Spark": "2.3"
          },
          "configurations": {
            "gateway": {
              "restAuthCredential.isEnabled": true,
              "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
              "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
            }
          }
        },
        "storageProfile": {
          "storageaccounts": [
            {
              "name": "[replace(replace(reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2019-06-01').primaryEndpoints.blob,'https://',''),'/','')]",
              "isDefault": true,
              "container": "[parameters('clusterName')]",
              "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2019-06-01').keys[0].value]"
            }
          ]
        },
        "computeProfile": {
          "roles": [
            {
              "name": "headnode",
              "minInstanceCount": 1,
              "targetInstanceCount": 2,
              "hardwareProfile": {
                "vmSize": "[parameters('headNodeSize')]"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                },
              },
            },
            {
              "name": "workernode",
              "targetInstanceCount": 1,
              "hardwareProfile": {
                "vmSize": "[parameters('workerNodeSize')]"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                },
              },
            }
          ]
        },
        "minSupportedTlsVersion": "1.2",
        "diskEncryptionProperties": {
          "vaultUri": "[parameters('diskEncryptionVaultUri')]",
          "keyName": "[parameters('diskEncryptionKeyName')]",
          "keyVersion": "[parameters('diskEncryptionKeyVersion')]",
          "msiResourceId": "[resourceID('Microsoft.ManagedIdentity/userAssignedIdentities/', parameters('managedIdentityName'))]"
        }
      },
      "identity": {
        "type": "UserAssigned",
        "userAssignedIdentities": {
          "[resourceID('Microsoft.ManagedIdentity/userAssignedIdentities/', parameters('managedIdentityName'))]": {}
        }
      }
    }
  ]
}

转换加密密钥Rotating the encryption key

在某些情况下,在创建 HDInsight 群集后,你可能想要更改它使用的加密密钥。There might be scenarios where you might want to change the encryption keys used by the HDInsight cluster after it has been created. 可以通过门户轻松实现此目的。This can be easily via the portal. 对于此操作,群集必须有权访问当前密钥和所需的新密钥,否则轮换密钥操作将会失败。For this operation, the cluster must have access to both the current key and the intended new key, otherwise the rotate key operation will fail.

使用 Azure 门户Using the Azure portal

若要轮换密钥,需要基密钥保管库 URI。To rotate the key, you need the base key vault URI. 完成此操作后,转到门户中的“HDInsight 群集属性”部分,单击“磁盘加密密钥 URL”下的“更改密钥”。 Once you've done that, go to the HDInsight cluster properties section in the portal and click on Change Key under Disk Encryption Key URL. 输入新密钥的 URL,并提交轮换密钥的操作。Enter in the new key url and submit to rotate the key.

轮换磁盘加密密钥

使用 Azure CLIUsing Azure CLI

以下示例演示如何轮换现有 HDInsight 群集的磁盘加密密钥。The following example shows how to rotate the disk encryption key for an existing HDInsight cluster. 有关详细信息,请参阅 Azure CLI az hdinsight rotate-disk-encryption-keyFor more information, see Azure CLI az hdinsight rotate-disk-encryption-key.

az hdinsight rotate-disk-encryption-key \
--encryption-key-name SparkClusterKey \
--encryption-key-version 00000000000000000000000000000000 \
--encryption-vault-uri https://MyKeyVault.vault.azure.cn \
--name MyCluster \
--resource-group MyResourceGroup

客户管理的密钥加密的常见问题解答FAQ for customer-managed key encryption

HDInsight 群集如何访问我的 Key Vault?How does the HDInsight cluster access my key vault?

HDInsight 使用与 HDInsight 群集关联的托管标识来访问 Azure Key Vault 实例。HDInsight accesses your Azure Key Vault instance using the managed identity that you associate with the HDInsight cluster. 可以在群集创建之前或创建期间创建此托管标识。This managed identity can be created before or during cluster creation. 还需要对存储密钥的密钥保管库的托管标识授予访问权限。You also need to grant the managed identity access to the key vault where the key is stored.

此功能是否适用于 HDInsight 上的所有群集?Is this feature available for all clusters on HDInsight?

客户管理的密钥加密适用于除 Spark 2.1 和 2.2 以外的其他所有群集类型。Customer-managed key encryption is available for all cluster types except Spark 2.1 and 2.2.

是否可以使用多个密钥来加密不同的磁盘或文件夹?Can I use multiple keys to encrypt different disks or folders?

不可以,所有托管磁盘和资源磁盘由同一个密钥加密。No, all managed disks and resource disks are encrypted by the same key.

如果群集失去了对 Key Vault 或密钥的访问权限,会发生什么情况?What happens if the cluster loses access to the key vault or the key?

如果群集失去了对密钥的访问权限,Apache Ambari 门户中会显示警告。If the cluster loses access to the key, warnings will be shown in the Apache Ambari portal. 在此状态下,“更改密钥”操作将会失败。In this state, the Change Key operation will fail. 恢复密钥访问权限后,Ambari 警告将会消失,密钥轮换等操作可以成功执行。Once key access is restored, Ambari warnings will go away and operations such as key rotation can be successfully performed.

密钥访问 Ambari 警报

如果删除密钥,如何恢复群集?How can I recover the cluster if the keys are deleted?

由于仅支持已启用“软删除”的密钥,因此,如果 Key Vault 中的密钥已恢复,则群集应重新获得对密钥的访问权限。Since only "Soft Delete" enabled keys are supported, if the keys are recovered in the key vault, the cluster should regain access to the keys. 若要恢复 Azure Key Vault 密钥,请参阅 Undo-AzKeyVaultKeyRemovalaz-keyvault-key-recoverTo recover an Azure Key Vault key, see Undo-AzKeyVaultKeyRemoval or az-keyvault-key-recover.

如果群集已纵向扩展,新节点是否无缝支持客户管理的密钥?If a cluster is scaled up, will the new nodes support customer-managed keys seamlessly?

是的。Yes. 在纵向扩展期间,群集需要访问密钥保管库中的密钥。The cluster needs access to the key in the key vault during scale up. 将使用同一个密钥来加密群集中的托管磁盘和资源磁盘。The same key is used to encrypt both managed disks and resource disks in the cluster.

是否可以在我的位置使用客户管理的密钥?Are customer-managed keys available in my location?

可以在所有公有云和国家云中使用 HDInsight 客户管理的密钥。HDInsight customer-managed keys are available in all public clouds and national clouds.

后续步骤Next steps