使用 Azure 机器学习进行数据加密Data encryption with Azure Machine Learning

Azure 机器学习在训练模型和执行推理时使用各种 Azure 数据存储服务和计算资源。Azure Machine Learning uses a variety of Azure data storage services and compute resources when training models and performing inference. 上述每种服务和资源在为静态数据和传输中数据提供加密方面都有其自己的方式。Each of these has their own story on how they provide encryption for data at rest and in transit. 本文介绍每一种服务和资源以及哪种最适合你的方案。In this article, learn about each one and which is best for your scenario.

重要

对于培训期间的生产级别加密,Microsoft 建议使用 Azure 机器学习计算群集。For production grade encryption during training, Microsoft recommends using Azure Machine Learning compute cluster. 对于推断期间的生产级别加密,Microsoft 建议使用 Azure Kubernetes 服务。For production grade encryption during inference, Microsoft recommends using Azure Kubernetes Service.

Azure 机器学习计算实例是开发/测试环境。Azure Machine Learning compute instance is a dev/test environment. 使用它时,我们建议将文件(如笔记本和脚本)存储在文件共享中。When using it, we recommend that you store your files, such as notebooks and scripts, in a file share. 数据应存储在数据存储中。Your data should be stored in a datastore.

静态加密Encryption at rest

重要

如果工作区包含敏感数据,我们建议在创建工作区时设置 hbi_workspace 标志If your workspace contains sensitive data we recommend setting the hbi_workspace flag while creating your workspace. 只能在创建工作区时设置 hbi_workspace 标志。The hbi_workspace flag can only be set when a workspace is created. 不能更改现有工作区的这个标志。It cannot be changed for an existing workspace.

hbi_workspace 标志控制 Microsoft 为诊断而收集的数据量,并在 Microsoft 托管环境中启用其他加密The hbi_workspace flag controls the amount of data Microsoft collects for diagnostic purposes and enables additional encryption in Microsoft-managed environments. 此外,该标志启用以下操作:In addition, it enables the following actions:

  • 开始加密 Azure 机器学习计算群集中的本地暂存磁盘,前提是尚未在该订阅中创建任何以前的群集。Starts encrypting the local scratch disk in your Azure Machine Learning compute cluster provided you have not created any previous clusters in that subscription. 否则,需要提供支持票证来启用对计算群集的暂存磁盘的加密Else, you need to raise a support ticket to enable encryption of the scratch disk of your compute clusters
  • 在不同运行之间清理本地暂存磁盘Cleans up your local scratch disk between runs
  • 使用密钥保管库,将存储帐户、容器注册表和 SSH 帐户的凭据从执行层安全地传递到计算群集Securely passes credentials for your storage account, container registry, and SSH account from the execution layer to your compute clusters using your key vault
  • 启用 IP 筛选,以确保基础批处理池不会由除 AzureMachineLearningService 以外的任何外部服务调用Enables IP filtering to ensure the underlying batch pools cannot be called by any external services other than AzureMachineLearningService
  • 请注意,HBI 工作区不支持计算实例Please note compute instances are not supported in HBI workspace

Azure Blob 存储Azure Blob storage

Azure 机器学习在绑定到 Azure 机器学习工作区和订阅的 Azure Blob 存储帐户中存储快照、输出与日志。Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account that's tied to the Azure Machine Learning workspace and your subscription. Azure Blob 存储中存储的所有数据已通过 Microsoft 管理的密钥静态加密。All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.

有关如何对 Azure Blob 存储中存储的数据使用自己密钥的信息,请参阅使用 Azure Key Vault 中客户管理的密钥进行 Azure 存储加密For information on how to use your own keys for data stored in Azure Blob storage, see Azure Storage encryption with customer-managed keys in Azure Key Vault.

训练数据通常也存储在 Azure Blob 存储中,因此可供训练计算目标访问。Training data is typically also stored in Azure Blob storage so that it's accessible to training compute targets. 此存储并不受 Azure 机器学习管理,而是作为远程文件系统装载到计算目标上。This storage isn't managed by Azure Machine Learning but mounted to compute targets as a remote file system.

如果需要轮换或撤销密钥,则可以随时执行此操作。If you need to rotate or revoke your key, you can do so at any time. 轮换密钥时,存储帐户将开始使用新密钥(最新版本)来加密静态数据。When rotating a key, the storage account will start using the new key (latest version) to encrypt data at rest. 撤消(禁用)密钥时,存储帐户会处理失败的请求。When revoking (disabling) a key, the storage account takes care of failing requests. 轮换或撤消操作通常需要一小时才能生效。It usually takes an hour for the rotation or revocation to be effective.

有关重新生成访问密钥的信息,请参阅重新生成存储访问密钥For information on regenerating the access keys, see Regenerate storage access keys.

Azure Cosmos DBAzure Cosmos DB

Azure 机器学习在 Azure Cosmos DB 实例中存储元数据。Azure Machine Learning stores metadata in an Azure Cosmos DB instance. 此实例与 Azure 机器学习管理的 Microsoft 订阅相关联。This instance is associated with a Microsoft subscription managed by Azure Machine Learning. Azure Cosmos DB 中存储的所有数据都使用 Microsoft 托管密钥进行静态加密。All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.

若要使用自己的(客户管理的)密钥来加密 Azure Cosmos DB 实例,可以创建一个专用的 Cosmos DB 实例用于工作区。To use your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, you can create a dedicated Cosmos DB instance for use with your workspace. 如果要将数据(例如运行历史记录信息)存储在 Microsoft 订阅中托管的多租户 Cosmos DB 实例之外,则建议采用此方法。We recommend this approach if you want to store your data, such as run history information, outside of the multi-tenant Cosmos DB instance hosted in our Microsoft subscription.

若要使用客户管理的密钥来预配订阅中的 Cosmos DB 实例,请执行以下操作:To enable provisioning a Cosmos DB instance in your subscription with customer-managed keys, perform the following actions:

  • 在订阅中注册 Microsoft.MachineLearning 和 Microsoft.DocumentDB 资源提供程序(如果尚未注册)。Register the Microsoft.MachineLearning and Microsoft.DocumentDB resource providers in your subscription, if not done already.

  • 创建 Azure 机器学习工作区时,请使用以下参数。Use the following parameters when creating the Azure Machine Learning workspace. 这两个参数都是必需的,并且在 SDK、CLI、REST API 和资源管理器模板中受支持。Both parameters are mandatory and supported in SDK, CLI, REST APIs, and Resource Manager templates.

    • resource_cmk_uri:此参数是密钥保管库中客户管理的密钥的完整资源 URI,其中包括密钥的版本信息resource_cmk_uri: This parameter is the full resource URI of the customer managed key in your key vault, including the version information for the key.

    • cmk_keyvault:此参数是订阅中密钥保管库的资源 ID。cmk_keyvault: This parameter is the resource ID of the key vault in your subscription. 此密钥保管库位于要用于 Azure 机器学习工作区的同一区域和订阅中。This key vault needs to be in the same region and subscription that you will use for the Azure Machine Learning workspace.

      备注

      此密钥保管库实例可能不同于在预配工作区时 Azure 机器学习创建的密钥保管库。This key vault instance can be different than the key vault that is created by Azure Machine Learning when you provision the workspace. 如果要对工作区使用相同的密钥保管库实例,请在使用 key_vault 参数预配工作区时传递相同的密钥保管库。If you want to use the same key vault instance for the workspace, pass the same key vault while provisioning the workspace by using the key_vault parameter.

重要

此 Cosmos DB 实例及其所需的全部资源是在订阅的 Microsoft 托管资源组中创建的。The Cosmos DB instance is created in a Microsoft-managed resource group in your subscription, along with any resources it needs. 这意味着需要为此 Cosmos DB 实例付费。This means that you are charged for this Cosmos DB instance. 托管资源组的命名格式为 <AML Workspace Resource Group Name><GUID>The managed resource group is named in the format <AML Workspace Resource Group Name><GUID>. 如果 Azure 机器学习工作区使用专用终结点,则还会为 Cosmos DB 实例创建一个虚拟网络。If your Azure Machine Learning workspace uses a private endpoint, a virtual network is also created for the Cosmos DB instance. 此 VNet 用于保护 Cosmos DB 与 Azure 机器学习之间的通信。This VNet is used to secure communication between Cosmos DB and Azure Machine Learning.

  • 请勿删除包含此 Cosmos DB 实例的资源组,也不要删除此组中自动创建的任何资源。Do not delete the resource group that contains this Cosmos DB instance, or any of the resources automatically created in this group. 如果需要删除该资源组和 Cosmos DB 实例等内容,必须删除使用它的 Azure 机器学习工作区。If you need to delete the resource group, Cosmos DB instance, etc., you must delete the Azure Machine Learning workspace that uses it. 删除与资源组、Cosmos DB 实例和其他自动创建的资源相关联的工作区时,这些资源都将被删除。The resource group, Cosmos DB instance, and other automatically created resources are deleted when the associated workspace is deleted.
  • 此 Cosmos DB 帐户的默认请求单位数设置为“8000” 。The default Request Units for this Cosmos DB account is set at 8000.
  • 不能提供自己的 VNet 来与创建的 Cosmos DB 实例一起使用。You cannot provide your own VNet for use with the Cosmos DB instance that is created. 也不能修改虚拟网络。You also cannot modify the virtual network. 例如,你不能更改它使用的 IP 地址范围。For example, you cannot change the IP address range that it uses.

如果需要轮换或撤销密钥,则可以随时执行此操作。If you need to rotate or revoke your key, you can do so at any time. 轮换密钥时,Cosmos DB 将开始使用新密钥(最新版本)来加密静态数据。When rotating a key, Cosmos DB will start using the new key (latest version) to encrypt data at rest. 撤消(禁用)密钥时,Cosmos DB 会处理失败的请求。When revoking (disabling) a key, Cosmos DB takes care of failing requests. 轮换或撤消操作通常需要一小时才能生效。It usually takes an hour for the rotation or revocation to be effective.

有关 Cosmos DB 的客户管理的密钥的详细信息,请参阅为 Azure Cosmos DB 帐户配置客户管理的密钥For more information on customer-managed keys with Cosmos DB, see Configure customer-managed keys for your Azure Cosmos DB account.

Azure 容器注册表Azure Container Registry

注册表(Azure 容器注册表)中的所有容器映像均已进行静态加密。All container images in your registry (Azure Container Registry) are encrypted at rest. Azure 会在存储映像之前自动将其加密,并在 Azure 机器学习提取映像时将其解密。Azure automatically encrypts an image before storing it and decrypts it when Azure Machine Learning pulls the image.

若要使用自己的(客户管理的)密钥来加密 Azure 容器注册表,需要创建自己的 ACR 并在预配工作区时附加它,或者加密预配工作区时创建的默认实例。To use your own (customer-managed) keys to encrypt your Azure Container Registry, you need to create your own ACR and attach it while provisioning the workspace or encrypt the default instance that gets created at the time of workspace provisioning.

重要

Azure 机器学习要求在 Azure 容器注册表中启用管理员帐户。Azure Machine Learning requires the admin account be enabled on your Azure Container Registry. 创建容器注册表时,默认情况下此设置已禁用。By default, this setting is disabled when you create a container registry. 有关如何启用管理员帐户的信息,请参阅管理员帐户For information on enabling the admin account, see Admin account.

为工作区创建 Azure 容器注册表后,请不要将其删除。Once an Azure Container Registry has been created for a workspace, do not delete it. 删除该注册表将损坏 Azure 机器学习工作区。Doing so will break your Azure Machine Learning workspace.

有关使用现有 Azure 容器注册表创建工作区的示例,请参阅以下文章:For an example of creating a workspace using an existing Azure Container Registry, see the following articles:

Azure 容器实例Azure Container Instance

可以使用客户管理的密钥来加密已部署的 Azure 容器实例 (ACI) 资源。You may encrypt a deployed Azure Container Instance (ACI) resource using customer-managed keys. 用于 ACI 的客户管理的密钥可存储在用于工作区的 Azure Key Vault 中。The customer-managed key used for ACI can be stored in the Azure Key Vault for your workspace. 有关生成密钥的信息,请参阅使用客户管理的密钥加密数据For information on generating a key, see Encrypt data with a customer-managed key.

若要在将模型部署到 Azure 容器实例时使用密钥,请使用 AciWebservice.deploy_configuration() 创建新的部署配置。To use the key when deploying a model to Azure Container Instance, create a new deployment configuration using AciWebservice.deploy_configuration(). 使用以下参数提供密钥信息:Provide the key information using the following parameters:

  • cmk_vault_base_url:包含密钥的密钥保管库的 URL。cmk_vault_base_url: The URL of the key vault that contains the key.
  • cmk_key_name:键的名称。cmk_key_name: The name of the key.
  • cmk_key_version:密钥版本。cmk_key_version: The version of the key.

有关如何创建和使用部署配置的详细信息,请参阅以下文章:For more information on creating and using a deployment configuration, see the following articles:

有关如何将客户管理的密钥用于 ACI 的详细信息,请参阅使用客户管理的密钥加密数据For more information on using a customer-managed key with ACI, see Encrypt data with a customer-managed key.

Azure Kubernetes 服务Azure Kubernetes Service

随时可以使用客户管理的密钥来加密已部署的 Azure Kubernetes 服务资源。You may encrypt a deployed Azure Kubernetes Service resource using customer-managed keys at any time. 有关详细信息,请参阅在 Azure Kubernetes 服务中使用自己的密钥For more information, see Bring your own keys with Azure Kubernetes Service.

此过程允许加密 Kubernetes 群集中已部署的虚拟机的数据和 OS 磁盘。This process allows you to encrypt both the Data and the OS Disk of the deployed virtual machines in the Kubernetes cluster.

重要

此过程仅适用于 AKS K8s 1.17 或更高版本。This process only works with AKS K8s version 1.17 or higher. Azure 机器学习在 2020 年 1 月 13 日添加了对 AKS 1.17 的支持。Azure Machine Learning added support for AKS 1.17 on Jan 13, 2020.

机器学习计算Machine Learning Compute

Azure 存储中存储的每个计算节点的 OS 磁盘,已通过 Azure 机器学习存储帐户中由 Microsoft 管理的密钥进行加密。The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. 此计算目标是暂时的;没有排队的运行时,群集通常会缩减。This compute target is ephemeral, and clusters are typically scaled down when no runs are queued. 底层虚拟机将解除预配,OS 磁盘将被删除。The underlying virtual machine is de-provisioned, and the OS disk is deleted. OS 磁盘不支持 Azure 磁盘加密。Azure Disk Encryption isn't supported for the OS disk.

每个虚拟机还包含一个本地临时磁盘用于 OS 操作。Each virtual machine also has a local temporary disk for OS operations. 如果需要,可以使用该磁盘来暂存训练数据。If you want, you can use the disk to stage training data. 对于其 hbi_workspace 参数设置为 TRUE 的工作区,默认会加密磁盘。The disk is encrypted by default for workspaces with the hbi_workspace parameter set to TRUE. 此环境仅在运行期间短暂存在,加密支持仅限于系统管理的密钥。This environment is short-lived only for the duration of your run, and encryption support is limited to system-managed keys only.

Azure DatabricksAzure Databricks

Azure Databricks 可在 Azure 机器学习管道中使用。Azure Databricks can be used in Azure Machine Learning pipelines. 默认情况下,Azure Databricks 使用的 Databricks 文件系统 (DBFS) 使用 Microsoft 托管密钥进行加密。By default, the Databricks File System (DBFS) used by Azure Databricks is encrypted using a Microsoft-managed key. 若要将 Azure Databricks 配置为使用客户管理的密钥,请参阅在默认(根)DBFS 上配置客户管理的密钥To configure Azure Databricks to use customer-managed keys, see Configure customer-managed keys on default (root) DBFS.

Microsoft 生成的数据Microsoft-generated data

使用自动化机器学习等服务时,Microsoft 可能会生成经过预处理的暂用数据用于训练多个模型。When using services such as Automated Machine Learning, Microsoft may generate a transient, pre-processed data for training multiple models. 此数据存储在工作区中的数据存储内,使你可以适当地强制实施访问控制和加密。This data is stored in a datastore in your workspace, which allows you to enforce access controls and encryption appropriately.

你可能还想要加密从已部署的终结点记录到 Azure Application Insights 实例的诊断信息You may also want to encrypt diagnostic information logged from your deployed endpoint into your Azure Application Insights instance.

传输中加密Encryption in transit

Azure 机器学习使用 TLS 来保护各种 Azure 机器学习微服务之间的内部通信。Azure Machine Learning uses TLS to secure internal communication between various Azure Machine Learning microservices. 所有 Azure 存储访问也都通过安全通道进行。All Azure Storage access also occurs over a secure channel.

Azure 机器学习使用 TLS 来保护对评分终结点的外部调用。To secure external calls made to the scoring endpoint, Azure Machine Learning uses TLS. 有关详细信息,请参阅使用 TLS 通过 Azure 机器学习来保护 Web 服务For more information, see Use TLS to secure a web service through Azure Machine Learning.

数据收集和处理Data collection and handling

Microsoft 收集的数据Microsoft collected data

Microsoft 可能会收集非用户标识信息,如资源名称(例如数据集名称或机器学习试验名称)或用于诊断的作业环境变量。Microsoft may collect non-user identifying information like resource names (for example the dataset name, or the machine learning experiment name), or job environment variables for diagnostic purposes. 所有此类数据都使用 Microsoft 托管密钥存储在 Microsoft 拥有的订阅中托管的存储中,并遵循 Microsoft 的标准隐私策略和数据处理标准All such data is stored using Microsoft-managed keys in storage hosted in Microsoft owned subscriptions and follows Microsoft's standard Privacy policy and data handling standards.

Microsoft 还建议不要在环境变量中存储敏感信息(如帐户密钥机密)。Microsoft also recommends not storing sensitive information (such as account key secrets) in environment variables. 我们会记录、加密和存储环境变量。Environment variables are logged, encrypted, and stored by us. 同样,为 run_id 命名时,请避免包含用户名或机密项目名称等敏感信息。Similarly when naming run_id, avoid including sensitive information such as user names or secret project names. 此信息可能会出现在可供 Microsoft 支持部门工程师访问的遥测日志中。This information may appear in telemetry logs accessible to Microsoft Support engineers.

预配工作区时,可以通过将 hbi_workspace 参数设置为 TRUE 来选择退出收集诊断数据。You may opt out from diagnostic data being collected by setting the hbi_workspace parameter to TRUE while provisioning the workspace. 使用 AzureML Python SDK、CLI、REST API 或 Azure 资源管理器模板时支持此功能。This functionality is supported when using the AzureML Python SDK, CLI, REST APIs, or Azure Resource Manager templates.

使用 Azure Key VaultUsing Azure Key Vault

Azure 机器学习使用与工作区关联的 Azure Key Vault 实例来存储各种凭据:Azure Machine Learning uses the Azure Key Vault instance associated with the workspace to store credentials of various kinds:

  • 关联的存储帐户连接字符串The associated storage account connection string
  • Azure 容器存储库实例的密码Passwords to Azure Container Repository instances
  • 数据存储的连接字符串Connection strings to data stores

Azure HDInsight 等计算目标和 VM 的 SSH 密码与密钥存储在与 Microsoft 订阅关联的独立 Key Vault 中。SSH passwords and keys to compute targets like Azure HDInsight and VMs are stored in a separate key vault that's associated with the Microsoft subscription. Azure 机器学习不会存储用户提供的任何密码或密钥,Azure Machine Learning doesn't store any passwords or keys provided by users. 而是生成、授权并存储自身的 SSH 密钥,用于连接到 VM 和 HDInsight 以运行试验。Instead, it generates, authorizes, and stores its own SSH keys to connect to VMs and HDInsight to run the experiments.

每个工作区有一个关联的系统分配的托管标识,该标识与工作区同名。Each workspace has an associated system-assigned managed identity that has the same name as the workspace. 此托管标识可以访问密钥保管库中的所有密钥、机密和证书。This managed identity has access to all keys, secrets, and certificates in the key vault.

后续步骤Next steps