在 Azure 虚拟网络中使用 Azure 机器学习工作室Use Azure Machine Learning studio in an Azure virtual network

本文介绍如何在虚拟网络中使用 Azure 机器学习工作室。In this article, you learn how to use Azure Machine Learning studio in a virtual network. 工作室包括 AutoML、设计器和数据标签等功能。The studio includes features like AutoML, the designer, and data labeling. 若要在虚拟网络中使用这些功能,必须遵循本文中的步骤。In order to use those features in a virtual network, you must follow the steps in this article.

在本文中,学习如何:In this article, you learn how to:

  • 授予工作室访问存储在虚拟网络内部的数据的权限。Give the studio access to data stored inside of a virtual network.
  • 从虚拟网络内部的资源访问工作室。Access the studio from a resource inside of a virtual network.
  • 了解工作室如何影响存储安全性。Understand how the studio impacts storage security.

本文是由 4 部分组成的系列文章的第五部分,指导你如何保护 Azure 机器学习工作流。This article is part five of a four-part series that walks you through securing an Azure Machine Learning workflow.

请参阅本系列中的其他文章:See the other articles in this series:

1.保护工作区 > 2.保护训练环境 > 3.保护推理环境 > 4.启用工作室功能1. Secure the workspace > 2. Secure the training environment > 3. Secure the inferencing environment > 4. Enable studio functionality

重要

如果工作区位于主权云(如 Azure 中国世纪互联)中,则集成笔记本不支持使用虚拟网络中的存储。If your workspace is in a sovereign cloud, such as Azure China 21Vianet, integrated notebooks do not support using storage that is in a virtual network. 但你可以从计算实例使用 Jupyter Notebook。Instead, you can use Jupyter Notebooks from a compute instance. 有关详细信息,请参阅访问计算实例笔记本中的数据部分。For more information, see the Access data in a Compute Instance notebook section.

先决条件Prerequisites

在工作室中配置数据访问权限Configure data access in the studio

在虚拟网络中,默认情况下会禁用某些工作室功能。Some of the studio's features are disabled by default in a virtual network. 若要重新启用这些功能,必须为计划在工作室中使用的存储帐户启用托管标识。To re-enable these features, you must enable managed identity for storage accounts you intend to use in the studio.

在虚拟网络中,默认情况下禁用以下操作:The following operations are disabled by default in a virtual network:

工作室支持从虚拟网络中的以下数据存储类型读取数据:The studio supports reading data from the following datastore types in a virtual network:

  • Azure BlobAzure Blob
  • Azure Data Lake Storage Gen1Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2
  • Azure SQL 数据库Azure SQL Database

将数据存储配置为使用工作区托管标识Configure datastores to use workspace-managed identity

将 Azure 存储帐户添加到具有服务终结点专用终结点的虚拟网络后,必须配置数据存储才能使用托管标识身份验证。After you add an Azure storage account to your virtual network with a either a service endpoint or private endpoint, you must configure your datastore to use managed identity authentication. 这样,工作室就可以访问存储帐户中的数据。Doing so lets the studio access data in your storage account.

Azure 机器学习使用数据存储连接到存储帐户。Azure Machine Learning uses datastores to connect to storage accounts. 使用以下步骤,将数据存储配置为使用托管标识:Use the following steps to configure a datastore to use managed identity:

  1. 在工作室中,选择“数据存储”。In the studio, select Datastores.

  2. 若要更新现有数据存储,请选择相应的数据存储并选择“更新凭据”。To update an existing datastore, select the datastore and select Update credentials.

    若要创建新的数据存储,请选择“+ 新建数据存储”。To create a new datastore, select + New datastore.

  3. 在数据存储设置中,对于“在 Azure 机器学习工作室中使用工作区托管标识进行数据预览和分析”,选择“是” 。In the datastore settings, select Yes for Use workspace managed identity for data preview and profiling in Azure Machine Learning studio.

    显示如何启用托管工作区标识的屏幕截图

这些步骤使用 Azure RBAC 将工作区托管标识作为“读取者”添加到存储服务。These steps add the workspace-managed identity as a Reader to the storage service using Azure RBAC. “读取者”访问权限允许工作区检索防火墙设置,以确保数据不会离开虚拟网络。Reader access lets the workspace retrieve firewall settings to ensure that data doesn't leave the virtual network. 这些更改可能需要长达 10 分钟才能生效。Changes may take up to 10 minutes to take effect.

对默认存储账户启用托管标识身份验证Enable managed identity authentication for default storage accounts

每个 Azure 机器学习工作区都有两个默认存储帐户:一个默认的 Blob 存储帐户和一个默认的文件存储帐户,这两个帐户都是在创建工作区时定义的。Each Azure Machine Learning workspace has two default storage accounts, a default blob storage account and a default file store account, which are defined when you create your workspace. 还可以在“数据存储”管理页面中设置新的默认值。You can also set new defaults in the Datastore management page.

显示可在何处查找默认数据存储的屏幕截图

下表说明了必须为工作区默认存储帐户启用托管标识身份验证的原因。The following table describes why you must enable managed identity authentication for your workspace default storage accounts.

存储帐户Storage account 说明Notes
工作区默认 Blob 存储Workspace default blob storage 存储设计器中的模型资源。Stores model assets from the designer. 必须在此存储帐户上启用托管标识身份验证才能在设计器中部署模型。You must enable managed identity authentication on this storage account to deploy models in the designer.

如果设计器管道使用已配置为使用托管标识的非默认数据存储,则可以可视化和运行该设计器管道。You can visualize and run a designer pipeline if it uses a non-default datastore that has been configured to use managed identity. 但如果未在默认数据存储上启用托管标识就尝试部署定型模型,则无论是否正在使用任何其他数据存储,部署都会失败。However, if you try to deploy a trained model without managed identity enabled on the default datastore, deployment will fail regardless of any other datastores in use.
工作区默认文件存储Workspace default file store 存储 AutoML 试验资产。Stores AutoML experiment assets. 必须在此存储帐户上启用托管标识身份验证才能提交 AutoML 试验。You must enable managed identity authentication on this storage account to submit AutoML experiments.

警告

但有一个已知问题,即默认文件存储不会自动创建 azureml-filestore 文件夹,而提交 AutoML 试验又需要此文件夹。There's a known issue where the default file store does not automatically create the azureml-filestore folder, which is required to submit AutoML experiments. 当用户在创建工作区的过程中将现有文件存储设置为默认文件存储时,就会发生这种情况。This occurs when users bring an existing filestore to set as the default filestore during workspace creation.

有两个选项可避免此问题:1) 使用在创建工作区的过程中系统自动为你创建的默认文件存储。To avoid this issue, you have two options: 1) Use the default filestore which is automatically created for you doing workspace creation. 2) 若要使用自己的文件存储,请在创建工作区的过程中确保文件存储位于 VNet 外部。2) To bring your own filestore, make sure the filestore is outside of the VNet during workspace creation. 创建工作区后,再将存储帐户添加到虚拟网络。After the workspace is created, add the storage account to the virtual network.

若要解决此问题,请从虚拟网络中删除文件存储帐户,然后将其重新添加到虚拟网络。To resolve this issue, remove the filestore account from the virtual network then add it back to the virtual network.

如果 Azure 存储帐户使用专用终结点,则必须授予工作区托管标识对专用链接的“读取者”访问权限。If your Azure storage account uses a private endpoint, you must grant the workspace-managed identity Reader access to the private link. 有关详细信息,请参阅读取者内置角色。For more information, see the Reader built-in role.

如果你的存储帐户使用服务终结点,则可以跳过此步骤。If your storage account uses a service endpoint, you can skip this step.

从 VNet 内部的资源访问工作室Access the studio from a resource inside the VNet

如果要从虚拟网络内的资源(例如,计算实例或虚拟机)访问工作室,则必须允许从虚拟网络到工作室的出站流量。If you are accessing the studio from a resource inside of a virtual network (for example, a compute instance or virtual machine), you must allow outbound traffic from the virtual network to the studio.

例如,如果使用网络安全组 (NSG) 来限制出站流量,请将一条规则添加到 服务标记 目标 AzureFrontDoor.FrontendFor example, if you are using network security groups (NSG) to restrict outbound traffic, add a rule to a service tag destination of AzureFrontDoor.Frontend.

托管标识的技术说明Technical notes for managed identity

使用托管标识访问存储服务会影响安全注意事项。Using managed identity to access storage services impacts security considerations. 本部分介绍每种存储帐户类型的更改。This section describes the changes for each storage account type.

这些注意事项是专门针对你要访问的存储帐户类型的。These considerations are unique to the type of storage account you are accessing.

Azure Blob 存储Azure Blob storage

对于 Azure Blob 存储,还会将工作区托管标识添加为 Blob 数据读取者,以便它能够从 Blob 存储读取数据。For Azure Blob storage, the workspace-managed identity is also added as a Blob Data Reader so that it can read data from blob storage.

Azure Data Lake Storage Gen2 访问控制Azure Data Lake Storage Gen2 access control

你可以使用 Azure RBAC 和 POSIX 样式的访问控制列表 (ACL) 来控制虚拟网络内的数据访问。You can use both Azure RBAC and POSIX-style access control lists (ACLs) to control data access inside of a virtual network.

若要使用 Azure RBAC,请将工作区托管标识添加到 Blob 数据读取者角色。To use Azure RBAC, add the workspace-managed identity to the Blob Data Reader role. 有关详细信息,请参阅 Azure 基于角色的访问控制For more information, see Azure role-based access control.

若要使用 ACL,可以向工作区托管标识分配访问权限,就像向任何其他安全主体分配访问权限一样。To use ACLs, the workspace-managed identity can be assigned access just like any other security principle. 有关详细信息,请参阅文件和目录上的访问控制列表For more information, see Access control lists on files and directories.

Azure SQL 数据库包含用户Azure SQL Database contained user

若要使用托管标识访问存储在 Azure SQL 数据库中的数据,必须创建一个映射到托管标识的 SQL 包含用户。To access data stored in an Azure SQL Database using managed identity, you must create a SQL contained user that maps to the managed identity. 若要详细了解如何从外部提供程序创建用户,请参阅创建映射到 Azure AD 标识的包含用户For more information on creating a user from an external provider, see Create contained users mapped to Azure AD identities.

创建 SQL 包含用户后,使用 GRANT T-SQL 命令向该用户授予权限。After you create a SQL contained user, grant permissions to it by using the GRANT T-SQL command.

Azure 机器学习设计器中间模块输出Azure Machine Learning designer intermediate module output

可以在设计器中指定任何模块的输出位置。You can specify the output location for any module in the designer. 使用此功能可将中间数据集存储在单独的位置,以用于安全性、日志记录或审核目的。Use this to store intermediate datasets in separate location for security, logging, or auditing purposes. 指定输出:To specify output:

  1. 选择要指定其输出的模块。Select the module whose output you'd like to specify.
  2. 在右侧出现的模块设置窗格中,选择“输出设置”。In the module settings pane that appears to the right, select Output settings.
  3. 指定要用于每个模块输出的数据存储。Specify the datastore you want to use for each module output.

确保有权访问虚拟网络中的中间存储帐户。Make sure that you have access to the intermediate storage accounts in your virtual network. 否则,管道将会失败。Otherwise, the pipeline will fail.

还应该为中间存储帐户启用托管标识身份验证,以可视化输出数据。You should also enable managed identity authentication for intermediate storage accounts to visualize output data.

后续步骤Next steps

本文是由四部分构成的虚拟网络系列文章中的可选部分。This article is an optional part of a four-part virtual network series. 若要了解如何保护虚拟网络,请参阅其余文章:See the rest of the articles to learn how to secure a virtual network: