在 Azure 虚拟网络中使用 Azure 机器学习工作室Use Azure Machine Learning studio in an Azure virtual network

本文介绍如何在虚拟网络中使用 Azure 机器学习工作室。In this article, you learn how to use Azure Machine Learning studio in a virtual network. 学习如何:You learn how to:

  • 从虚拟网络内部的资源访问工作室。Access the studio from a resource inside of a virtual network.
  • 授予工作室访问存储在虚拟网络内部的数据的权限。Give the studio access to data stored inside of a virtual network.
  • 了解工作室如何影响存储安全性。Understand how storage security is impacted by the studio.

本文是由 4 部分组成的系列文章的第五部分,指导你如何保护 Azure 机器学习工作流。This article is part five of a four-part series that walks you through securing an Azure Machine Learning workflow.

请参阅本系列中的其他文章:See the other articles in this series:

1.保护工作区 > 2.保护训练环境 > 3.保护推理环境 > 4.启用工作室功能1. Secure the workspace > 2. Secure the training environment > 3. Secure the inferencing environment > 4. Enable studio functionality

重要

尽管大多数工作室都可与虚拟网络中存储的数据配合使用,但集成笔记本并非如此。While most of the studio works with data stored in a virtual network, integrated notebooks do not . 集成笔记本不支持使用虚拟网络中的存储。Integrated notebooks do not support using storage that is in a virtual network. 但你可以从计算实例使用 Jupyter Notebook。Instead, you can use Jupyter Notebooks from a compute instance. 有关详细信息,请参阅访问计算实例笔记本中的数据部分。For more information, see the Access data in a Compute Instance notebook section.

先决条件Prerequisites

从 VNet 内部的资源访问工作室Access the studio from a resource inside the VNet

如果要从虚拟网络内的资源(例如,计算实例或虚拟机)访问工作室,则必须允许从虚拟网络到工作室的出站流量。If you are accessing the studio from a resource inside of a virtual network (for example, a compute instance or virtual machine), you must allow outbound traffic from the virtual network to the studio.

例如,如果使用网络安全组 (NSG) 来限制出站流量,请将一条规则添加到 服务标记 目标 AzureFrontDoor.Frontend 。For example, if you are using network security groups (NSG) to restrict outbound traffic, add a rule to a service tag destination of AzureFrontDoor.Frontend .

使用工作室访问数据Access data using the studio

将 Azure 存储帐户添加到虚拟网络后,必须配置存储帐户,以使用托管标识授予工作室对数据的访问权限。After you add an Azure storage account to your virtual network, you must configure your storage account to use managed identity to grant the studio access to your data. 工作室支持将存储帐户配置为使用服务终结点或专用终结点。The studio supports storage accounts configured to use service endpoints or private endpoints. 存储帐户默认使用服务终结点。Storage accounts use service endpoints by default.

如果未启用托管标识,则会收到以下错误 Error: Unable to profile this dataset. This might be because your data is stored behind a virtual network or your data does not support profile. 此外,将禁用以下操作:If you do not enable managed identity, you will receive this error, Error: Unable to profile this dataset. This might be because your data is stored behind a virtual network or your data does not support profile. Additionally, the following operations will be disabled:

  • 预览工作室中的数据。Preview data in the studio.
  • 在设计器中将数据可视化。Visualize data in the designer.
  • 提交 AutoML 试验。Submit an AutoML experiment.
  • 启动标记项目。Start a labeling project.

工作室支持从虚拟网络中的以下数据存储类型读取数据:The studio supports reading data from the following datastore types in a virtual network:

  • Azure BlobAzure Blob
  • Azure Data Lake Storage Gen1Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2
  • Azure SQL 数据库Azure SQL Database

将数据存储配置为使用托管标识Configure datastores to use managed identity

Azure 机器学习使用数据存储连接到存储帐户。Azure Machine Learning uses datastores to connect to storage accounts. 使用以下步骤,将数据存储配置为使用托管标识。Use the following steps to configure your datastores to use managed identity.

  1. 在工作室中,选择“数据存储”。In the studio, select Datastores .

  2. 若要创建新的数据存储,请选择“+ 新建数据存储”。To create a new datastore, select + New datastore .

    若要更新现有数据存储,请选择相应的数据存储并选择“更新凭据”。To update an existing datastore, select the datastore and select Update credentials .

  3. 在数据存储设置中,对于“允许 Azure 机器学习服务使用工作区托管标识来访问存储”,选择“是” 。In the datastore settings, select Yes for Allow Azure Machine Learning service to access the storage using workspace-managed identity .

这些步骤使用 Azure 基于资源的访问控制 (RBAC) 将工作区托管标识作为“读取者”添加到存储服务。These steps add the workspace-managed identity as a Reader to the storage service using Azure resource-based access control (RBAC). “读取者”访问权限允许工作区检索防火墙设置,并确保数据不会离开虚拟网络。Reader access lets the workspace retrieve firewall settings, and ensure that data doesn't leave the virtual network.

备注

这些更改可能需要长达 10 分钟才能生效。These changes may take up to 10 minutes to take effect.

托管标识的技术说明Technical notes for managed identity

使用托管标识访问存储服务会影响一些安全注意事项。Using managed identity to access storage services impacts some security considerations. 这些注意事项是专门针对你要访问的存储帐户类型的。These considerations are unique to the type of storage account you are accessing. 本部分介绍每种存储帐户类型的更改。This section describes the changes for each storage account type.

Azure Blob 存储Azure Blob storage

对于 Azure Blob 存储,还会将工作区托管标识添加为 Blob 数据读取者,以便它能够从 Blob 存储读取数据。For Azure Blob storage , the workspace-managed identity is also added as a Blob Data Reader so that it can read data from blob storage.

Azure Data Lake Storage Gen2 访问控制Azure Data Lake Storage Gen2 access control

你可以使用 RBAC 和 POSIX 样式的访问控制列表 (ACL) 来控制虚拟网络内的数据访问。You can use both RBAC and POSIX-style access control lists (ACLs) to control data access inside of a virtual network.

若要使用 RBAC,请将工作区托管标识添加到 Blob 数据读取者角色。To use RBAC, add the workspace-managed identity to the Blob Data Reader role. 有关详细信息,请参阅基于角色的访问控制For more information, see Role-based access control.

若要使用 ACL,可以向工作区托管标识分配访问权限,就像向任何其他安全主体分配访问权限一样。To use ACLs, the workspace-managed identity can be assigned access just like any other security principle. 有关详细信息,请参阅文件和目录上的访问控制列表For more information, see Access control lists on files and directories.

Azure SQL 数据库包含用户Azure SQL Database contained user

若要使用托管标识访问存储在 Azure SQL 数据库中的数据,必须创建一个映射到托管标识的 SQL 包含用户。To access data stored in an Azure SQL Database using managed identity, you must create a SQL contained user that maps to the managed identity. 若要详细了解如何从外部提供程序创建用户,请参阅创建映射到 Azure AD 标识的包含用户For more information on creating a user from an external provider, see Create contained users mapped to Azure AD identities.

创建 SQL 包含用户后,使用 GRANT T-SQL 命令向该用户授予权限。After you create a SQL contained user, grant permissions to it by using the GRANT T-SQL command.

Azure 机器学习设计器默认数据存储Azure Machine Learning designer default datastore

默认情况下,该设计器使用附加到工作区的存储帐户来存储输出。The designer uses the storage account attached to your workspace to store output by default. 不过,可以指定它将输出存储到你有权访问的任何数据存储。However, you can specify it to store output to any datastore that you have access to. 如果环境使用虚拟网络,你可以使用这些控制确保数据保持安全且可访问。If your environment uses virtual networks, you can use these controls to ensure your data remains secure and accessible.

若要为管道设置新的默认存储,请执行以下操作:To set a new default storage for a pipeline:

  1. 在管道草稿中,选择管道标题附近的“设置”齿轮图标。In a pipeline draft, select the Settings gear icon near the title of your pipeline.
  2. 选择“选择默认数据存储”。Select Select default datastore .
  3. 指定新的数据存储。Specify a new datastore.

还可以基于每个模块替代默认数据存储。You can also override the default datastore on a per-module basis. 这使你可以控制每一单个模块的存储位置。This gives you control over the storage location for each individual module.

  1. 选择要指定其输出的模块。Select the module whose output you want to specify.
  2. 展开“输出设置”部分。Expand the Output settings section.
  3. 选择“替代默认输出设置”。Select Override default output settings .
  4. 选择“设置输出设置”。Select Set output settings .
  5. 指定新的数据存储。Specify a new datastore.

后续步骤Next steps

本文是由四部分构成的虚拟网络系列文章中的可选部分。This article is an optional part of a four-part virtual network series. 若要了解如何保护虚拟网络,请参阅其余文章:See the rest of the articles to learn how to secure a virtual network: