创建和管理 Azure 机器学习计算实例Create and manage an Azure Machine Learning compute instance

了解如何在 Azure 机器学习工作区中创建和管理计算实例Learn how to create and manage a compute instance in your Azure Machine Learning workspace.

可将计算实例用作云中的完全配置、完全托管的开发环境。Use a compute instance as your fully configured and managed development environment in the cloud. 对于开发和测试,还可将该实例用作训练计算目标或用于推理目标For development and testing, you can also use the instance as a training compute target or for an inference target. 计算实例可以并行运行多个作业,它有一个作业队列。A compute instance can run multiple jobs in parallel and has a job queue. 作为开发环境,不能与工作区中的其他用户共享计算实例。As a development environment, a compute instance cannot be shared with other users in your workspace.

在本文中,学习如何:In this article, you learn how to:

  • 创建计算实例Create a compute instance
  • 管理(启动、停止、重启、删除)计算实例Manage (start, stop, restart, delete) a compute instance
  • 访问终端窗口Access the terminal window
  • 安装 R 或 Python 包Install R or Python packages
  • 创建新环境或 Jupyter 内核Create new environments or Jupyter kernels

计算实例可以在虚拟网络环境中安全地运行作业,无需企业打开 SSH 端口。Compute instances can run jobs securely in a virtual network environment, without requiring enterprises to open up SSH ports. 作业在容器化环境中执行,并将模型依赖项打包到 Docker 容器中。The job executes in a containerized environment and packages your model dependencies in a Docker container.

先决条件Prerequisites

创建Create

时间估计:大约 5 分钟。Time estimate: Approximately 5 minutes.

对于工作区而言,创建计算实例是一次性过程。Creating a compute instance is a one time process for your workspace. 可将此计算重复用作开发工作站,或者用作训练的计算目标。You can reuse this compute as a development workstation or as a compute target for training. 可将多个计算实例附加到工作区。You can have multiple compute instances attached to your workspace.

对于每个区域每个虚拟机 (VM) 系列配额和创建计算实例时应用的区域总配额,专用内核数一致,且该数量与 Azure 机器学习训练计算群集配额共享。The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation, is unified and shared with Azure Machine Learning training compute cluster quota. 停止计算实例不会释放配额,因此无法确保你能够重启计算实例。Stopping the compute instance does not release quota to ensure you will be able to restart the compute instance. 请注意,创建计算实例后,不能更改其虚拟机大小。Please note it is not possible to change the virtual machine size of compute instance once it is created.

以下示例演示如何创建计算实例:The following example demonstrates how to create a compute instance:

import datetime
import time

from azureml.core.compute import ComputeTarget, ComputeInstance
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your instance
# Compute instance name should be unique across the azure region
compute_name = "ci{}".format(ws._workspace_id)[:10]

# Verify that instance does not exist already
try:
    instance = ComputeInstance(workspace=ws, name=compute_name)
    print('Found existing instance, use it.')
except ComputeTargetException:
    compute_config = ComputeInstance.provisioning_configuration(
        vm_size='STANDARD_D3_V2',
        ssh_public_access=False,
        # vnet_resourcegroup_name='<my-resource-group>',
        # vnet_name='<my-vnet-name>',
        # subnet_name='default',
        # admin_user_ssh_public_key='<my-sshkey>'
    )
    instance = ComputeInstance.create(ws, compute_name, compute_config)
    instance.wait_for_completion(show_output=True)

有关此示例中使用的类、方法和参数的详细信息,请参阅以下参考文档:For more information on the classes, methods, and parameters used in this example, see the following reference documents:

还可使用 Azure 资源管理器模板创建计算实例。You can also create a compute instance with an Azure Resource Manager template.

代表他人创建(预览版)Create on behalf of (preview)

作为管理员,你可代表数据科学家创建计算实例,并通过以下方式将实例分配给他们:As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:

你为其创建计算实例的数据科学家需要拥有针对以下项的 Azure 基于角色的访问控制 (Azure RBAC) 权限:The data scientist you create the compute instance for needs the following be Azure role-based access control (Azure RBAC) permissions:

  • Microsoft.MachineLearningServices/workspaces/computes/start/actionMicrosoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/actionMicrosoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/actionMicrosoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/actionMicrosoft.MachineLearningServices/workspaces/computes/applicationaccess/action

数据科学家可启动、停止和重启计算实例。The data scientist can start, stop, and restart the compute instance. 他们可将计算实例用于:They can use the compute instance for:

  • JupyterJupyter
  • JupyterLabJupyterLab
  • RStudioRStudio
  • 集成式笔记本Integrated notebooks

管理Manage

启动、停止、重启和删除计算实例。Start, stop, restart and delete a compute instance. 计算实例不会自动纵向缩减,因此请确保停止该资源以免产生费用。A compute instance does not automatically scale down, so make sure to stop the resource to prevent ongoing charges.

在下例中,计算实例的名称均为“实例”In the examples below, the name of the compute instance is instance

  • 获取状态Get status

    # get_status() gets the latest status of the ComputeInstance target
    instance.get_status()
    
  • 停止Stop

    # stop() is used to stop the ComputeInstance
    # Stopping ComputeInstance will stop the billing meter and persist the state on the disk.
    # Available Quota will not be changed with this operation.
    instance.stop(wait_for_completion=True, show_output=True)
    
  • 开始Start

    # start() is used to start the ComputeInstance if it is in stopped state
    instance.start(wait_for_completion=True, show_output=True)
    
  • 重启Restart

    # restart() is used to restart the ComputeInstance
    instance.restart(wait_for_completion=True, show_output=True)
    
  • DeleteDelete

    # delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name 
    instance.delete(wait_for_completion=True, show_output=True)
    

使用 Azure RBAC 可以对工作区中的哪些用户能够创建、删除、启动、停止、重启计算实例进行控制。Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. 充当工作区参与者和所有者角色的所有用户可以在整个工作区中创建、删除、启动、停止和重启计算实例。All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. 但是,只有特定计算实例的创建者或分配的用户(如果该计算实例是以其名义创建的)可在该计算实例上访问 Jupyter、JupyterLab 和 RStudio。However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. 计算实例专用于具有 root 用户访问权限的单个用户,并且可通过 Jupyter/JupyterLab/RStudio 进行终端访问。A compute instance is dedicated to a single user who has root access, and can terminal in through Jupyter/JupyterLab/RStudio. 计算实例将具有单用户登录名,所有操作都将使用该用户的标识进行试验运行的 Azure RBAC 控制和权限划分。Compute instance will have single-user log in and all actions will use that user’s identity for Azure RBAC and attribution of experiment runs. SSH 访问是通过公钥/私钥机制控制的。SSH access is controlled through public/private key mechanism.

可以通过 Azure RBAC 来控制这些操作:These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/readMicrosoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/writeMicrosoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/deleteMicrosoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/actionMicrosoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/actionMicrosoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/actionMicrosoft.MachineLearningServices/workspaces/computes/restart/action

后续步骤Next steps