什么是 Azure 机器学习计算实例?What is an Azure Machine Learning compute instance?

Azure 机器学习计算实例是面向数据科学家的基于云的托管式工作站。An Azure Machine Learning compute instance is a managed cloud-based workstation for data scientists.

计算实例可让客户轻松地开始进行 Azure 机器学习开发,并为 IT 管理员提供管理和企业就绪功能。Compute instances make it easy to get started with Azure Machine Learning development as well as provide management and enterprise readiness capabilities for IT administrators.

可以使用计算实例作为在云中进行机器学习的完全配置和托管的开发环境。Use a compute instance as your fully configured and managed development environment in the cloud for machine learning. 还可以在开发和测试中将它们用作训练和推理的计算目标。They can also be used as a compute target for training and inferencing for development and testing purposes.

对于生产级模型训练,请使用具有多节点缩放功能的 Azure 机器学习计算群集For production grade model training, use an Azure Machine Learning compute cluster with multi-node scaling capabilities. 对于生产级模型部署,请使用 Azure Kubernetes 服务群集For production grade model deployment, use Azure Kubernetes Service cluster.

为了让计算实例 Jupyter 功能可以正常运行,请确保没有禁用 Web 套接字通信。For compute instance Jupyter functionality to work, ensure that web socket communication is not disabled. 请确保网络允许到 *.instances.azureml.net 和 *.instances.azureml.ms 的 websocket 连接。Please ensure your network allows websocket connections to *.instances.azureml.net and *.instances.azureml.ms.

为何使用计算实例?Why use a compute instance?

计算实例是完全托管式基于云的工作站,已针对机器学习开发环境进行优化。A compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. 它提供以下优势:It provides the following benefits:

主要优点Key benefits 描述Description
工作效率Productivity 可以在 Azure 机器学习工作室中使用集成的笔记本及以下工具来构建和部署模型:You can build and deploy models using integrated notebooks and the following tools in Azure Machine Learning studio:
- Jupyter- Jupyter
- JupyterLab- JupyterLab
- RStudio(预览版)- RStudio (preview)
计算实例与 Azure 机器学习工作区和工作室完全集成。Compute instance is fully integrated with Azure Machine Learning workspace and studio. 你可以与工作区中的其他数据科学家共享笔记本和数据。You can share notebooks and data with other data scientists in the workspace.
你还可以在计算实例中使用 VS CodeYou can also use VS Code with compute instances.
无需自行管理且安全Managed & secure 减少安全保护工作,增强企业的安全要求合规性。Reduce your security footprint and add compliance with enterprise security requirements. 计算实例提供可靠的管理策略和安全网络配置,例如:Compute instances provide robust management policies and secure networking configurations such as:

- 通过资源管理器模板或 Azure 机器学习 SDK 自动预配- Autoprovisioning from Resource Manager templates or Azure Machine Learning SDK
- Azure 基于角色的访问控制 (Azure RBAC)- Azure role-based access control (Azure RBAC)
- 虚拟网络支持- Virtual network support
- 用于启用/禁用 SSH 访问的 SSH 策略- SSH policy to enable/disable SSH access
已启用 TLS 1.2TLS 1.2 enabled
已针对 ML 进行了预配置Preconfigured for ML 使用预配置的最新 ML 包、深度学习框架和 GPU 驱动程序完成设置任务,可节省时间。Save time on setup tasks with pre-configured and up-to-date ML packages, deep learning frameworks, GPU drivers.
完全可自定义Fully customizable 支持多种 Azure VM 类型,包括 GPU 和持久性低级自定义,例如,安装相应的包和驱动程序可以轻而易举地实现高级方案。Broad support for Azure VM types including GPUs and persisted low-level customization such as installing packages and drivers makes advanced scenarios a breeze.

你可以自行创建计算实例,也可以让管理员为你创建计算实例You can create a compute instance yourself, or an administrator can create a compute instance for you.

工具和环境Tools and environments

重要

本文中标记了“(预览版)”的项目目前为公共预览版。Items marked (preview) in this article are currently in public preview. 该预览版在提供时没有附带服务级别协议,建议不要将其用于生产工作负载。The preview version is provided without a service level agreement, and it's not recommended for production workloads. 某些功能可能不受支持或者受限。Certain features might not be supported or might have constrained capabilities.

使用 Azure 机器学习计算实例可以在工作区中的完全集成式笔记本体验中创作、训练和部署模型。Azure Machine Learning compute instance enables you to author, train, and deploy models in a fully integrated notebook experience in your workspace.

使用计算实例作为远程服务器,无需 SSH 即可在 VS Code 中运行 Jupyter 笔记本。You can run Jupyter notebooks in VS Code using compute instance as the remote server with no SSH needed. 也可以通过远程 SSH 扩展启用 VS Code 集成。You can also enable VS Code integration through remote SSH extension.

可以安装包,然后在计算实例中添加内核You can install packages and add kernels to your compute instance.

计算实例上已安装以下工具和环境:Following tools and environments are already installed on the compute instance:

常规工具和环境General tools & environments 详细信息Details
驱动程序Drivers CUDA
cuDNN
NVIDIA
Blob FUSE
Intel MPI 库Intel MPI library
Azure CLIAzure CLI
Azure 机器学习示例Azure Machine Learning samples
DockerDocker
NginxNginx
NCCL 2.0NCCL 2.0
ProtobufProtobuf
R 工具和环境R tools & environments 详细信息Details
RStudio Server 开源版(预览版)RStudio Server Open Source Edition (preview)
R 内核R kernel
适用于 R 的 Azure 机器学习 SDKAzure Machine Learning SDK for R azuremlsdkazuremlsdk
SDK 示例SDK samples
PYTHON 工具和环境PYTHON tools & environments 详细信息Details
Anaconda PythonAnaconda Python
Jupyter 和扩展Jupyter and extensions
Jupyterlab 和扩展Jupyterlab and extensions
适用于 Python 的 Azure 机器学习 SDKAzure Machine Learning SDK for Python
(来自 PyPI)from PyPI
包括大多数 azureml 额外包。Includes most of the azureml extra packages. 若要查看完整列表,请打开计算实例上的终端窗口并运行To see the full list, open a terminal window on your compute instance and run
conda list -n azureml_py36 azureml*
其他 PyPI 包Other PyPI packages jupytext
tensorboard
nbconvert
notebook
Pillow
Conda 包Conda packages cython
numpy
ipykernel
scikit-learn
matplotlib
tqdm
joblib
nodejs
nb_conda_kernels
深度学习包Deep learning packages PyTorch
TensorFlow
Keras
Horovod
MLFlow
pandas-ml
scrapbook
ONNX 包ONNX packages keras2onnx
onnx
onnxconverter-common
skl2onnx
onnxmltools
Azure 机器学习 Python 和 R SDK 示例Azure Machine Learning Python & R SDK samples

Python 包都安装在 Python 3.6 - AzureML 环境中。Python packages are all installed in the Python 3.6 - AzureML environment.

访问文件Accessing files

笔记本和 R 脚本存储在 Azure 文件共享中工作区的默认存储帐户内。Notebooks and R scripts are stored in the default storage account of your workspace in Azure file share. 这些文件位于“用户文件”目录下。These files are located under your “User files” directory. 通过此存储可以轻松地在计算实例之间共享笔记本。This storage makes it easy to share notebooks between compute instances. 停止或删除计算实例时,存储帐户还会安全保存笔记本。The storage account also keeps your notebooks safely preserved when you stop or delete a compute instance.

工作区的 Azure 文件共享帐户作为驱动器装载到计算实例上。The Azure file share account of your workspace is mounted as a drive on the compute instance. 此驱动器是 Jupyter、Jupyter Labs 和 RStudio 的默认工作目录。This drive is the default working directory for Jupyter, Jupyter Labs, and RStudio. 这意味着,在 Jupyter、JupyterLab 或 RStudio 中创建的笔记本和其他文件会自动存储在文件共享上,并可在其他计算实例中使用。This means that the notebooks and other files you create in Jupyter, JupyterLab, or RStudio are automatically stored on the file share and available to use in other compute instances as well.

可以从同一工作区中的所有计算实例访问文件共享中的文件。The files in the file share are accessible from all compute instances in the same workspace. 对计算实例上的这些文件所做的任何更改将可靠地保存回到文件共享。Any changes to these files on the compute instance will be reliably persisted back to the file share.

还可以将最新 Azure 机器学习示例克隆到工作区文件共享中“用户文件”目录下的文件夹内。You can also clone the latest Azure Machine Learning samples to your folder under the user files directory in the workspace file share.

与写入到计算实例本地磁盘本身相比,在网络驱动器上写入小文件可能速度更慢。Writing small files can be slower on network drives than writing to the compute instance local disk itself. 若要写入许多小文件,请尝试直接在计算实例上使用某个目录,例如 /tmp 目录。If you are writing many small files, try using a directory directly on the compute instance, such as a /tmp directory. 请注意,无法从其他计算实例访问这些文件。Note these files will not be accessible from other compute instances.

你可以使用计算实例上的 /tmp 目录来保存临时数据。You can use the /tmp directory on the compute instance for your temporary data. 但是,不要在计算实例的 OS 磁盘上写入大型数据文件。However, do not write large files of data on the OS disk of the compute instance. 请改用数据存储Use datastores instead. 如果已安装 JupyterLab git 扩展,它也会导致计算实例性能下降。If you have installed JupyterLab git extension it can also lead to slowdown in compute instance performance.

管理计算实例Managing a compute instance

在 Azure 机器学习工作室中的工作区内选择“计算”,然后在顶部选择“计算实例”。 In your workspace in Azure Machine Learning studio, select Compute, then select Compute Instance on the top.

管理计算实例

可执行以下操作:You can perform the following actions:

  • 创建计算实例Create a compute instance.
  • 刷新“计算实例”选项卡。Refresh the compute instances tab.
  • 启动、停止和重启计算实例。Start, stop, and restart a compute instance. 只要实例在运行,你就需要为其付费。You do pay for the instance whenever it is running. 不使用计算实例时,请将其停止,以便降低成本。Stop the compute instance when you are not using it to reduce cost. 停止计算实例会将其解除分配。Stopping a compute instance deallocates it. 然后在需要时重启。Then start it again when you need it. 请注意,停止计算实例将停止计算小时的计费,但仍会对磁盘、公共 IP 和标准负载均衡器进行计费。Please note stopping the compute instance stops the billing for compute hours but you will still be billed for disk, public IP, and standard load balancer.
  • 删除计算实例。Delete a compute instance.
  • 筛选计算实例列表,以仅显示已创建的实例。Filter the list of compute instanced to show only those you have created.

对于工作区中可供你使用的每个计算实例,你可以:For each compute instance in your workspace that you can use, you can:

  • 访问计算实例上的 Jupyter、JupyterLab、RStudioAccess Jupyter, JupyterLab, RStudio on the compute instance
  • 通过 SSH 连接到计算实例。SSH into compute instance. 默认已禁用 SSH 访问,但可以在创建计算实例时启用。SSH access is disabled by default but can be enabled at compute instance creation time. SSH 访问是通过公钥/私钥机制实现的。SSH access is through public/private key mechanism. 选项卡中将提供 IP 地址、用户名和端口号等 SSH 连接详细信息。The tab will give you details for SSH connection such as IP address, username, and port number.
  • 获取有关特定计算实例的详细信息,例如 IP 地址和区域。Get details about a specific compute instance such as IP address, and region.

使用 RBAC 可以控制工作区中的哪些用户可以创建、删除、启动、停止和重启计算实例。RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. 充当工作区参与者和所有者角色的所有用户可以在整个工作区中创建、删除、启动、停止和重启计算实例。All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. 但是,只有特定计算实例的创建者或分配的用户(如果该计算实例是以其名义创建的)可在该计算实例上访问 Jupyter、JupyterLab 和 RStudio。However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. 计算实例专用于具有 root 用户访问权限的单个用户,并且可通过 Jupyter/JupyterLab/RStudio 进行终端访问。A compute instance is dedicated to a single user who has root access, and can terminal in through Jupyter/JupyterLab/RStudio. 计算实例将具有单用户登录名,所有操作将使用该用户的标识进行试验运行的 RBAC 控制和权限划分。Compute instance will have single-user log in and all actions will use that user’s identity for RBAC and attribution of experiment runs. SSH 访问是通过公钥/私钥机制控制的。SSH access is controlled through public/private key mechanism.

可以通过 Azure RBAC 来控制这些操作:These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/readMicrosoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/writeMicrosoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/deleteMicrosoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/actionMicrosoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/actionMicrosoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/actionMicrosoft.MachineLearningServices/workspaces/computes/restart/action

创建计算实例Create a compute instance

在 Azure 机器学习工作室的工作区中,当你准备好运行某个笔记本时,请从“计算”部分或“笔记本”部分创建新的计算实例In your workspace in Azure Machine Learning studio, create a new compute instance from either the Compute section or in the Notebooks section when you are ready to run one of your notebooks.

也可以通过以下方式创建实例You can also create an instance

应用于计算实例创建过程的每区域每 VM 系列专用核心数配额和区域总配额The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation. 与 Azure 机器学习训练计算群集配额统一并共享。is unified and shared with Azure Machine Learning training compute cluster quota. 停止计算实例不会释放配额,因此无法确保你能够重启计算实例。Stopping the compute instance does not release quota to ensure you will be able to restart the compute instance.

代表他人创建(预览版)Create on behalf of (preview)

作为管理员,你可代表数据科学家创建计算实例,并通过以下方式将实例分配给他们:As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:

你为其创建计算实例的数据科学家需要以下 Azure RBAC 权限:The data scientist you create the compute instance for needs the following Azure RBAC permissions:

  • Microsoft.MachineLearningServices/workspaces/computes/start/actionMicrosoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/actionMicrosoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/actionMicrosoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/actionMicrosoft.MachineLearningServices/workspaces/computes/applicationaccess/action

数据科学家可启动、停止和重启计算实例。The data scientist can start, stop, and restart the compute instance. 他们可将计算实例用于:They can use the compute instance for:

  • JupyterJupyter
  • JupyterLabJupyterLab
  • RStudioRStudio
  • 集成式笔记本Integrated notebooks

计算目标Compute target

计算实例可用作类似于 Azure 机器学习计算训练群集的训练计算目标Compute instances can be used as a training compute target similar to Azure Machine Learning compute training clusters.

计算实例:A compute instance:

  • 具有作业队列。Has a job queue.
  • 在虚拟网络环境中安全地运行作业,无需企业打开 SSH 端口。Runs jobs securely in a virtual network environment, without requiring enterprises to open up SSH port. 作业在容器化环境中执行,并将模型依赖项打包到 Docker 容器中。The job executes in a containerized environment and packages your model dependencies in a Docker container.
  • 可以并行运行多个小型作业(预览版)。Can run multiple small jobs in parallel (preview). 每个核心可以并行运行两个作业,而剩余的作业将排队。Two jobs per core can run in parallel while the rest of the jobs are queued.
  • 支持单节点多 GPU 分布式训练作业Supports single-node multi-GPU distributed training jobs

可以使用计算实例作为测试/调试方案的本地推理部署目标。You can use compute instance as a local inferencing deployment target for test/debug scenarios.

Notebook VM 发生了什么情况?What happened to Notebook VM?

计算实例即将取代 Notebook VM。Compute instances are replacing the Notebook VM.

任何存储在工作区文件共享中的笔记本文件和工作区数据存储中的数据都可以从计算实例访问。Any notebook files stored in the workspace file share and data in workspace data stores will be accessible from a compute instance. 但是,以前安装在 Notebook VM 上的任何自定义包都需要在计算实例上重新安装。However, any custom packages previously installed on a Notebook VM will need to be reinstalled on the compute instance. 创建计算群集时适用的配额限制在创建计算实例时同样适用。Quota limitations, which apply to compute clusters creation will apply to compute instance creation as well.

不能创建新的 Notebook VM。New Notebook VMs cannot be created. 但你仍然可以访问和使用已创建的 Notebook VM 及其完整功能。However, you can still access and use Notebook VMs you have created, with full functionality. 可以在现有 Notebook VM 所在的同一工作区中创建计算实例。Compute instances can be created in same workspace as the existing Notebook VMs.

后续步骤Next steps