为 Azure 机器学习设置 Python 开发环境Set up a Python development environment for Azure Machine Learning

了解如何为 Azure 机器学习配置 Python 开发环境。Learn how to configure a Python development environment for Azure Machine Learning.

下表描述了本文所述的每个开发环境及其优点和缺点。The following table shows each development environment covered in this article, along with pros and cons.

环境Environment 优点Pros 缺点Cons
本地环境Local environment 可以全面控制开发环境和依赖项。Full control of your development environment and dependencies. 使用所选的任何生成工具、环境或 IDE 来运行。Run with any build tool, environment, or IDE of your choice. 入门需要更长的时间。Takes longer to get started. 必须安装必要的 SDK 包,此外,必须安装一个环境(如果尚未安装)。Necessary SDK packages must be installed, and an environment must also be installed if you don't already have one.
Data Science Virtual Machine (DSVM)The Data Science Virtual Machine (DSVM) 类似于基于云的计算实例(已预装 Python 和 SDK),但预装了其他流行的数据科学和机器学习工具。Similar to the cloud-based compute instance (Python and the SDK are pre-installed), but with additional popular data science and machine learning tools pre-installed. 易于缩放,并可与其他自定义工具和工作流结合使用。Easy to scale and combine with other custom tools and workflows. 与基于云的计算实例相比,入门过程更慢。A slower getting started experience compared to the cloud-based compute instance.
Azure 机器学习计算实例Azure Machine Learning compute instance 最容易入门。Easiest way to get started. 整个 SDK 已安装在工作区 VM 中,笔记本教程已预先克隆,随时可供运行。The entire SDK is already installed in your workspace VM, and notebook tutorials are pre-cloned and ready to run. 缺少对开发环境和依赖项的控制。Lack of control over your development environment and dependencies. Linux VM 会产生额外的成本(可以停止不使用的 VM,以免产生费用)。Additional cost incurred for Linux VM (VM can be stopped when not in use to avoid charges). 请参阅定价详细信息See pricing details.
Azure DatabricksAzure Databricks 非常适合用于在可缩放的 Apache Spark 平台上运行大规模的密集型机器学习工作流。Ideal for running large-scale intensive machine learning workflows on the scalable Apache Spark platform. 对于试验性机器学习或较小规模的试验和工作流而言性能过剩。Overkill for experimental machine learning, or smaller-scale experiments and workflows. Azure Databricks 会产生额外的成本。Additional cost incurred for Azure Databricks. 请参阅定价详细信息See pricing details.

本文还将提供以下工具的更多用法提示:This article also provides additional usage tips for the following tools:

  • Jupyter Notebook:如果已在使用 Jupyter Notebook,则应安装 SDK 的某些附加功能。Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some extras that you should install.

  • Visual Studio Code:如果使用 Visual Studio Code,Azure 机器学习扩展包含对 Python 的广泛语言支持,以及更方便、更高效地使用 Azure 机器学习的功能。Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning extension includes extensive language support for Python as well as features to make working with the Azure Machine Learning much more convenient and productive.

先决条件Prerequisites

(仅限本地和 DSVM)创建一个工作区配置文件(Local and DSVM only) Create a workspace configuration file

工作区配置文件是一个 JSON 文件,用于告知 SDK 如何与 Azure 机器学习工作区进行通信。The workspace configuration file is a JSON file that tells the SDK how to communicate with your Azure Machine Learning workspace. 该文件命名为 config.json,其格式如下:The file is named config.json, and it has the following format:

{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}

此 JSON 文件必须采用包含 Python 脚本或 Jupyter Notebook 的目录结构。This JSON file must be in the directory structure that contains your Python scripts or Jupyter Notebooks. 它可以位于同一目录(名为 .azureml 的子目录)中,也可以位于父目录中。It can be in the same directory, a subdirectory named .azureml, or in a parent directory.

若要从代码使用此文件,请使用 Workspace.from_config 方法。To use this file from your code, use the Workspace.from_config method. 此代码从文件中加载信息,并连接到工作区。This code loads the information from the file and connects to your workspace.

使用下列方法之一创建工作区配置文件:Create a workspace configuration file in one of the following methods:

  • Azure 门户Azure portal

    下载文件:在 Azure 门户中,选择工作区的“概览”部分中的“下载 config.json”Download the file: In the Azure portal, select Download config.json from the Overview section of your workspace.

    Azure 门户

  • Azure 机器学习 Python SDKAzure Machine Learning Python SDK

    创建一个脚本,用于连接到你的 Azure 机器学习工作区,使用 write_config 方法生成文件并将其另存为 .azureml/config.json。Create a script to connect to your Azure Machine Learning workspace and use the write_config method to generate your file and save it as .azureml/config.json. 确保将 subscription_idresource_groupworkspace_name 替换为你自己的值。Make sure to replace subscription_id,resource_group, and workspace_name with your own.

    from azureml.core import Workspace
    
    subscription_id = '<subscription-id>'
    resource_group  = '<resource-group>'
    workspace_name  = '<workspace-name>'
    
    try:
        ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
        ws.write_config()
        print('Library configuration succeeded')
    except:
        print('Workspace not found')
    

本地计算机或远程 VM 环境Local computer or remote VM environment

你可以在本地计算机或远程虚拟机上设置环境,例如 Azure 机器学习计算实例或 Data Science VM。You can set up an environment on a local computer or remote virtual machine, such as an Azure Machine Learning compute instance or Data Science VM.

若要配置本地开发环境或远程 VM,请执行以下操作:To configure a local development environment or remote VM:

  1. 创建 Python 虚拟环境(virtualenv,conda)。Create a Python virtual environment (virtualenv, conda).

    备注

    建议使用 AnacondaMiniconda 来管理 Python 虚拟环境并安装包,虽然这不是必需的。Although not required, it's recommended you use Anaconda or Miniconda to manage Python virtual environments and install packages.

    重要

    如果在 Linux 或 macOS 上操作,并使用除 bash 以外的 shell(例如 zsh),则在运行某些命令时可能会收到错误消息。If you're on Linux or macOS and use a shell other than bash (for example, zsh) you might receive errors when you run some commands. 若要解决此问题,请使用 bash 命令启动新的 bash shell,然后运行命令。To work around this problem, use the bash command to start a new bash shell and run the commands there.

  2. 激活新创建的 Python 虚拟环境。Activate your newly created Python virtual environment.

  3. 安装 Azure 机器学习 Python SDKInstall the Azure Machine Learning Python SDK.

  4. 若要将本地环境配置为使用你的 Azure 机器学习工作区,请创建一个工作区配置文件或使用现有文件。To to configure your local environment to use your Azure Machine Learning workspace, create a workspace configuration file or use an existing one.

设置本地环境后,便可以开始使用 Azure 机器学习。Now that you have your local environment set up, you're ready to start working with Azure Machine Learning. 若要开始,请参阅 Azure 机器学习 Python 入门指南See the Azure Machine Learning Python getting started guide to get started.

Jupyter NotebookJupyter Notebooks

运行本地 Jupyter Notebook 服务器时,建议为你的 Python 虚拟环境创建一个 IPython 内核。When running a local Jupyter Notebook server, it's recommended that you create an IPython kernel for your Python virtual environment. 这有助于确保实现预期的内核和包导入行为。This helps ensure the expected kernel and package import behavior.

  1. 启用环境特定的 IPython 内核Enable environment-specific IPython kernels

    conda install notebook ipykernel
    
  2. 为你的 Python 虚拟环境创建一个内核。Create a kernel for your Python virtual environment. 请确保将 <myenv> 替换为你的 Python 虚拟环境的名称。Make sure to replace <myenv> with the name of your Python virtual environment.

    ipython kernel install --user --name <myenv> --display-name "Python (myenv)"
    
  3. 启动 Jupyter Notebook 服务器Launch the Jupyter Notebook server

若要开始使用 Azure 机器学习和 Jupyter Notebook,请参阅 Azure 机器学习笔记本存储库See the Azure Machine Learning notebooks repository to get started with Azure Machine Learning and Jupyter Notebooks.

备注

可在 https://github.com/Azure/azureml-examples 找到社区主导的示例存储库。A community-driven repository of examples can be found at https://github.com/Azure/azureml-examples.

Visual Studio CodeVisual Studio Code

若要使用 Visual Studio Code 进行开发:To use Visual Studio Code for development:

  1. 安装 Visual Studio CodeInstall Visual Studio Code.
  2. 安装 Azure 机器学习 Visual Studio Code 扩展(预览版)。Install the Azure Machine Learning Visual Studio Code extension (preview).

安装 Visual Studio Code 扩展后,你可以管理 Azure 机器学习资源运行和调试试验,以及部署训练后的模型Once you have the Visual Studio Code extension installed, you can manage your Azure Machine Learning resources, run and debug experiments, and deploy trained models.

Azure 机器学习计算实例Azure Machine Learning compute instance

Azure 机器学习计算实例是一个安全的基于云的 Azure 工作站,为数据科学家提供 Jupyter Notebook 服务器、JupyterLab 和一个完全托管的机器学习环境。The Azure Machine Learning compute instance is a secure, cloud-based Azure workstation that provides data scientists with a Jupyter Notebook server, JupyterLab, and a fully managed machine learning environment.

无需为计算实例安装或配置任何组件。There is nothing to install or configure for a compute instance.

随时可从 Azure 机器学习工作区内部创建组件。Create one anytime from within your Azure Machine Learning workspace. 只需提供名称并指定 Azure VM 类型即可。Provide just a name and specify an Azure VM type. 请参考以下文章尝试操作计算实例:教程:设置环境和工作区Try it now with this Tutorial: Setup environment and workspace.

若要详细了解计算实例(包括如何安装包),请参阅创建和管理 Azure 机器学习计算实例To learn more about compute instances, including how to install packages, see Create and manage an Azure Machine Learning compute instance.

提示

若要防止未使用的计算实例产生费用,请停止计算实例To prevent incurring charges for an unused compute instance, stop the compute instance.

除了 Jupyter Notebook 服务器和 JupyterLab 以外,还可以在 Azure 机器学习工作室内的集成笔记本功能中使用计算实例。In addition to a Jupyter Notebook server and JupyterLab, you can use compute instances in the integrated notebook feature inside of Azure Machine Learning studio.

你还可以使用 Azure 机器学习 Visual Studio Code 扩展,将 Azure 机器学习计算实例配置为远程 Jupyter Notebook 服务器You can also use the Azure Machine Learning Visual Studio Code extension to configure an Azure Machine Learning compute instance as a remote Jupyter Notebook server.

Data Science Virtual MachineData Science Virtual Machine

Data Science VM 是一种可用作开发环境的自定义虚拟机 (VM) 映像。The Data Science VM is a customized virtual machine (VM) image you can use as a development environment. 它专为数据科学工作而设计,其中预配置了工具和软件,例如:It's designed for data science work that's pre-configured tools and software like:

  • TensorFlow、PyTorch、Scikit-learn、XGBoost 和 Azure 机器学习 SDK 等包Packages such as TensorFlow, PyTorch, Scikit-learn, XGBoost, and the Azure Machine Learning SDK
  • Spark Standalone 和 Drill 等常用数据科学工具Popular data science tools such as Spark Standalone and Drill
  • Azure CLI、AzCopy 和存储资源管理器等 Azure 工具Azure tools such as the Azure CLI, AzCopy, and Storage Explorer
  • Visual Studio Code 和 PyCharm 等集成开发环境 (IDE)Integrated development environments (IDEs) such as Visual Studio Code and PyCharm
  • Jupyter Notebook 服务器Jupyter Notebook Server

有关更全面的工具列表,请参阅 Data Science VM 工具指南For a more comprehensive list of the tools, see the Data Science VM tools guide.

重要

如果你计划将 Data Science VM 用作训练或推理作业的计算目标,则仅 Ubuntu 受支持。If you plan to use the Data Science VM as a compute target for your training or inferencing jobs, only Ubuntu is supported.

若要使用 Data Science VM 作为开发环境,请执行以下操作:To use the Data Science VM as a development environment:

  1. 使用下列方法之一创建一个 Data Science VM:Create a Data Science VM using one of the following methods:

    • 使用 Azure 门户创建一个 UbuntuWindows DSVM。Use the Azure portal to create an Ubuntu or Windows DSVM.

    • 使用 ARM 模板创建 Data Science VMCreate a Data Science VM using ARM templates.

    • 使用 Azure CLIUse the Azure CLI

      若要创建 Ubuntu Data Science VM,请使用以下命令:To create an Ubuntu Data Science VM, use the following command:

      # create a Ubuntu Data Science VM in your resource group
      # note you need to be at least a contributor to the resource group in order to execute this command successfully
      # If you need to create a new resource group use: "az group create --name YOUR-RESOURCE-GROUP-NAME --location YOUR-REGION"
      az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-type password
      

      若要创建 Windows DSVM,请使用以下命令:To create a Windows DSVM, use the following command:

      # create a Windows Server 2016 DSVM in your resource group
      # note you need to be at least a contributor to the resource group in order to execute this command successfully
      az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --authentication-type password
      
  2. 激活包含 Azure 机器学习 SDK 的 conda 环境。Activate the conda environment containing the Azure Machine Learning SDK.

    • 对于 Ubuntu Data Science VM:For Ubuntu Data Science VM:

      conda activate py36
      
    • 对于 Windows Data Science VM:For Windows Data Science VM:

      conda activate AzureML
      
  3. 若要将 Data Science VM 配置为使用你的 Azure 机器学习工作区,请创建一个工作区配置文件或使用现有的工作区配置文件。To configure the Data Science VM to use your Azure Machine Learning workspace, create a workspace configuration file or use an existing one.

你可以使用 Visual Studio Code 和 Azure 机器学习 Visual Studio Code 扩展(与本地环境类似)与 Azure 机器学习进行交互。Similar to local environments, you can use Visual Studio Code and the Azure Machine Learning Visual Studio Code extension to interact with Azure Machine Learning.

有关详细信息,请参阅 Data Science Virtual MachineFor more information, see Data Science Virtual Machines.

后续步骤Next steps