配置 Azure 机器学习的开发环境Configure a development environment for Azure Machine Learning

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何将开发环境配置为使用 Azure 机器学习。In this article, you learn how to configure a development environment to work with Azure Machine Learning. Azure 机器学习不区分平台。Azure Machine Learning is platform agnostic. 开发环境的唯一要求是提供 Python 3。The only hard requirement for your development environment is Python 3. 另外建议提供一个隔离的环境,例如 Anaconda 或 Virtualenv。An isolated environment like Anaconda or Virtualenv is also recommended.

下表描述了本文所述的每个开发环境及其优点和缺点。The following table shows each development environment covered in this article, along with pros and cons.

环境Environment 优点Pros 缺点Cons
基于云的 Azure 机器学习计算实例(预览版)Cloud-based Azure Machine Learning compute instance (preview) 最容易入门。Easiest way to get started. 整个 SDK 已安装在工作区 VM 中,笔记本教程已预先克隆,随时可供运行。The entire SDK is already installed in your workspace VM, and notebook tutorials are pre-cloned and ready to run. 缺少对开发环境和依赖项的控制。Lack of control over your development environment and dependencies. Linux VM 会产生额外的成本(可以停止不使用的 VM,以免产生费用)。Additional cost incurred for Linux VM (VM can be stopped when not in use to avoid charges). 请参阅定价详细信息See pricing details.
本地环境Local environment 可以全面控制开发环境和依赖项。Full control of your development environment and dependencies. 使用所选的任何生成工具、环境或 IDE 来运行。Run with any build tool, environment, or IDE of your choice. 入门需要更长的时间。Takes longer to get started. 必须安装必要的 SDK 包,此外,必须安装一个环境(如果尚未安装)。Necessary SDK packages must be installed, and an environment must also be installed if you don't already have one.
Data Science Virtual Machine (DSVM)The Data Science Virtual Machine (DSVM) 类似于基于云的计算实例(已预装 Python 和 SDK),但预装了其他流行的数据科学和机器学习工具。Similar to the cloud-based compute instance (Python and the SDK are pre-installed), but with additional popular data science and machine learning tools pre-installed. 易于缩放,并可与其他自定义工具和工作流结合使用。Easy to scale and combine with other custom tools and workflows. 与基于云的计算实例相比,入门过程更慢。A slower getting started experience compared to the cloud-based compute instance.

本文还将提供以下工具的更多用法提示:This article also provides additional usage tips for the following tools:

  • Jupyter Notebook:如果你已在使用 Jupyter Notebook,SDK 中包含了需要安装的某些附加功能。Jupyter Notebooks: If you're already using the Jupyter Notebook, the SDK has some extras that you should install.

  • Visual Studio Code:如果使用 Visual Studio Code,Azure 机器学习扩展包含对 Python 的广泛语言支持,以及更方便、更高效地使用 Azure 机器学习的功能。Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning extension includes extensive language support for Python as well as features to make working with the Azure Machine Learning much more convenient and productive.

先决条件Prerequisites

Azure 机器学习工作区。An Azure Machine Learning workspace. 若要创建工作区,请参阅创建 Azure 机器学习工作区To create the workspace, see Create an Azure Machine Learning workspace. 只需创建一个工作区,即可开始使用自己的基于云的 Notebook 服务器DSVMA workspace is all you need to get started with your own cloud-based notebook server, a DSVM.

若要为本地计算机Jupyter Notebook 服务器Visual Studio Code 安装 SDK 环境,还需要:To install the SDK environment for your local computer, Jupyter Notebook server or Visual Studio Code you also need:

  • AnacondaMiniconda 包管理器。Either the Anaconda or Miniconda package manager.

  • 在 Linux 或 macOS 上,需要 bash shell。On Linux or macOS, you need the bash shell.

    Tip

    如果在 Linux 或 macOS 上操作,并使用除 bash 以外的 shell(例如 zsh),则在运行某些命令时可能会收到错误消息。If you're on Linux or macOS and use a shell other than bash (for example, zsh) you might receive errors when you run some commands. 若要解决此问题,请使用 bash 命令启动新的 bash shell,然后运行命令。To work around this problem, use the bash command to start a new bash shell and run the commands there.

  • 在 Windows 上,需要命令提示符或 Anaconda 提示符(由 Anaconda 和 Miniconda 安装)。On Windows, you need the command prompt or Anaconda prompt (installed by Anaconda and Miniconda).

自己的基于云的计算实例Your own cloud-based compute instance

Azure 机器学习计算实例(预览版)是一个安全的基于云的 Azure 工作站,为数据科学家提供 Jupyter Notebook 服务器、JupyterLab 和一个准备妥当的 ML 环境。The Azure Machine Learning compute instance (preview) is a secure, cloud-based Azure workstation that provides data scientists with a Jupyter notebook server, JupyterLab, and a fully prepared ML environment.

无需为计算实例安装或配置任何组件。There is nothing to install or configure for a compute instance. 随时可从 Azure 机器学习工作区内部创建组件。Create one anytime from within your Azure Machine Learning workspace. 只需提供名称并指定 Azure VM 类型即可。Provide just a name and specify an Azure VM type. 请参考以下文章尝试操作计算实例:教程:设置环境和工作区Try it now with this Tutorial: Setup environment and workspace.

详细了解计算实例Learn more about compute instances.

若要避免产生计算费用,请停止计算实例To stop incurring compute charges, stop the compute instance.

Data Science Virtual MachineData Science Virtual Machine

DSVM 是自定义的虚拟机 (VM) 映像。The DSVM is a customized virtual machine (VM) image. 它专为数据科学工作而设计,其中预配置了:It's designed for data science work that's pre-configured with:

  • TensorFlow、PyTorch、Scikit-learn、XGBoost 和 Azure 机器学习 SDK 等包Packages such as TensorFlow, PyTorch, Scikit-learn, XGBoost, and the Azure Machine Learning SDK
  • Spark Standalone 和 Drill 等常用数据科学工具Popular data science tools such as Spark Standalone and Drill
  • Azure CLI、AzCopy 和存储资源管理器等 Azure 工具Azure tools such as the Azure CLI, AzCopy, and Storage Explorer
  • Visual Studio Code 和 PyCharm 等集成开发环境 (IDE)Integrated development environments (IDEs) such as Visual Studio Code and PyCharm
  • Jupyter Notebook 服务器Jupyter Notebook Server

Azure 机器学习 SDK 适用于 Ubuntu 或 Windows 版本的 DSVM。The Azure Machine Learning SDK works on either the Ubuntu or Windows version of the DSVM. 但是,如果还计划将 DSVM 用作计算目标,则仅支持 Ubuntu。But if you plan to use the DSVM as a compute target as well, only Ubuntu is supported.

若要使用 DSVM 作为开发环境:To use the DSVM as a development environment:

  1. 在以下任一环境中创建 DSVM:Create a DSVM in either of the following environments:

    • Azure 门户:The Azure portal:

    • Azure CLI:The Azure CLI:

      Important

      • 使用 Azure CLI 时,必须先使用 az login 命令登录到 Azure 订阅。When you use the Azure CLI, you must first sign in to your Azure subscription by using the az login command.

      • 在此步骤中使用命令时,必须提供资源组名称、VM名称、用户名和密码。When you use the commands in this step, you must provide a resource group name, a name for the VM, a username, and a password.

      • 若要创建 Ubuntu Data Science Virtual Machine,请使用以下命令:To create an Ubuntu Data Science Virtual Machine, use the following command:

        # create a Ubuntu DSVM in your resource group
        # note you need to be at least a contributor to the resource group in order to execute this command successfully
        # If you need to create a new resource group use: "az group create --name YOUR-RESOURCE-GROUP-NAME --location YOUR-REGION"
        az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-type password
        
      • 若要创建 Windows Data Science Virtual Machine,请使用以下命令:To create a Windows Data Science Virtual Machine, use the following command:

        # create a Windows Server 2016 DSVM in your resource group
        # note you need to be at least a contributor to the resource group in order to execute this command successfully
        az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --authentication-type password
        
  2. DSVM 上已安装 Azure 机器学习 SDK。The Azure Machine Learning SDK is already installed on the DSVM. 若要使用包含此 SDK 的 Conda 环境,请使用以下某个命令:To use the Conda environment that contains the SDK, use one of the following commands:

    • 对于 Ubuntu DSVM:For Ubuntu DSVM:

      conda activate py36
      
    • 对于 Windows DSVM:For Windows DSVM:

      conda activate AzureML
      
  3. 若要验证是否可以访问 SDK 并检查版本,请使用以下 Python 代码:To verify that you can access the SDK and check the version, use the following Python code:

    import azureml.core
    print(azureml.core.VERSION)
    
  4. 若要将 DSVM 配置为使用你的 Azure 机器学习工作区,请参阅创建工作区配置文件部分。To configure the DSVM to use your Azure Machine Learning workspace, see the Create a workspace configuration file section.

有关详细信息,请参阅 Data Science Virtual MachineFor more information, see Data Science Virtual Machines.

本地计算机Local computer

使用本地计算机(也可能是远程虚拟机)时,请创建 Anaconda 环境并安装 SDK。When you're using a local computer (which might also be a remote virtual machine), create an Anaconda environment and install the SDK. 下面是一个示例:Here's an example:

  1. 如果尚未安装 Anaconda(Python 3.7 版),请下载并安装。Download and install Anaconda (Python 3.7 version) if you don't already have it.

  2. 打开 Anaconda 提示符,使用以下命令创建环境:Open an Anaconda prompt and create an environment with the following commands:

    运行以下命令来创建环境。Run the following command to create the environment.

    conda create -n myenv python=3.6.5
    

    然后激活该环境。Then activate the environment.

    conda activate myenv
    

    此示例使用 Python 3.6.5 创建环境,但可以选择任何特定的子版本。This example creates an environment using python 3.6.5, but any specific subversions can be chosen. 使用某些主要版本(建议使用 3.5+)时,不一定能够保证 SDK 的兼容性。如果遇到错误,我们建议在 Anaconda 环境中尝试不同的版本/子版本。SDK compatibility may not be guaranteed with certain major versions (3.5+ is recommended), and it's recommended to try a different version/subversion in your Anaconda environment if you run into errors. 下载组件和包时,创建环境需要花费几分钟。It will take several minutes to create the environment while components and packages are downloaded.

  3. 在新环境中运行以下命令,以启用环境特定的 IPython 内核。Run the following commands in your new environment to enable environment-specific IPython kernels. 这可以确保在 Anaconda 环境中使用 Jupyter 笔记本时,内核和包导入的行为符合预期:This will ensure expected kernel and package import behavior when working with Jupyter Notebooks within Anaconda environments:

    conda install notebook ipykernel
    

    然后运行以下命令来创建内核:Then run the following command to create the kernel:

    ipython kernel install --user --name myenv --display-name "Python (myenv)"
    
  4. 使用以下命令安装包:Use the following commands to install packages:

    此命令安装包含笔记本和 automl 附加项的基础 Azure 机器学习 SDK。This command installs the base Azure Machine Learning SDK with notebook and automl extras. automl 附加项是一个较大的安装包,如果你不打算运行自动化机器学习试验,可以从方括号中删除此附加项。The automl extra is a large install, and can be removed from the brackets if you don't intend to run automated machine learning experiments. 默认情况下,automl 附加项还包含 Azure 机器学习数据准备 SDK 作为依赖项。The automl extra also includes the Azure Machine Learning Data Prep SDK by default as a dependency.

    pip install azureml-sdk[notebooks,automl]
    

    Note

    • 如果有消息指出无法卸载 PyYAML,请改用以下命令:If you get a message that PyYAML can't be uninstalled, use the following command instead:

      pip install --upgrade azureml-sdk[notebooks,automl] --ignore-installed PyYAML

    • 从 macOS Catalina 开始,zsh (Z shell) 是默认的登录 shell 和交互式 shell。Starting with macOS Catalina, zsh (Z shell) is the default login shell and interactive shell. 在 zsh 中,运行以下命令以使用“\”(反斜杠)来转义方括号:In zsh, use the following command which escapes brackets with "\" (backslash):

      pip install --upgrade azureml-sdk\[notebooks,automl\]

    安装 SDK 可能需要几分钟时间。It will take several minutes to install the SDK. 有关安装选项的详细信息,请参阅安装指南For more information on installation options, see the install guide.

  5. 安装机器学习试验的其他包。Install other packages for your machine learning experimentation.

    使用以下任一命令(请将 <new package> 替换为要安装的包)。Use either of the following commands and replace <new package> with the package you want to install. 通过 conda install 安装包时,该包必须是当前通道的一部分(可在 Anaconda 云中添加新通道)。Installing packages via conda install requires that the package is part of the current channels (new channels can be added in Anaconda Cloud).

    conda install <new package>
    

    或者,可以通过 pip 安装包。Alternatively, you can install packages via pip.

    pip install <new package>
    

Jupyter NotebookJupyter Notebooks

Jupyter Notebook 是 Jupyter 项目的一部分。Jupyter Notebooks are part of the Jupyter Project. 它们提供交互式编码体验,可用于创建将实时代码与叙述性文本和图形混合在一起的文档。They provide an interactive coding experience where you create documents that mix live code with narrative text and graphics. Jupyter Notebook 也是与他人共享结果的好方法,因为可用于将代码部分的输出保存在文档中。Jupyter Notebooks are also a great way to share your results with others, because you can save the output of your code sections in the document. 可以在各种平台上安装 Jupyter Notebook。You can install Jupyter Notebooks on a variety of platforms.

本地计算机部分所述的过程将会安装在 Anaconda 环境中运行 Jupyter Notebook 所需的组件。The procedure in the Local computer section installs necessary components for running Jupyter Notebooks in an Anaconda environment.

若要在 Jupyter Notebook 环境中启用这些组件:To enable these components in your Jupyter Notebook environment:

  1. 打开 Anaconda 提示符并激活环境。Open an Anaconda prompt and activate your environment.

    conda activate myenv
    
  2. 克隆一组示例笔记本的 GitHub 存储库Clone the GitHub repository for a set of sample notebooks.

    git clone https://github.com/Azure/MachineLearningNotebooks.git
    
  3. 使用以下命令启动 Jupyter Notebook 服务器:Launch the Jupyter Notebook server with the following command:

    jupyter notebook
    
  4. 若要验证 Jupyter Notebook 是否可以使用 SDK,请创建一个新的笔记本,选择“Python 3”作为内核,然后在笔记本单元中运行以下命令: To verify that Jupyter Notebook can use the SDK, create a New notebook, select Python 3 as your kernel, and then run the following command in a notebook cell:

    import azureml.core
    azureml.core.VERSION
    
  5. 如果在导入模块时遇到问题并收到 ModuleNotFoundError,请在笔记本单元中运行以下代码,确保 Jupyter 内核连接到环境的正确路径。If you encounter issues importing modules and receive a ModuleNotFoundError, ensure your Jupyter kernel is connected to the correct path for your environment by running the following code in a Notebook cell.

    import sys
    sys.path
    
  6. 若要将 Jupyter Notebook 配置为使用你的 Azure 机器学习工作区,请转到创建工作区配置文件部分。To configure the Jupyter Notebook to use your Azure Machine Learning workspace, go to the Create a workspace configuration file section.

Visual Studio CodeVisual Studio Code

Visual Studio Code 是一款非常流行的跨平台代码编辑器,它通过 Visual Studio 市场中提供的扩展支持广泛的编程语言和工具。Visual Studio Code is a very popular cross platform code editor that supports an extensive set of programming languages and tools through extensions available in the Visual Studio marketplace. Azure 机器学习扩展将安装 Python 扩展,用于在所有类型的 Python 环境(虚拟环境、Anaconda 等)中编写代码。The Azure Machine Learning extension installs the Python extension for coding in all types of Python environments (virtual, Anaconda, etc.). 此外,它还提供便利的功能用于处理 Azure 机器学习资源和运行 Azure 机器学习试验,而无需退出 Visual Studio Code。In addition, it provides convenience features for working with Azure Machine Learning resources and running Azure Machine Learning experiments all without leaving Visual Studio Code.

若要使用 Visual Studio Code 进行开发:To use Visual Studio Code for development:

  1. 若要安装适用于 Visual Studio Code 的 Azure 机器学习扩展,请参阅 Azure 机器学习Install the Azure Machine Learning extension for Visual Studio Code, see Azure Machine Learning.

    有关详细信息,请参阅使用适用于 Visual Studio Code 的 Azure 机器学习For more information, see Use Azure Machine Learning for Visual Studio Code.

  2. 若要了解如何使用 Visual Studio Code 进行任何类型的 Python 开发,请参阅 VSCode 中的 Python 入门Learn how to use Visual Studio Code for any type of Python development, see Get started with Python in VSCode.

    • 若要选择包含 SDK 的 SDK Python 环境,请打开 VS Code,然后按 Ctrl+Shift+P(Linux 和 Windows)或 Command+Shift+P (Mac)。To select the SDK Python environment containing the SDK, open VS Code, and then select Ctrl+Shift+P (Linux and Windows) or Command+Shift+P (Mac).

      • 此时会打开“命令面板”。 The Command Palette opens.
    • 输入 Python:Select Interpreter,然后选择相应的环境Enter Python: Select Interpreter, and then select the appropriate environment

  3. 若要验证是否可以使用 SDK,请创建包含以下代码的新 Python 文件 (.py):To validate that you can use the SDK, create a new Python file (.py) that contains the following code:

    #%%
    import azureml.core
    azureml.core.VERSION
    

    单击“运行单元”CodeLens 或直接按 Shift+Enter 运行此代码。Run this code by clicking the "Run cell" CodeLens or simply press shift-enter.

创建工作区配置文件Create a workspace configuration file

工作区配置文件是一个 JSON 文件,用于告知 SDK 如何与 Azure 机器学习工作区进行通信。The workspace configuration file is a JSON file that tells the SDK how to communicate with your Azure Machine Learning workspace. 该文件命名为 config.json,其格式如下:The file is named config.json, and it has the following format:

{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}

此 JSON 文件必须采用包含 Python 脚本或 Jupyter Notebook 的目录结构。This JSON file must be in the directory structure that contains your Python scripts or Jupyter Notebooks. 它可以位于同一目录(名为 .azureml 的子目录)中,也可以位于父目录中。It can be in the same directory, a subdirectory named .azureml, or in a parent directory.

要从代码使用此文件,请使用 ws=Workspace.from_config()To use this file from your code, use ws=Workspace.from_config(). 此代码从文件中加载信息,并连接到工作区。This code loads the information from the file and connects to your workspace.

可通过三种方式创建配置文件:You can create the configuration file in three ways:

  • 使用 ws. write_config :编写 config.json 文件。Use ws.write_config: to write a config.json file. 该文件含包含工作区的配置信息。The file contains the configuration information for your workspace. 可以下载 config.json 或将其复制到其他开发环境。You can download or copy the config.json to other development environments.

  • 下载文件:在Azure 门户中,选择工作区的“概览”部分中的“下载 config.json”Download the file: In the Azure portal, select Download config.json from the Overview section of your workspace.

    Azure 门户

  • 以编程方式创建文件:以下代码片段通过提供订阅 ID、资源组和工作区名称连接到工作区。Create the file programmatically: In the following code snippet, you connect to a workspace by providing the subscription ID, resource group, and workspace name. 然后,它将工作区配置保存到文件中:It then saves the workspace configuration to the file:

    from azureml.core import Workspace
    
    subscription_id = '<subscription-id>'
    resource_group  = '<resource-group>'
    workspace_name  = '<workspace-name>'
    
    try:
        ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
        ws.write_config()
        print('Library configuration succeeded')
    except:
        print('Workspace not found')
    

    此代码将配置文件写入 .azureml/config.json 文件。This code writes the configuration file to the .azureml/config.json file.

后续步骤Next steps