什么是 Azure 机器学习环境?What are Azure Machine Learning environments?

Azure 机器学习环境封装了一个供你在其中进行机器学习训练的环境。Azure Machine Learning environments are an encapsulation of the environment where your machine learning training happens. 此类学习环境指定了与训练和评分脚本有关的 Python 包、环境变量和软件设置。They specify the Python packages, environment variables, and software settings around your training and scoring scripts. 它们还指定运行时(Python、Spark 或 Docker)。They also specify run times (Python, Spark, or Docker). 环境是机器学习工作区中受管理且版本受控的实体,可用于创建跨各种计算目标的可再现、可审核且可移植的机器学习工作流。The environments are managed and versioned entities within your Machine Learning workspace that enable reproducible, auditable, and portable machine learning workflows across a variety of compute targets.

可以使用本地计算机上的 Environment 目标执行以下操作:You can use an Environment object on your local compute to:

  • 开发训练脚本。Develop your training script.
  • 在 Azure 机器学习计算中重用相同的环境进行大规模的模型训练。Reuse the same environment on Azure Machine Learning Compute for model training at scale.
  • 使用该相同环境部署你的模型。Deploy your model with that same environment.
  • 重新访问在其中训练了现有模型的环境。Revisit the environment in which an existing model was trained.

下图说明了如何将单个 Environment 对象同时用于你的运行配置(用于训练)与你的推理和部署配置(用于 Web 服务部署)。The following diagram illustrates how you can use a single Environment object in both your run configuration, for training, and your inference and deployment configuration, for web service deployments.

图示:机器学习工作流中的环境

环境、计算目标和训练脚本共同构成了运行配置:完整的训练运行规范。The environment, compute target and training script together form the run configuration: the full specification of a training run.

环境类型Types of environments

环境可以总体分为三类:特选类、用户管理类以及系统管理类 。Environments can broadly be divided into three categories: curated, user-managed, and system-managed.

特选环境由 Azure 机器学习提供,且默认可用于你的工作区。Curated environments are provided by Azure Machine Learning and are available in your workspace by default. 应该按原样使用它们,其中包含的 Python 包和设置的集合可帮助你开始使用各种机器学习框架。Intended to be used as is, they contain collections of Python packages and settings to help you get started with various machine learning frameworks. 这些预先创建的环境还可以加快部署速度。These pre-created environments also allow for faster deployment time. 如需完整列表,请参阅特选环境一文。For a full list, see the curated environments article.

在用户管理的环境中,你需要负责设置环境,并在计算目标上安装训练脚本所需的每个包。In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the compute target. Conda 不检查你的环境,也不会为你安装任何组件。Conda doesn't check your environment or install anything for you. 如果你定义自己的环境,则必须将版本为 >= 1.0.45azureml-defaults 列出为 pip 依赖项。If you're defining your own environment, you must list azureml-defaults with version >= 1.0.45 as a pip dependency. 此包包含将模型作为 Web 服务托管时所需的功能。This package contains the functionality that's needed to host the model as a web service.

如果想让 Conda 为你管理 Python 环境和脚本依赖项,请使用系统管理的环境。You use system-managed environments when you want Conda to manage the Python environment and the script dependencies for you. 新的 conda 环境基于 conda 依赖项对象构建。A new conda environment is built based on the conda dependencies object. 默认情况下,Azure 机器学习服务会采用这种类型的环境,因为它适用于不可手动配置的远程计算目标。The Azure Machine Learning service assumes this type of environment by default, because of its usefulness on remote compute targets that aren't manually configurable.

创建和管理环境Create and manage environments

可以通过以下方式创建环境:You can create environments by:

  • 定义新的 Environment 对象(通过使用特选环境或通过定义自己的依赖项)。Defining new Environment objects, either by using a curated environment or by defining your own dependencies.
  • 使用工作区中的现有 Environment 对象。Using existing Environment objects from your workspace. 此方法可以确保你的依赖项的一致性和可再现性。This approach allows for consistency and reproducibility with your dependencies.
  • 从现有 Anaconda 环境定义中进行导入。Importing from an existing Anaconda environment definition.
  • 使用 Azure 机器学习 CLIUsing the Azure Machine Learning CLI
  • 使用 VS Code 扩展Using the VS Code extension

如需具体的代码示例,请参阅如何使用环境中的“创建环境”部分。For specific code samples, see the "Create an environment" section of How to use environments. 还可以通过工作区轻松管理环境。Environments are also easily managed through your workspace. 它们包括以下功能:They include the following functionality:

  • 提交试验时,环境会自动注册到你的工作区。Environments are automatically registered to your workspace when you submit an experiment. 还可以手动注册它们。They can also be manually registered.
  • 可以从你的工作区中提取环境来用于训练或部署,或对环境定义进行编辑。You can fetch environments from your workspace to use for training or deployment, or to make edits to the environment definition.
  • 利用版本控制,你可以查看你的环境随时间的变化,从而确保可再现性。With versioning, you can see changes to your environments over time, which ensures reproducibility.
  • 你可以基于你的环境自动生成 Docker 映像。You can build Docker images automatically from your environments.

如需代码示例,请参阅如何使用环境中的“管理环境”部分。For code samples, see the "Manage environments" section of How to use environments.

生成、缓存和重复使用环境Environment building, caching, and reuse

Azure 机器学习服务在 Docker 映像和 conda 环境中生成环境定义。The Azure Machine Learning service builds environment definitions into Docker images and conda environments. 它还会缓存环境,使其可在后续的训练运行和服务终结点部署中重复使用。It also caches the environments so they can be reused in subsequent training runs and service endpoint deployments. 远程运行训练脚本需要创建一个 Docker 映像,而本地运行可以直接使用 Conda 环境。Running a training script remotely requires the creation of a Docker image whereas, a local run can use a Conda environment directly.

使用某个环境提交运行Submitting a run using an environment

当你首次使用某个环境提交远程运行时,Azure 机器学习服务会在与工作区关联的 Azure 容器注册表 (ACR) 上调用 ACR 生成任务When you first submit a remote run using an environment, the Azure Machine Learning service invokes an ACR Build Task on the Azure Container Registry (ACR) associated with the Workspace. 然后,生成的 Docker 映像将在工作区 ACR 中缓存。The built Docker image is then cached on the Workspace ACR. 特选环境由全局 ACR 中缓存的 Docker 映像提供支持。Curated environments are backed by Docker images that are cached in Global ACR. 开始执行运行时,计算目标会从相关 ACR 中检索该映像。At the start of the run execution, the image is retrieved by the compute target from the relevant ACR.

对于本地运行,将基于环境定义创建 Docker 或 Conda 环境。For local runs, a Docker or Conda environment is created based on the environment definition. 然后,将在目标计算(本地运行时环境或本地 Docker 引擎)上执行脚本。The scripts are then executed on the target compute - a local runtime environment or local Docker engine.

以 Docker 映像的形式生成环境Building environments as Docker images

如果工作区 ACR 中不存在环境定义,则会生成一个新映像。If the environment definition doesn't already exist in the workspace ACR, a new image will be built. 映像生成包括两个步骤:The image build consists of two steps:

  1. 下载基础映像,并执行任何 Docker 步骤Downloading a base image, and executing any Docker steps
  2. 根据环境定义中指定的 conda 依赖项生成 conda 环境。Building a conda environment according to conda dependencies specified in the environment definition.

如果指定用户管理的依赖项,则会省略第二个步骤。The second step is omitted if you specify user-managed dependencies. 在这种情况下,你需要负责安装任何 Python 包,方法是在基础映像中包含这些包,或者在第一个步骤中指定自定义 Docker 步骤。In this case you're responsible for installing any Python packages, by including them in your base image, or specifying custom Docker steps within the first step. 你还要负责为 Python 可执行文件指定正确的位置。You're also responsible for specifying the correct location for the Python executable. 还可以使用自定义 Docker 基础映像It is also possible to use a custom Docker base image.

缓存和重复使用映像Image caching and reuse

如果你对另一个运行使用相同的环境定义,Azure 机器学习服务将重复使用工作区 ACR 中缓存的映像。If you use the same environment definition for another run, the Azure Machine Learning service reuses the cached image from the Workspace ACR.

若要查看缓存的映像的详细信息,请使用 Environment.get_image_details 方法。To view the details of a cached image, use Environment.get_image_details method.

为了确定是要重复使用缓存的映像还是生成新映像,服务将从环境定义计算一个哈希值,并将其与现有环境的哈希进行比较。To determine whether to reuse a cached image or build a new one, the service computes a hash value from the environment definition and compares it to the hashes of existing environments. 计算的哈希基于:The hash is based on:

  • 基础映像属性值Base image property value
  • 自定义 Docker 步骤属性值Custom docker steps property value
  • Conda 定义中的 Python 包列表List of Python packages in Conda definition
  • Spark 定义中的包列表List of packages in Spark definition

此哈希不依赖于环境名称或版本 - 如果只是重命名了环境,或者使用与现有环境完全相同的属性和包创建了新环境,则哈希值将保持不变。The hash doesn't depend on environment name or version - if you rename your environment or create a new environment with the exact properties and packages of an existing one, then the hash value remains the same. 但是,更改环境定义(例如添加或删除 Python 包,或更改包版本)会导致哈希值更改。However, environment definition changes, such as adding or removing a Python package or changing the package version, cause the hash value to change. 更改环境中的依赖项或通道的顺序将产生新的环境,因此需要新的映像版本。Changing the order of dependencies or channels in an environment will result in a new environment and thus require a new image build. 需要注意的是,对特选环境所做的任何更改都会导致哈希失效,并产生新的“非特选”环境。It is important to note that any change to a curated environment will invalidate the hash and result in a new "non-curated" environment.

计算所得的哈希值将与工作区中和全局 ACR(或本地运行的计算目标)中的哈希值进行比较。The computed hash value is compared to those in the Workspace and Global ACR (or on the compute target for local runs). 如果存在匹配项,则会拉取缓存的映像,否则会触发映像生成。If there is a match then the cached image is pulled, otherwise an image build is triggered. 拉取缓存的映像的持续时间包括下载时间,而拉取新生成的映像的持续时间包括生成时间和下载时间。The duration to pull a cached image includes the download time whereas the duration to pull a newly built image includes both the build time and the download time.

下图显示了三个环境定义。The following diagram shows three environment definitions. 其中两个定义包含不同的名称和版本,但其中的基础映像和 Python 包完全相同。Two of them have different names and versions, but identical base image and Python packages. 但它们具有相同的哈希,因此对应于同一个缓存的映像。But they have the same hash and thus correspond to the same cached image. 第三个环境包含不同的 Python 包和版本,因此对应于不同的缓存映像。The third environment has different Python packages and versions, and therefore corresponds to a different cached image.

作为 Docker 映像缓存的环境示意图

重要

如果创建了一个包含未固定包依赖项(例如 numpy)的环境,该环境将继续使用创建环境时安装的包版本。If you create an environment with an unpinned package dependency, for example numpy, that environment will keep using the package version installed at the time of environment creation. 此外,将来包含匹配定义的任何环境将继续使用旧版本。Also, any future environment with matching definition will keep using the old version.

若要更新包,请指定版本号以强制重新生成映像,例如 numpy==1.18.1To update the package, specify a version number to force image rebuild, for example numpy==1.18.1. 将会安装新的依赖项(包括嵌套的依赖项),这可能会破坏以前正常工作的方案。New dependencies, including nested ones, will be installed that might break a previously working scenario.

警告

Environment.build 方法将重新生成缓存的映像,这可能会造成更新取消固定包的负面影响,并破坏对应于该缓存映像的所有环境定义的可再现性。The Environment.build method will rebuild the cached image, with possible side-effect of updating unpinned packages and breaking reproducibility for all environment definitions corresponding to that cached image.

后续步骤Next steps