使用 Azure 机器学习大规模构建 scikit-learn 模型Build scikit-learn models at scale with Azure Machine Learning

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何使用 Azure 机器学习运行 scikit-learn 训练脚本。In this article, learn how to run your scikit-learn training scripts with Azure Machine Learning.

本文中的示例脚本用来对鸢尾花图像进行分类,以基于 scikit-learn 的 iris 数据集构建机器学习模型。The example scripts in this article are used to classify iris flower images to build a machine learning model based on scikit-learn's iris dataset.

无论是从头开始训练机器学习 scikit-learn 模型,还是将现有模型引入云中,都可以通过 Azure 机器学习使用弹性云计算资源来横向扩展开源训练作业。Whether you're training a machine learning scikit-learn model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. 你可以通过 Azure 机器学习来构建、部署和监视生产级模型以及对其进行版本控制。You can build, deploy, version and monitor production-grade models with Azure Machine Learning.


在以下任一环境中运行此代码:Run this code on either of these environments:

  • Azure 机器学习计算实例 - 无需下载或安装Azure Machine Learning compute instance - no downloads or installation necessary

    • 在开始本教程之前完成教程:设置环境和工作区以创建预先装载了 SDK 和示例存储库的专用笔记本服务器。Complete the Tutorial: Setup environment and workspace to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
    • 在笔记本服务器上的示例训练文件夹中,导航到以下目录,查找已完成且已展开的笔记本:how-to-use-azureml > ml-frameworks > scikit-learn > training > train-hyperparameter-tune-deploy-with-sklearn 文件夹。In the samples training folder on the notebook server, find a completed and expanded notebook by navigating to this directory: how-to-use-azureml > ml-frameworks > scikit-learn > training > train-hyperparameter-tune-deploy-with-sklearn folder.
  • 你自己的 Jupyter 笔记本服务器Your own Jupyter Notebook server

设置试验Set up the experiment

本部分将准备训练实验,包括加载所需 python 包、初始化工作区、创建实验以及上传训练数据和训练脚本。This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.

初始化工作区Initialize a workspace

Azure 机器学习工作区是服务的顶级资源。The Azure Machine Learning workspace is the top-level resource for the service. 它提供了一个集中的位置来处理创建的所有项目。It provides you with a centralized place to work with all the artifacts you create. 在 Python SDK 中,可以通过创建 workspace 对象来访问工作区项目。In the Python SDK, you can access the workspace artifacts by creating a workspace object.

根据在先决条件部分中创建的 config.json 文件创建工作区对象。Create a workspace object from the config.json file created in the prerequisites section.

from azureml.core import Workspace

ws = Workspace.from_config()

准备脚本Prepare scripts

这里为你提供了在本教程中使用的训练脚本 train_iris.py。In this tutorial, the training script train_iris.py is already provided for you here. 实际上,你应该能够原样获取任何自定义训练脚本,并使用 Azure ML 运行它,而无需修改你的代码。In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.


  • 提供的训练脚本展示了如何在脚本中使用 Run 对象将一些指标记录到 Azure ML 运行中。The provided training script shows how to log some metrics to your Azure ML run using the Run object within the script.
  • 提供的训练脚本使用了来自 iris = datasets.load_iris() 函数的示例数据。The provided training script uses example data from the iris = datasets.load_iris() function. 对于你自己的数据,你可能需要使用上传数据集和脚本之类的步骤来使数据在训练期间可用。For your own data, you may need to use steps such as Upload dataset and scripts to make data available during training.

定义环境。Define your Environment.

创建自定义环境。Create a custom environment.

创作 conda 环境 (sklearn-env.yml)。Author your conda environment (sklearn-env.yml). 若要在笔记本中编写 conda 环境,可以在单元顶部添加 %%writefile sklearn-env.yml 行。To write the conda environment from a notebook, you can add the line %%writefile sklearn-env.yml at the top of the cell.

name: sklearn-training-env
  - python=3.6.2
  - scikit-learn
  - numpy
  - pip:
    - azureml-defaults

基于此 Conda 环境规范创建 Azure ML 环境。Create an Azure ML environment from this Conda environment specification. 此环境将在运行时打包到 docker 容器中。The Environment will be packaged into a docker container at runtime.

from azureml.core import Environment

myenv = Environment.from_conda_specification(name = "myenv", file_path = "sklearn-env.yml")
myenv.docker.enabled = True

使用特选环境Use a curated environment

如果你不想生成自己的映像,可以使用 Azure ML 提供的预生成的特选容器环境。Azure ML provides prebuilt, curated container environments if you don't want to build your own image. 有关详细信息,请参阅此文For more info, see here. 若要使用特选环境,可以改为运行以下命令:If you want to use a curated environment, you can run the following command instead:

env = Environment.get(workspace=ws, name="AzureML-Tutorial")

创建 ScriptRunConfigCreate a ScriptRunConfig

此 ScriptRunConfig 会提交你的在本地计算目标上执行的作业。This ScriptRunConfig will submit your job for execution on the local compute target.

from azureml.core import ScriptRunConfig

sklearnconfig = ScriptRunConfig(source_directory='.', script='train_iris.py')
sklearnconfig.run_config.environment = myenv

若要针对远程群集进行提交,可以将 run_config.target 更改为所需的计算目标。If you want to submit against a remote cluster, you can change run_config.target to the desired compute target.

提交运行Submit your run

from azureml.core import Experiment

run = Experiment(ws,'train-sklearn').submit(config=sklearnconfig)


Azure 机器学习通过复制整个源目录来运行训练脚本。Azure Machine Learning runs training scripts by copying the entire source directory. 如果你有不想上传的敏感数据,请使用 .ignore 文件或不将其包含在源目录中。If you have sensitive data that you don't want to upload, use a .ignore file or don't include it in the source directory . 改为使用数据存储来访问数据。Instead, access your data using a datastore.

有关自定义 Python 环境的详细信息,请参阅创建和管理用于训练和部署的环境For more information on customizing your Python environment, see Create and manage environments for training and deployment.

在运行执行过程中发生的情况What happens during run execution

执行运行时,会经历以下阶段:As the run is executed, it goes through the following stages:

  • 准备:根据 TensorFlow 估算器创建 Docker 映像。Preparing: A docker image is created according to the TensorFlow estimator. 将映像上传到工作区的容器注册表,缓存以用于后续运行。The image is uploaded to the workspace's container registry and cached for later runs. 还会将日志流式传输到运行历史记录,可以查看日志以监视进度。Logs are also streamed to the run history and can be viewed to monitor progress.

  • 缩放:如果 Batch AI 群集执行运行所需的节点多于当前可用节点,则群集将尝试纵向扩展。Scaling: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.

  • 正在运行:将脚本文件夹中的所有脚本上传到计算目标,装载或复制数据存储,然后执行 entry_script。Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. 将 stdout 和 ./logs 文件夹中的输出流式传输到运行历史记录,可将其用于监视运行。Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

  • 后期处理:将运行的 ./outputs 文件夹复制到运行历史记录。Post-Processing: The ./outputs folder of the run is copied over to the run history.

保存并注册模型Save and register the model

训练模型后,可以将其保存并注册到工作区。Once you've trained the model, you can save and register it to your workspace. 凭借模型注册,可以在工作区中存储模型并对其进行版本控制,从而简化模型管理和部署Model registration lets you store and version your models in your workspace to simplify model management and deployment.

将以下代码添加到训练脚本 train_iris.py 以保存模型。Add the following code to your training script, train_iris.py, to save the model.

import joblib

joblib.dump(svm_model_linear, 'model.joblib')

使用以下代码将模型注册到工作区。Register the model to your workspace with the following code. 通过指定参数 model_frameworkmodel_framework_versionresource_configuration,无代码模型部署将可供使用。By specifying the parameters model_framework, model_framework_version, and resource_configuration, no-code model deployment becomes available. 无代码模型部署允许你通过已注册模型直接将模型部署为 Web 服务,ResourceConfiguration 对象定义 Web 服务的计算资源。No-code model deployment allows you to directly deploy your model as a web service from the registered model, and the ResourceConfiguration object defines the compute resource for the web service.

from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

model = run.register_model(model_name='sklearn-iris', 
                           resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))


无论使用哪种估算器进行训练,都可以采用与 Azure 机器学习中任何其他已注册模型完全相同的方式部署你刚才注册的模型。The model you just registered can be deployed the exact same way as any other registered model in Azure Machine Learning, regardless of which estimator you used for training. 部署指南包含有关模型注册的部分,但由于你已有一个已注册的模型,因而可以直接跳到创建计算目标进行部署。The deployment how-to contains a section on registering models, but you can skip directly to creating a compute target for deployment, since you already have a registered model.

(预览版)无代码模型部署(Preview) No-code model deployment

除了传统的部署路线之外,还可以为 scikit-learn 使用无代码部署功能(预览版)。Instead of the traditional deployment route, you can also use the no-code deployment feature (preview)for scikit-learn. 所有内置 scikit-learn 模型类型都支持无代码模型部署。No-code model deployment is supported for all built-in scikit-learn model types. 通过使用 model_frameworkmodel_framework_versionresource_configuration 参数注册你的模型(如上所示),可以简单地使用 deploy() 静态函数来部署模型。By registering your model as shown above with the model_framework, model_framework_version, and resource_configuration parameters, you can simply use the deploy() static function to deploy your model.

web_service = Model.deploy(ws, "scikit-learn-service", [model])

注意:这些依赖项包含在预建的 scikit-learn 推理容器中。NOTE: These dependencies are included in the pre-built scikit-learn inference container.

    - azureml-defaults
    - inference-schema[numpy-support]
    - scikit-learn
    - numpy

完整的操作指南更深入地介绍了 Azure 机器学习。The full how-to covers deployment in Azure Machine Learning in greater depth.

后续步骤Next steps

在本文中,你训练并注册了一个 scikit-learn 模型,并了解了部署选项。In this article, you trained and registered a scikit-learn model, and learned about deployment options. 有关 Azure 机器学习的详细信息,请参阅以下其他文章。See these other articles to learn more about Azure Machine Learning.