教程:训练第一个 ML 模型Tutorial: Train your first ML model

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本教程是由两个部分构成的系列教程的第二部分 。This tutorial is part two of a two-part tutorial series. 在上一篇教程中,你创建了一个工作区并选择了一个开发环境In the previous tutorial, you created a workspace and chose a development environment. 本教程介绍 Azure 机器学习中的基础设计模式,并基于糖尿病数据集训练一个简单的 scikit-learn 模型。In this tutorial, you learn the foundational design patterns in Azure Machine Learning, and train a simple scikit-learn model based on the diabetes data set. 完成本教程后,你将获得 SDK 的实践知识,继而可以开发更复杂的试验和工作流。After completing this tutorial, you will have the practical knowledge of the SDK to scale up to developing more-complex experiments and workflows.

本教程将介绍以下任务:In this tutorial, you learn the following tasks:

  • 连接工作区并创建试验Connect your workspace and create an experiment
  • 加载数据并训练 scikit-learn 模型Load data and train scikit-learn models
  • 在工作室中查看训练结果View training results in the studio
  • 检索最佳模型Retrieve the best model

先决条件Prerequisites

唯一的先决条件是运行本教程的第一部分:设置环境和工作区The only prerequisite is to run part one of this tutorial, Setup environment and workspace.

在本教程的这一部分中,你将运行在第一部分末尾打开的示例 Jupyter 笔记本 tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb 中的代码。In this part of the tutorial, you run the code in the sample Jupyter notebook tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb opened at the end of part one. 本文将介绍此 Notebook 中的相同代码。This article walks through the same code that is in the notebook.

打开笔记本Open the notebook

  1. 登录到 Azure 机器学习工作室Sign in to Azure Machine Learning studio.

  2. 打开第一部分中所示的文件夹中的 tutorial-1st-experiment-sdk-train.ipynbOpen the tutorial-1st-experiment-sdk-train.ipynb in your folder as shown in part one.

在 Jupyter 界面中创建新的笔记本!Do not create a new notebook in the Jupyter interface! 笔记本 tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb 包含本教程所需的所有代码和数据The notebook tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb is inclusive of all code and data needed for this tutorial.

连接工作区并创建试验Connect workspace and create experiment

提示

tutorial-1st-experiment-sdk-train.ipynb 的内容。Contents of tutorial-1st-experiment-sdk-train.ipynb. 如果要在运行代码时继续阅读,请立即切换到 Jupyter 笔记本。Switch to the Jupyter notebook now if you want to read along as you run the code. 若要在笔记本中运行单个代码单元,请单击代码单元,然后按 Shift+EnterTo run a single code cell in a notebook, click the code cell and hit Shift+Enter. 或者,通过从顶部工具栏中选择“全部运行”来运行整个笔记本。Or, run the entire notebook by choosing Run all from the top toolbar.

导入 Workspace 类,并使用函数 from_config(). 从文件 config.json 中加载订阅信息。默认情况下,这会查找当前目录中的 JSON 文件,但你也可以使用 from_config(path="your/file/path") 指定一个路径参数以指向该文件。Import the Workspace class, and load your subscription information from the file config.json using the function from_config(). This looks for the JSON file in the current directory by default, but you can also specify a path parameter to point to the file using from_config(path="your/file/path"). 如果你在工作区中的云笔记本服务器上运行此笔记本,则该文件会自动包含在根目录中。If you are running this notebook in a cloud notebook server in your workspace, the file is automatically in the root directory.

如果以下代码要求进行额外的身份验证,只需在浏览器中粘贴链接,然后输入身份验证令牌即可。If the following code asks for additional authentication, simply paste the link in a browser and enter the authentication token. 此外,如果有多个租户链接到用户,则需要添加以下行:In addition, if you have more than one tenant linked to your user, you will need to add the following lines:

from azureml.core.authentication import InteractiveLoginAuthentication
interactive_auth = InteractiveLoginAuthentication(tenant_id="your-tenant-id")
Additional details on authentication can be found here: https://aka.ms/aml-notebook-auth 
from azureml.core import Workspace
ws = Workspace.from_config()

现在,请在工作区中创建一个试验。Now create an experiment in your workspace. 试验是代表一系列试运行(各个模型运行)的另一个基础云资源。An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). 本教程使用试验来创建运行并在 Azure 机器学习工作室中跟踪模型训练。In this tutorial you use the experiment to create runs and track your model training in the Azure Machine Learning studio. 参数包含工作区引用,以及试验的字符串名称。Parameters include your workspace reference, and a string name for the experiment.

from azureml.core import Experiment
experiment = Experiment(workspace=ws, name="diabetes-experiment")

加载数据并准备训练Load data and prepare for training

对于本教程,使用糖尿病数据集,其中使用年龄、性别和 BMI 等特征来预测糖尿病的发展阶段。For this tutorial, you use the diabetes data set, which uses features like age, gender, and BMI to predict diabetes disease progression. Azure 开放数据集类加载数据,并使用 train_test_split() 将其拆分为训练集和测试集。Load the data from the Azure Open Datasets class, and split it into training and test sets using train_test_split(). 此函数将隔离数据,以便在训练后为模型提供不可见的数据进行测试。This function segregates the data so the model has unseen data to use for testing following training.

from azureml.opendatasets import Diabetes
from sklearn.model_selection import train_test_split

x_df = Diabetes.get_tabular_dataset().to_pandas_dataframe().dropna()
y_df = x_df.pop("Y")

X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)

训练模型Train a model

进行小规模的训练时,在本地就能轻松训练一个简单的 scikit-learn 学习模型;但是,在训练包含数十个不同特征组合方式和超参数设置的多个迭代时,很容易就会失去已训练的模型及其训练方式的跟踪。Training a simple scikit-learn model can easily be done locally for small-scale training, but when training many iterations with dozens of different feature permutations and hyperparameter settings, it is easy to lose track of what models you've trained and how you trained them. 以下设计模式演示如何利用 SDK 轻松跟踪云中的训练。The following design pattern shows how to leverage the SDK to easily keep track of your training in the cloud.

生成一个脚本,用于通过不同的超参数 alpha 值训练循环中的岭回归模型。Build a script that trains ridge models in a loop through different hyperparameter alpha values.

from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math

alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

for alpha in alphas:
    run = experiment.start_logging()
    run.log("alpha_value", alpha)

    model = Ridge(alpha=alpha)
    model.fit(X=X_train, y=y_train)
    y_pred = model.predict(X=X_test)
    rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))
    run.log("rmse", rmse)

    model_name = "model_alpha_" + str(alpha) + ".pkl"
    filename = "outputs/" + model_name

    joblib.dump(value=model, filename=filename)
    run.upload_file(name=model_name, path_or_stream=filename)
    run.complete()

以上代码完成以下操作:The above code accomplishes the following:

  1. 对于 alphas 数组中的每个 alpha 超参数值,在试验中创建一个新的运行。For each alpha hyperparameter value in the alphas array, a new run is created within the experiment. 记录 alpha 值以区分不同的运行。The alpha value is logged to differentiate between each run.
  2. 在每个运行中,实例化、训练一个岭回归模型并使用它来运行预测。In each run, a Ridge model is instantiated, trained, and used to run predictions. 计算实际值与预测值的均方根误差,然后将结果记录到该运行。The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. 此时,该运行中已附加 alpha 值和 rmse 准确度的元数据。At this point the run has metadata attached for both the alpha value and the rmse accuracy.
  3. 接下来,序列化每个运行的模型并将其上传到运行。Next, the model for each run is serialized and uploaded to the run. 这样便可从工作室中的运行下载模型文件。This allows you to download the model file from the run in the studio.
  4. 每次迭代结束时,通过调用 run.complete() 完成运行。At the end of each iteration the run is completed by calling run.complete().

完成训练后,调用 experiment 变量可提取工作室中的试验的链接。After the training has completed, call the experiment variable to fetch a link to the experiment in the studio.

experiment
名称Name工作区Workspace报告页Report Page文档页Docs Page
diabetes-experimentdiabetes-experimentyour-workspace-nameyour-workspace-nameAzure 机器学习工作室链接Link to Azure Machine Learning studio文档链接Link to Documentation

在工作室中查看训练结果View training results in studio

单击“Azure 机器学习工作室链接”可转到试验主页。Following the Link to Azure Machine Learning studio takes you to the main experiment page. 在此处可以查看试验中的每个运行。Here you see all the individual runs in the experiment. 所有自定义记录的值(在本例中为 alpha_valuermse)将成为每个运行的字段,并且可在图标中使用。Any custom-logged values (alpha_value and rmse, in this case) become fields for each run, and also become available for the charts. 要绘制具有日志记录指标的新图表,请单击“添加图表”并选择要绘制的指标。To plot a new chart with a logged metric, click on 'Add chart' and select the metric you would like to plot.

对包含数百甚至数千个独立运行的模型进行大规模训练时,在此页中可以轻松查看训练的每个模型,具体而言,可以查看模型的训练方式,以及独特的指标在不同时间的变化。When training models at scale over hundreds and thousands of separate runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time.

工作室中的试验主页。

选择 RUN NUMBER 列中的运行编号链接可查看单个运行的页面。Select a run number link in the RUN NUMBER column to see the page for an individual run. 默认选项卡“详细信息”显示有关每个运行的详细信息。The default tab Details shows you more-detailed information on each run. 导航到“输出 + 日志”选项卡可看到在每次训练迭代期间上传到运行的模型的 .pkl 文件。Navigate to the Outputs + logs tab, and you see the .pkl file for the model that was uploaded to the run during each training iteration. 在此处可以下载模型文件,而无需手动重新训练。Here you can download the model file, rather than having to retrain it manually.

工作室中的运行详细信息页。

获取最佳模型Get the best model

除了能够从工作室中的试验下载模型文件以外,还能够以编程方式下载这些文件。In addition to being able to download model files from the experiment in the studio, you can also download them programmatically. 以下代码循环访问试验中的每个运行,并访问记录的运行指标和运行详细信息(包含 run_id)。The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). 这会跟踪最佳运行,在本例中,该运行是均方根误差最低的运行。This keeps track of the best run, in this case the run with the lowest root-mean-squared-error.

minimum_rmse_runid = None
minimum_rmse = None

for run in experiment.get_runs():
    run_metrics = run.get_metrics()
    run_details = run.get_details()
    # each logged metric becomes a key in this returned dict
    run_rmse = run_metrics["rmse"]
    run_id = run_details["runId"]

    if minimum_rmse is None:
        minimum_rmse = run_rmse
        minimum_rmse_runid = run_id
    else:
        if run_rmse < minimum_rmse:
            minimum_rmse = run_rmse
            minimum_rmse_runid = run_id

print("Best run_id: " + minimum_rmse_runid)
print("Best run_id rmse: " + str(minimum_rmse))    
Best run_id: 864f5ce7-6729-405d-b457-83250da99c80
Best run_id rmse: 57.234760283951765

使用 Run 构造函数以及试验对象,通过最佳运行 ID 提取单个运行。Use the best run ID to fetch the individual run using the Run constructor along with the experiment object. 然后调用 get_file_names() 以查看可从此运行下载的所有文件。Then call get_file_names() to see all the files available for download from this run. 在本例中,你在训练期间只为每个运行上传了一个文件。In this case, you only uploaded one file for each run during training.

from azureml.core import Run
best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)
print(best_run.get_file_names())
['model_alpha_0.1.pkl']

对运行对象调用 download(),并指定要下载的模型文件名。Call download() on the run object, specifying the model file name to download. 默认情况下,此函数会将文件下载到当前目录。By default this function downloads to the current directory.

best_run.download_file(name="model_alpha_0.1.pkl")

清理资源Clean up resources

如果打算运行其他 Azure 机器学习教程,请不要完成本部分。Do not complete this section if you plan on running other Azure Machine Learning tutorials.

停止计算实例Stop the compute instance

如果使用了计算实例或笔记本 VM,请停止未使用的 VM,以降低成本。If you used a compute instance or Notebook VM, stop the VM when you are not using it to reduce cost.

  1. 在工作区中选择“计算”。 In your workspace, select Compute.

  2. 从列表中选择 VM。From the list, select the VM.

  3. 选择“停止” 。Select Stop.

  4. 准备好再次使用服务器时,选择“启动” 。When you're ready to use the server again, select Start.

删除所有内容Delete everything

重要

已创建的资源可以用作其他 Azure 机器学习教程和操作方法文章的先决条件。The resources you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.

如果不打算使用已创建的资源,请删除它们,以免产生任何费用:If you don't plan to use the resources you created, delete them, so you don't incur any charges:

  1. 在 Azure 门户中,选择最左侧的“资源组” 。In the Azure portal, select Resource groups on the far left.

    在 Azure 门户中删除Delete in the Azure portal

  2. 从列表中选择已创建的资源组。From the list, select the resource group you created.

  3. 选择“删除资源组” 。Select Delete resource group.

  4. 输入资源组名称。Enter the resource group name. 然后选择“删除” 。Then select Delete.

还可保留资源组,但请删除单个工作区。You can also keep the resource group but delete a single workspace. 显示工作区属性,然后选择“删除”。Display the workspace properties and select Delete.

后续步骤Next steps

在本教程中,你已执行以下任务:In this tutorial, you did the following tasks:

  • 已连接工作区并创建试验Connected your workspace and created an experiment
  • 已加载数据并训练 scikit-learn 模型Loaded data and trained scikit-learn models
  • 已在工作室中查看训练结果并已检索模型Viewed training results in the studio and retrieved models

使用 Azure 机器学习来部署模型Deploy your model with Azure Machine Learning. 了解如何开发自动化机器学习试验。Learn how to develop automated machine learning experiments.