使用 MLflow 和 Azure 机器学习训练和跟踪 ML 模型(预览版)Train and track ML models with MLflow and Azure Machine Learning (preview)
本文介绍如何启用 MLflow 的跟踪 URI 和记录 API(统称为 MLflow 跟踪),以将 Azure 机器学习作为 MLflow 试验的后端进行连接。In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as MLflow Tracking, to connect Azure Machine Learning as the backend of your MLflow experiments.
支持的功能包括:Supported capabilities include:
在 Azure 机器学习工作区中跟踪和记录试验指标及项目。Track and log experiment metrics and artifacts in your Azure Machine Learning workspace. 如果已为试验使用 MLflow 跟踪,工作区可提供集中、安全和可缩放的位置,用于存储训练指标和模型。If you already use MLflow Tracking for your experiments, the workspace provides a centralized, secure, and scalable location to store training metrics and models.
使用具有 Azure 机器学习后端支持(预览版)的 MLflow 项目提交训练作业。Submit training jobs with MLflow Projects with Azure Machine Learning backend support (preview). 你可以使用 Azure 机器学习跟踪在本地提交作业,也可以像通过 Azure 机器学习计算那样将运行迁移到云中。You can submit jobs locally with Azure Machine Learning tracking or migrate your runs to the cloud like via an Azure Machine Learning Compute.
在 MLflow 和 Azure 机器学习模型注册表中跟踪和管理模型。Track and manage models in MLflow and Azure Machine Learning model registry.
MLflow 是一个开放源代码库,用于管理机器学习试验的生命周期。MLflow is an open-source library for managing the life cycle of your machine learning experiments. MLFlow 跟踪是 MLflow 的一个组件,它可以记录和跟踪训练运行指标及模型项目,无论试验环境是在本地计算机上、远程计算目标上、虚拟机上,还是在 Azure Databricks 群集上。MLFlow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an Azure Databricks cluster.
备注
作为开放源代码库,MLflow 会经常更改。As an open source library, MLflow changes frequently. 因此,通过 Azure 机器学习和 MLflow 集成提供的功能应视为预览版,Microsoft 并不完全支持它。As such, the functionality made available via the Azure Machine Learning and MLflow integration should be considered as a preview, and not fully supported by Microsoft.
下图说明使用 MLflow 跟踪,你可以跟踪试验的运行指标,并将模型项目存储在 Azure 机器学习工作区中。The following diagram illustrates that with MLflow Tracking, you track an experiment's run metrics and store model artifacts in your Azure Machine Learning workspace.
提示
本文档中的信息主要是为希望监视模型训练过程的数据科学家和开发人员提供的。The information in this document is primarily for data scientists and developers who want to monitor the model training process. 如果您是一名管理员,希望监视 Azure 机器学习的资源使用情况和事件,例如配额、已完成的训练运行或已完成的模型部署,请参阅监视 Azure 机器学习。If you are an administrator interested in monitoring resource usage and events from Azure Machine learning, such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.
比较 MLflow 和 Azure 机器学习客户端Compare MLflow and Azure Machine Learning clients
下表汇总了可以使用 Azure 机器学习的不同客户端,以及它们各自的功能。The following table summarizes the different clients that can use Azure Machine Learning, and their respective function capabilities.
MLflow 跟踪提供指标记录和项目存储功能,这些功能仅通过 Azure 机器学习 Python SDK 提供。MLflow Tracking offers metric logging and artifact storage functionalities that are only otherwise available via the Azure Machine Learning Python SDK.
功能Capability | MLflow 跟踪和部署MLflow Tracking & Deployment | Azure 机器学习 Python SDKAzure Machine Learning Python SDK | Azure 机器学习 CLIAzure Machine Learning CLI | Azure 机器学习工作室Azure Machine Learning studio |
---|---|---|---|---|
管理工作区Manage workspace | ✓✓ | ✓✓ | ✓✓ | |
使用数据存储Use data stores | ✓✓ | ✓✓ | ||
记录指标Log metrics | ✓✓ | ✓✓ | ||
上传项目Upload artifacts | ✓✓ | ✓✓ | ||
查看指标View metrics | ✓✓ | ✓✓ | ✓✓ | ✓✓ |
管理计算Manage compute | ✓✓ | ✓✓ | ✓✓ | |
部署模型Deploy models | ✓✓ | ✓✓ | ✓✓ | ✓✓ |
监视模型性能Monitor model performance | ✓✓ | |||
检测数据偏差Detect data drift | ✓✓ | ✓✓ |
先决条件Prerequisites
- 安装
azureml-mlflow
包。Install theazureml-mlflow
package.- 此包会自动引入 Azure 机器学习 Python SDK 的
azureml-core
,它为 MLflow 访问工作区提供了连接。This package automatically brings inazureml-core
of the The Azure Machine Learning Python SDK, which provides the connectivity for MLflow to access your workspace.
- 此包会自动引入 Azure 机器学习 Python SDK 的
- 创建 Azure 机器学习工作区。Create an Azure Machine Learning Workspace.
跟踪本地运行Track local runs
使用 Azure 机器学习进行 MLflow 跟踪,你可以将本地运行中记录的指标和项目存储到 Azure 机器学习工作区中。MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your local runs into your Azure Machine Learning workspace.
导入 mlflow
和 Workspace
类以访问 MLflow 的跟踪 URI 并配置工作区。Import the mlflow
and Workspace
classes to access MLflow's tracking URI and configure your workspace.
在下面的代码中,get_mlflow_tracking_uri()
方法会向工作区 ws
分配唯一的跟踪 URI 地址,并且 set_tracking_uri()
会将 MLflow 跟踪 URI 指向该地址。In the following code, the get_mlflow_tracking_uri()
method assigns a unique tracking URI address to the workspace, ws
, and set_tracking_uri()
points the MLflow tracking URI to that address.
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
备注
跟踪 URI 的有效时间不超过一小时。The tracking URI is valid up to an hour or less. 如果在一段空闲时间后重新启动脚本,请使用 get_mlflow_tracking_uri API 来获取新的 URI。If you restart your script after some idle time, use the get_mlflow_tracking_uri API to get a new URI.
使用 set_experiment()
设置 MLflow 试验名称,并通过 start_run()
启动训练运行。Set the MLflow experiment name with set_experiment()
and start your training run with start_run()
. 然后使用 log_metric()
激活 MLflow 记录 API 并开始记录训练运行指标。Then use log_metric()
to activate the MLflow logging API and begin logging your training run metrics.
experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)
with mlflow.start_run():
mlflow.log_metric('alpha', 0.03)
跟踪远程运行Track remote runs
远程运行可以让你通过更强大的计算(例如启用 GPU 的虚拟机或机器学习计算群集)训练模型。Remote runs let you train your models on more powerful computes, such as GPU enabled virtual machines, or Machine Learning Compute clusters. 请参阅使用计算目标进行模型训练,了解不同的计算选项。See Use compute targets for model training to learn about different compute options.
使用 Azure 机器学习进行 MLflow 跟踪,你可以将远程运行中记录的指标和项目存储在 Azure 机器学习工作区中。MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your remote runs into your Azure Machine Learning workspace. 任何包含 MLflow 跟踪代码的运行都会自动将指标记录到工作区。Any run with MLflow Tracking code in it will have metrics logged automatically to the workspace.
以下示例 conda 环境包括 mlflow
和 azureml-mlflow
作为 pip 包。The following example conda environment includes mlflow
and azureml-mlflow
as pip packages.
name: sklearn-example
dependencies:
- python=3.6.2
- scikit-learn
- matplotlib
- numpy
- pip:
- azureml-mlflow
- numpy
在脚本中,使用 Environment
类配置计算和训练运行环境。In your script, configure your compute and training run environment with the Environment
class. 然后,将远程计算作为计算目标构造 ScriptRunConfig
。Then, construct ScriptRunConfig
with your remote compute as the compute target.
import mlflow
with mlflow.start_run():
mlflow.log_metric('example', 1.23)
在具有此计算和训练运行配置的情况下,使用 Experiment.submit()
方法提交运行。With this compute and training run configuration, use the Experiment.submit()
method to submit a run. 此方法会自动设置 MLflow 跟踪 URI 并将记录从 MLflow 定向到工作区。This method automatically sets the MLflow tracking URI and directs the logging from MLflow to your Workspace.
run = exp.submit(src)
使用 MLflow 项目进行训练Train with MLflow Projects
MLflow 项目允许你组织和描述你的代码,使其他数据科学家(或自动化工具)可以运行它。MLflow Projects allow for you to organize and describe your code to let other data scientists (or automated tools) run it. 使用 Azure 机器学习的 MLflow 项目使你可以在工作区中跟踪和管理你的训练运行。MLflow Projects with Azure Machine Learning enables you to track and manage your training runs in your workspace.
此示例演示如何使用 Azure 机器学习跟踪在本地提交 MLflow 项目。This example shows how to submit MLflow projects locally with Azure Machine Learning tracking.
安装 azureml-mlflow
包,以通过 Azure 机器学习对本地试验使用 MLflow 跟踪。Install the azureml-mlflow
package to use MLflow Tracking with Azure Machine Learning on your experiments locally. 可以通过 Jupyter Notebook 或代码编辑器运行试验。Your experiments can run via a Jupyter notebook or code editor.
pip install azureml-mlflow
导入 mlflow
和 Workspace
类以访问 MLflow 的跟踪 URI 并配置工作区。Import the mlflow
and Workspace
classes to access MLflow's tracking URI and configure your workspace.
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
使用 set_experiment()
设置 MLflow 试验名称,并通过 start_run()
启动训练运行。Set the MLflow experiment name with set_experiment()
and start your training run with start_run()
. 然后使用 log_metric()
激活 MLflow 记录 API 并开始记录训练运行指标。Then, use log_metric()
to activate the MLflow logging API and begin logging your training run metrics.
experiment_name = 'experiment-with-mlflow-projects'
mlflow.set_experiment(experiment_name)
创建后端配置对象以存储集成所需的信息,如计算目标以及要使用的托管环境的类型。Create the backend configuration object to store necessary information for the integration such as, the compute target and which type of managed environment to use.
backend_config = {"USE_CONDA": False}
将 azureml-mlflow
包作为 pip 依赖项添加到环境配置文件,以便在工作区中跟踪指标和关键项目。Add the azureml-mlflow
package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace.
name: mlflow-example
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.6
- scikit-learn=0.19.1
- pip
- pip:
- mlflow
- azureml-mlflow
提交本地运行,并确保设置参数 backend = "azureml"
。Submit the local run and ensure you set the parameter backend = "azureml"
. 利用此设置,你可以在本地提交运行,并在工作区中获得对自动输出跟踪、日志文件、快照和打印错误的附加支持。With this setting, you can submit runs locally and get the added support of automatic output tracking, log files, snapshots, and printed errors in your workspace.
在 Azure 机器学习工作室中查看运行和指标。View your runs and metrics in the Azure Machine Learning studio.
local_env_run = mlflow.projects.run(uri=".",
parameters={"alpha":0.3},
backend = "azureml",
use_conda=False,
backend_config = backend_config,
)
查看工作区中的指标和项目View metrics and artifacts in your workspace
MLflow 记录的指标和项目保存在工作区中。The metrics and artifacts from MLflow logging are kept in your workspace. 若要随时查看它们,请在 Azure 机器学习工作室中导航到你的工作区,并在该工作区中按名称找到试验。To view them anytime, navigate to your workspace and find the experiment by name in your workspace in Azure Machine Learning studio. 或运行以下代码。Or run the below code.
run.get_metrics()
管理模型Manage models
使用支持 MLflow 模型注册表的 Azure 机器学习模型注册表注册和跟踪模型。Register and track your models with the Azure Machine Learning model registry which supports the MLflow model registry. Azure 机器学习模型与 MLflow 模型架构一致,从而可以轻松地在不同的工作流之间导出和导入这些模型。Azure Machine Learning models are aligned with the MLflow model schema making it easy to export and import these models across different workflows. 与 MLflow 相关的元数据(如运行 ID)还使用注册的模型进行标记,以进行跟踪。The MLflow related metadata such as, run ID is also tagged with the registered model for traceability. 用户可以提交训练运行、注册和部署 MLflow 运行生成的模型。Users can submit training runs, register, and deploy models produced from MLflow runs.
如果要一步部署和注册生产就绪模型,请参阅部署和注册 MLflow 模型。If you want to deploy and register your production ready model in one step, see Deploy and register MLflow models.
若要注册并查看运行中的模型,请执行以下步骤:To register and view a model from a run, use the following steps:
运行完成后,调用
register_model()
方法。Once the run is complete call theregister_model()
method.# the model folder produced from the run is registered. This includes the MLmodel file, model.pkl and the conda.yaml. run.register_model(model_name = 'my-model', model_path = 'model')
使用 Azure 机器学习工作室查看工作区中的已注册模型。View the registered model in your workspace with Azure Machine Learning studio.
在以下示例中,已注册的模型
my-model
标记了 MLflow 跟踪元数据。In the following example the registered model,my-model
has MLflow tracking metadata tagged.选择“项目”选项卡以查看与 MLflow 模型架构(conda.yaml、MLmodel 和 model.pkl)一致的所有模型文件。Select the Artifacts tab to see all the model files that align with the MLflow model schema (conda.yaml, MLmodel, model.pkl).
选择 MLmodel 以查看运行生成的 MLmodel 文件。Select MLmodel to see the MLmodel file generated by the run.
清理资源Clean up resources
如果不打算使用工作区中记录的指标和项目,目前尚未提供单独删除它们的功能。If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually is currently unavailable. 可以改为删除包含存储帐户和工作区的资源组,这样就不会产生任何费用:Instead, delete the resource group that contains the storage account and workspace, so you don't incur any charges:
在 Azure 门户中,选择最左侧的“资源组”。In the Azure portal, select Resource groups on the far left.
从列表中选择已创建的资源组。From the list, select the resource group you created.
选择“删除资源组”。Select Delete resource group.
输入资源组名称。Enter the resource group name. 然后选择“删除”。Then select Delete.
示例笔记本Example notebooks
将 MLflow 与 Azure ML 笔记本配合使用演示了本文中所述的概念,并在这些概念的基础上有所延伸。The MLflow with Azure ML notebooks demonstrate and expand upon concepts presented in this article.
备注
可在 https://github.com/Azure/azureml-examples 找到使用 mlflow 的社区主导的示例存储库。A community-driven repository of examples using mlflow can be found at https://github.com/Azure/azureml-examples.
后续步骤Next steps
- 使用 MLflow 部署模型。Deploy models with MLflow.
- 监视生产模型中的数据偏移。Monitor your production models for data drift.
- 使用 MLflow 跟踪 Azure Databricks 运行。Track Azure Databricks runs with MLflow.
- 管理模型。Manage your models.