使用 MLflow 和 Azure Machine Learning 跟踪 ML 模型

适用于：Azure Machine Learning SDK v1 for Python

重要

本文提供有关使用 Azure Machine Learning SDK v1 的信息。 SDK v1 自 2025 年 3 月 31 日起弃用。对它的支持将于 2026 年 6 月 30 日结束。可以在该日期之前安装和使用 SDK v1。使用 SDK v1 的现有工作流将在支持结束日期后继续运行。但是，在产品发生体系结构更改时，可能会面临安全风险或中断性变更。

建议在 2026 年 6 月 30 日之前过渡到 SDK v2。有关 SDK v2 的详细信息，请参阅什么是 Azure Machine Learning CLI 和 Python SDK v2？和 SDK v2 参考。

备注

本文使用 Azure Machine Learning SDK v1。有关使用 SDK v2 和 MLflow 的当前方法，请参阅使用 MLflow 的试验和模型和配置 Azure Machine Learning 的 MLflow。

本文介绍如何启用 MLflow 跟踪以将Azure Machine Learning作为 MLflow 试验的后端进行连接。

MLflow 是一个开放源代码库，用于管理机器学习试验的生命周期。 MLflow 跟踪是 MLflow 的一个组件，它记录和跟踪训练运行指标和模型项目，无论试验的环境如何-本地在计算机上、远程计算目标、虚拟机或 Azure Databricks 群集。

请参阅 MLflow 和 Azure Machine Learning，了解所有受支持的 MLflow 和Azure Machine Learning功能，包括 MLflow Project 支持（预览版）和模型部署。

提示

若要跟踪在 Azure Databricks 或 Azure Synapse Analytics 上进行的实验，请参阅专门介绍 MLflow 和 Azure 机器学习的文章跟踪 Azure Databricks ML 实验。

备注

本文档中的信息主要面向需要监视模型训练过程的数据科学家与开发人员。如果你是一名管理员，希望监视来自Azure Machine Learning的资源使用情况和事件，例如配额、已完成的训练作业或已完成的模型部署，请参阅 Monitoring Azure Machine Learning。

先决条件

安装 mlflow 包。
- 使用 MLflow 瘦身，这是一个轻量级 MLflow 包，无需 SQL 存储、服务器、UI 或数据科学依赖项。如果主要需要跟踪和日志记录功能，而无需导入完整的 MLflow 功能套件（包括部署），请使用此包。
安装 azureml-mlflow 包。
创建Azure Machine Learning工作区。
- 查看使用工作区执行 MLflow 操作所需的访问权限。
安装和设置 Azure Machine Learning CLI （v1）并确保安装 ml 扩展。

重要

本文中的某些Azure CLI命令使用 Azure Machine Learning azure-cli-ml 或 v1 扩展。对 CLI v1 的支持于 2025 年 9 月 30 日结束。 Microsoft将不再为此服务提供技术支持或更新。使用 CLI v1 的现有工作流将继续在支持终止日期后运行。但是，在产品发生体系结构更改时，可能会面临安全风险或中断性变更。

建议尽快过渡到 mlv2 扩展。有关 v2 扩展的详细信息，请参阅 Azure Machine Learning CLI 扩展和 Python SDK v2。
安装和设置 Azure Machine Learning SDK for Python。

本地或远程计算机的运行跟踪

通过将 MLflow 与 Azure Machine Learning 配合使用，您可以在 Azure Machine Learning 工作区中存储您在本地计算机上执行的运行的记录指标和项目。

设置跟踪环境

若要跟踪未在 Azure Machine Learning 计算上运行的运行（称为 local compute），需要将本地计算指向 Azure Machine Learning MLflow 跟踪 URI。

备注

在 Azure 计算平台上运行（Azure 笔记本、Jupyter 笔记本在 Azure 计算实例或计算群集上托管）时，无需配置跟踪 URI。它会自动配置好。

适用于：Azure Machine Learning SDK v1 for Python

您可以使用适用于 Python 的 Azure Machine Learning SDK v1 获取 Azure Machine Learning MLflow 跟踪 URI。请确保您在使用的群集中安装 azureml-sdk 库。以下示例获取与工作区关联的唯一 MLflow 追踪 URI。然后，set_tracking_uri() 方法将 MLflow 跟踪 URI 指向该 URI。

使用工作区配置文件：
```
from azureml.core import Workspace
import mlflow

ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
```
提示

可以通过以下方式下载工作区配置文件：
1. 导航到 Azure Machine Learning studio
2. 选择页面右上角 ->下载配置文件。
3. 将文件 config.json 保存在正在使用的同一目录中。

使用订阅 ID、资源组名称和工作区名称：

from azureml.core import Workspace
import mlflow

#Enter details of your Azure Machine Learning workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace_name = '<AZUREML_WORKSPACE_NAME>'

ws = Workspace.get(name=workspace_name,
                   subscription_id=subscription_id,
                   resource_group=resource_group)

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

适用于：Azure CLI ml extension v1

另一种选择是直接在终端中设置一个 MLflow 环境变量 MLFLOW_TRACKING_URI。

export MLFLOW_TRACKING_URI=$(az ml workspace show --query mlflow_tracking_uri | sed 's/"//g')

重要

请确保在本地计算机上登录到Azure帐户。否则，跟踪 URI 将返回一个空字符串。如果使用任何Azure Machine Learning计算，则已配置跟踪环境和试验名称。

可以使用订阅 ID、部署资源的区域、资源组名称和工作区名称来生成 Azure Machine Learning 跟踪 URI。以下代码示例演示如何生成 URI：

import mlflow

region = ""
subscription_id = ""
resource_group = ""
workspace_name = ""

azureml_mlflow_uri = f"azureml://{region}.api.ml.azure.cn/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}"
mlflow.set_tracking_uri(azureml_mlflow_uri)

备注

还可以通过以下方式获取此 URL：

导航到 Azure Machine Learning studio
选择页面的右上角 -> 查看 Azure 门户中的所有属性 ->MLflow 跟踪 URI。
复制 URI 并将其与方法 mlflow.set_tracking_uri一起使用。

设置试验名称

所有 MLflow 运行都记录到当前的实验中。默认情况下，运行日志会被记录到由 Azure Machine Learning 自动创建的名为 Default 的试验中。若要配置要使用的试验，请使用 MLflow 命令 mlflow.set_experiment()。

experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)

提示

使用 Azure Machine Learning SDK 提交作业时，请在提交作业时使用属性 experiment_name 设置试验名称。不必在训练脚本上进行配置。

启动训练任务

设置 MLflow 试验名称后，使用 start_run() 启动训练运行。然后使用 log_metric() 激活 MLflow 记录 API 并开始记录训练运行指标。

import os
from random import random

with mlflow.start_run() as mlflow_run:
    mlflow.log_param("hello_param", "world")
    mlflow.log_metric("hello_metric", random())
    os.system(f"echo 'hello world' > helloworld.txt")
    mlflow.log_artifact("helloworld.txt")

有关如何使用 MLflow 记录运行中的指标、参数和项目的详细信息，请参阅如何记录和查看指标。

跟踪在 Azure Machine Learning 上的任务运行

适用于：Azure Machine Learning SDK v1 for Python

远程运行（作业）允许你以更可靠且可重复的方式训练模型。它们还可以利用更强大的计算资源，例如Machine Learning计算群集。若要了解不同的计算选项，请参阅使用计算目标进行模型训练。

提交运行时，Azure Machine Learning会自动配置 MLflow 以处理运行中的工作区。此配置意味着无需设置 MLflow 跟踪 URI。此外，Azure Machine Learning根据试验提交的详细信息自动命名试验。

重要

将训练作业提交到Azure Machine Learning时，无需在训练逻辑中配置 MLflow 跟踪 URI 或试验名称。 Azure Machine Learning为你处理这些配置。

创建训练例程

首先，创建子 src 目录并使用训练代码添加 train.py 文件。所有训练代码都进入 src 子目录，包括 train.py。

训练代码来自Azure Machine Learning示例存储库中的此 MLflow 示例。

将以下代码复制到文件中：

# imports
import os
import mlflow

from random import random

# define functions
def main():
    mlflow.log_param("hello_param", "world")
    mlflow.log_metric("hello_metric", random())
    os.system(f"echo 'hello world' > helloworld.txt")
    mlflow.log_artifact("helloworld.txt")


# run functions
if __name__ == "__main__":
    # run main function
    main()

配置实验

使用Python将试验提交到Azure Machine Learning。在笔记本或Python文件中，使用 Environment 类配置计算和训练运行环境。

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

env = Environment(name="mlflow-env")

# Specify conda dependencies with scikit-learn and temporary pointers to mlflow extensions
cd = CondaDependencies.create(
    conda_packages=["scikit-learn", "matplotlib"],
    pip_packages=["azureml-mlflow", "pandas", "numpy"]
    )

env.python.conda_dependencies = cd

然后，使用远程计算创建一个 ScriptRunConfig 作为计算目标。

from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory="src",
                      script=training_script,
                      compute_target="<COMPUTE_NAME>",
                      environment=env)

使用此计算和训练运行配置，调用 Experiment.submit() 方法以提交运行。此方法自动设置 MLflow 跟踪 URI，并将日志记录从 MLflow 定向到工作区。

from azureml.core import Experiment
from azureml.core import Workspace
ws = Workspace.from_config()

experiment_name = "experiment_with_mlflow"
exp = Experiment(workspace=ws, name=experiment_name)

run = exp.submit(src)

查看工作区中的指标和工件

工作区跟踪 MLflow 日志记录中的指标和项目。若要随时查看它们，请转到工作区，并在 Azure Machine Learning studio 中按名称查找试验。或运行以下代码。

使用 MLflow get_run（）检索运行指标。

from mlflow.tracking import MlflowClient

# Use MlFlow to retrieve the run that was just completed
client = MlflowClient()
run_id = mlflow_run.info.run_id
finished_mlflow_run = MlflowClient().get_run(run_id)

metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params

print(metrics,tags,params)

若要查看运行的项目，请使用 MlFlowClient.list_artifacts()。

client.list_artifacts(run_id)

若要将项目下载到当前目录，请使用 mlflow.artifacts.download_artifacts（）。

file_path = mlflow.artifacts.download_artifacts(run_id=run_id, artifact_path="helloworld.txt")

备注

在旧版 MLflow （< 2.0）中，请改用 MlflowClient.download_artifacts() 。

有关如何使用 MLflow 在 Azure Machine Learning 中检索试验和运行信息的详细信息，请参阅使用 MLflow 管理试验和运行。

比较和查询执行情况

请使用以下代码来比较和查询您 Azure Machine Learning 工作区中的所有 MLflow 运行。若要详细了解如何使用 MLflow 查询运行，请参阅 mlflow。

from mlflow.entities import ViewType

all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()]
query = "metrics.hello_metric > 0"
runs = mlflow.search_runs(experiment_ids=all_experiments, filter_string=query, run_view_type=ViewType.ALL)

runs.head(10)

自动日志记录

通过使用 Azure Machine Learning 和 MLflow，可以在训练模型时自动记录指标、模型参数和模型项目。支持各种常用的机器学习库。

若要启用自动日志记录，请在训练代码之前插入以下代码：

mlflow.autolog()

了解有关使用 MLflow 实现自动日志记录的详细信息。

管理模型

使用支持 MLflow 模型注册表的 Azure Machine Learning 模型注册表注册和跟踪模型。 Azure Machine Learning模型与 MLflow 模型架构保持一致，因此可以轻松地跨不同的工作流导出和导入这些模型。已注册的模型还跟踪与 MLflow 相关的元数据，例如运行 ID，以便进行跟踪。可以提交训练运行、注册模型和部署从 MLflow 运行生成的模型。

若要在一个步骤中部署和注册生产就绪模型，请参阅部署和注册 MLflow 模型。

若要注册并查看运行中的模型，请执行以下步骤：

运行完成后，调用 register_model() 该方法。

# the model folder produced from a run is registered. This includes the MLmodel file, model.pkl and the conda.yaml.
model_path = "model"
model_uri = 'runs:/{}/{}'.format(run_id, model_path) 
mlflow.register_model(model_uri,"registered_model_name")

使用 Azure Machine Learning studio 在工作区中查看已注册的模型。

在以下示例中，已注册的模型 my-model 标记了 MLflow 跟踪元数据。
选择“项目”选项卡以查看与 MLflow 模型架构（conda.yaml、MLmodel 和 model.pkl）一致的所有模型文件。
选择 MLmodel 以查看运行生成的 MLmodel 文件。

清理资源

如果不打算在工作区中使用记录的指标和项目，则当前无法单独删除它们。请删除包含存储帐户和工作区的资源组，以免产生任何费用：

在Azure门户中，选择最左侧的资源组。
从列表中选择已创建的资源组。
选择“删除资源组”。
输入资源组名称。然后选择“删除”。

示例笔记本

MLflow 和 Azure Machine Learning 笔记本演示并扩展本文中介绍的概念。另请参阅社区驱动的存储库，AzureML-Examples。

后续步骤

Last updated on 2026-04-22

使用 MLflow 和 Azure Machine Learning 跟踪 ML 模型

先决条件

本地或远程计算机的运行跟踪

设置跟踪环境

设置试验名称

启动训练任务

跟踪在 Azure Machine Learning 上的任务运行

创建训练例程

配置实验

查看工作区中的指标和工件

比较和查询执行情况

自动日志记录

管理模型

清理资源

示例笔记本

后续步骤

Recursos adicionales