使用 MLflow 和 Azure 机器学习跟踪 Azure Databricks ML 试验(预览版)Track Azure Databricks ML experiments with MLflow and Azure Machine Learning (preview)

本文介绍如何启用 MLflow 的跟踪 URI 和记录 API(统称为 MLflow 跟踪),以连接 Azure Databricks (ADB) 试验、MLflow 和 Azure 机器学习。In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as MLflow Tracking, to connect your Azure Databricks (ADB) experiments, MLflow, and Azure Machine Learning.

MLflow 是一个开放源代码库,用于管理机器学习试验的生命周期。MLflow is an open-source library for managing the life cycle of your machine learning experiments. MLflow 跟踪是 MLflow 的一个组件,用于记录和跟踪训练运行指标和模型项目。MLFlow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts. 详细了解 Azure Databricks 和 MLflowLearn more about Azure Databricks and MLflow.

有关其他 MLflow 和 Azure 机器学习功能集成,请参阅使用 MLflow 和 Azure 机器学习跟踪试验运行并创建终结点See Track experiment runs and create endpoints with MLflow and Azure Machine Learning for additional MLflow and Azure Machine Learning functionality integrations.

备注

作为开放源代码库,MLflow 会经常更改。As an open source library, MLflow changes frequently. 因此,通过 Azure 机器学习和 MLflow 集成提供的功能应视为预览版,Microsoft 并不完全支持它。As such, the functionality made available via the Azure Machine Learning and MLflow integration should be considered as a preview, and not fully supported by Microsoft.

提示

本文档中的信息主要面向需要监视模型训练过程的数据科学家与开发人员。The information in this document is primarily for data scientists and developers who want to monitor the model training process. 如果你是一名管理员并想要了解如何监视 Azure 机器学习的资源使用情况和事件(例如配额、已完成的训练运行或已完成的模型部署),请参阅监视 Azure 机器学习If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.

先决条件Prerequisites

跟踪 Azure Databricks 运行Track Azure Databricks runs

通过将 MLflow 跟踪与 Azure 机器学习配合使用,可将从 Azure Databricks 运行中记录的指标和项目存储到:MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your Azure Databricks runs into both your:

  • Azure Databricks 工作区Azure Databricks workspace.
  • Azure 机器学习工作区Azure Machine Learning workspace

创建 Azure Databricks 工作区和群集后,After you create your Azure Databricks workspace and cluster,

  1. 从 PyPi 安装 azureml-mlflow 库,以确保群集有权访问所需的函数和类。Install the azureml-mlflow library from PyPi, to ensure that your cluster has access to the necessary functions and classes.

  2. 设置试验笔记本。Set up your experiment notebook.

  3. 连接 Azure Databricks 工作区和 Azure 机器学习工作区。Connect your Azure Databricks workspace and Azure Machine Learning workspace.

以下部分提供了有关这些步骤的其他详细信息,以便你可以通过 Azure Databricks 成功运行 MLflow 试验。Additional detail for these steps are in the following sections so you can successfully run your MLflow experiments with Azure Databricks.

安装库Install libraries

若要在群集上安装库,请导航到“库”选项卡,然后选择“安装新库” To install libraries on your cluster, navigate to the Libraries tab and select Install New

将 MLflow 与 Azure Databricks 配合使用

在“包”字段中,键入 azureml-mlflow,然后选择“安装”。In the Package field, type azureml-mlflow and then select install. 根据需要重复此步骤,以将其他包安装到用于试验的群集中。Repeat this step as necessary to install other additional packages to your cluster for your experiment.

Azure DB 安装 MLflow 库

设置笔记本Set up your notebook

设置好 ADB 群集后,Once your ADB cluster is set up,

  1. 在左侧导航窗格中选择“工作区”。Select Workspaces on the left navigation pane.
  2. 展开工作区下拉菜单,选择“导入”Expand the workspaces drop down menu and select Import
  3. 拖放或浏览查找试验笔记本,以导入 ADB 工作区。Drag and drop, or browse to find, your experiment notebook to import your ADB workspace.
  4. 选择“导入” 。Select Import. 试验笔记本会自动打开。Your experiment notebook opens automatically.
  5. 在左上角的笔记本标题下,选择要附加到试验笔记本的群集。Under the notebook title on the top left, select the cluster want to attach to your experiment notebook.

连接 Azure Databricks 和 Azure 机器学习工作区Connect your Azure Databricks and Azure Machine Learning workspaces

通过将 ADB 工作区链接到 Azure 机器学习工作区,可以在 Azure 机器学习工作区中跟踪试验数据。Linking your ADB workspace to your Azure Machine Learning workspace enables you to track your experiment data in the Azure Machine Learning workspace.

若要将 ADB 工作区链接到新的或现有的 Azure 机器学习工作区,To link your ADB workspace to a new or existing Azure Machine Learning workspace,

  1. 登录到 Azure 门户Sign in to Azure portal.
  2. 导航到 ADB 工作区的“概述”页。Navigate to your ADB workspace's Overview page.
  3. 选择右下角的“链接 Azure 机器学习工作区”按钮。Select the Link Azure Machine Learning workspace button on the bottom right.

链接 Azure DB 和 Azure 机器学习工作区

工作区中的 MLflow 跟踪MLflow Tracking in your workspaces

实例化工作区后,MLflow 跟踪会自动设置为在以下所有位置进行跟踪:After you instantiate your workspace, MLflow Tracking is automatically set to be tracked in all of the following places:

  • 链接的 Azure 机器学习工作区。The linked Azure Machine Learning workspace.
  • 原始 ADB 工作区。Your original ADB workspace.

所有试验都会进入托管的 Azure 机器学习跟踪服务。All your experiments land in the managed Azure Machine Learning tracking service.

以下代码应在试验笔记本中,以获取链接的 Azure 机器学习工作区。The following code should be in your experiment notebook to get your linked Azure Machine Learning workspace.

此代码将This code,

  • 获取 Azure 订阅的详细信息,以实例化 Azure 机器学习工作区。Gets the details of your Azure subscription to instantiate your Azure Machine Learning workspace.

  • 假定你拥有现成的资源组和 Azure 机器学习工作区。如果没有,可以创建它们Assumes you have an existing resource group and Azure Machine Learning workspace, otherwise you can create them.

  • 设置试验名称。Sets the experiment name. 此处的 user_name 与 Azure Databricks 工作区关联的 user_name 一致。The user_name here is consistent with the user_name associated with the Azure Databricks workspace.

import mlflow
import mlflow.azureml
import azureml.mlflow
import azureml.core

from azureml.core import Workspace

subscription_id = 'subscription_id'

# Azure Machine Learning resource group NOT the managed resource group
resource_group = 'resource_group_name' 

#Azure Machine Learning workspace name, NOT Azure Databricks workspace
workspace_name = 'workspace_name'  

# Instantiate Azure Machine Learning workspace
ws = Workspace.get(name=workspace_name,
                   subscription_id=subscription_id,
                   resource_group=resource_group)

#Set MLflow experiment. 
experimentName = "/Users/{user_name}/{experiment_folder}/{experiment_name}" 
mlflow.set_experiment(experimentName) 

将 MLflow 跟踪设置为仅在 Azure 机器学习工作区中进行跟踪Set MLflow Tracking to only track in your Azure Machine Learning workspace

如果你希望在一个集中位置管理跟踪的试验,则可以将 MLflow 跟踪设置为仅在 Azure 机器学习工作区中进行跟踪。If you prefer to manage your tracked experiments in a centralized location, you can set MLflow tracking to only track in your Azure Machine Learning workspace.

在脚本中包含以下代码:Include the following code in your script:

uri = ws.get_mlflow_tracking_uri()
mlflow.set_tracking_uri(uri)

在训练脚本中,导入 mlflow 以使用 MLflow 记录 API,并开始记录运行指标。In your training script, import mlflow to use the MLflow logging APIs, and start logging your run metrics. 以下示例记录时期损失指标。The following example, logs the epoch loss metric.

import mlflow 
mlflow.log_metric('epoch_loss', loss.item()) 

使用 MLflow 注册模型Register models with MLflow

训练完模型后,可以使用 mlflow.<model_flavor>.log_model() 方法将模型记录并注册到后端跟踪服务器。After your model is trained, you can log and register your models to the backend tracking server with the mlflow.<model_flavor>.log_model() method. <model_flavor> 是指与模型关联的框架。<model_flavor>, refers to the framework associated with the model. 了解受支持的模型风格Learn what model flavors are supported.

默认情况下,后端跟踪服务器是 Azure Databricks 工作区;如果选择将 MLflow 跟踪设置为仅在 Azure 机器学习工作区中进行跟踪,那么后端跟踪服务器就是 Azure 机器学习工作区。The backend tracking server is the Azure Databricks workspace by default; unless you chose to set MLflow Tracking to only track in your Azure Machine Learning workspace, then the backend tracking server is the Azure Machine Learning workspace.

  • 如果还没有模型注册为该名称,该方法将注册一个新模型,创建版本 1,并返回 ModelVersion MLflow 对象。If a registered model with the name doesn’t exist, the method registers a new model, creates version 1, and returns a ModelVersion MLflow object.

  • 如果已有模型注册为该名称,该方法将创建一个新的模型版本并返回版本对象。If a registered model with the name already exists, the method creates a new model version and returns the version object.

mlflow.spark.log_model(model, artifact_path = "model", 
                       registered_model_name = 'model_name')  

mlflow.sklearn.log_model(model, artifact_path = "model", 
                         registered_model_name = 'model_name') 

为 MLflow 模型创建终结点Create endpoints for MLflow models

当你准备好为 ML 模型创建终结点时,When you are ready to create an endpoint for your ML models. 可以部署为You can deploy as,

  • 用于交互式评分的 Azure 机器学习请求-响应 Web 服务。An Azure Machine Learning Request-Response web service for interactive scoring. 借助此部署,你可以利用 Azure 机器学习模型管理和数据偏移检测功能,并将它们应用于生产模型。This deployment allows you to leverage and apply the Azure Machine Learning model management, and data drift detection capabilities to your production models.

  • MLFlow 模型对象,这些对象可在流式处理或批处理管道中用作 Azure Databricks 工作区中的 Python 函数或 Pandas UDF。MLFlow model objects, which can be used in streaming or batch pipelines as Python functions or Pandas UDFs in Azure Databricks workspace.

将模型部署到 Azure 机器学习终结点Deploy models to Azure Machine Learning endpoints

可以利用 mlflow.azureml.deploy API 将模型部署到 Azure 机器学习工作区。You can leverage the mlflow.azureml.deploy API to deploy a model to your Azure Machine Learning workspace. 如果仅向 Azure Databricks 工作区注册了模型,如使用 MLflow 注册模型部分中所述,请指定 model_name 参数以将模型注册到 Azure 机器学习工作区。If you only registered the model to the Azure Databricks workspace, as described in the register models with MLflow section, specify the model_name parameter to register the model into Azure Machine Learning workspace.

可以将 Azure Databricks 运行部署到以下终结点:Azure Databricks runs can be deployed to the following endpoints,

将模型部署到 ADB 终结点以进行批量评分Deploy models to ADB endpoints for batch scoring

可以选择 Azure Databricks 群集进行批量评分。You can choose Azure Databricks clusters for batch scoring. 系统将加载 MLFlow 模型,并将其用作 Spark Pandas UDF 对新数据进行评分。The MLFlow model is loaded and used as a Spark Pandas UDF to score new data.

from pyspark.sql.types import ArrayType, FloatType 

model_uri = "runs:/"+last_run_id+ {model_path} 

#Create a Spark UDF for the MLFlow model 

pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri) 

#Load Scoring Data into Spark Dataframe 

scoreDf = spark.table({table_name}).where({required_conditions}) 


#Make Prediction 

preds = (scoreDf 

           .withColumn('target_column_name', pyfunc_udf('Input_column1', 'Input_column2', ' Input_column3', …)) 

        ) 

display(preds) 

清理资源Clean up resources

如果不打算使用工作区中记录的指标和项目,目前尚未提供单独删除它们的功能。If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually is currently unavailable. 可以改为删除包含存储帐户和工作区的资源组,这样就不会产生任何费用:Instead, delete the resource group that contains the storage account and workspace, so you don't incur any charges:

  1. 在 Azure 门户中,选择最左侧的“资源组”。In the Azure portal, select Resource groups on the far left.

    在 Azure 门户中删除

  2. 从列表中选择已创建的资源组。From the list, select the resource group you created.

  3. 选择“删除资源组”。Select Delete resource group.

  4. 输入资源组名称。Enter the resource group name. 然后选择“删除”。Then select Delete.

示例笔记本Example notebooks

将 MLflow 与 Azure 机器学习笔记本配合使用演示了本文中所述的概念,并在这些概念的基础上有所延伸。The MLflow with Azure Machine Learning notebooks demonstrate and expand upon concepts presented in this article.

后续步骤Next steps