Python 快速入门Quick start Python

MLflow 是用于管理端到端机器学习生命周期的开源平台。MLflow is an open source platform for managing the end-to-end machine learning lifecycle. MLflow 提供简单的 API 来记录指标(例如模型丢失)、参数(例如学习速率)和拟合模型,以便简化随后的训练结果分析或模型部署。MLflow provides simple APIs for logging metrics (for example, model loss), parameters (for example, learning rate), and fitted models, making it easy to analyze training results or deploy models later on.

本部分内容:In this section:

安装 MLflowInstall MLflow

如果使用的是用于机器学习的 Databricks Runtime,则已经安装 MLflow。If you’re using Databricks Runtime for Machine Learning, MLflow is already installed. 否则,请从 PyPI 安装 MLflow 包Otherwise, install the MLflow package from PyPI.

自动将训练运行记录到 MLflowAutomatically log training runs to MLflow

MLflow 提供 mlflow.<framework>.autolog() API 来自动记录采用多种 ML 框架编写的训练代码。MLflow provides mlflow.<framework>.autolog() APIs to automatically log training code written in many ML frameworks. 你可以先调用此 API,然后再运行训练代码以记录特定于模型的指标、参数和模型项目。You can call this API before running training code to log model-specific metrics, parameters, and model artifacts.

TensorflowTensorflow

# Also autoinstruments tf.keras
import mlflow.tensorflow
mlflow.tensorflow.autolog()

KerasKeras

# Use import mlflow.tensorflow and mlflow.tensorflow.autolog() if using tf.keras
import mlflow.keras
mlflow.keras.autolog()

XgboostXgboost

import mlflow.xgboost
mlflow.xgboost.autolog()

LightgbmLightgbm

import mlflow.lightgbm
mlflow.lightgbm.autolog()

Scikit-learnScikit-learn

import mlflow.sklearn
mlflow.sklearn.autolog()

PysparkPyspark

如果使用 pyspark.ml 执行优化,则系统会自动将指标和模型记录到 MLflow 中。If performing tuning with pyspark.ml, metrics and models are automatically logged to MLflow. 请参阅 Apache Spark MLlib 和自动化 MLflow 跟踪See Apache Spark MLlib and automated MLflow tracking

查看结果View results

执行机器学习代码后,你可以在“试验运行”边栏中查看结果:After executing your machine learning code, you can view results in the Experiment Runs sidebar:

  1. 单击笔记本上下文栏中的“试验”图标试验Click the Experiment icon Experiment in the notebook context bar. 随即显示“试验运行”边栏。The Experiments Runs sidebar displays. 在边栏中,你可以查看运行的参数和指标:In the sidebar, you can view the run’s parameters and metrics:

    本垒打次数Runs

  2. 单击“外部链接”图标Click the External Link icon 外部链接 (位于“试验运行”上下文栏中)以查看试验:in the Experiment Runs context bar to view the experiment:

    查看试验View experiment

  3. 在试验中,单击一个日期:In the experiment, click a date:

    选择“运行”Select run

    随即显示运行详细信息:The run details display:

    运行详细信息Run details

  4. 在试验中,单击一个源:In the experiment, click a source:

    试验源Experiment source

    随即显示运行中使用的笔记本修订版:The notebook revision used in the run displays:

    笔记本修订版Notebook revision

跟踪其他指标、参数和模型Track additional metrics, parameters, and models

你可以通过直接调用 MLflow 跟踪日志记录 API 来记录其他信息。You can log additional information by directly invoking the MLflow Tracking logging APIs.

  • 数值指标:Numerical metrics:

    import mlflow
    mlflow.log_metric("accuracy", 0.9)
    
  • 训练参数:Training parameters:

    import mlflow
    mlflow.log_param("learning_rate", 0.001)
    
  • 模型:Models:

    Scikit-learnScikit-learn

    import mlflow.sklearn
    mlflow.sklearn.log_model(model, "myModel")
    

    PysparkPyspark

    import mlflow.spark
    mlflow.spark.log_model(model, "myModel")
    

    XgboostXgboost

    import mlflow.xgboost
    mlflow.xgboost.log_model(model, "myModel")
    

    TensorflowTensorflow

    import mlflow.keras
    mlflow.keras.log_model(model, "myModel")
    

    KerasKeras

    import mlflow.keras
    mlflow.keras.log_model(model, "myModel")
    

    PyTorchPytorch

    import mlflow.pytorch
    mlflow.pytorch.log_model(model, "myModel")
    

    SpacySpacy

    import mlflow.spacy
    mlflow.spacy.log_model(model, "myModel")
    
  • 其他项目(文件):Other artifacts (files):

    import mlflow
    mlflow.log_artifact("/tmp/my-file", "myArtifactPath")
    

示例笔记本Example notebooks

要求Requirements

Databricks Runtime 6.4 或更高版本或者 Databricks Runtime 6.4 ML 或更高版本。Databricks Runtime 6.4 or above or Databricks Runtime 6.4 ML or above.

笔记本Notebooks

要通过 Python 开始使用 MLflow 跟踪,推荐的方法是使用 MLflow autolog() API。The recommended way to get started using MLflow tracking with Python is to use the MLflow autolog() API. 使用 MLflow 的 autologging 功能,一行代码会自动记录生成的模型、用于创建模型的参数和模型分数。With MLflow’s autologging capabilities, a single line of code automatically logs the resulting model, the parameters used to create the model, and a model score. 以下笔记本演示了如何使用 autologging 设置运行。The following notebook shows you how to set up a run using autologging.

MLflow Autologging 快速入门 Python 笔记本MLflow Autologging Quick Start Python notebook

获取笔记本Get notebook

如果需要更好地控制为每次训练运行记录的指标,或想要记录其他项目(如表或绘图),可以使用以下笔记本中演示的 MLflow 日志记录 API 函数。If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the MLflow logging API functions demonstrated in the following notebook.

MLflow 日志记录 API 快速入门 Python 笔记本MLflow Logging API Quick Start Python notebook

获取笔记本Get notebook

了解详细信息Learn more