Track ML experiments and models with MLflow

In this article, you learn how to use MLflow for tracking your experiments and runs in Azure Machine Learning workspaces.

Tracking is the process of saving relevant information about experiments that you run. The saved information (metadata) varies based on your project, and it can include:

  • Code
  • Environment details (such as OS version, Python packages)
  • Input data
  • Parameter configurations
  • Models
  • Evaluation metrics
  • Evaluation visualizations (such as confusion matrices, importance plots)
  • Evaluation results (including some evaluation predictions)

When you're working with jobs in Azure Machine Learning, Azure Machine Learning automatically tracks some information about your experiments, such as code, environment, and input and output data. However, for others like models, parameters, and metrics, the model builder needs to configure their tracking, as they're specific to the particular scenario.

Note

If you want to track experiments that are running on Azure Databricks, see Track Azure Databricks ML experiments with MLflow and Azure Machine Learning.\

Benefits of tracking experiments

We strongly recommend that machine learning practitioners track experiments, whether you're training with jobs in Azure Machine Learning or training interactively in notebooks. Experiment tracking allows you to:

  • Organize all of your machine learning experiments in a single place. You can then search and filter experiments and drill down to see details about the experiments you ran before.
  • Compare experiments, analyze results, and debug model training with little extra work.
  • Reproduce or rerun experiments to validate results.
  • Improve collaboration, since you can see what other teammates are doing, share experiment results, and access experiment data programmatically.

Why use MLflow for tracking experiments?

Azure Machine Learning workspaces are MLflow-compatible, which means you can use MLflow to track runs, metrics, parameters, and artifacts within your Azure Machine Learning workspaces. A major advantage of using MLflow for tracking is that you don't need to change your training routines to work with Azure Machine Learning or inject any cloud-specific syntax.

For more information about all supported MLflow and Azure Machine Learning functionalities, see MLflow and Azure Machine Learning.

Limitations

Some methods available in the MLflow API might not be available when connected to Azure Machine Learning. For details about supported and unsupported operations, see Support matrix for querying runs and experiments.

Prerequisites

  • An Azure subscription. If you don't have an Azure subscription, create a Trial before you begin. Try the Azure Machine Learning.
  • Install the MLflow SDK package mlflow and the Azure Machine Learning azureml-mlflow plug-in for MLflow:

    pip install mlflow azureml-mlflow
    

    Tip

    You can use the mlflow-skinny package, which is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. mlflow-skinny is recommended for users who primarily need the MLflow tracking and logging capabilities without importing the full suite of features, including deployments.

  • An Azure Machine Learning workspace. To create a workspace, see the Create machine learning resources tutorial. Review the access permissions you need to perform your MLflow operations in your workspace.

  • If you perform remote tracking (that is, track experiments running outside Azure Machine Learning), configure MLflow to point to the tracking URI of your Azure Machine Learning workspace. For more information on how to connect MLflow to your workspace, see Configure MLflow for Azure Machine Learning.

Configure the experiment

MLflow organizes information in experiments and runs (runs are called jobs in Azure Machine Learning). By default, runs are logged to an experiment named Default that is automatically created for you. You can configure the experiment where tracking is happening.

For interactive training, such as in a Jupyter notebook, use the MLflow command mlflow.set_experiment(). For example, the following code snippet configures an experiment:

experiment_name = 'hello-world-example'
mlflow.set_experiment(experiment_name)

Configure the run

Azure Machine Learning tracks any training job in what MLflow calls a run. Use runs to capture all the processing that your job performs.

When you're working interactively, MLflow starts tracking your training routine as soon as you try to log information that requires an active run. For instance, MLflow tracking starts when you log a metric, a parameter, or start a training cycle, and Mlflow's autologging functionality is enabled. However, it's usually helpful to start the run explicitly, specially if you want to capture the total time for your experiment in the Duration field. To start the run explicitly, use mlflow.start_run().

Whether you start the run manually or not, you eventually need to stop the run, so that MLflow knows that your experiment run is done and can mark the run's status as Completed. To stop a run, use mlflow.end_run().

We strongly recommend starting runs manually, so that you don't forget to end them when you're working in notebooks.

  • To start a run manually and end it when you're done working in the notebook:

    mlflow.start_run()
    
    # Your code
    
    mlflow.end_run()
    
  • It's usually helpful to use the context manager paradigm to help you remember to end the run:

    with mlflow.start_run() as run:
        # Your code
    
  • When you start a new run with mlflow.start_run(), it can be useful to specify the run_name parameter, which later translates to the name of the run in the Azure Machine Learning user interface and help you to identify the run quicker:

    with mlflow.start_run(run_name="hello-world-example") as run:
        # Your code
    

Enable MLflow autologging

You can log metrics, parameters, and files with MLflow manually. However, you can also rely on MLflow's automatic logging capability. Each machine learning framework supported by MLflow decides what to track automatically for you.

To enable automatic logging, insert the following code before your training code:

mlflow.autolog()

View metrics and artifacts in your workspace

The metrics and artifacts from MLflow logging are tracked in your workspace. You can view and access them in the studio anytime or access them programatically via the MLflow SDK.

To view metrics and artifacts in the studio:

  1. Go to Azure Machine Learning studio.

  2. Navigate to your workspace.

  3. Find the experiment by name in your workspace.

  4. Select the logged metrics to render charts on the right side. You can customize the charts by applying smoothing, changing the color, or plotting multiple metrics on a single graph. You can also resize and rearrange the layout as you wish.

  5. Once you've created your desired view, save it for future use and share it with your teammates, using a direct link.

    Screenshot of the metrics view.

To access or query metrics, parameters, and artifacts programatically via the MLflow SDK, use mlflow.get_run().

import mlflow

run = mlflow.get_run("<RUN_ID>")

metrics = run.data.metrics
params = run.data.params
tags = run.data.tags

print(metrics, params, tags)

Tip

For metrics, the previous example code will only return the last value of a given metric. If you want to retrieve all the values of a given metric, use the mlflow.get_metric_history method. For more information on retrieving values of a metric, see Getting params and metrics from a run.

To download artifacts you've logged, such as files and models, use mlflow.artifacts.download_artifacts().

mlflow.artifacts.download_artifacts(run_id="<RUN_ID>", artifact_path="helloworld.txt")

For more information about how to retrieve or compare information from experiments and runs in Azure Machine Learning, using MLflow, see Query & compare experiments and runs with MLflow.