Track model development using MLflow

06/16/2025

MLflow tracking lets you log notebooks and training datasets, parameters, metrics, tags, and artifacts related to training a machine learning or deep learning model. For an example notebook to get started with MLflow, see Tutorial: End-to-end classic ML models on Azure Databricks.

MLflow tracking with experiments and runs

The model development process is iterative, and it can be challenging to keep track of your work as you develop and optimize a model. In Azure Databricks, you can use MLflow tracking to help you keep track of the model development process, including parameter settings or combinations you have tried and how they affected the model's performance.

MLflow tracking uses experiments and runs to log and track your ML and deep learning model development. A run is a single execution of model code. During an MLflow run, you can log model parameters and results. An experiment is a collection of related runs. In an experiment, you can compare and filter runs to understand how your model performs and how its performance depends on the parameter settings, input data, and so on.

Note

Starting March 27, 2024, MLflow imposes a quota limit on the number of total parameters, tags, and metric steps for all existing and new runs, and the number of total runs for all existing and new experiments, see Resource limits. If you hit the runs per experiment quota, Databricks recommends you delete runs that you no longer need using the delete runs API in Python. If you hit other quota limits, Databricks recommends adjusting your logging strategy to keep under the limit. If you require an increase to this limit, reach out to your Databricks account team with a brief explanation of your use case, why the suggested mitigation approaches do not work, and the new limit you request.

MLflow tracking API

The MLflow Tracking API logs parameters, metrics, tags, and artifacts from a model run. The Tracking API communicates with an MLflow tracking server. When you use Databricks, a Databricks-hosted tracking server logs the data. The hosted MLflow tracking server has Python, Java, and R APIs.

MLflow is pre-installed on Databricks Runtime ML clusters. To use MLflow on a Databricks Runtime cluster, you must install the mlflow library. For instructions on installing a library onto a cluster, see Install a library on a cluster.

Where MLflow runs are logged

All MLflow runs are logged to the active experiment, which can be set using any of the following ways:

Use the mlflow.set_experiment() command.
Use the experiment_id parameter in the mlflow.start_run() command.
Set one of the MLflow environment variables MLFLOW_EXPERIMENT_NAME or MLFLOW_EXPERIMENT_ID.

If no active experiment is set, runs are logged to the notebook experiment.

To log your experiment results to a remotely hosted MLflow Tracking server in a workspace other than the one in which you are running your experiment, set the tracking URI to reference the remote workspace with mlflow.set_tracking_uri(), and set the path to your experiment in the remote workspace by using mlflow.set_experiment().

mlflow.set_tracking_uri(<uri-of-remote-workspace>)
mlflow.set_experiment("path to experiment in remote workspace")

If you are running experiments locally and want to log experiment results to the Databricks MLflow Tracking server, provide your Databricks workspace instance (DATABRICKS_HOST) and Databricks personal access token (DATABRICKS_TOKEN). Next, you can set the tracking URI to reference the workspace with mlflow.set_tracking_uri(), and set the path to your experiment by using mlflow.set_experiment(). See Perform Azure Databricks personal access token authentication for details on where to find values for the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables.

The following code example demonstrates setting these values:


os.environ["DATABRICKS_HOST"] = "https://dbc-1234567890123456.cloud.databricks.com" # set to your server URI
os.environ["DATABRICKS_TOKEN"] = "dapixxxxxxxxxxxxx"

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/your-experiment")

Log runs to an experiment

MLflow can automatically log training code written in many machine learning and deep learning frameworks. This is the easiest way to get started using MLflow tracking. See the example notebook.

For more control over which parameters and metrics are logged, or to log additional artifacts such as CSV files or plots, use the MLflow logging API. See the example notebook.

Use autologging to track model development

This example notebook shows how to use autologging with scikit-learn. For information about autologging with other Python libraries, see the MLflow autologging documentation.

MLflow autologging Python notebook

Get notebook

Use the logging API to track model development

This example notebook shows how to use the Python logging API. MLflow also has REST, R, and Java APIs.

MLflow logging API Python notebook

Get notebook

Log runs to a workspace experiment

By default, when you train a model in a Databricks notebook, runs are logged to the notebook experiment. Only MLflow runs initiated within a notebook can be logged to the notebook experiment.

MLflow runs launched from any notebook or from the APIs can be logged to a workspace experiment. To log runs to a workspace experiment, use code similar to the following in your notebook or API call:

experiment_name = "/Shared/name_of_experiment/"
mlflow.set_experiment(experiment_name)

For instructions on creating a workspace experiment, see Create workspace experiment. For information about viewing logged runs, see View notebook experiment and View workspace experiment.

Analyze MLflow runs programmatically

You can access MLflow run data programmatically using the following two DataFrame APIs:

The MLflow Python client search_runs API returns a pandas DataFrame.
The MLflow experiment data source returns an Apache Spark DataFrame.

This example demonstrates how to use the MLflow Python client to build a dashboard that visualizes changes in evaluation metrics over time, tracks the number of runs started by a specific user, and measures the total number of runs across all users:

Build dashboards with the MLflow Search API

Why model training metrics and outputs may vary

Many of the algorithms used in ML have a random element, such as sampling or random initial conditions within the algorithm itself. When you train a model using one of these algorithms, the results might not be the same with each run, even if you start the run with the same conditions. Many libraries offer a seeding mechanism to fix the initial conditions for these stochastic elements. However, there may be other sources of variation that are not controlled by seeds. Some algorithms are sensitive to the order of the data, and distributed ML algorithms may also be affected by how the data is partitioned. Generally this variation is not significant and not important in the model development process.

To control variation caused by differences in ordering and partitioning, use the PySpark functions repartition and sortWithinPartitions.

MLflow tracking examples

The following notebooks demonstrate how to track model development using MLflow.