Hyperparameter tuning with Optuna

07/12/2025

Optuna is an open-source Python library for hyperparameter tuning that can be scaled horizontally across multiple compute resources.

MLflow 3.0 introduces powerful new capabilities for hyperparameter optimization by integrating with Optuna.

MlflowStorage class allows Optuna to use the MLflow Tracking Server as its storage backend.
MlflowSparkStudy class enables launching parallel Optuna studies using PySpark executors.

Install Optuna

MLflow 3.0 is pre-installed in Databricks Runtime 17.0 ML and above. On older runtimes, use the following commands to install the latest version of Optuna and MLFlow.

%pip install mlflow --upgrade
%pip install optuna

Run Optuna optimization in parallel

Here are the steps in a Optuna workflow:

Define an objective function to optimize. Within the objective function, define the hyperparameter search space. For more details, see Optuna documentation.

Below is an example for model selection and hyperparameter tuning with sckit-learn. The example defines the objective function objective, and calls the suggest_float function to define the search space for the parameter x.

import sklearn

def objective(trial):
    # Invoke suggest methods of a Trial object to generate hyperparameters.
    regressor_name = trial.suggest_categorical('classifier', ['SVR', 'RandomForest'])
    if regressor_name == 'SVR':
        svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True)
        regressor_obj = sklearn.svm.SVR(C=svr_c)
    else:
        rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32)
        regressor_obj = sklearn.ensemble.RandomForestRegressor(max_depth=rf_max_depth)

    X, y = sklearn.datasets.fetch_california_housing(return_X_y=True)
    X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)

    regressor_obj.fit(X_train, y_train)
    y_pred = regressor_obj.predict(X_val)

    error = sklearn.metrics.mean_squared_error(y_val, y_pred)

    return error  # An objective value linked with the Trial object

Create a shared storage for distributed optimization. With MlflowStorage, you can use MLflow Tracking Server as the storage backend.

import mlflow
from mlflow.optuna.storage import MlflowStorage

experiment_id = mlflow.get_experiment_by_name(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()).experiment_id

mlflow_storage = MlflowStorage(experiment_id=experiment_id)

Create an Optuna Study object, and run the tuning algorithm by calling the optimize function of the Study object. MlflowSparkStudy can run launching parallel Optuna studies using PySpark executors.

Below is an example from the Optuna documentation.
- Create a Study, and optimize the objective function with 8 trials (8 calls of the objective function with different values of x).
- Get the best parameters of the Study
```
from mlflow.pyspark.optuna.study import MlflowSparkStudy

mlflow_study = MlflowSparkStudy(
    study_name="spark-mlflow-tuning",
    storage=mlflow_storage,
)

mlflow_study.optimize(objective, n_trials=8, n_jobs=4)

best_params = study.best_params
```

Notebook example

This notebook provides an example of using Optuna to select a scikit-learn model and a set of hyperparameters for the Iris dataset.

Scaling up hyperparameter tuning with Optuna and MLflow

Get notebook

MLFlow Optuna Integration API

MlflowStorage

MlflowStorage is a MLflow-based storage class for Optuna with batch processing to avoid REST API throttling.

Class Parameter name	Type	Description
`experiment_id`	`str`	MLflow experiment ID for the storage
`name`	`str`	Name of the storage
`batch_flush_interval`	`float`	Time in seconds between automatic batch flushes (default: 1.0)
`batch_size_threshold`	`float`	Maximum number of items in batch before triggering a flush (default: 100)

MlflowSparkStudy

MlflowSparkStudy is a wrapper of the class ~optuna.study.Study to incorporate Optuna with Spark via MLflow experiment.

Class Parameter name	Type	Description
`study_name`	`str`	Name of the study
`storage`	`mlflow.optuna.MlflowStorage`	MLflow-based storage class
`sampler`	`samplers.BaseSampler`	A sampler object that implements background algorithm for value suggestion. `optuna.samplers.TPESampler` is used as the default.
`pruner`	`float`	A pruner object that decides early stopping of unpromising trials. `optuna.pruners.MedianPruner` is used as the default.