Hyperparameter tuning with Optuna
Optuna is an open-source Python library for hyperparameter tuning that can be scaled horizontally across multiple compute resources. Optuna also integrates with MLflow for model and trial tracking and monitoring.
Install Optuna
Use the following commands to install Optuna and its integration module.
%pip install optuna
%pip install optuna-integration # Integration with MLflow
Define search space and run Optuna optimization
Here are the steps in a Optuna workflow:
- Define an objective function to optimize. Within the objective function, define the hyperparameter search space.
- Create an Optuna Study object, and run the tuning algorithm by calling the
optimize
function of the Study object.
Below is a minimal example from the Optuna documentation.
- Define objective function
objective
, and call thesuggest_float
function to define the search space for the parameterx
. - Create a Study, and optimize the
objective
function with 100 trials, i.e., 100 calls of theobjective
function with different values ofx
. - Get the best parameters of the Study
def objective(trial):
x = trial.suggest_float("x", -10, 10)
return (x - 2) ** 2
study = optuna.create_study()
study.optimize(objective, n_trials=100)
best_params = study.best_params
Parallelize Optuna trials to multiple machines
You can distribute Optuna trials to multiple machines in an Azure Databricks cluster with Joblib Apache Spark Backend.
import joblib
from joblibspark import register_spark
register_spark() # register Spark backend for Joblib
with joblib.parallel_backend("spark", n_jobs=-1):
study.optimize(objective, n_trials=100)
Integrate with MLflow
To track hyperparameters and metrics of all the Optuna trials, use the MLflowCallback
of Optuna Integration modules when you call the optimize
function.
import mlflow
from optuna.integration.mlflow import MLflowCallback
mlflow_callback = MLflowCallback(
tracking_uri="databricks",
metric_name="accuracy",
create_experiment=False,
mlflow_kwargs={
"experiment_id": experiment_id
}
)
study.optimize(objective, n_trials=100, callbacks=[mlflow_callback])
Notebook example
This notebook provides an example of using Optuna to select a scikit-learn model and a set of hyperparameters for the Iris dataset.
On top of a single-machine Optuna workflow, the notebook showcases how to
- Parallelize Optuna trials to multiple machines via Joblib
- Track trial runs with MLflow