Use scikit-learn on Azure Databricks

This page provides examples of how you can use the scikit-learn package to train machine learning models in Azure Databricks. scikit-learn is one of the most popular Python libraries for single-node machine learning and is included in Databricks Runtime and Databricks Runtime ML. See Databricks Runtime release notes for the scikit-learn library version included with your cluster's runtime.

You can import these notebooks and run them in your Azure Databricks workspace.

For additional example notebooks to get started quickly on Azure Databricks, see Tutorials: Get started with ML.

Basic example using scikit-learn

This notebook provides a quick overview of machine learning model training on Azure Databricks. It uses the scikit-learn package to train a simple classification model. It also illustrates the use of MLflow to track the model development process, and Optuna to automate hyperparameter tuning.

If your workspace is enabled for Unity Catalog, use this version of the notebook:

scikit-learn classification notebook (Unity Catalog)

Get notebook

If your workspace is not enabled for Unity Catalog, use this version of the notebook:

scikit-learn classification notebook

Get notebook

End-to-end example using scikit-learn on Azure Databricks

This notebook uses scikit-learn to illustrate a complete end-to-end example of loading data, model training, distributed hyperparameter tuning, and model inference. It also illustrates model lifecycle management using MLflow Model Registry to log and register your model.

If your workspace is enabled for Unity Catalog, use this version of the notebook:

Use scikit-learn with MLflow integration on Databricks (Unity Catalog)

Get notebook

If your workspace is not enabled for Unity Catalog, use this version of the notebook:

Use scikit-learn with MLflow integration on Databricks

Get notebook