Library utility (dbutils.library) (legacy)

Note

dbutils.library.install and dbutils.library.installPyPI APIs are removed in Databricks Runtime 11.0 and above. Most library utility commands are deprecated. Most library utilities are not available on Databricks Runtime ML. For information on dbutils.library.restartPython, see Restart the Python process on Azure Databricks.

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.

Databricks strongly recommends using %pip magic commands to install notebook-scoped libraries. See Notebook-scoped Python libraries.

For full documentation for Databricks utilities functionality, see Databricks Utilities (dbutils) reference.

Commands: install, installPyPI, list, restartPython), updateCondaEnv

The library utility allows you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in user defined functions. This enables:

  • Library dependencies of a notebook to be organized within the notebook itself.
  • Notebook users with different library dependencies to share a cluster without interference.

Detaching a notebook destroys this environment. However, you can recreate it by re-running the library install API commands in the notebook. See the restartPython API for how you can reset your notebook state without losing your environment.

Library utilities are enabled by default. Therefore, by default the Python environment for each notebook is isolated by using a separate Python executable that is created when the notebook is attached to and inherits the default Python environment on the cluster. Libraries installed through an init script into the Azure Databricks Python environment are still available. You can disable this feature by setting spark.databricks.libraryIsolation.enabled to false.

This API is compatible with the existing cluster-wide library installation through the UI and Libraries API. Libraries installed through this API have higher priority than cluster-wide libraries.

To list the available commands, run dbutils.library.help().

install(path: String): boolean -> Install the library within the current notebook session
installPyPI(pypiPackage: String, version: String = "", repo: String = "", extras: String = ""): boolean -> Install the PyPI library within the current notebook session
list: List -> List the isolated libraries added for the current notebook session via dbutils
restartPython: void -> Restart python process for the current notebook session
updateCondaEnv(envYmlContent: String): boolean -> Update the current notebook's Conda environment based on the specification (content of environment

install command (dbutils.library.install)

Given a path to a library, installs that library within the current notebook session. Libraries installed by calling this command are available only to the current notebook.

To display help for this command, run dbutils.library.help("install").

This example installs a .egg or .whl library within a notebook.

Important

dbutils.library.install is removed in Databricks Runtime 11.0 and above.

Databricks recommends that you put all your library install commands in the first cell of your notebook and call restartPython at the end of that cell. The Python notebook state is reset after running restartPython; the notebook loses all state including but not limited to local variables, imported libraries, and other ephemeral states. Therefore, we recommend that you install libraries and reset the notebook state in the first notebook cell.

The accepted library sources are dbfs, abfss, adl, and wasbs.

dbutils.library.install("abfss:/path/to/your/library.egg")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.
dbutils.library.install("abfss:/path/to/your/library.whl")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.

Note

You can directly install custom wheel files using %pip. In the following example we are assuming you have uploaded your library wheel file to DBFS:

%pip install /dbfs/path/to/your/library.whl

Egg files are not supported by pip, and wheel files are considered the standard for build and binary packaging for Python. However, if you want to use an egg file in a way that's compatible with %pip, you can use the following workaround:

# This step is only needed if no %pip commands have been run yet.
# It will trigger setting up the isolated notebook environment
%pip install <any-lib>  # This doesn't need to be a real library; for example "%pip install any-lib" would work
import sys
# Assuming the preceding step was completed, the following command
# adds the egg file to the current notebook environment
sys.path.append("/local/path/to/library.egg")

installPyPI command (dbutils.library.installPyPI)

Given a Python Package Index (PyPI) package, install that package within the current notebook session. Libraries installed by calling this command are isolated among notebooks.

To display help for this command, run dbutils.library.help("installPyPI").

This example installs a PyPI package in a notebook. version, repo, and extras are optional. Use the extras argument to specify the Extras feature (extra requirements).

dbutils.library.installPyPI("pypipackage", version="version", repo="repo", extras="extras")
dbutils.library.restartPython()  # Removes Python state, but some libraries might not work without calling this command.

Important

dbutils.library.installPyPI is removed in Databricks Runtime 11.0 and above.

The version and extras keys cannot be part of the PyPI package string. For example: dbutils.library.installPyPI("azureml-sdk[databricks]==1.19.0") is not valid. Use the version and extras arguments to specify the version and extras information as follows:

dbutils.library.installPyPI("azureml-sdk", version="1.19.0", extras="databricks")
dbutils.library.restartPython()  # Removes Python state, but some libraries might not work without calling this command.

Note

When replacing dbutils.library.installPyPI commands with %pip commands, the Python interpreter is automatically restarted. You can run the install command as follows:

%pip install azureml-sdk[databricks]==1.19.0

This example specifies library requirements in one notebook and installs them by using %run in the other. To do this, first define the libraries to install in a notebook. This example uses a notebook named InstallDependencies.

dbutils.library.installPyPI("torch")
dbutils.library.installPyPI("scikit-learn", version="1.19.1")
dbutils.library.installPyPI("azureml-sdk", extras="databricks")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.

Then install them in the notebook that needs those dependencies.

%run /path/to/InstallDependencies # Install the dependencies in the first cell.
import torch
from sklearn.linear_model import LinearRegression
import azureml
...

This example resets the Python notebook state while maintaining the environment. This technique is available only in Python notebooks. For example, you can use this technique to reload libraries Azure Databricks preinstalled with a different version:

dbutils.library.installPyPI("numpy", version="1.15.4")
dbutils.library.restartPython()
# Make sure you start using the library in another cell.
import numpy

You can also use this technique to install libraries such as tensorflow that need to be loaded on process start up:

dbutils.library.installPyPI("tensorflow")
dbutils.library.restartPython()
# Use the library in another cell.
import tensorflow

list command (dbutils.library.list)

Lists the isolated libraries added for the current notebook session through the library utility. This does not include libraries that are attached to the cluster.

To display help for this command, run dbutils.library.help("list").

This example lists the libraries installed in a notebook.

dbutils.library.list()

Note

The equivalent of this command using %pip is:

%pip freeze

updateCondaEnv command (dbutils.library.updateCondaEnv)

Updates the current notebook's Conda environment based on the contents of environment.yml. This method is supported only for Databricks Runtime on Conda.

To display help for this command, run dbutils.library.help("updateCondaEnv").

This example updates the current notebook's Conda environment based on the contents of the provided specification.

dbutils.library.updateCondaEnv(
"""
channels:
  - anaconda
dependencies:
  - gensim=3.4
  - nltk=3.4
""")