Databricks Asset Bundles library dependencies
This article describes the syntax for declaring Databricks Asset Bundles library dependencies. Bundles enable programmatic management of Azure Databricks workflows. See What are Databricks Asset Bundles?.
In addition to notebooks, your Azure Databricks jobs will likely depend on libraries in order to work as expected. Databricks Asset Bundles dependencies for local development are specified in the requirements*.txt
file at the root of the bundle project, but job task library dependencies are declared in your bundle configuration files and are often necessary as part of the job task type specification.
Bundles provide support for the following library dependencies for Azure Databricks jobs:
- Python wheel file
- JAR file (Java or Scala)
- PyPI, Maven, or CRAN packages
Note
Whether or not a library is supported depends on the cluster configuration for the job and the library source. For complete library support information, see Libraries.
To add a Python wheel file to a job task, in libraries
specify a whl
mapping for each library to be installed. You can install a wheel file from workspace files, Unity Catalog volumes, cloud object storage, or a local file path.
Important
Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in an Azure Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.
Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.
The following example shows how to install three Python wheel files for a job task.
- The first Python wheel file was either previously uploaded to the Azure Databricks workspace or added as an
include
item in thesync
mapping, and is in the same local folder as the bundle configuration file. - The second Python wheel file is in the specified workspace files location in the Azure Databricks workspace.
- The third Python wheel file was previously uploaded to the volume named
my-volume
in the Azure Databricks workspace.
resources:
jobs:
my_job:
# ...
tasks:
- task_key: my_task
# ...
libraries:
- whl: ./my-wheel-0.1.0.whl
- whl: /Workspace/Shared/Libraries/my-wheel-0.0.1-py3-none-any.whl
- whl: /Volumes/main/default/my-volume/my-wheel-0.1.0.whl
To add a JAR file to a job task, in libraries
specify a jar
mapping for each library to be installed. You can install a JAR from workspace files, Unity Catalog volumes, cloud object storage, or a local file path.
Important
Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in an Azure Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.
Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.
The following example shows how to install a JAR file that was previously uploaded to the volume named my-volume
in the Azure Databricks workspace.
resources:
jobs:
my_job:
# ...
tasks:
- task_key: my_task
# ...
libraries:
- jar: /Volumes/main/default/my-volume/my-java-library-1.0.jar
To add a PyPI package to a job task definition, in libraries
, specify a pypi
mapping for each PyPI package to be installed. For each mapping, specify the following:
- For
package
, specify the name of the PyPI package to install. An optional exact version specification is also supported. - Optionally, for
repo
, specify the repository where the PyPI package can be found. If not specified, the defaultpip
index is used (https://pypi.org/simple/).
The following example shows how to install two PyPI packages.
- The first PyPI package uses the specified package version and the default
pip
index. - The second PyPI package uses the specified package version and the explicitly specified
pip
index.
resources:
jobs:
my_job:
# ...
tasks:
- task_key: my_task
# ...
libraries:
- pypi:
package: wheel==0.41.2
- pypi:
package: numpy==1.25.2
repo: https://pypi.org/simple/
To add a Maven package to a job task definition, in libraries
, specify a maven
mapping for each Maven package to be installed. For each mapping, specify the following:
- For
coordinates
, specify the Gradle-style Maven coordinates for the package. - Optionally, for
repo
, specify the Maven repo to install the Maven package from. If omitted, both the Maven Central Repository and the Spark Packages Repository are searched. - Optionally, for
exclusions
, specify any dependencies to explicitly exclude. See Maven dependency exclusions.
The following example shows how to install two Maven packages.
- The first Maven package uses the specified package coordinates and searches for this package in both the Maven Central Repository and the Spark Packages Repository.
- The second Maven package uses the specified package coordinates, searches for this package only in the Maven Central Repository, and does not include any of this package's dependencies that match the specified pattern.
resources:
jobs:
my_job:
# ...
tasks:
- task_key: my_task
# ...
libraries:
- maven:
coordinates: com.databricks:databricks-sdk-java:0.8.1
- maven:
coordinates: com.databricks:databricks-dbutils-scala_2.13:0.1.4
repo: https://mvnrepository.com/
exclusions:
- org.scala-lang:scala-library:2.13.0-RC*