Install libraries from a package repository

Azure Databricks provides tools to install libraries from PyPI, Maven, and CRAN package repositories. See Cluster-scoped libraries for full library compatibility details.

Important

Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in an Azure Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.

Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.

PyPI package

  1. In the Library Source button list, select PyPI.

  2. Enter a PyPI package name. To install a specific version of a library, use this format for the library: <library>==<version>. For example, scikit-learn==0.19.1.

    Note

    For jobs, Databricks recommends that you specify a library version to ensure a reproducible environment. If the library version is not fully specified, Databricks uses the latest matching version. This means that different runs of the same job might use different library versions as new versions are published. Specifying the library version prevents new, breaking changes in libraries from breaking your jobs.

  3. (Optional) In the Index URL field enter a PyPI index URL.

  4. Click Install.

Maven or Spark package

Important

To install Maven libraries on compute configured with shared access mode, you must add the coordinates to the allowlist. See Allowlist libraries and init scripts on shared compute.

Important

For DBR 14.3 LTS and below, Databricks uses Apache Ivy 2.4.0 to resolve Maven packages. For DBR 15.0 and above, Databricks uses Ivy 2.5.1 or greater and the specific Ivy version is listed in Databricks Runtime release notes versions and compatibility.

The installation order of Maven packages may affect the final dependency tree, which can impact the order in which libraries are loaded.

  1. In the Library Source button list, select Maven.

  2. Specify a Maven coordinate. Do one of the following:

    • In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form groupId:artifactId:version; for example, com.databricks:spark-avro_2.10:1.0.0.
    • If you don't know the exact coordinate, enter the library name and click Search Packages. A list of matching packages displays. To display details about a package, click its name. You can sort packages by name, organization, and rating. You can also filter the results by writing a query in the search bar. The results refresh automatically.
      1. Select Maven Central or Spark Packages in the drop-down list at the top left.
      2. Optionally select the package version in the Releases column.
      3. Click + Select next to a package. The Coordinate field is filled in with the selected package and version.
  3. (Optional) In the Repository field, you can enter a Maven repository URL.

    Note

    Internal Maven repositories are not supported.

  4. In the Exclusions field, optionally provide the groupId and the artifactId of the dependencies that you want to exclude (for example, log4j:log4j).

    Note

    Maven works by using the closest-to-root version, and in the case of two packages vying for versions with different dependencies, the order matters, so it may fail when the package with an older dependency gets loaded first.

    To work around this, exclude the conflicting library. For example, when installing the package with the coordinate com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22, set the Exclusions field to com.nimbusds:oauth2-oidc-sdk:RELEASE so the latest version of eventhubs from MSAL4J is loaded and the eventhubs dependency is satisfied.

  5. Click Install.

CRAN package

  1. In the Library Source button list, select CRAN.

  2. In the Package field, enter the name of the package.

  3. (Optional) In the Repository field, you can enter the CRAN repository URL.

  4. Click Install.

    Note

    CRAN mirrors serve the latest version of a library. As a result, you may end up with different versions of an R package if you attach the library to different clusters at different times. To learn how to manage and fix R package versions on Databricks, see the Knowledge Base.