What are Databricks Asset Bundles?

Databricks Asset Bundles (DABs) are a tool to facilitate the adoption of software engineering best practices, including source control, code review, testing, and continuous integration and delivery (CI/CD), for your data and AI projects. Bundles make it possible to describe Databricks resources such as jobs, pipelines, and notebooks as source files. These source files provide an end-to-end definition of a project, including how it should be structured, tested, and deployed, which makes it easier to collaborate on projects during active development.

Bundles provide a way to include metadata alongside your project's source files. When you deploy a project using bundles, this metadata is used to provision infrastructure and other resources. Your project's collection of source files and metadata is then deployed as a single bundle to your target environment. A bundle includes the following parts:

  • Required cloud infrastructure and workspace configurations
  • Source files, such as notebooks and Python files, that include the business logic
  • Definitions and settings for Databricks resources, such as Azure Databricks jobs, Delta Live Tables pipelines, Model Serving endpoints, MLflow Experiments, and MLflow registered models
  • Unit tests and integration tests

The following diagram provides a high-level view of a development and CI/CD pipeline with bundles:

Databricks Asset Bundles overview

When should I use Databricks Asset Bundles?

Databricks Assets Bundles are an infrastructure-as-code (IaC) approach to managing your Databricks projects. Use them when you want to manage complex projects where multiple contributors and automation are essential, and continuous integration and deployment (CI/CD) are a requirement. Since bundles are defined and managed through YAML templates and files you create and maintain alongside source code, they map well to scenarios where IaC is an appropriate approach.

Some ideal scenarios for bundles include:

  • Develop data, analytics, and ML projects in a team-based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth collaboration and streamlined processes.
  • Iterate on ML problems faster. Manage ML pipeline resources (such as training and batch inference jobs) by using ML projects that follow production best practices from the beginning.
  • Set organizational standards for new projects by authoring custom bundle templates that include default permissions, service principals, and CI/CD configurations.
  • Regulatory compliance: In industries where regulatory compliance is a significant concern, bundles can help maintain a versioned history of code and infrastructure work. This assists in governance and ensures that necessary compliance standards are met.

How do Databricks Asset Bundles work?

Bundle metadata is defined using YAML files that specify the artifacts, resources, and configuration of a Databricks project. You can create this YAML file manually or generate one using a bundle template. The Databricks CLI can then be used to validate, deploy, and run bundles using these bundle YAML files. You can run bundle projects from IDEs, terminals, or within Databricks directly. This article uses the Databricks CLI.

Bundles can be created manually or based on a template. The Databricks CLI provides default templates for simple use cases, but for more specific or complex jobs, you can create custom bundle templates to implement your team's best practices and keep common configurations consistent.

For more details on the configuration YAML used to express Databricks Asset Bundles, see Databricks Asset Bundle configuration.

Configure your environment to use bundles

Use the Databricks CLI to easily deploy bundles from the command line. To install the Databricks CLI, see Install or update the Databricks CLI.

Databricks Asset Bundles are available in Databricks CLI version 0.218.0 or above. To find the version of the Databricks CLI that is installed, run the following command:

databricks --version

After installing the Databricks CLI, verify that your remote Databricks workspaces are configured correctly. Bundles require the workspace files feature to be enabled as this feature supports working with files other than Databricks Notebooks, such as .py and .yml files. If you're using Databricks Runtime version 11.3 LTS or above, this feature is enabled by default.

Authentication

Azure Databricks provides several authentication methods. For more information, see authentication type.

Develop your first Databricks Asset Bundle

The fastest way to start bundle development is by using a bundle project template. Create your first bundle project using the Databricks CLI bundle init command. This command presents a choice of Databricks-provided default bundle templates and asks a series of questions to initialize project variables.

databricks bundle init

Creating your bundle is the first step in the lifecycle of a bundle. The second step is developing your bundle, a key element of which is defining bundle settings and resources in the databricks.yml and resource configuration files. For information about bundle configuration, see Databricks Asset Bundle configuration.

Tip

Bundle configuration examples can be found in Bundle configuration examples and the Bundle examples repository in GitHub.

Next steps