bundle
command group
Note
This information applies to Databricks CLI versions 0.205 and above. The Databricks CLI is in Public Preview.
Databricks CLI use is subject to the Databricks License and Databricks Privacy Notice, including any Usage Data provisions.
The bundle
command group within the Databricks CLI enables you to programmatically validate, deploy, and run Azure Databricks workflows such as Azure Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks. See What are Databricks Asset Bundles?.
You run bundle
commands by appending them to databricks bundle
. To display help for the bundle
command, run databricks bundle -h
.
Create a bundle from a project template
To create a Databricks Asset Bundle using the default Databricks Asset Bundle template for Python, run the bundle init
command as follows, and then answer the on-screen prompts:
databricks bundle init
To create a Databricks Asset Bundle using a custom Databricks Asset Bundle template, run the bundle init
command as follows:
databricks bundle init <project-template-local-path-or-url> \
--project-dir="</local/path/to/project/template/output>"
See also:
- Databricks Asset Bundle project templates
- Develop a job on Azure Databricks using Databricks Asset Bundles
- Develop Delta Live Tables pipelines with Databricks Asset Bundles
- Databricks Asset Bundles for MLOps Stacks
Display the bundle configuration schema
To display the bundle configuration schema, run the bundle schema
command, as follows:
databricks bundle schema
To output the Databricks Asset Bundle configuration schema as a JSON file, run the bundle schema
command and redirect the output to a JSON file. For example, you can generate a file named bundle_config_schema.json
within the current directory, as follows:
databricks bundle schema > bundle_config_schema.json
Validate a bundle
To validate that your bundle configuration files are syntactically correct, run the bundle validate
command from the bundle project root, as follows:
databricks bundle validate
By default this command returns a summary of the bundle identity:
Name: MyBundle
Target: dev
Workspace:
Host: https://my-host.cloud.databricks.com
User: someone@example.com
Path: /Users/someone@example.com/.bundle/MyBundle/dev
Validation OK!
Note
The bundle validate
command outputs warnings if resource properties are defined in the bundle configuration files that are not found in the corresponding object's schema.
If you only want to output a summary of the bundle's identity and resources, use bundle summary.
Sync a bundle's tree to a workspace
Use the bundle sync
command to do one-way synchronization of a bundle's file changes within a local filesystem directory, to a directory within a remote Azure Databricks workspace.
Note
bundle sync
commands cannot synchronize file changes from a directory within a remote Azure Databricks workspace, back to a directory within a local filesystem.
databricks bundle sync
commands work in the same way as databricks sync
commands and are provided as a productivity convenience. For command usage information, see sync command group.
Generate a bundle configuration file
You can use the bundle generate
command to generate resource configuration for a job, pipeline, or dashboard that already exists in your Databricks workspace. This command generates a *.yml
file for the job, pipeline, or dashboard in the resources
folder of the bundle project and also downloads any files, such as notebooks, referenced in the configuration.
Generate job or pipeline configuration
Important
The bundle generate
command is provided as a convenience to autogenerate resource configuration. However, when this job or pipeline configuration is included in the bundle and deployed, it creates a new resource and does not update the existing resource unless bundle deployment bind
has first been used. See Bind bundle resources.
To generate configuration for a job or pipeline, run the bundle generate
command as follows:
databricks bundle generate [job|pipeline] --existing-[job|pipeline]-id [job-id|pipeline-id]
Note
Currently, only jobs with notebook tasks are supported by this command.
For example, the following command generates a new hello_job.yml
file in the resources
bundle project folder containing the YAML below, and downloads the simple_notebook.py
to the src
project folder.
databricks bundle generate job --existing-job-id 6565621249
# This is the contents of the resulting hello_job.yml file.
resources:
jobs:
6565621249:
name: Hello Job
format: MULTI_TASK
tasks:
- task_key: run_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
notebook_path: ./src/simple_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
Generate dashboard configuration
To generate configuration for an existing dashboard in the workspace, run bundle generate
, specifying either the ID or workspace path for the dashboard:
databricks bundle generate dashboard --existing-id [dashboard-id]
databricks bundle generate dashboard --existing-path [dashboard-workspace-path]
You can copy the workspace path for a dashboard from the workspace UI.
For example, the following command generates a new baby_gender_by_county.dashboard.yml
file in the resources
bundle project folder containing the YAML below, and downloads the baby_gender_by_county.lvdash.json
file to the src
project folder.
databricks bundle generate dashboard --existing-path "/Workspace/Users/someone@example.com/baby_gender_by_county.lvdash.json"
# This is the contents of the resulting baby_gender_by_county.dashboard.yml file.
resources:
dashboards:
baby_gender_by_county:
display_name: "Baby gender by county"
warehouse_id: aae11o8e6fe9zz79
file_path: ../src/baby_gender_by_county.lvdash.json
Tip
To update the .lvdash.json
file after you have already deployed a dashboard, use the --resource
option when you run bundle generate dashboard
to generate that file for the existing dashboard resource. To continuously poll and retrieve updates to a dashboard, use the --force
and --watch
options.
Bind bundle resources
The bundle deployment bind
command allows you to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Azure Databricks workspace so that they become managed by Databricks Asset Bundles. If you bind a resource, existing Azure Databricks resources in the workspace are updated based on the configuration defined in the bundle it is bound to after the next bundle deploy
.
Tip
It's a good idea to confirm the bundle in the workspace before running bind.
databricks bundle deployment bind [resource-key] [resource-id]
For example, the following command binds the resource hello_job
to its remote counterpart in the workspace. The command outputs a diff and allows you to deny the resource binding, but if confirmed, any updates to the job definition in the bundle are applied to the corresponding remote job when the bundle is next deployed.
databricks bundle deployment bind hello_job 6565621249
Use bundle deployment unbind
if you want to remove the link between the job or pipeline in a bundle and its remote counterpart in a workspace.
databricks bundle deployment unbind [resource-key]
Output a bundle summary
The bundle summary
command outputs a summary of a bundle's identity and resources, including deep links for resources so that you can easily navigate to the resource in the Databricks workspace.
databricks bundle summary
The following example output is the summary of a bundle named my_pipeline_bundle
that defines a job and a pipeline:
Name: my_pipeline_bundle
Target: dev
Workspace:
Host: https://myworkspace.cloud.databricks.com
User: someone@example.com
Path: /Users/someone@example.com/.bundle/my_pipeline/dev
Resources:
Jobs:
my_project_job:
Name: [dev someone] my_project_job
URL: https://myworkspace.cloud.databricks.com/jobs/206000809187888?o=6051000018419999
Pipelines:
my_project_pipeline:
Name: [dev someone] my_project_pipeline
URL: https://myworkspace.cloud.databricks.com/pipelines/7f559fd5-zztz-47fa-aa5c-c6bf034b4f58?o=6051000018419999
Tip
You can also use bundle open
to navigate to a resource in the Databricks workspace. See Open a bundle resource.
Deploy a bundle
To deploy a bundle to the remote workspace, run the bundle deploy
command from the bundle project root. If no command options are specified, the default target as declared within the bundle configuration files is used.
databricks bundle deploy
To deploy the bundle to a specific target, set the -t
(or --target
) option along with the target's name as declared within the bundle configuration files. For example, for a target declared with the name dev
:
databricks bundle deploy -t dev
A bundle can be deployed to multiple workspaces, such as development, staging, and production workspaces. Fundamentally, the root_path
property is what determines a bundle's unique identity, which defaults to ~/.bundle/${bundle.name}/${bundle.target}
. Therefore by default, a bundle's identity is comprised of the identity of the deployer, the bundle's name, and the bundle's target name. If these are identical across different bundles, deployment of these bundles will interfere with one another.
Furthermore, a bundle deployment tracks the resources it creates in the target workspace by their IDs as a state that is stored in the workspace file system. Resource names are not used to correlate between a bundle deployment and a resource instance, so:
- If a resource in the bundle configuration does not exist in the target workspace, it is created.
- If a resource in the bundle configuration exists in the target workspace, it is updated in the workspace.
- If a resource is removed from the bundle configuration, it is removed from the target workspace if it was previously deployed.
- A resource's association with a bundle can only be forgotten if you change the bundle name, the bundle target, or the workspace. You can run
bundle validate
to output a summary containing these values.
Run a job or pipeline
To run a specific job or pipeline, use the bundle run
command. You must specify the resource key of the job or pipeline declared within the bundle configuration files. By default, the environment declared within the bundle configuration files is used. For example, to run a job hello_job
in the default environment, run the following command:
databricks bundle run hello_job
To run a job with a key hello_job
within the context of a target declared with the name dev
:
databricks bundle run -t dev hello_job
If you want to do a pipeline validation run, use the --validate-only
option, as shown in the following example:
databricks bundle run --validate-only my_pipeline
To pass job parameters, use the --params
option, followed by comma-separated key-value pairs, where the key is the parameter name. For example, the following command sets the parameter with the name message
to HelloWorld
for the job hello_job
:
databricks bundle run --params message=HelloWorld hello_job
Note
You can pass parameters to job tasks using the job task options, but the --params
option is the recommended method for passing job parameters. An error occurs if job parameters are specified for a job that doesn't have job parameters defined or if task parameters are specified for a job that has job parameters defined.
To cancel and restart an existing job run or pipeline update, use the --restart
option:
databricks bundle run --restart hello_job
Open a bundle resource
To navigate to a bundle resource in the workspace, run the bundle open
command from the bundle project root, specifying the resource to open. If a resource key is not specified, this command outputs a list of the bundle's resources from which to choose.
databricks bundle open [resource-key]
For example, the following command launches a browser and navigates to the baby_gender_by_county dashboard in the bundle in the Databricks workspace that is configured for the bundle:
databricks bundle open baby_gender_by_county
Destroy a bundle
Warning
Destroying a bundle permanently deletes a bundle's previously-deployed jobs, pipelines, and artifacts. This action cannot be undone.
To delete jobs, pipelines, and artifacts that were previously deployed, run the bundle destroy
command. The following command deletes all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files:
databricks bundle destroy
Note
A bundle's identity is comprised of the bundle name, the bundle target, and the workspace. If you have changed any of these and then attempt to destroy a bundle prior to deploying, an error will occur.
By default, you are prompted to confirm permanent deletion of the previously-deployed jobs, pipelines, and artifacts. To skip these prompts and perform automatic permanent deletion, add the --auto-approve
option to the bundle destroy
command.