Migrate from dbx to bundles
Important
Databricks recommends that you use Databricks Asset Bundles instead of dbx
by Databricks Labs. Related articles about dbx
have been retired and might not be updated.
This article describes how to migrate projects for dbx
by Databricks Labs over to Databricks Asset Bundles. See Introduction to dbx by Databricks Labs and What are Databricks Asset Bundles?.
Before you migrate, note the following limitations and feature comparisons between dbx
by Databricks Labs and Databricks Asset Bundles.
Limitations
The following functionality supported in dbx
by Databricks Labs is limited, does not exist, or requires workarounds in Databricks Asset Bundles.
- Building JAR artifacts is not supported in bundles.
- FUSE notation for workspace paths is not supported in bundles (for example,
/Workspace/<path>/<filename>
). However, you can instruct bundles to generate FUSE-style workspace paths during deployments by using notation such as/Workspace/${bundle.file_path}/<filename>
.
Feature comparisons
Before you migrate, note how the following features for dbx
by Databricks Labs are implemented in Databricks Asset Bundles.
Templates and projects
dbx
provide support for Jinja templating. You can include Jinja templates in the deployment configuration and pass environment variables either inline or through a variables file. Although not recommended, dbx
also provides experimental support for custom user functions.
Bundles provide support for Go templates for configuration reuse. Users can create bundles based on prebuilt templates. There is almost full parity for templating, except for custom user functions.
Build management
dbx
provides build support through pip wheel
, Poetry, and Flit. Users can specify the build option in the build
section of a project's deployment.yml
file.
Bundles enable users to build, deploy, and run Python wheel files. Users can leverage the built-in whl
entry in a bundle's databricks.yml
file.
Sync, deploy, and run code
dbx
enables uploading code separately from generating workspace resources such as Azure Databricks jobs.
Bundles always upload code and create or update workspace resources at the same time. This simplifies deployments and avoids blocking conditions for jobs that are already in progress.
Migrate a dbx project to a bundle
After you note the preceding limitations and feature comparisons between dbx
by Databricks Labs and Databricks Asset Bundles, you are ready to migrate from dbx
to bundles.
Databricks recommends that to begin a dbx
project migration, you keep your dbx
project in its original folder and that you have a separate, blank folder into which you copy your original dbx
project's contents. This separate folder will be your new bundle. You could encounter unexpected issues if you begin converting your dbx
project in its original folder to a bundle and then make some mistakes or want to start over from the beginning,
Step 1: Install and set up the Databricks CLI
Databricks Asset Bundles are generally available in Databricks CLI version 0.218.0 and above. If you have already installed and set up Databricks CLI version 0.218.0 or above, skip ahead to Step 2.
Note
Bundles are not compatible with Databricks CLI versions 0.18 and below.
- Install or update to Databricks CLI version 0.218.0 or above. See Install or update the Databricks CLI.
- Set up the Databricks CLI for authentication with your target Azure Databricks workspaces, for example by using Azure Databricks personal access token authentication. For other Azure Databricks authentication types, see Authentication for the Databricks CLI.
Step 2: Create the bundle configuration file
If you are using an IDE such as Visual Studio Code, PyCharm Professional or IntelliJ IDEA Ultimate that provides support for YAML files and JSON schema files, you can use your IDE not only to create the bundle configuration file but to check the file's syntax and formatting and provide code completion hints, as follows.
Visual Studio Code
Add YAML language server support to Visual Studio Code, for example by installing the YAML extension from the Visual Studio Code Marketplace.
Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the
bundle schema
command and redirect the output to a JSON file. For example, generate a file namedbundle_config_schema.json
within the current directory, as follows:databricks bundle schema > bundle_config_schema.json
Use Visual Studio Code to create or open a bundle configuration file within the current directory. By convention, this file is named
databricks.yml
.Add the following comment to the beginning of your bundle configuration file:
# yaml-language-server: $schema=bundle_config_schema.json
Note
In the preceding comment, if your Databricks Asset Bundle configuration JSON schema file is in a different path, replace
bundle_config_schema.json
with the full path to your schema file.Use the YAML language server features that you added earlier. For more information, see your YAML language server's documentation.
PyCharm Professional
Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the
bundle schema
command and redirect the output to a JSON file. For example, generate a file namedbundle_config_schema.json
within the current directory, as follows:databricks bundle schema > bundle_config_schema.json
Configure PyCharm to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.
Use PyCharm to create or open a bundle configuration file. By convention, this file is named
databricks.yml
. As you type, PyCharm checks for JSON schema syntax and formatting and provides code completion hints.
IntelliJ IDEA Ultimate
Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the
bundle schema
command and redirect the output to a JSON file. For example, generate a file namedbundle_config_schema.json
within the current directory, as follows:databricks bundle schema > bundle_config_schema.json
Configure IntelliJ IDEA to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.
Use IntelliJ IDEA to create or open a bundle configuration file. By convention, this file is named
databricks.yml
. As you type, IntelliJ IDEA checks for JSON schema syntax and formatting and provides code completion hints.
Step 3: Convert dbx project settings to databricks.yml
Convert the settings in your dbx
project's .dbx/project.json
file to the equivalent settings in your bundle's databricks.yml
file. For details, see Converting dbx project settings to databricks.yml.
Step 4: Convert dbx deployment settings to databricks.yml
Convert the settings in your dbx
project's conf
folder to the equivalent settings in your bundle's databricks.yml
file. For details, see Converting dbx deployment settings to databricks.yml.
Step 5: Validate the bundle
Before you deploy artifacts or run an Azure Databricks job, a Delta Live Tables pipeline, or an MLOps pipeline, you should make sure that your bundle configuration file is syntactically correct. To do this, run the bundle validate
command from the bundle root:
databricks bundle validate
For information about bundle validate
, see Validate a bundle.
Step 6: Deploy the bundle
To deploy any specified local artifacts to the remote workspace, run the bundle deploy
command from the bundle root. If no command options are specified, the default target declared in the bundle configuration file is used:
databricks bundle deploy
To deploy the artifacts within the context of a specific target, specify the -t
(or --target
) option along with the target's name as declared within the bundle configuration file. For example, for a target declared with the name development
:
databricks bundle deploy -t development
For information about bundle deploy
, see Deploy a bundle.
Tip
You can link bundle-defined jobs and pipelines to existing jobs and pipelines in the Azure Databricks workspace to keep them in sync. See Bind bundle resources.
Step 7: Run the bundle
To run a specific job or pipeline, run the bundle run
command from the bundle root. You must specify the job or pipeline declared within the bundle configuration file. If the -t
option is not specified, the default target as declared within the bundle configuration file is used. For example, to run a job named hello_job
within the context of the default target:
databricks bundle run hello_job
To run a job named hello_job
within the context of a target declared with the name development
:
databricks bundle run -t development hello_job
For information about bundle run
, see Run a bundle.
(Optional) Step 8: Configure the bundle for CI/CD with GitHub
If you use GitHub for CI/CD, you can use GitHub Actions to run the databricks bundle deploy
and databricks bundle run
commands automatically, based on specific GitHub workflow events and other criteria. See Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions.
Converting dbx project settings to databricks.yml
For dbx
, project settings are by default in a file named project.json
in the project's .dbx
folder. See Project file reference.
For bundles, bundle configurations are by default in a file named databricks.yml
within the bundle's root folder. See Databricks Asset Bundle configuration.
For a conf/project.json
file with the following example content:
{
"environments": {
"default": {
"profile": "charming-aurora",
"storage_type": "mlflow",
"properties": {
"workspace_directory": "/Shared/dbx/charming_aurora",
"artifact_location": "/Shared/dbx/projects/charming_aurora"
}
}
},
"inplace_jinja_support": true
}
The corresponding databricks.yml
file is as follows:
bundle:
name: <some-unique-bundle-name>
targets:
default:
workspace:
profile: charming-aurora
root_path: /Shared/dbx/charming_aurora
artifact_path: /Shared/dbx/projects/charming_aurora
resources:
# See an example "resources" mapping in the following section.
The following objects in this example's preceding conf/project.json
file are not supported in databricks.yml
files and have no workarounds:
inplace_jinja_support
storage_type
The following additional allowed objects in conf/project.json
files are not supported in databricks.yml
files and have no workarounds:
enable-context-based-upload-for-execute
enable-failsafe-cluster-reuse-with-assets
Converting dbx deployment settings to databricks.yml
For dbx
, deployment settings are by default in a file within the project's conf
folder. See Deployment file reference. The deployment settings file by default has one of the following file names:
deployment.yml
deployment.yaml
deployment.json
deployment.yml.j2
deployment.yaml.j2
deployment.json.j2
For bundles, deployment settings are by default in a file named databricks.yml
within the bundle's root folder. See Databricks Asset Bundle configuration.
For a conf/deployment.yml
file with the following example content:
build:
python: "pip"
environments:
default:
workflows:
- name: "workflow1"
tasks:
- task_key: "task1"
python_wheel_task:
package_name: "some-pkg"
entry_point: "some-ep"
The corresponding databricks.yml
file is as follows:
bundle:
name: <some-unique-bundle-name>
targets:
default:
workspace:
# See an example "workspace" mapping in the preceding section.
resources:
jobs:
workflow1:
tasks:
- task_key: task1
python_wheel_task:
package_name: some-pkg
entry_point: some-ep
The following object in this example's preceding conf/deployment.yml
file are not supported in databricks.yml
files and have no workarounds:
build
(although see Develop a Python wheel file using Databricks Asset Bundles)
The following additional allowed objects and functionality in conf/deployment.yml
files are not supported in databricks.yml
files and have no workarounds unless otherwise stated:
access_control_list
custom
(use standard YAML anchors instead)deployment_config
- Azure Databricks Jobs 2.0 format (use Jobs 2.1 format instead)
dbx
Jinja features- Name-based properties