Databricks Asset Bundles for MLOps Stacks
You can use Databricks Asset Bundles, the Databricks CLI, and the Databricks MLOps Stack repository on GitHub to create MLOps Stacks. An MLOps Stack is an MLOps project on Azure Databricks that follows production best practices out of the box. See What are Databricks Asset Bundles?.
To create, deploy, and run an MLOps Stacks project, complete the following steps:
Requirements
- Make sure that the target remote workspace has workspace files enabled. See What are workspace files?.
- On your development machine, make sure that Databricks CLI version 0.212.2 or above is installed. To check your installed Databricks CLI version, run the command
databricks -v
. To update your Databricks CLI version, see Install or update the Databricks CLI. (Bundles do not work with Databricks CLI versions 0.18 and below.)
Step 1: Set up authentication
For more information about to set up authentication, see databricks authentication.
Step 2: Create the bundle project
Use Databricks Asset Bundle templates to create your MLOps Stacks project's starter files. To do this, begin by running the following command:
databricks bundle init mlops-stacks
Answer the on-screen prompts. For guidance on answering these prompts, see Start a new project in the Databricks MLOps Stacks repository on GitHub.
The first prompt offers the option of setting up the ML code components, the CI/CD components, or both. This option simplifies the initial setup as you can choose to create only those components that are immediately relevant. (To set up the other components, run the initialization command again.) Select one of the following:
CICD_and_Project
(default) - Set up both ML code and CI/CD components.Project_Only
- Set up ML code components only. This option is for data scientists to get started.CICD_Only
- Set up CI/CD components only. This option is for ML engineers to set up infrastructure.
After you answer all of the on-screen prompts, the template creates your MLOps Stacks project's starter files and adds them to your current working directory.
Customize your MLOps Stacks project's starter files as desired. To do this, follow the guidance in the following files within your new project:
Role Goal Docs First-time users of this repo Understand the ML pipeline and code structure in this repo README.md
Data Scientist Get started writing ML code for a brand new project <project-name>/README.md
Data Scientist Update production ML code (for example, model training logic) for an existing project docs/ml-pull-request.md
Data Scientist Modify production model ML resources (for example, model training or inference jobs) <project-name>/resources/README.md
MLOps / DevOps Set up CI/CD for the current ML project docs/mlops-setup.md
For customizing experiments, the mappings within an experiment declaration correspond to the create experiment operation's request payload as defined in POST /api/2.0/mlflow/experiments/create in the REST API reference, expressed in YAML format.
For customizing jobs, the mappings within a job declaration correspond to the create job operation's request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.
Tip
You can define, combine, and override the settings for new job clusters in bundles by using the techniques described in Override cluster settings in Databricks Asset Bundles.
For customizing models, the mappings within a model declaration correspond to the create model operation's request payload as defined in POST /api/2.0/mlflow/registered-models/create in the REST API reference, expressed in YAML format.
For customizing pipelines, the mappings within a pipeline declaration correspond to the create pipeline operation's request payload as defined in POST /api/2.0/pipelines in the REST API reference, expressed in YAML format.
Step 3: Validate the bundle project
Check whether the bundle configuration is valid. To do this, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle validate
If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, and then repeat this step.
Step 4: Deploy the bundle
Deploy the project's resources and artifacts to the desired remote workspace. To do this, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle deploy -t <target-name>
Replace <target-name>
with the name of the desired target within the databricks.yml
file, for example dev
, test
, staging
, or prod
.
Step 5: Run the deployed bundle
The project's deployed Azure Databricks jobs automatically run on their predefined schedules. To run a deployed job immediately, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle run -t <target-name> <job-name>
- Replace
<target-name>
with the name of the desired target within thedatabricks.yml
file where the job was deployed, for exampledev
,test
,staging
, orprod
. - Replace
<job-name>
with the name of the job in one of the.yml
files within<project-name>/databricks-resources
, for examplebatch_inference_job
,write_feature_table_job
, ormodel_training_job
.
A link to the Azure Databricks job appears, which you can copy into your web browser to open the job within the Azure Databricks UI.
Step 6: Delete the deployed bundle (optional)
To delete a deployed project's resources and artifacts if you no longer need them, run the Databricks CLI from the project's root, where the databricks.yml
is located, as follows:
databricks bundle destroy -t <target-name>
Replace <target-name>
with the name of the desired target within the databricks.yml
file, for example dev
, test
, staging
, or prod
.
Answer the on-screen prompts to confirm the deletion of the previously deployed resources and artifacts.