Databricks Asset Bundles resources

Databricks Asset Bundles allows you to specify information about the Azure Databricks resources used by the bundle in the resources mapping in the bundle configuration. See resources mapping.

This article outlines supported resource types for bundles and provides details and examples for each supported type.

Supported resources

The following table lists supported resource types for bundles. Some resources can be created by defining them in a bundle and deploying the bundle, and some resources only support referencing an existing resource to include in the bundle.

Resources are defined using the corresponding Databricks REST API object's create operation request payload, where the object's supported fields, expressed as YAML, are the resource's supported properties. Links to documentation for each resource's corresponding payloads are listed in the table.

Tip

The databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.

Resource Create support Resource properties
cluster Cluster properties: POST /api/2.1/clusters/create
dashboard Dashboard properties: POST /api/2.0/lakeview/dashboards
experiment Experiment properties: POST /api/2.0/mlflow/experiments/create
job Job properties: POST /api/2.1/jobs/create
model (legacy) Model properties: POST /api/2.0/mlflow/registered-models/create
model_serving_endpoint Model serving endpoint properties: POST /api/2.0/serving-endpoints
pipeline Pipeline properties: POST /api/2.0/pipelines
quality_monitor Quality monitor properties: POST /api/2.1/unity-catalog/tables/{table_name}/monitor
registered_model (Unity Catalog) Unity Catalog model properties: POST /api/2.1/unity-catalog/models
schema (Unity Catalog) Unity Catalog schema properties: POST /api/2.1/unity-catalog/schemas
volume (Unity Catalog) Unity Catalog volume properties: POST /api/2.1/unity-catalog/volumes

cluster

The cluster resource allows you to create all-purpose clusters. The following example creates a cluster named my_cluster and sets that as the cluster to use to run the notebook in my_job:

bundle:
  name: clusters

resources:
  clusters:
    my_cluster:
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    my_job:
      tasks:
        - task_key: test_task
          notebook_task:
            notebook_path: "./src/my_notebook.py"

dashboard

The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.

The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.

resources:
  dashboards:
    nyc_taxi_trip_analysis:
      display_name: "NYC Taxi Trip Analysis"
      file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
      warehouse_id: ${var.warehouse_id}

If you use the UI to modify the dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See Generate a bundle configuration file.

In addition, if you attempt to deploy a bundle that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See Deploy a bundle.

experiment

The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see MLflow experiments.

The following example defines an experiment that all users can view:

resources:
  experiments:
    experiment:
      name: my_ml_experiment
      permissions:
        - level: CAN_READ
          group_name: users
      description: MLflow experiment used to track runs

job

The job resource allows you to define jobs and their corresponding tasks in your bundle. For information about jobs, see Schedule and orchestrate workflows. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job on Azure Databricks using Databricks Asset Bundles.

The following example defines a job with the resource key hello-job with one notebook task:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          notebook_task:
            notebook_path: ./hello.py

For information about defining job tasks and overriding job settings, see Add tasks to jobs in Databricks Asset Bundles, Override job tasks settings in Databricks Asset Bundles, and Override cluster settings in Databricks Asset Bundles.

model_serving_endpoint

The model_serving_endpoint resource allows you to define model serving endpoints.

The following example defines an Unity Catalog model serving endpoint:

resources:
  model_serving_endpoints:
    uc_model_serving_endpoint:
      name: "uc-model-endpoint"
      config:
        served_entities:
        - entity_name: "myCatalog.mySchema.my-ads-model"
          entity_version: "10"
          workload_size: "Small"
          scale_to_zero_enabled: "true"
        traffic_config:
          routes:
          - served_model_name: "my-ads-model-10"
            traffic_percentage: "100"
      tags:
      - key: "team"
        value: "data science"

quality_monitor (Unity Catalog)

The quality_monitor resource allows you to define a Unity Catalog table monitor.

The following example defines a quality monitor:

resources:
  quality_monitors:
    my_quality_monitor:
      table_name: dev.mlops_schema.predictions
      output_schema_name: ${bundle.target}.mlops_schema
      assets_dir: /Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
      inference_log:
        granularities: [1 day]
        model_id_col: model_id
        prediction_col: prediction
        label_col: price
        problem_type: PROBLEM_TYPE_REGRESSION
        timestamp_col: timestamp
      schedule:
        quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
        timezone_id: UTC

registered_model (Unity Catalog)

The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.

The following example defines a registered model in Unity Catalog:

resources:
  registered_models:
      model:
        name: my_model
        catalog_name: ${bundle.target}
        schema_name: mlops_schema
        comment: Registered model in Unity Catalog for ${bundle.target} deployment target
        grants:
          - privileges:
              - EXECUTE
            principal: account users

pipeline

The pipeline resource allows you to create Delta Live Tables pipelines. For information about pipelines, see What is Delta Live Tables?. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop Delta Live Tables pipelines with Databricks Asset Bundles.

The following example defines a pipeline with the resource key hello-pipeline:

resources:
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

schema (Unity Catalog)

The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:

  • The owner of a schema resource is always the deployment user, and cannot be changed. If run_as is specified in the bundle, it will be ignored by operations on the schema.
  • Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example, enable_predictive_optimization is not supported as it is only available on the update API.

The following example defines a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:

resources:
  pipelines:
    my_pipeline:
      name: test-pipeline-{{.unique_id}}
      libraries:
        - notebook:
            path: ./nb.sql
      development: true
      catalog: main
      target: ${resources.schemas.my_schema.id}

  schemas:
    my_schema:
      name: test-schema-{{.unique_id}}
      catalog_name: main
      comment: This schema was created by DABs.

A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas mapping. For more information about grants, see Show, grant, and revoke privileges.

The following example defines a Unity Catalog schema with grants:

resources:
  schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - CAN_MANAGE
        - principal: my_team
          privileges:
            - CAN_READ
      catalog_name: main

volume (Unity Catalog)

The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:

  • A volume cannot be referenced in the artifact_path for the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in the artifact_path in subsequent deployments.
  • Volumes in the bundle are not prepended with the dev_${workspace.current_user.short_name} prefix when the deployment target has mode: development configured. However, you can manually configure this prefix. See Custom presets.

The following example creates a Unity Catalog volume with the key my_volume:

resources:
  volumes:
    my_volume:
      catalog_name: main
      name: my_volume
      schema_name: my_schema