Databricks Asset Bundles resources

Databricks Asset Bundles allows you to specify information about the Azure Databricks resources used by the bundle in the resources mapping in the bundle configuration. See resources mapping and resources key reference.

This article outlines supported resource types for bundles and provides details and an example for each supported type. For additional examples, see Bundle configuration examples.

Tip

To generate YAML for any existing resource, use the databricks bundle generate command. See Generate a bundle configuration file.

Supported resources

The following table lists supported resource types for bundles. Some resources can be created by defining them in a bundle and deploying the bundle, and some resources can only be created by referencing an existing asset to include in the bundle.

Resources are defined using the corresponding Databricks REST API object's create operation request payload, where the object's supported fields, expressed as YAML, are the resource's supported properties. Links to documentation for each resource's corresponding payloads are listed in the table.

Tip

The databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.

cluster

Type: Map

The cluster resource defines a cluster.

clusters:
  <cluster-name>:
    <cluster-field-name>: <cluster-field-value>
Key Type Description
apply_policy_default_values Boolean When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.
autoscale Map Parameters needed in order to automatically scale clusters up and down based on load. See autoscale.
autotermination_minutes Integer Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.
aws_attributes Map Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. See aws_attributes.
azure_attributes Map Attributes related to clusters running on Azure. If not specified at cluster creation, a set of default values will be used. See azure_attributes.
cluster_log_conf Map The configuration for delivering spark logs to a long-term storage destination. See cluster_log_conf.
cluster_name String Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string.
custom_tags Map Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. See custom_tags.
data_security_mode String The data governance model to use when accessing data from a cluster. See data_security_mode.
docker_image Map The custom docker image. See docker_image.
driver_instance_pool_id String The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.
driver_node_type_id String The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.
enable_elastic_disk Boolean Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.
enable_local_disk_encryption Boolean Whether to enable LUKS on cluster VMs' local disks
gcp_attributes Map Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used. See gcp_attributes.
init_scripts Sequence The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. See init_scripts.
instance_pool_id String The optional ID of the instance pool to which the cluster belongs.
is_single_node Boolean This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Databricks will automatically set single node related custom_tags, spark_conf, and num_workers
kind String The kind of compute described by this compute specification.
node_type_id String This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
num_workers Integer Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.
permissions Sequence The cluster permissions. See permissions.
policy_id String The ID of the cluster policy used to create the cluster if applicable.
runtime_engine String Determines the cluster's runtime engine, either STANDARD or PHOTON.
single_user_name String Single user name if data_security_mode is SINGLE_USER
spark_conf Map An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively. See spark_conf.
spark_env_vars Map An object containing a set of optional, user-specified environment variable key-value pairs.
spark_version String The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
ssh_public_keys Sequence SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.
use_ml_runtime Boolean This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.
workload_type Map Cluster Attributes showing for clusters workload types. See workload_type.

Examples

The following example creates a dedicated (single-user) cluster for the current user with Databricks Runtime 15.4 LTS and a cluster policy:

resources:
  clusters:
    my_cluster:
      num_workers: 0
      node_type_id: 'i3.xlarge'
      driver_node_type_id: 'i3.xlarge'
      spark_version: '15.4.x-scala2.12'
      spark_conf:
        'spark.executor.memory': '2g'
      autotermination_minutes: 60
      enable_elastic_disk: true
      single_user_name: ${workspace.current_user.userName}
      policy_id: '000128DB309672CA'
      enable_local_disk_encryption: false
      data_security_mode: SINGLE_USER
      runtime_engine": STANDARD

This example creates a simple cluster my_cluster and sets that as the cluster to use to run the notebook in my_job:

bundle:
  name: clusters

resources:
  clusters:
    my_cluster:
      num_workers: 2
      node_type_id: 'i3.xlarge'
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: '13.3.x-scala2.12'
      spark_conf:
        'spark.executor.memory': '2g'

  jobs:
    my_job:
      tasks:
        - task_key: test_task
          notebook_task:
            notebook_path: './src/my_notebook.py'

dashboard

Type: Map

The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.

dashboards:
  <dashboard-name>:
    <dashboard-field-name>: <dashboard-field-value>
Key Type Description
display_name String The display name of the dashboard.
etag String The etag for the dashboard. Can be optionally provided on updates to ensure that the dashboard has not been modified since the last read.
file_path String The local path of the dashboard asset, including the file name. Exported dashboards always have the file extension .lvdash.json.
permissions Sequence The dashboard permissions. See permissions.
serialized_dashboard Any The contents of the dashboard in serialized string form.
warehouse_id String The warehouse ID used to run the dashboard.

Example

The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.

resources:
  dashboards:
    nyc_taxi_trip_analysis:
      display_name: 'NYC Taxi Trip Analysis'
      file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
      warehouse_id: ${var.warehouse_id}

If you use the UI to modify the dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See Generate a bundle configuration file.

In addition, if you attempt to deploy a bundle that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See Deploy a bundle.

experiment

Type: Map

The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see Organize training runs with MLflow experiments.

experiments:
  <experiment-name>:
    <experiment-field-name>: <experiment-field-value>
Key Type Description
artifact_location String The location where artifacts for the experiment are stored.
name String The friendly name that identifies the experiment.
permissions Sequence The experiment's permissions. See permissions.
tags Sequence Additional metadata key-value pairs. See tags.

Example

The following example defines an experiment that all users can view:

resources:
  experiments:
    experiment:
      name: my_ml_experiment
      permissions:
        - level: CAN_READ
          group_name: users
      description: MLflow experiment used to track runs

job

Type: Map

The job resource allows you to define jobs and their corresponding tasks in your bundle. For information about jobs, see Orchestration using Databricks Jobs. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job with Databricks Asset Bundles.

jobs:
  <job-name>:
    <job-field-name>: <job-field-value>
Key Type Description
budget_policy_id String The id of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.
continuous Map An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of schedule and continuous can be used. See continuous.
deployment Map Deployment information for jobs managed by external sources. See deployment.
description String An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.
edit_mode String Edit mode of the job, either UI_LOCKED or EDITABLE.
email_notifications Map An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted. See email_notifications.
environments Sequence A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings. See environments.
format String The format of the job.
git_source Map An optional specification for a remote Git repository containing the source code used by tasks. The git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed. Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.
health Map An optional set of health rules that can be defined for this job. See health.
job_clusters Sequence A list of job cluster specifications that can be shared and reused by tasks of this job. See clusters.
max_concurrent_runs Integer An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. See max_concurrent_runs.
name String An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.
notification_settings Map Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job. See notification_settings.
parameters Sequence Job-level parameter definitions. See parameters.
performance_target String PerformanceTarget defines how performant or cost efficient the execution of run on serverless should be.
permissions Sequence The job's permissions. See permissions.
queue Map The queue settings of the job. See queue.
run_as Map Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. Either user_name or service_principal_name should be specified. If not, an error is thrown. See Specify a run identity for a Databricks Asset Bundles workflow.
schedule Map An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking "Run Now" in the Jobs UI or sending an API request to runNow. See schedule.
tags Map A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.
tasks Sequence A list of task specifications to be executed by this job. See Add tasks to jobs in Databricks Asset Bundles.
timeout_seconds Integer An optional timeout applied to each run of this job. A value of 0 means no timeout.
trigger Map A configuration to trigger a run when certain conditions are met. See trigger.
webhook_notifications Map A collection of system notification IDs to notify when runs of this job begin or complete. See webhook_notifications.

Example

The following example defines a job with the resource key hello-job with one notebook task:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          notebook_task:
            notebook_path: ./hello.py

For information about defining job tasks and overriding job settings, see Add tasks to jobs in Databricks Asset Bundles, Override job tasks settings in Databricks Asset Bundles, and Override cluster settings in Databricks Asset Bundles.

Important

The job git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed.

Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.

model (legacy)

Type: Map

The model resource allows you to define legacy models in bundles. Databricks recommends you use Unity Catalog registered models instead.

pipeline

Type: Map

The pipeline resource allows you to create DLT pipelines. For information about pipelines, see DLT. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop DLT pipelines with Databricks Asset Bundles.

pipelines:
  <pipeline-name>:
    <pipeline-field-name>: <pipeline-field-value>
Key Type Description
allow_duplicate_names Boolean If false, deployment will fail if name conflicts with that of another pipeline.
catalog String A catalog in Unity Catalog to publish data from this pipeline to. If target is specified, tables in this pipeline are published to a target schema inside catalog (for example, catalog.target.table). If target is not specified, no data is published to Unity Catalog.
channel String The DLT Release Channel that specifies which version of DLT to use.
clusters Sequence The cluster settings for this pipeline deployment. See cluster.
configuration Map The configuration for this pipeline execution.
continuous Boolean Whether the pipeline is continuous or triggered. This replaces trigger.
deployment Map Deployment type of this pipeline. See deployment.
development Boolean Whether the pipeline is in development mode. Defaults to false.
dry_run Boolean Whether the pipeline is a dry run pipeline.
edition String The pipeline product edition.
event_log Map The event log configuration for this pipeline. See event_log.
filters Map The filters that determine which pipeline packages to include in the deployed graph. See filters.
id String Unique identifier for this pipeline.
ingestion_definition Map The configuration for a managed ingestion pipeline. These settings cannot be used with the libraries, schema, target, or catalog settings. See ingestion_definition.
libraries Sequence Libraries or code needed by this deployment. See libraries.
name String A friendly name for this pipeline.
notifications Sequence The notification settings for this pipeline. See notifications.
permissions Sequence The pipeline's permissions. See permissions.
photon Boolean Whether Photon is enabled for this pipeline.
schema String The default schema (database) where tables are read from or published to.
serverless Boolean Whether serverless compute is enabled for this pipeline.
storage String The DBFS root directory for storing checkpoints and tables.
target String Target schema (database) to add tables in this pipeline to. Exactly one of schema or target must be specified. To publish to Unity Catalog, also specify catalog. This legacy field is deprecated for pipeline creation in favor of the schema field.
trigger Map Deprecated. Which pipeline trigger to use. Use continuous instead.

Example

The following example defines a pipeline with the resource key hello-pipeline:

resources:
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

quality_monitor (Unity Catalog)

Type: Map

The quality_monitor resource allows you to define a Unity Catalog table monitor.

quality_monitors:
  <quality_monitor-name>:
    <quality_monitor-field-name>: <quality_monitor-field-value>
Key Type Description
assets_dir String The directory to store monitoring assets (e.g. dashboard, metric tables).
baseline_table_name String Name of the baseline table from which drift metrics are computed from. Columns in the monitored table should also be present in the baseline table.
custom_metrics Sequence Custom metrics to compute on the monitored table. These can be aggregate metrics, derived metrics (from already computed aggregate metrics), or drift metrics (comparing metrics across time windows). See custom_metrics.
inference_log Map Configuration for monitoring inference logs. See inference_log.
notifications Map The notification settings for the monitor. See notifications.
output_schema_name String Schema where output metric tables are created.
schedule Map The schedule for automatically updating and refreshing metric tables. See schedule.
skip_builtin_dashboard Boolean Whether to skip creating a default dashboard summarizing data quality metrics.
slicing_exprs Sequence List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.
snapshot Map Configuration for monitoring snapshot tables.
table_name String The full name of the table.
time_series Map Configuration for monitoring time series tables. See time_series.
warehouse_id String Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.

Examples

For a complete example bundle that defines a quality_monitor, see the mlops_demo bundle.

The following examples define quality monitors for InferenceLog, TimeSeries, and Snapshot profile types.

# InferenceLog profile type
resources:
  quality_monitors:
    my_quality_monitor:
      table_name: dev.mlops_schema.predictions
      output_schema_name: ${bundle.target}.mlops_schema
      assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
      inference_log:
        granularities: [1 day]
        model_id_col: model_id
        prediction_col: prediction
        label_col: price
        problem_type: PROBLEM_TYPE_REGRESSION
        timestamp_col: timestamp
      schedule:
        quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
        timezone_id: UTC
# TimeSeries profile type
resources:
  quality_monitors:
    my_quality_monitor:
      table_name: dev.mlops_schema.predictions
      output_schema_name: ${bundle.target}.mlops_schema
      assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
      time_series:
        granularities: [30 minutes]
        timestamp_col: timestamp
      schedule:
        quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
        timezone_id: UTC
# Snapshot profile type
resources:
  quality_monitors:
    my_quality_monitor:
      table_name: dev.mlops_schema.predictions
      output_schema_name: ${bundle.target}.mlops_schema
      assets_dir: /Workspace/Users/${workspace.current_user.userName}/databricks_lakehouse_monitoring
      snapshot: {}
      schedule:
        quartz_cron_expression: 0 0 8 * * ? # Run Every day at 8am
        timezone_id: UTC

registered_model (Unity Catalog)

Type: Map

The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.

registered_models:
  <registered_model-name>:
    <registered_model-field-name>: <registered_model-field-value>
Key Type Description
catalog_name String The name of the catalog where the schema and the registered model reside.
comment String The comment attached to the registered model.
grants Sequence The grants associated with the registered model. See grants.
name String The name of the registered model.
schema_name String The name of the schema where the registered model resides.
storage_location String The storage location on the cloud under which model version data files are stored.

Example

The following example defines a registered model in Unity Catalog:

resources:
  registered_models:
    model:
      name: my_model
      catalog_name: ${bundle.target}
      schema_name: mlops_schema
      comment: Registered model in Unity Catalog for ${bundle.target} deployment target
      grants:
        - privileges:
            - EXECUTE
          principal: account users

schema (Unity Catalog)

Type: Map

The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:

  • The owner of a schema resource is always the deployment user, and cannot be changed. If run_as is specified in the bundle, it will be ignored by operations on the schema.
  • Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example, enable_predictive_optimization is not supported as it is only available on the update API.
schemas:
  <schema-name>:
    <schema-field-name>: <schema-field-value>
Key Type Description
catalog_name String The name of the parent catalog.
comment String A user-provided free-form text description.
grants Sequence The grants associated with the schema. See grants.
name String The name of schema, relative to the parent catalog.
properties Map A map of key-value properties attached to the schema.
storage_root String The storage root URL for managed tables within the schema.

Examples

The following example defines a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:

resources:
  pipelines:
    my_pipeline:
      name: test-pipeline-{{.unique_id}}
      libraries:
        - notebook:
            path: ./nb.sql
      development: true
      catalog: main
      target: ${resources.schemas.my_schema.id}

  schemas:
    my_schema:
      name: test-schema-{{.unique_id}}
      catalog_name: main
      comment: This schema was created by DABs.

A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas mapping. For more information about grants, see Show, grant, and revoke privileges.

The following example defines a Unity Catalog schema with grants:

resources:
  schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - SELECT
        - principal: my_team
          privileges:
            - CAN_MANAGE
      catalog_name: main

volume (Unity Catalog)

Type: Map

The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:

  • A volume cannot be referenced in the artifact_path for the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in the artifact_path in subsequent deployments.
  • Volumes in the bundle are not prepended with the dev_${workspace.current_user.short_name} prefix when the deployment target has mode: development configured. However, you can manually configure this prefix. See Custom presets.
volumes:
  <volume-name>:
    <volume-field-name>: <volume-field-value>
Key Type Description
catalog_name String The name of the catalog of the schema and volume.
comment String The comment attached to the volume.
grants Sequence The grants associated with the volume. See grants.
name String The name of the volume.
schema_name String The name of the schema where the volume is.
storage_location String The storage location on the cloud.
volume_type String The volume type, either EXTERNAL or MANAGED. An external volume is located in the specified external location. A managed volume is located in the default location which is specified by the parent schema, or the parent catalog, or the metastore.

Example

The following example creates a Unity Catalog volume with the key my_volume:

resources:
  volumes:
    my_volume:
      catalog_name: main
      name: my_volume
      schema_name: my_schema

For an example bundle that runs a job that writes to a file in Unity Catalog volume, see the bundle-examples GitHub repository.

Common objects

grants

Type: Sequence

Key Type Description
principal String The name of the principal that will be granted privileges.
privileges Sequence The privileges to grant to the specified entity.