Databricks Asset Bundles resources

Databricks Asset Bundles allows you to specify information about the Azure Databricks resources used by the bundle in the resources mapping in the bundle configuration. See resources mapping and resources key reference.

This page provides configuration reference for all supported resource types for bundles and provides details and an example for each supported type. For additional examples, see Bundle configuration examples.

The JSON schema for bundles that is used to validate YAML configuration is in the Databricks CLI GitHub repository.

Tip

To generate YAML for any existing resource, use the databricks bundle generate command. See databricks bundle generate.

Supported resources

The following table lists supported resource types for bundles (YAML and Python, where applicable). Some resources can be created by defining them in a bundle and deploying the bundle, and some resources can only be created by referencing an existing asset to include in the bundle.

Resource configuration defines a Databricks object that corresponds to a Databricks REST API object. The REST API object's supported create request fields, expressed as YAML, are the resource's supported keys. Links to documentation for each resource's corresponding object are in the table below.

Tip

The databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.

Resource Python support Corresponding REST API object
cluster Cluster object
dashboard Dashboard object
experiment Experiment object
job Jobs Job object
model (legacy) Model (legacy) object
pipeline Pipelines Pipeline object
quality_monitor Quality monitor object
registered_model (Unity Catalog) Registered model object
schema (Unity Catalog) Schemas Schema object
secret_scope Secret scope object
sql_warehouse SQL warehouse object
volume (Unity Catalog) Volumes Volume object

cluster

Type: Map

The cluster resource defines a cluster.

clusters:
  <cluster-name>:
    <cluster-field-name>: <cluster-field-value>
Key Type Description
apply_policy_default_values Boolean When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.
autoscale Map Parameters needed in order to automatically scale clusters up and down based on load. See autoscale.
autotermination_minutes Integer Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.
aws_attributes Map Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. See aws_attributes.
azure_attributes Map Attributes related to clusters running on Azure. If not specified at cluster creation, a set of default values will be used. See azure_attributes.
cluster_log_conf Map The configuration for delivering spark logs to a long-term storage destination. See cluster_log_conf.
cluster_name String Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string.
custom_tags Map Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags.
data_security_mode String The data governance model to use when accessing data from a cluster. Valid values include NONE, SINGLE_USER, USER_ISOLATION, LEGACY_SINGLE_USER, LEGACY_TABLE_ACL, LEGACY_PASSTHROUGH.
docker_image Map The custom docker image. See docker_image.
driver_instance_pool_id String The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.
driver_node_type_id String The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.
enable_elastic_disk Boolean Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.
enable_local_disk_encryption Boolean Whether to enable LUKS on cluster VMs' local disks
gcp_attributes Map Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used. See gcp_attributes.
init_scripts Sequence The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. See init_scripts.
instance_pool_id String The optional ID of the instance pool to which the cluster belongs.
is_single_node Boolean This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Databricks will automatically set single node related custom_tags, spark_conf, and num_workers
kind String The kind of compute described by this compute specification.
node_type_id String This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
num_workers Integer Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.
permissions Sequence The cluster permissions. See permissions.
policy_id String The ID of the cluster policy used to create the cluster if applicable.
runtime_engine String Determines the cluster's runtime engine, either STANDARD or PHOTON.
single_user_name String Single user name if data_security_mode is SINGLE_USER
spark_conf Map An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.
spark_env_vars Map An object containing a set of optional, user-specified environment variable key-value pairs.
spark_version String The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
ssh_public_keys Sequence SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.
use_ml_runtime Boolean This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.
workload_type Map Cluster Attributes showing for clusters workload types. See workload_type.

cluster.autoscale

Type: Map

Parameters for automatically scaling clusters up and down based on load.

Key Type Description
min_workers Integer The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.
max_workers Integer The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers.

cluster.azure_attributes

Type: Map

Attributes related to clusters running on Azure.

Key Type Description
first_on_demand Integer The first first_on_demand nodes of the cluster will be placed on on-demand instances.
availability String Availability type used for all subsequent nodes past the first_on_demand ones. Valid values are SPOT_AZURE, ON_DEMAND_AZURE, SPOT_WITH_FALLBACK_AZURE.
spot_bid_max_price Number The max price for Azure spot instances. Use -1 to specify lowest price.

cluster.cluster_log_conf

The configuration for delivering Spark logs to a long-term storage destination.

Key Type Description
dbfs Map DBFS location for cluster log delivery. See dbfs.
volumes Map Volumes location for cluster log delivery. See volumes.

cluster.cluster_log_conf.dbfs

Type: Map

DBFS location for cluster log delivery.

Key Type Description
destination String The DBFS path for cluster log delivery (for example, dbfs:/cluster-logs).

cluster.cluster_log_conf.volumes

Type: Map

Volumes location for cluster log delivery.

Key Type Description
destination String The volume path for cluster log delivery (for example, /Volumes/catalog/schema/volume/cluster_log).

cluster.docker_image

Type: Map

The custom Docker image configuration.

Key Type Description
url String URL of the Docker image.
basic_auth Map Basic authentication for Docker repository. See basic_auth.

cluster.docker_image.basic_auth

Type: Map

Basic authentication for Docker repository.

Key Type Description
username String The username for Docker registry authentication.
password String The password for Docker registry authentication.

cluster.init_scripts

Type: Map

The configuration for storing init scripts. At least one location type must be specified.

Key Type Description
dbfs Map DBFS location of init script. See dbfs.
workspace Map Workspace location of init script. See workspace.
abfss Map ABFSS location of init script. See abfss.
volumes Map UC Volumes location of init script. See volumes.

cluster.init_scripts.dbfs

Type: Map

DBFS location of init script.

Key Type Description
destination String The DBFS path of the init script.

cluster.init_scripts.workspace

Type: Map

Workspace location of init script.

Key Type Description
destination String The workspace path of the init script.

cluster.init_scripts.abfss

Type: Map

ABFSS location of init script.

Key Type Description
destination String The ABFSS path of the init script.

cluster.init_scripts.volumes

Type: Map

Volumes location of init script.

Key Type Description
destination String The UC Volumes path of the init script.

cluster.workload_type

Type: Map

Cluster attributes showing cluster workload types.

Key Type Description
clients Map Defines what type of clients can use the cluster. See clients.

cluster.workload_type.clients

Type: Map

The type of clients for this compute workload.

Key Type Description
jobs Boolean Whether the cluster can run jobs.
notebooks Boolean Whether the cluster can run notebooks.

Examples

The following example creates a dedicated (single-user) cluster for the current user with Databricks Runtime 15.4 LTS and a cluster policy:

resources:
  clusters:
    my_cluster:
      num_workers: 0
      node_type_id: 'i3.xlarge'
      driver_node_type_id: 'i3.xlarge'
      spark_version: '15.4.x-scala2.12'
      spark_conf:
        'spark.executor.memory': '2g'
      autotermination_minutes: 60
      enable_elastic_disk: true
      single_user_name: ${workspace.current_user.userName}
      policy_id: '000128DB309672CA'
      enable_local_disk_encryption: false
      data_security_mode: SINGLE_USER
      runtime_engine": STANDARD

This example creates a simple cluster my_cluster and sets that as the cluster to use to run the notebook in my_job:

bundle:
  name: clusters

resources:
  clusters:
    my_cluster:
      num_workers: 2
      node_type_id: 'i3.xlarge'
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: '13.3.x-scala2.12'
      spark_conf:
        'spark.executor.memory': '2g'

  jobs:
    my_job:
      tasks:
        - task_key: test_task
          notebook_task:
            notebook_path: './src/my_notebook.py'
          existing_cluster_id: ${resources.clusters.my_cluster.id}

dashboard

Type: Map

The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.

If you deployed a bundle that contains a dashboard from your local environment and then use the UI to modify that dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See databricks bundle generate.

In addition, if you attempt to deploy a bundle from your local environment that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See databricks bundle deploy.

Note

When using Databricks Asset Bundles with dashboard Git support, prevent duplicate dashboards from being generated by adding the sync mapping to exclude the dashboards from synchronizing as files:

sync:
  exclude:
    - src/*.lvdash.json
dashboards:
  <dashboard-name>:
    <dashboard-field-name>: <dashboard-field-value>
Key Type Description
display_name String The display name of the dashboard.
embed_credentials Boolean Whether the bundle deployment identity credentials are used to execute queries for all dashboard viewers. If it is set to false, a viewer's credentials are used. The default value is false.
etag String The etag for the dashboard. Can be optionally provided on updates to ensure that the dashboard has not been modified since the last read.
file_path String The local path of the dashboard asset, including the file name. Exported dashboards always have the file extension .lvdash.json.
permissions Sequence The dashboard permissions. See permissions.
serialized_dashboard Any The contents of the dashboard in serialized string form.
warehouse_id String The warehouse ID used to run the dashboard.

Example

The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.

resources:
  dashboards:
    nyc_taxi_trip_analysis:
      display_name: 'NYC Taxi Trip Analysis'
      file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
      warehouse_id: ${var.warehouse_id}

experiment

Type: Map

The experiment resource allows you to define MLflow experiments in a bundle. For information about MLflow experiments, see Organize training runs with MLflow experiments.

experiments:
  <experiment-name>:
    <experiment-field-name>: <experiment-field-value>
Key Type Description
artifact_location String The location where artifacts for the experiment are stored.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String The friendly name that identifies the experiment. An experiment name must be an absolute path in the Databricks workspace, for example /Workspace/Users/someone@example.com/my_experiment.
permissions Sequence The experiment's permissions. See permissions.
tags Sequence Additional metadata key-value pairs. See tags.

Example

The following example defines an experiment that all users can view:

resources:
  experiments:
    experiment:
      name: /Workspace/Users/someone@example.com/my_experiment
      permissions:
        - level: CAN_READ
          group_name: users
      description: MLflow experiment used to track runs

job

Type: Map

Jobs are supported in Python for Databricks Asset Bundles. See databricks.bundles.jobs.

The job resource allows you to define jobs and their corresponding tasks in your bundle.

For information about jobs, see Lakeflow Jobs. For a tutorial that uses a Databricks Asset Bundles template to create a job, see Develop a job with Databricks Asset Bundles.

jobs:
  <job-name>:
    <job-field-name>: <job-field-value>
Key Type Description
budget_policy_id String The id of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.
continuous Map An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of schedule and continuous can be used. See continuous.
deployment Map Deployment information for jobs managed by external sources. See deployment.
description String An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.
edit_mode String Edit mode of the job, either UI_LOCKED or EDITABLE.
email_notifications Map An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted. See email_notifications.
environments Sequence A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings. See environments.
format String Deprecated. The format of the job.
git_source Map An optional specification for a remote Git repository containing the source code used by tasks. See job.git_source.
Important: The git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository, and bundles expect that a deployed job has the same content as the local copy from where it was deployed.
Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.
health Map An optional set of health rules that can be defined for this job. See health.
job_clusters Sequence A list of job cluster specifications that can be shared and reused by tasks of this job. See job_clusters.
max_concurrent_runs Integer An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently.
name String An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.
notification_settings Map Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job. See notification_settings.
parameters Sequence Job-level parameter definitions.
performance_target String Defines how performant or cost efficient the execution of the run on serverless should be.
permissions Sequence The job's permissions. See permissions.
queue Map The queue settings of the job. See queue.
run_as Map Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. Either user_name or service_principal_name should be specified. If not, an error is thrown. See run_as.
schedule Map An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking "Run Now" in the Jobs UI or sending an API request to runNow. See schedule.
tags Map A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.
tasks Sequence A list of task specifications to be executed by this job. See Add tasks to jobs in Databricks Asset Bundles.
timeout_seconds Integer An optional timeout applied to each run of this job. A value of 0 means no timeout.
trigger Map A configuration to trigger a run when certain conditions are met. See trigger.
webhook_notifications Map A collection of system notification IDs to notify when runs of this job begin or complete. See webhook_notifications.

job.continuous

Type: Map

Configuration for continuous job execution.

Key Type Description
pause_status String Whether the continuous job is paused or not. Valid values: PAUSED, UNPAUSED.
task_retry_mode String Indicate how the continuous job is applying task level retries. Valid values are NEVER and ON_FAILURE. Defaults to NEVER.

job.deployment

Type: Map

Deployment information for jobs managed by external sources.

Key Type Description
kind String The kind of deployment. For example, BUNDLE.
metadata_file_path String The path to the metadata file for the deployment.

job.email_notifications

Type: Map

Email notification settings for job runs.

Key Type Description
on_start Sequence A list of email addresses to notify when a run starts.
on_success Sequence A list of email addresses to notify when a run succeeds.
on_failure Sequence A list of email addresses to notify when a run fails.
on_duration_warning_threshold_exceeded Sequence A list of email addresses to notify when a run duration exceeds the warning threshold.
no_alert_for_skipped_runs Boolean Whether to skip sending alerts for skipped runs.

job.environments

Type: Sequence

A list of task execution environment specifications that can be referenced by serverless tasks of a job.

Each item in the list is a JobEnvironment:

Key Type Description
environment_key String The key of an environment. It has to be unique within a job.
spec Map The entity that represents a serverless environment. See job.environments.spec.

job.environments.spec

Type: Map

The entity that represents a serverless environment.

Key Type Description
client String Deprecated. The client version.
dependencies Sequence List of pip dependencies, as supported by the version of pip in this environment.
environment_version String Required. Environment version used by the environment. Each version comes with a specific Python version and a set of Python packages. The version is a string, consisting of an integer.

job.git_source

Type: Map

Git repository configuration for job source code.

Key Type Description
git_branch String The name of the branch to be checked out and used by this job. This field cannot be specified in conjunction with git_tag or git_commit.
git_commit String Commit to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_tag.
git_provider String Unique identifier of the service used to host the Git repository. The value is case insensitive. Valid values are gitHub, bitbucketCloud, gitLab, azureDevOpsServices, gitHubEnterprise, bitbucketServer, gitLabEnterpriseEdition.
git_snapshot Map Read-only state of the remote repository at the time the job was run. This field is only included on job runs. See git_snapshot.
git_tag String Name of the tag to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_commit.
git_url String URL of the repository to be cloned by this job.

job.git_source.git_snapshot

Type: Map

Read-only commit information snapshot.

Key Type Description
used_commit String Commit that was used to execute the run. If git_branch was specified, this points to the HEAD of the branch at the time of the run; if git_tag was specified, this points to the commit the tag points to.

job.health

Type: Map

Health monitoring configuration for the job.

Key Type Description
rules Sequence A list of job health rules. Each rule contains a metric and op (operator) and value. See job.health.rules.

job.health.rules

Type: Sequence

A list of job health rules.

Each item in the list is a JobHealthRule:

Key Type Description
metric String Specifies the health metric that is being evaluated for a particular health rule.
  • RUN_DURATION_SECONDS: Expected total time for a run in seconds.
  • STREAMING_BACKLOG_BYTES: An estimate of the maximum bytes of data waiting to be consumed across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_RECORDS: An estimate of the maximum offset lag across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_SECONDS: An estimate of the maximum consumer delay across all streams. This metric is in Public Preview.
  • STREAMING_BACKLOG_FILES: An estimate of the maximum number of outstanding files across all streams. This metric is in Public Preview.
op String Specifies the operator used to compare the health metric value with the specified threshold.
value Integer Specifies the threshold value that the health metric should obey to satisfy the health rule.

job.job_clusters

Type: Sequence

A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings.

Each item in the list is a JobCluster:

Key Type Description
job_cluster_key String A unique name for the job cluster. This field is required and must be unique within the job. JobTaskSettings may refer to this field to determine which cluster to launch for the task execution.
new_cluster Map If new_cluster, a description of a cluster that is created for each task. See cluster.

job.notification_settings

Type: Map

Notification settings that apply to all notifications for the job.

Key Type Description
no_alert_for_skipped_runs Boolean Whether to skip sending alerts for skipped runs.
no_alert_for_canceled_runs Boolean Whether to skip sending alerts for canceled runs.

job.queue

Type: Map

Queue settings for the job.

Key Type Description
enabled Boolean Whether to enable queueing for the job.

job.schedule

Type: Map

Schedule configuration for periodic job execution.

Key Type Description
quartz_cron_expression String A Cron expression using Quartz syntax that specifies when the job runs. For example, 0 0 9 * * ? runs the job every day at 9:00 AM UTC.
timezone_id String The timezone for the schedule. For example, America/Los_Angeles or UTC.
pause_status String Whether the schedule is paused or not. Valid values: PAUSED, UNPAUSED.

job.trigger

Type: Map

Trigger configuration for event-driven job execution.

Key Type Description
file_arrival Map Trigger based on file arrival. See file_arrival.
table Map Trigger based on a table. See table.
table_update Map Trigger based on table updates. See table_update.
periodic Map Periodic trigger. See periodic.

job.trigger.file_arrival

Type: Map

Trigger configuration based on file arrival.

Key Type Description
url String The file path to monitor for new files.
min_time_between_triggers_seconds Integer Minimum time in seconds between trigger events.
wait_after_last_change_seconds Integer Wait time in seconds after the last file change before triggering.

job.trigger.table

Type: Map

Trigger configuration based on a table.

Key Type Description
table_names Sequence A list of table names to monitor.
condition String The SQL condition that must be met to trigger the job.

job.trigger.table_update

Type: Map

Trigger configuration based on table updates.

Key Type Description
table_names Sequence A list of table names to monitor for updates.
condition String The SQL condition that must be met to trigger the job.
wait_after_last_change_seconds Integer Wait time in seconds after the last table update before triggering.

job.trigger.periodic

Type: Map

Periodic trigger configuration.

Key Type Description
interval Integer The interval value for the periodic trigger.
unit String The unit of time for the interval. Valid values: SECONDS, MINUTES, HOURS, DAYS, WEEKS.

job.webhook_notifications

Type: Map

Webhook notification settings for job runs.

Key Type Description
on_start Sequence A list of webhook notification IDs to notify when a run starts.
on_success Sequence A list of webhook notification IDs to notify when a run succeeds.
on_failure Sequence A list of webhook notification IDs to notify when a run fails.
on_duration_warning_threshold_exceeded Sequence A list of webhook notification IDs to notify when a run duration exceeds the warning threshold.

Examples

The following example defines a job with the resource key hello-job with one notebook task:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          notebook_task:
            notebook_path: ./hello.py

The following example defines a job with a SQL notebook:

resources:
  jobs:
    job_with_sql_notebook:
      name: 'Job to demonstrate using a SQL notebook with a SQL warehouse'
      tasks:
        - task_key: notebook
          notebook_task:
            notebook_path: ./select.sql
            warehouse_id: 799f096837fzzzz4

For additional job configuration examples, see Job configuration.

For information about defining job tasks and overriding job settings, see:

model (legacy)

Type: Map

The model resource allows you to define legacy models in bundles. Databricks recommends you use Unity Catalog registered models instead.

pipeline

Type: Map

Pipelines are supported in Python for Databricks Asset Bundles. See databricks.bundles.pipelines.

The pipeline resource allows you to create pipelines. For information about pipelines, see Lakeflow Spark Declarative Pipelines. For a tutorial that uses the Databricks Asset Bundles template to create a pipeline, see Develop Lakeflow Spark Declarative Pipelines with Databricks Asset Bundles.

pipelines:
  <pipeline-name>:
    <pipeline-field-name>: <pipeline-field-value>
Key Type Description
allow_duplicate_names Boolean If false, deployment will fail if name conflicts with that of another pipeline.
budget_policy_id String Budget policy of this pipeline.
catalog String A catalog in Unity Catalog to publish data from this pipeline to. If target is specified, tables in this pipeline are published to a target schema inside catalog (for example, catalog.target.table). If target is not specified, no data is published to Unity Catalog.
channel String The Lakeflow Spark Declarative Pipelines Release Channel that specifies which version of Lakeflow Spark Declarative Pipelines to use.
clusters Sequence The cluster settings for this pipeline deployment. See cluster.
configuration Map The configuration for this pipeline execution.
continuous Boolean Whether the pipeline is continuous or triggered. This replaces trigger.
deployment Map Deployment type of this pipeline. See deployment.
development Boolean Whether the pipeline is in development mode. Defaults to false.
dry_run Boolean Whether the pipeline is a dry run pipeline.
edition String The pipeline product edition.
environment Map The environment specification for this pipeline used to install dependencies on serverless compute. See environment. This key is only supported in Databricks CLI version 0.258 and above.
event_log Map The event log configuration for this pipeline. See event_log.
filters Map The filters that determine which pipeline packages to include in the deployed graph. See filters.
id String Unique identifier for this pipeline.
ingestion_definition Map The configuration for a managed ingestion pipeline. These settings cannot be used with the libraries, schema, target, or catalog settings. See ingestion_definition.
libraries Sequence A list of libraries or code needed by this deployment. See pipeline.libraries.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String A friendly name for this pipeline.
notifications Sequence The notification settings for this pipeline.
permissions Sequence The pipeline's permissions. See permissions.
photon Boolean Whether Photon is enabled for this pipeline.
root_path String The root path for this pipeline. This is used as the root directory when editing the pipeline in the Databricks user interface and it is added to sys.path when executing Python sources during pipeline execution.
run_as Map The identity that the pipeline runs as. If not specified, the pipeline runs as the user who created the pipeline. Only user_name or service_principal_name can be specified. If both are specified, an error is thrown. See run_as.
schema String The default schema (database) where tables are read from or published to.
serverless Boolean Whether serverless compute is enabled for this pipeline.
storage String The DBFS root directory for storing checkpoints and tables.
tags Map A map of tags associated with the pipeline. These are forwarded to the cluster as cluster tags, and are therefore subject to the same limitations. A maximum of 25 tags can be added to the pipeline.
target String Target schema (database) to add tables in this pipeline to. Exactly one of schema or target must be specified. To publish to Unity Catalog, also specify catalog. This legacy field is deprecated for pipeline creation in favor of the schema field.

pipeline.deployment

Type: Map

Deployment type configuration for the pipeline.

Key Type Description
kind String The kind of deployment. For example, BUNDLE.
metadata_file_path String The path to the metadata file for the deployment.

pipeline.environment

Type: Map

Environment specification for installing dependencies on serverless compute.

Key Type Description
dependencies Sequence A list of pip dependencies, as supported by the version of pip in this environment. Each dependency is a pip requirement file line.

pipeline.event_log

Type: Map

Event log configuration for the pipeline.

Key Type Description
catalog String The Unity Catalog catalog the event log is published under.
name String The name the event log is published to in Unity Catalog.
schema String The Unity Catalog schema the event log is published under.

pipeline.filters

Type: Map

Filters that determine which pipeline packages to include in the deployed graph.

Key Type Description
include Sequence A list of package names to include.
exclude Sequence A list of package names to exclude.

pipeline.ingestion_definition

Type: Map

Configuration for a managed ingestion pipeline. These settings cannot be used with the libraries, schema, target, or catalog settings.

Key Type Description
connection_name String The name of the connection to use for ingestion.
ingestion_gateway_id String The ID of the ingestion gateway.
objects Sequence Required. Settings specifying tables to replicate and the destination for the replicated tables. Each object can be a SchemaSpec, TableSpec, or ReportSpec.
source_configuration Map Catalog-level source configuration parameters. See source_configuration.
table_configuration Map Configuration for the ingestion tables. See table_configuration.

SchemaSpec

Type: Map

Schema object specification for ingesting all tables from a schema.

Key Type Description
source_schema String The name of the source schema to ingest.
destination_catalog String The name of the destination catalog in Unity Catalog.
destination_schema String The name of the destination schema in Unity Catalog.
table_configuration Map Configuration to apply to all tables in this schema. See pipeline.ingestion_definition.table_configuration.

TableSpec

Type: Map

Table object specification for ingesting a specific table.

Key Type Description
source_schema String The name of the source schema containing the table.
source_table String The name of the source table to ingest.
destination_catalog String The name of the destination catalog in Unity Catalog.
destination_schema String The name of the destination schema in Unity Catalog.
destination_table String The name of the destination table in Unity Catalog.
table_configuration Map Configuration for this specific table. See pipeline.ingestion_definition.table_configuration.

ReportSpec

Type: Map

Report object specification for ingesting analytics reports.

Key Type Description
source_url String The URL of the source report.
source_report String The name or identifier of the source report.
destination_catalog String The name of the destination catalog in Unity Catalog.
destination_schema String The name of the destination schema in Unity Catalog.
destination_table String The name of the destination table for the report data.
table_configuration Map Configuration for the report table. See pipeline.ingestion_definition.table_configuration.

pipeline.ingestion_definition.source_configuration

Type: Map

Configuration for source.

Key Type Description
catalog Map Catalog-level source configuration parameters. See catalog.

pipeline.ingestion_definition.source_configuration.catalog

Type: Map

Catalog-level source configuration parameters

Key Type Description
postgres Map Postgres-specific catalog-level configuration parameters. Contains one slot_config key that is a Map representing the Postgres slot configuration to use for logical replication.
source_catalog String The source catalog name.

pipeline.ingestion_definition.table_configuration

Type: Map

Configuration options for ingestion tables.

Key Type Description
exclude_columns Sequence A list of column names to be excluded for the ingestion. When not specified, include_columns fully controls what columns to be ingested. When specified, all other columns including future ones will be automatically included for ingestion. This field in mutually exclusive with include_columns.
include_columns Sequence A list of column names to be included for the ingestion. When not specified, all columns except ones in exclude_columns will be included. Future columns will be automatically included. When specified, all other future columns will be automatically excluded from ingestion. This field in mutually exclusive with exclude_columns.
primary_keys Sequence A list of column names to use as primary keys for the table.
sequence_by Sequence The column names specifying the logical order of events in the source data. Spark Declarative Pipelines uses this sequencing to handle change events that arrive out of order.

pipeline.libraries

Type: Sequence

Defines the list of libraries or code needed by this pipeline.

Each item in the list is a definition:

Key Type Description
file Map The path to a file that defines a pipeline and is stored in Databricks Repos. See pipeline.libraries.file.
glob Map The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends /**. This field cannot be used together with notebook or file. See pipeline.libraries.glob.
notebook Map The path to a notebook that defines a pipeline and is stored in the Databricks workspace. See pipeline.libraries.notebook.
whl String This field is deprecated

pipeline.libraries.file

Type: Map

The path to a file that defines a pipeline and is stored in the Databricks Repos.

Key Type Description
path String The absolute path of the source code.

pipeline.libraries.glob

Type: Map

The unified field to include source code. Each entry can be a notebook path, a file path, or a folder path that ends /**. This field cannot be used together with notebook or file.

Key Type Description
include String The source code to include for pipelines

pipeline.libraries.notebook

Type: Map

The path to a notebook that defines a pipeline and is stored in the Databricks workspace.

Key Type Description
path String The absolute path of the source code.

Example

The following example defines a pipeline with the resource key hello-pipeline:

resources:
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

For additional pipeline configuration examples, see Pipeline configuration.

registered_model (Unity Catalog)

Type: Map

The registered model resource allows you to define models in Unity Catalog. For information about Unity Catalog registered models, see Manage model lifecycle in Unity Catalog.

registered_models:
  <registered_model-name>:
    <registered_model-field-name>: <registered_model-field-value>
Key Type Description
aliases Sequence List of aliases associated with the registered model. See registered_model.aliases.
browse_only Boolean Indicates whether the principal is limited to retrieving metadata for the associated object through the BROWSE privilege when include_browse is enabled in the request.
catalog_name String The name of the catalog where the schema and the registered model reside.
comment String The comment attached to the registered model.
full_name String The three-level (fully qualified) name of the registered model
grants Sequence The grants associated with the registered model. See grant.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String The name of the registered model.
schema_name String The name of the schema where the registered model resides.
storage_location String The storage location on the cloud under which model version data files are stored.

registered_model.aliases

Type: Sequence

List of aliases associated with the registered model

Key Type Description
alias_name String Name of the alias, e.g. 'champion' or 'latest_stable'
catalog_name String The name of the catalog containing the model version
id String The unique identifier of the alias
model_name String The name of the parent registered model of the model version, relative to parent schema
schema_name String The name of the schema containing the model version, relative to parent catalog
version_num Integer Integer version number of the model version to which this alias points.

Example

The following example defines a registered model in Unity Catalog:

resources:
  registered_models:
    model:
      name: my_model
      catalog_name: ${bundle.target}
      schema_name: mlops_schema
      comment: Registered model in Unity Catalog for ${bundle.target} deployment target
      grants:
        - privileges:
            - EXECUTE
          principal: account users

schema (Unity Catalog)

Type: Map

Schemas are supported in Python for Databricks Asset Bundles. See databricks.bundles.schemas.

The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:

  • The owner of a schema resource is always the deployment user, and cannot be changed. If run_as is specified in the bundle, it will be ignored by operations on the schema.
  • Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example, enable_predictive_optimization is not supported as it is only available on the update API.
schemas:
  <schema-name>:
    <schema-field-name>: <schema-field-value>
Key Type Description
catalog_name String The name of the parent catalog.
comment String A user-provided free-form text description.
grants Sequence The grants associated with the schema. See grant.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String The name of schema, relative to the parent catalog.
properties Map A map of key-value properties attached to the schema.
storage_root String The storage root URL for managed tables within the schema.

Examples

The following example defines a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:

resources:
  pipelines:
    my_pipeline:
      name: test-pipeline-{{.unique_id}}
      libraries:
        - notebook:
            path: ../src/nb.ipynb
        - file:
            path: ../src/range.sql
      development: true
      catalog: ${resources.schemas.my_schema.catalog_name}
      target: ${resources.schemas.my_schema.id}

  schemas:
    my_schema:
      name: test-schema-{{.unique_id}}
      catalog_name: main
      comment: This schema was created by Databricks Asset Bundles.

A top-level grants mapping is not supported by Databricks Asset Bundles, so if you want to set grants for a schema, define the grants for the schema within the schemas mapping. For more information about grants, see Show, grant, and revoke privileges.

The following example defines a Unity Catalog schema with grants:

resources:
  schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - SELECT
        - principal: my_team
          privileges:
            - CAN_MANAGE
      catalog_name: main

secret_scope

Type: Map

The secret_scope resource allows you to define secret scopes in a bundle. For information about secret scopes, see Secret management.

secret_scopes:
  <secret_scope-name>:
    <secret_scope-field-name>: <secret_scope-field-value>
Key Type Description
backend_type String The backend type the scope will be created with. If not specified, this defaults to DATABRICKS.
keyvault_metadata Map The metadata for the secret scope if the backend_type is AZURE_KEYVAULT. See keyvault_metadata.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String Scope name requested by the user. Scope names are unique.
permissions Sequence The permissions to apply to the secret scope. Permissions are managed via secret scope ACLs. See permissions.

secret_scope.keyvault_metadata

Type: Map

The metadata for Azure Key Vault-backed secret scopes.

Key Type Description
resource_id String The Azure resource ID of the Key Vault.
dns_name String The DNS name of the Azure Key Vault.

Examples

The following example defines a secret scope that uses a key vault backend:

resources:
  secret_scopes:
    secret_scope_azure:
      name: test-secrets-azure-backend
      backend_type: 'AZURE_KEYVAULT'
      keyvault_metadata:
        resource_id: my_azure_keyvault_id
        dns_name: my_azure_keyvault_dns_name

The following example sets a custom ACL using secret scopes and permissions:

resources:
  secret_scopes:
    my_secret_scope:
      name: my_secret_scope
      permissions:
        - user_name: admins
          level: WRITE
        - user_name: users
          level: READ

For an example bundle that demonstrates how to define a secret scope and a job with a task that reads from it in a bundle, see the bundle-examples GitHub repository.

sql_warehouse

Type: Map

The SQL warehouse resource allows you to define a SQL warehouse in a bundle. For information about SQL warehouses, see Data warehousing on Azure Databricks.

sql_warehouses:
  <sql-warehouse-name>:
    <sql-warehouse-field-name>: <sql-warehouse-field-value>
Key Type Description
auto_stop_mins Integer The amount of time in minutes that a SQL warehouse must be idle (for example, no RUNNING queries), before it is automatically stopped. Valid values are 0, which indicates no autostop, or greater than or equal to 10. The default is 120.
channel Map The channel details. See channel
cluster_size String The size of the clusters allocated for this warehouse. Increasing the size of a Spark cluster allows you to run larger queries on it. If you want to increase the number of concurrent queries, tune max_num_clusters. For supported values, see cluster_size.
creator_name String The name of the user that created the warehouse.
enable_photon Boolean Whether the warehouse should use Photon optimized clusters. Defaults to false.
enable_serverless_compute Boolean Whether the warehouse should use serverless compute.
instance_profile_arn String Deprecated. Instance profile used to pass IAM role to the cluster,
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
max_num_clusters Integer The maximum number of clusters that the autoscaler will create to handle concurrent queries. Values must be less than or equal to 30 and greater than or equal to min_num_clusters. Defaults to min_clusters if unset.
min_num_clusters Integer The minimum number of available clusters that will be maintained for this SQL warehouse. Increasing this will ensure that a larger number of clusters are always running and therefore may reduce the cold start time for new queries. This is similar to reserved vs. revocable cores in a resource manager. Values must be greater than 0 and less than or equal to min(max_num_clusters, 30). Defaults to 1.
name String The logical name for the cluster. The name must be unique within an org and less than 100 characters.
permissions Sequence The permissions to apply to the warehouse. See permissions.
spot_instance_policy String Whether to use spot instances. Valid values are POLICY_UNSPECIFIED, COST_OPTIMIZED, RELIABILITY_OPTIMIZED. The default is COST_OPTIMIZED.
tags Map A set of key-value pairs that will be tagged on all resources (e.g., AWS instances and EBS volumes) associated with this SQL warehouse. The number of tags must be less than 45.
warehouse_type String The warehouse type, PRO or CLASSIC. If you want to use serverless compute, set this field to PRO and also set the field enable_serverless_compute to true.

sql_warehouse.channel

Type: Map

The channel configuration for the SQL warehouse.

Key Type Description
name String The name of the channel. Valid values include CHANNEL_NAME_CURRENT, CHANNEL_NAME_PREVIEW, CHANNEL_NAME_CUSTOM.
dbsql_version String The DBSQL version for custom channels.

Example

The following example defines a SQL warehouse:

resources:
  sql_warehouses:
    my_sql_warehouse:
      name: my_sql_warehouse
      cluster_size: X-Large
      enable_serverless_compute: true
      max_num_clusters: 3
      min_num_clusters: 1
      auto_stop_mins: 60
      warehouse_type: PRO

volume (Unity Catalog)

Type: Map

Volumes are supported in Python for Databricks Asset Bundles. See databricks.bundles.volumes.

The volume resource type allows you to define and create Unity Catalog volumes as part of a bundle. When deploying a bundle with a volume defined, note that:

  • A volume cannot be referenced in the artifact_path for the bundle until it exists in the workspace. Hence, if you want to use Databricks Asset Bundles to create the volume, you must first define the volume in the bundle, deploy it to create the volume, then reference it in the artifact_path in subsequent deployments.
  • Volumes in the bundle are not prepended with the dev_${workspace.current_user.short_name} prefix when the deployment target has mode: development configured. However, you can manually configure this prefix. See Custom presets.
volumes:
  <volume-name>:
    <volume-field-name>: <volume-field-value>
Key Type Description
catalog_name String The name of the catalog of the schema and volume.
comment String The comment attached to the volume.
grants Sequence The grants associated with the volume. See grant.
lifecycle Map Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed. See lifecycle.
name String The name of the volume.
schema_name String The name of the schema where the volume is.
storage_location String The storage location on the cloud.
volume_type String The volume type, either EXTERNAL or MANAGED. An external volume is located in the specified external location. A managed volume is located in the default location which is specified by the parent schema, or the parent catalog, or the metastore. See Managed versus external volumes.

Example

The following example creates a Unity Catalog volume with the key my_volume_id:

resources:
  volumes:
    my_volume_id:
      catalog_name: main
      name: my_volume
      schema_name: my_schema

For an example bundle that runs a job that writes to a file in Unity Catalog volume, see the bundle-examples GitHub repository.

Common objects

grant

Type: Map

Defines the principal and privileges to grant to that principal. For more information about grants, see Show, grant, and revoke privileges.

Key Type Description
principal String The name of the principal that will be granted privileges. This can be a user, group, or service principal.
privileges Sequence The privileges to grant to the specified entity. Valid values depend on the resource type (for example, SELECT, MODIFY, CREATE, USAGE, READ_FILES, WRITE_FILES, EXECUTE, ALL_PRIVILEGES).

Example

The following example defines a Unity Catalog schema with grants:

resources:
  schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - SELECT
        - principal: my_team
          privileges:
            - CAN_MANAGE
      catalog_name: main

lifecycle

Type: Map

Contains the lifecycle settings for a resource. It controls the behavior of the resource when it is deployed or destroyed.

Key Type Description
prevent_destroy Boolean Lifecycle setting to prevent the resource from being destroyed.