Add tasks to jobs in Databricks Asset Bundles

This page provides information about how to define job tasks in Databricks Asset Bundles. For information about job tasks, see Configure and edit tasks in Lakeflow Jobs.

Important

The job git_source field and task source field set to GIT are not recommended for bundles, because local relative paths may not point to the same content in the Git repository. Bundles expect that a deployed job has the same files as the local copy from where it was deployed.

Instead, clone the repository locally and set up your bundle project within this repository, so that the source for tasks are the workspace.

Configure tasks

Define tasks for a job in a bundle in the tasks key for the job definition. Examples of task configuration for the available task types is in the Task settings section. For information about defining a job in a bundle, see job.

Tip

To quickly generate resource configuration for an existing job using the Databricks CLI, you can use the bundle generate job command. See bundle commands.

To set task values, most job task types have task-specific parameters, but you can also define job parameters that get passed to tasks. Dynamic value references are supported for job parameters, which enable passing values that are specific to the job run between tasks. For complete information on how to pass task values by task type, see Details by task type.

You can also override general job task settings with settings for a target workspace. See Override with target settings.

The following example configuration defines a job with two notebook tasks, and passes a task value from the first task to the second task.

resources:
  jobs:
    pass_task_values_job:
      name: pass_task_values_job
      tasks:
        # Output task
        - task_key: output_value
          notebook_task:
            notebook_path: ../src/output_notebook.ipynb

        # Input task
        - task_key: input_value
          depends_on:
            - task_key: output_value
          notebook_task:
            notebook_path: ../src/input_notebook.ipynb
            base_parameters:
              received_message: '{{tasks.output_value.values.message}}'

The output_notebook.ipynb contains the following code, which sets a task value for the message key:

# Databricks notebook source
# This first task sets a simple output value.

message = "Hello from the first task"

# Set the message to be used by other tasks
dbutils.jobs.taskValues.set(key="message", value=message)

print(f"Produced message: {message}")

The input_notebook.ipynb retrieves the value of the parameter received_message, that was set in the configuration for the task:

# This notebook receives the message as a parameter.

dbutils.widgets.text("received_message", "")
received_message = dbutils.widgets.get("received_message")

print(f"Received message: {received_message}")

Task settings

This section contains settings and examples for each job task type.

Condition task

The condition_task enables you to add a task with if/else conditional logic to your job. The task evaluates a condition that can be used to control the execution of other tasks. The condition task does not require a cluster to execute and does not support retries or notifications. For more information about the if/else condition task, see Add branching logic to a job with the If/else task.

The following keys are available for a condition task. For the corresponding REST API object definition, see condition_task.

Key	Type	Description
`left`	String	Required. The left operand of the condition. Can be a string value or a job state or a dynamic value reference such as `{{job.repair_count}}` or `{{tasks.task_key.values.output}}`.
`op`	String	Required. The operator to use for comparison. Valid values are: `EQUAL_TO`, `NOT_EQUAL`, `GREATER_THAN`, `GREATER_THAN_OR_EQUAL`, `LESS_THAN`, `LESS_THAN_OR_EQUAL`.
`right`	String	Required. The right operand of the condition. Can be a string value or a job state or a dynamic value reference.

Examples

The following example contains a condition task and a notebook task, where the notebook task only executes if the number of job repairs is less than 5.

resources:
  jobs:
    my-job:
      name: my-job
      tasks:
        - task_key: condition_task
          condition_task:
            op: LESS_THAN
            left: '{{job.repair_count}}'
            right: '5'
        - task_key: notebook_task
          depends_on:
            - task_key: condition_task
              outcome: 'true'
          notebook_task:
            notebook_path: ../src/notebook.ipynb

Dashboard task

You use this task to refresh a dashboard and send a snapshot to subscribers. For more information about dashboards in bundles, see dashboard.

The following keys are available for a dashboard task. For the corresponding REST API object definition, see dashboard_task.

Key	Type	Description
`dashboard_id`	String	Required. The identifier of the dashboard to be refreshed. The dashboard must already exist.
`subscription`	Map	The subscription configuration for sending the dashboard snapshot. Each subscription object can specify destination settings for where to send snapshots after the dashboard refresh completes. See subscription.
`warehouse_id`	String	The warehouse ID to execute the dashboard with for the schedule. If not specified, the default warehouse of the dashboard will be used.

Examples

The following example adds a dashboard task to a job. When the job is run, the dashboard with the specified ID is refreshed.

resources:
  jobs:
    my-dashboard-job:
      name: my-dashboard-job
      tasks:
        - task_key: my-dashboard-task
          dashboard_task:
            dashboard_id: 11111111-1111-1111-1111-111111111111

dbt task

You use this task to run one or more dbt commands. For more information about dbt, see Connect to dbt Cloud.

The following keys are available for a dbt task. For the corresponding REST API object definition, see dbt_task.

Key	Type	Description
`catalog`	String	The name of the catalog to use. The catalog value can only be specified if a `warehouse_id` is specified. This field requires dbt-databricks >= 1.1.1.
`commands`	Sequence	Required. A list of dbt commands to execute in sequence. Each command must be a complete dbt command (e.g., `dbt deps`, `dbt seed`, `dbt run`, `dbt test`). A maximum of up to 10 commands can be provided.
`profiles_directory`	String	The path to the directory containing the dbt profiles.yml file. Can only be specified if no `warehouse_id` is specified. If no `warehouse_id` is specified and this folder is unset, the root directory is used.
`project_directory`	String	The path to the directory containing the dbt project. If not specified, defaults to the root of the repository or workspace directory. For projects stored in the Databricks workspace, the path must be absolute and begin with a slash. For projects in a remote repository, the path must be relative.
`schema`	String	The schema to write to. This parameter is only used when a `warehouse_id` is also provided. If not provided, the default schema is used.
`source`	String	The location type of the dbt project. Valid values are `WORKSPACE` and `GIT`. When set to `WORKSPACE`, the project will be retrieved from the Databricks workspace. When set to `GIT`, the project will be retrieved from a Git repository defined in `git_source`. If empty, the task uses `GIT` if `git_source` is defined and `WORKSPACE` otherwise.
`warehouse_id`	String	The ID of the SQL warehouse to use for running dbt commands. If not specified, the default warehouse will be used.

Examples

The following example adds a dbt task to a job. This dbt task uses the specified SQL warehouse to run the specified dbt commands.

To get a SQL warehouse's ID, open the SQL warehouse's settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.

Tip

Databricks Asset Bundles also includes a dbt-sql project template that defines a job with a dbt task, as well as dbt profiles for deployed dbt jobs. For information about Databricks Asset Bundles templates, see Default bundle templates.

resources:
  jobs:
    my-dbt-job:
      name: my-dbt-job
      tasks:
        - task_key: my-dbt-task
          dbt_task:
            commands:
              - 'dbt deps'
              - 'dbt seed'
              - 'dbt run'
            project_directory: /Users/someone@example.com/Testing
            warehouse_id: 1a111111a1111aa1
          libraries:
            - pypi:
                package: 'dbt-databricks>=1.0.0,<2.0.0'

For each task

The for_each_task enables you to add a task with a for each loop to your job. The task executes a nested task for every input provided. For more information about the for_each_task, see Use a For each task to run another task in a loop.

The following keys are available for a for_each_task. For the corresponding REST API object definition, see for_each_task.

Key	Type	Description
`concurrency`	Integer	The maximum number of task iterations that can run concurrently. If not specified, all iterations may run in parallel subject to cluster and workspace limits.
`inputs`	String	Required. The input data for the loop. This can be a JSON string or a reference to an array parameter. Each element in the array will be passed to one iteration of the nested task.
`task`	Map	Required. The nested task definition to execute for each input. This object contains the complete task specification including `task_key` and the task type (e.g., `notebook_task`, `python_wheel_task`, etc.).

Examples

The following example adds a for_each_task to a job, where it loops over the values of another task and processes them.

resources:
  jobs:
    my_job:
      name: my_job
      tasks:
        - task_key: generate_countries_list
          notebook_task:
            notebook_path: ../src/generate_countries_list.ipnyb
        - task_key: process_countries
          depends_on:
            - task_key: generate_countries_list
          for_each_task:
            inputs: '{{tasks.generate_countries_list.values.countries}}'
            task:
              task_key: process_countries_iteration
              notebook_task:
                notebook_path: ../src/process_countries_notebook.ipnyb

JAR task

You use this task to run a JAR. You can reference local JAR libraries or those in a workspace, a Unity Catalog volume, or an external cloud storage location. See JAR file (Java or Scala).

For details on how to compile and deploy Scala JAR files on a Unity Catalog-enabled cluster in standard access mode, see Tutorial: Run Scala code on serverless compute.

The following keys are available for a JAR task. For the corresponding REST API object definition, see jar_task.

Key	Type	Description
`jar_uri`	String	Deprecated. The URI of the JAR to be executed. DBFS and cloud storage paths are supported. This field is deprecated and should not be used. Instead, use the `libraries` field to specify JAR dependencies.
`main_class_name`	String	Required. The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code must use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job fail.
`parameters`	Sequence	The parameters passed to the main method. Use task parameter variables to set parameters containing information about job runs.

Examples

The following example adds a JAR task to a job. The path for the JAR is to a volume location.

resources:
  jobs:
    my-jar-job:
      name: my-jar-job
      tasks:
        - task_key: my-jar-task
          spark_jar_task:
            main_class_name: org.example.com.Main
          libraries:
            - jar: /Volumes/main/default/my-volume/my-project-0.1.0-SNAPSHOT.jar

Notebook task

You use this task to run a notebook. See Notebook task for jobs.

The following keys are available for a notebook task. For the corresponding REST API object definition, see notebook_task.

Key	Type	Description
`base_parameters`	Map	The base parameters to use for each run of this job. If the run is initiated by a call to jobs or run-now with parameters specified, the two parameters maps are merged. If the same key is specified in `base_parameters` and in `run-now`, the value from `run-now` is used. Use task parameter variables to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job's `base_parameters` or the `run-now` override parameters, the default value from the notebook is used. Retrieve these parameters in a notebook using `dbutils.widgets.get`.
`notebook_path`	String	Required. The path of the notebook in the Databricks workspace or remote repository, for example `/Users/user.name@databricks.com/notebook_to_run`. For notebooks stored in the Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative.
`source`	String	Location type of the notebook. Valid values are `WORKSPACE` and `GIT`. When set to `WORKSPACE`, the notebook will be retrieved from the local Databricks workspace. When set to `GIT`, the notebook will be retrieved from a Git repository defined in `git_source`. If the value is empty, the task will use `GIT` if `git_source` is defined and `WORKSPACE` otherwise.
`warehouse_id`	String	The ID of the warehouse to run the notebook on. Classic SQL warehouses are not supported. Use serverless or pro SQL warehouses instead. Note that SQL warehouses only support SQL cells. If the notebook contains non-SQL cells, the run will fail, so if you need to use Python (or other) in a cell, use serverless.

Examples

The following example adds a notebook task to a job and sets a job parameter named my_job_run_id. The path for the notebook to deploy is relative to the configuration file in which this task is declared. The task gets the notebook from its deployed location in the Azure Databricks workspace.

resources:
  jobs:
    my-notebook-job:
      name: my-notebook-job
      tasks:
        - task_key: my-notebook-task
          notebook_task:
            notebook_path: ./my-notebook.ipynb
      parameters:
        - name: my_job_run_id
          default: '{{job.run_id}}'

Pipeline task

You use this task to run a pipeline. See Lakeflow Spark Declarative Pipelines.

The following keys are available for a pipeline task. For the corresponding REST API object definition, see pipeline_task.

Key	Type	Description
`full_refresh`	Boolean	If true, a full refresh of the pipeline will be triggered, which will completely recompute all datasets in the pipeline. If false or omitted, only incremental data will be processed. For details, see Pipeline refresh semantics.
`pipeline_id`	String	Required. The ID of the pipeline to run. The pipeline must already exist.

Examples

The following example adds a pipeline task to a job. This task runs the specified pipeline.

Tip

You can get a pipelines's ID by opening the pipeline in the workspace and copying the Pipeline ID value on the Pipeline details tab of the pipeline's settings page.

resources:
  jobs:
    my-pipeline-job:
      name: my-pipeline-job
      tasks:
        - task_key: my-pipeline-task
          pipeline_task:
            pipeline_id: 11111111-1111-1111-1111-111111111111

Python script task

You use this task to run a Python file.

The following keys are available for a Python script task. For the corresponding REST API object definition, see python_task.

Key	Type	Description
`parameters`	Sequence	The parameters to pass to the Python file. Use task parameter variables to set parameters containing information about job runs.
`python_file`	String	Required. The URI of the Python file to be executed, for example `/Users/someone@example.com/my-script.py`. For python files stored in the Databricks workspace, the path must be absolute and begin with /. For files stored in a remote repository, the path must be relative. This field does not support dynamic value references such as variables.
`source`	String	The location type of the Python file. Valid values are `WORKSPACE` and `GIT`. When set to `WORKSPACE`, the file will be retrieved from the local Databricks workspace. When set to `GIT`, the file will be retrieved from a Git repository defined in `git_source`. If the value is empty, the task will use `GIT` if `git_source` is defined and `WORKSPACE` otherwise.

Examples

The following example adds a Python script task to a job. The path for the Python file to deploy is relative to the configuration file in which this task is declared. The task gets the Python file from its deployed location in the Azure Databricks workspace.

resources:
  jobs:
    my-python-script-job:
      name: my-python-script-job

      tasks:
        - task_key: my-python-script-task
          spark_python_task:
            python_file: ./my-script.py

Python wheel task

You use this task to run a Python wheel. See Build a Python wheel file using Databricks Asset Bundles.

The following keys are available for a Python wheel task. For the corresponding REST API object definition, see python_wheel_task.

Key	Type	Description
`entry_point`	String	Required. The named entry point to execute: function or class. If it does not exist in the metadata of the package it executes the function from the package directly using `$packageName.$entryPoint()`.
`named_parameters`	Map	The named parameters to pass to the Python wheel task, also know as Keyword arguments. A named parameter is a key-value pair with a string key and a string value. Both `parameters` and `named_parameters` cannot be specified. If `named_parameters` is specified, the `parameters` are passed as keyword arguments to the entry point function.
`package_name`	String	Required. The name of the Python package to execute. All dependencies must be installed in the environment. This does not check for or install any package dependencies.
`parameters`	Sequence	The parameters to pass to the Python wheel task, also known as Positional arguments. Each parameter is a string. If specified, `named_parameters` must not be specified.

Examples

The following example adds a Python wheel task to a job. The path for the Python wheel file to deploy is relative to the configuration file in which this task is declared. See Databricks Asset Bundles library dependencies.

resources:
  jobs:
    my-python-wheel-job:
      name: my-python-wheel-job
      tasks:
        - task_key: my-python-wheel-task
          python_wheel_task:
            entry_point: run
            package_name: my_package
          libraries:
            - whl: ./my_package/dist/my_package-*.whl

Run job task

You use this task to run another job.

The following keys are available for a run job task. For the corresponding REST API object definition, see run_job_task.

Key	Type	Description
`job_id`	Integer	Required. The ID of the job to run. The job must already exist in the workspace.
`job_parameters`	Map	Job-level parameters to pass to the job being run. These parameters are accessible within the job's tasks.
`pipeline_params`	Map	Parameters for the pipeline task. Used only if the job being run contains a pipeline task. Can include `full_refresh` to trigger a full refresh of the pipeline.

Examples

The following example contains a run job task in the second job that runs the first job.

This example uses a substitution to retrieve the ID of the job to run. To get a job's ID from the UI, open the job in the workspace and copy the ID from the Job ID value in the Job details tab of the jobs's settings page.

resources:
  jobs:
    my-first-job:
      name: my-first-job
      tasks:
        - task_key: my-first-job-task
          new_cluster:
            spark_version: '13.3.x-scala2.12'
            node_type_id: 'i3.xlarge'
            num_workers: 2
          notebook_task:
            notebook_path: ./src/test.py
    my_second_job:
      name: my-second-job
      tasks:
        - task_key: my-second-job-task
          run_job_task:
            job_id: ${resources.jobs.my-first-job.id}

SQL task

You use this task to run a SQL file, query, or alert.

The following keys are available for a SQL task. For the corresponding REST API object definition, see sql_task.

Key	Type	Description
`alert`	Map	Configuration for running a SQL alert. Contains: `alert_id` (String): Required. The canonical identifier of the SQL alert to run. `pause_subscriptions` (Boolean): Whether to pause alert subscriptions. `subscriptions` (Sequence): List of subscription settings.
`dashboard`	Map	Configuration for refreshing a SQL dashboard. Contains: `dashboard_id` (String): Required. The canonical identifier of the SQL dashboard to refresh. `custom_subject` (String): Custom subject for the email sent to dashboard subscribers. `pause_subscriptions` (Boolean): Whether to pause dashboard subscriptions. `subscriptions` (Sequence): List of subscription settings.
`file`	Map	Configuration for running a SQL file. Contains: `path` (String): Required. The path of the SQL file in the workspace or remote repository. For files stored in the Databricks workspace, the path must be absolute and begin with a slash. For files stored in a remote repository, the path must be relative. `source` (String): The location type of the SQL file. Valid values are `WORKSPACE` and `GIT`.
`parameters`	Map	Parameters to be used for each run of this task. SQL queries and files can use these parameters by referencing them with the syntax `{{parameter_key}}`. Use task parameter variables to set parameters containing information about job runs.
`query`	Map	Configuration for running a SQL query. Contains: `query_id` (String): Required. The canonical identifier of the SQL query to run.
`warehouse_id`	String	Required. The ID of the SQL warehouse to use to run the SQL task. The SQL warehouse must already exist.

Examples

Tip

To get a SQL warehouse's ID, open the SQL warehouse's settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.

The following example adds a SQL file task to a job. This SQL file task uses the specified SQL warehouse to run the specified SQL file.

resources:
  jobs:
    my-sql-file-job:
      name: my-sql-file-job
      tasks:
        - task_key: my-sql-file-task
          sql_task:
            file:
              path: /Users/someone@example.com/hello-world.sql
              source: WORKSPACE
            warehouse_id: 1a111111a1111aa1

The following example adds a SQL alert task to a job. This SQL alert task uses the specified SQL warehouse to refresh the specified SQL alert.

resources:
  jobs:
    my-sql-file-job:
      name: my-sql-alert-job
      tasks:
        - task_key: my-sql-alert-task
          sql_task:
            warehouse_id: 1a111111a1111aa1
            alert:
              alert_id: 11111111-1111-1111-1111-111111111111

The following example adds a SQL query task to a job. This SQL query task uses the specified SQL warehouse to run the specified SQL query.

resources:
  jobs:
    my-sql-query-job:
      name: my-sql-query-job
      tasks:
        - task_key: my-sql-query-task
          sql_task:
            warehouse_id: 1a111111a1111aa1
            query:
              query_id: 11111111-1111-1111-1111-111111111111

Other task settings

The following task settings allow you to configure behaviors for all tasks. For the corresponding REST API object definitions, see tasks.

Key	Type	Description
`compute_key`	String	The key of the compute resource to use for this task. If specified, `new_cluster`, `existing_cluster_id`, and `job_cluster_key` cannot be specified.
`depends_on`	Sequence	An optional list of task dependencies. Each item contains: `task_key` (String): Required. The key of the task this task depends on. `outcome` (String): Can be specified only for `condition_task`. If specified, the dependent task will only run if the condition evaluates to the specified outcome (either `true` or `false`).
`description`	String	An optional description for the task.
`disable_auto_optimization`	Boolean	Whether to disable automatic optimization for this task. If true, automatic optimizations like adaptive query execution will be disabled.
`email_notifications`	Map	An optional set of email addresses to notify when a run begins, completes, or fails. Each item contains: `on_start` (Sequence): List of email addresses to notify when a run starts. `on_success` (Sequence): List of email addresses to notify when a run completes successfully. `on_failure` (Sequence): List of email addresses to notify when a run fails. `on_duration_warning_threshold_exceeded` (Sequence): List of email addresses to notify when run duration exceeds the threshold. `on_streaming_backlog_suceeded` (Sequence): List of email addresses to notify when any streaming backlog thresholds are exceeded for any stream.
`environment_key`	String	The key of an environment defined in the job's `environments` configuration. Used to specify environment-specific settings. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.
`existing_cluster_id`	String	The ID of an existing cluster that will be used for all runs of this task.
`health`	Map	An optional specification for health monitoring of this task that includes a `rules` key, which is a list of health rules to evaluate.
`job_cluster_key`	String	The key of a job cluster defined in the job's `job_clusters` configuration.
`libraries`	Sequence	An optional list of libraries to be installed on the cluster that will execute the task. Each library is specified as a map with keys like `jar`, `egg`, `whl`, `pypi`, `maven`, `cran`, or `requirements`.
`max_retries`	Integer	An optional maximum number of times to retry the task if it fails. If not specified, the task will not be retried.
`min_retry_interval_millis`	Integer	An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. If not specified, the default is 0 (immediate retry).
`new_cluster`	Map	A specification for a new cluster to be created for each run of this task. See cluster.
`notification_settings`	Map	Optional notification settings for this task. Each item contains: `no_alert_for_skipped_runs` (Boolean): If true, do not send notifications for skipped runs. `no_alert_for_canceled_runs` (Boolean): If true, do not send notifications for canceled runs. `alert_on_last_attempt` (Boolean): If true, send notifications only on the last retry attempt.
`retry_on_timeout`	Boolean	An optional policy to specify whether to retry the task when it times out. If not specified, defaults to false.
`run_if`	String	An optional value indicating the condition under which the task should run. Valid values are: `ALL_SUCCESS` (default): Run if all dependencies succeed. `AT_LEAST_ONE_SUCCESS`: Run if at least one dependency succeeds. `NONE_FAILED`: Run if no dependencies have failed. `ALL_DONE`: Run when all dependencies complete, regardless of outcome. `AT_LEAST_ONE_FAILED`: Run if at least one dependency fails. `ALL_FAILED`: Run if all dependencies fail.
`task_key`	String	Required. A unique name for the task. This field is used to refer to this task from other tasks using the `depends_on` field.
`timeout_seconds`	Integer	An optional timeout applied to each run of this task. A value of 0 means no timeout. If not set, the default timeout from the cluster configuration is used.
`webhook_notifications`	Map	An optional set of system destinations to notify when a run begins, completes, or fails. Each item contains: `on_start` (Sequence): List of notification destinations when a run starts. `on_success` (Sequence): List of notification destinations when a run completes. `on_failure` (Sequence): List of notification destinations when a run fails. `on_duration_warning_threshold_exceeded` (Sequence): List of notification destinations when run duration exceeds threshold. `on_streaming_backlog_suceeded` (Sequence): List of email addresses to notify when any streaming backlog thresholds are exceeded for any stream.

Last updated on 2025-12-08

Add tasks to jobs in Databricks Asset Bundles

Configure tasks

Task settings

Condition task

Examples

Dashboard task

Examples

dbt task

Examples

For each task

Examples

JAR task

Examples

Notebook task

Examples

Pipeline task

Examples

Python script task

Examples

Python wheel task

Examples

Run job task

Examples

SQL task

Examples

Other task settings

Additional resources