Run tasks conditionally in an Azure Databricks job

By default, a job task runs when its dependencies have run and have all succeeded, but you can also configure tasks in an Azure Databricks job to run only when specific conditions are met. Azure Databricks Jobs supports the following methods to run tasks conditionally:

  • You can specify Run if dependencies to run a task based on the run status of the task's dependencies. For example, you can use Run if to run a task even when some or all of its dependencies have failed, allowing your job to recover from failures and continue running.
  • The If/else condition task is used to run a part of a job DAG based on the results of a boolean expression. The If/else condition task allows you to add branching logic to your job. For example, run transformation tasks only if the upstream ingestion task adds new data. Otherwise, run data processing tasks.

Add the Run if condition of a task

You can configure a Run if condition when you edit a task with one or more dependencies. To add the condition to the task, select the condition from the Run if dependencies drop-down menu in the task configuration. The Run if condition is evaluated after completing all task dependencies. You can also add a Run if condition when you add a new task with one or more dependencies.

Run if condition options

You can add the following Run if conditions to a task:

  • All succeeded: All dependencies have run and succeeded. This is the default condition to run a task. The task is marked as Upstream failed if the condition is unmet.
  • At least one succeeded: At least one dependency has succeeded. The task is marked as Upstream failed if the condition is unmet.
  • None failed: None of the dependencies failed, and at least one dependency was run. The task is marked as Upstream failed if the condition is unmet.
  • All done: The task is run after all its dependencies have run, regardless of the status of the dependent runs. This condition allows you to define a task that is run without depending on the outcome of its dependent tasks.
  • At least one failed: At least one dependency failed. The task is marked as Excluded if the condition is unmet.
  • All failed: All dependencies have failed. The task is marked as Excluded if the condition is unmet.

Note

  • Tasks configured to handle failures are marked as Excluded if their Run if condition is unmet. Excluded tasks are skipped and are treated as successful.
  • If all task dependencies are excluded, the task is also excluded, regardless of its Run if condition.
  • If you cancel a task run, the cancellation propagates through downstream tasks, and tasks with a Run if condition that handles failure are run, for example, to verify a cleanup task runs when a task run is canceled.

How does Azure Databricks Jobs determine job run status?

Azure Databricks Jobs determines whether a job run was successful based on the outcome of the job's leaf tasks. A leaf task is a task that has no downstream dependencies. A job run can have one of three outcomes:

  • Succeeded: All tasks were successful.
  • Succeeded with failures: Some tasks failed, but all leaf tasks were successful.
  • Failed: One or more leaf tasks failed.

Add branching logic to your job with the If/else condition task

Use the If/else condition task to run a part of a job DAG based on a boolean expression. The expression consists of a boolean operator and a pair of operands, where the operands might reference job or task state using job and task parameter variables or use task values.

Note

  • Numeric and non-numeric values are handled differently depending on the boolean operator:
    • The == and != operators perform string comparison of their operands. For example, 12.0 == 12 evaluates to false.
    • The >, >=, and <= operators perform numeric comparisons of their operands. For example, 12.0 >= 12 evaluates to true, and 10.0 >= 12 evaluates to false.
    • Only numeric, string, and boolean values are allowed when referencing task values in an operand. Any other types will cause the condition expression to fail. Non-numeric value types are serialized to strings and are treated as strings in If/else condition expressions. For example, if a task value is set to a boolean value, it is serialized to "true" or "false".

You can add an If/else condition task when you create a job or edit a task in an existing job. To configure an If/else condition task:

  1. In the Type drop-down menu, select If/else condition.
  2. In the first Condition text box, enter the operand to be evaluated. The operand can reference a job or task parameter variable or a task value.
  3. Select a boolean operator from the drop-down menu.
  4. In the second Condition text box, enter the value for evaluating the condition.

To configure dependencies on an If/else condition task:

  1. Select the If/else condition task in the DAG view and click + Add task.
  2. After entering details for the task, click Depends on and select <task-name> (true) where <task-name> is the name of the If/else condition task.
  3. Repeat for the condition evaluating to false.

For example, suppose you have a task named process_records that maintains a count of records that are not valid in a value named bad_records, and you want to branch processing based on whether records that are not valid are found. To add this logic to your workflow, you can create an If/else condition task with an expression like {{tasks.process_records.values.bad_records}} > 0. You can then add dependent tasks based on the results of the condition.

After the run of a job containing an If/else condition task completes, you can view the result of the expression and details of the expression evaluation when you view the job run details in the UI.