MERGE INTO

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Merges a set of updates, insertions, and deletions based on a source table into a target Delta table.

This statement is supported only for Delta Lake tables.

This page contains details for using the correct syntax with the MERGE command. See Upsert into a Delta Lake table using merge for more guidance on how to use MERGE operations to manage your data.

Syntax

MERGE [ WITH SCHEMA EVOLUTION ] INTO target_table_name [target_alias]
   USING source_table_reference [source_alias]
   ON merge_condition
   { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
     WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
     WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]

matched_action
 { DELETE |
   UPDATE SET * |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }

not_matched_action
 { INSERT * |
   INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )

not_matched_by_source_action
 { DELETE |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }

Parameters

  • WITH SCHEMA EVOLUTION

    Applies to: check marked yes Databricks Runtime 15.2 and above

    Enables automatic schema evolution for this MERGE operation. When enabled, the schema of the target Delta table is automatically updated to match the schema of the source table.

  • target_table_name

    A Table name identifying the table being modified. The table referenced must be a Delta table.

    The table must not be a foreign table.

  • target_alias

    A Table alias for the target table. The alias must not include a column list.

  • source_table_reference

    A Table name identifying the source table to be merged into the target table.

  • source_alias

    A Table alias for the source table. The alias must not include a column list.

  • ON merge_condition

    How the rows from one relation are combined with the rows of another relation. An expression with a return type of BOOLEAN.

  • WHEN MATCHED [ AND matched_condition ]

    WHEN MATCHED clauses are executed when a source row matches a target table row based on the merge_condition and the optional match_condition.

  • matched_action

    • DELETE

      Deletes the matching target table row.

      Multiple matches are allowed when matches are unconditionally deleted. An unconditional delete is not ambiguous, even if there are multiple matches.

    • UPDATE

      Updates the matched target table row.

      To update all the columns of the target Delta table with the corresponding columns of the source dataset, use UPDATE SET *. This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...] for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.

      Note

      This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.

      Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 11.3 LTS and above

      You can specify DEFAULT as expr to explicitly update the column to its default value.

    If there are multiple WHEN MATCHED clauses, then they are evaluated in the order they are specified. Each WHEN MATCHED clause, except the last one, must have a matched_condition. Otherwise, the query returns a NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION error.

    If none of the WHEN MATCHED conditions evaluate to true for a source and target row pair that matches the merge_condition, then the target row is left unchanged.

  • WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ]

    WHEN NOT MATCHED clauses insert a row when a source row does not match any target row based on the merge_condition and the optional not_matched_condition.

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 12.2 LTS and above

    WHEN NOT MATCHED BY TARGET can be used as an alias for WHEN NOT MATCHED.

    not_matched_condition must be a Boolean expression.

    • INSERT *

      Inserts all the columns of the target Delta table with the corresponding columns of the source dataset. This is equivalent to INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...]) for all the columns of the target Delta table. This action requires that the source table has the same columns as those in the target table.

      Note

      This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.

    • INSERT ( ... ) VALUES ( ... )

      The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns, the column default is inserted, or NULL if none exists.

      Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 11.3 LTS and above

      You can specify DEFAULT as an expression to explicitly insert the column default for a target column.

    If there are multiple WHEN NOT MATCHED clauses, then they are evaluated in the order they are specified. All WHEN NOT MATCHED clauses, except the last one, must have not_matched_conditions. Otherwise, the query returns a NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION error.

  • WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ]

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 12.2 LTS and above

    WHEN NOT MATCHED BY SOURCE clauses are executed when a target row does not match any rows in the source table based on the merge_condition and the optional not_match_by_source_condition evaluates to true.

    not_matched_by_source_condition must be a Boolean expression that only references columns from the target table.

  • not_matched_by_source_action

    • DELETE

      Deletes the target table row.

    • UPDATE

      Updates the target table row. expr may only reference columns from the target table, otherwise the query will throw an analysis error.

      Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 11.3 LTS and above

      You can specify DEFAULT as expr to explicitly update the column to its default value.

    Important

    Adding a WHEN NOT MATCHED BY SOURCE clause to update or delete target rows when the merge_condition evaluates to false can lead to a large number of target rows being modified. For best performance, apply not_matched_by_source_conditions to limit the number of target rows updated or deleted.

    If there are multiple WHEN NOT MATCHED BY SOURCE clauses, then they are evaluated in the order they are specified. Each WHEN NOT MATCHED BY SOURCE clause, except the last one, must have a not_matched_by_source_condition. Otherwise, the query returns a NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION error.

    If none of the WHEN NOT MATCHED BY SOURCE conditions evaluate to true for a target row that doesn't match any rows in the source table based on the merge_condition, then the target row is left unchanged.

Important

A MERGE operation can fail with a DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error if multiple rows of the source dataset match and attempt to update the same rows of the target Delta table. According to the SQL semantics of merge, such an update operation is ambiguous as it is unclear which source row should be used to update the matched target row. You can preprocess the source table to eliminate the possibility of multiple matches. See the Change data capture example—it preprocesses the change dataset (that is, the source dataset) to retain only the latest change for each key before applying that change into the target Delta table.

Examples

You can use MERGE INTO for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Upsert into a Delta Lake table using merge for a few examples.

WHEN MATCHED

-- Delete all target rows that have a match in the source table.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN MATCHED THEN DELETE

-- Conditionally update target rows that have a match in the source table using the source value.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *

-- Multiple MATCHED clauses conditionally deleting matched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN MATCHED AND target.marked_for_deletion THEN DELETE
  WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT

WHEN NOT MATCHED [BY TARGET]

-- Insert all rows from the source that are not already in the target table.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN NOT MATCHED THEN INSERT *

-- Conditionally insert new rows in the target table using unmatched rows from the source table.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN NOT MATCHED BY TARGET AND source.created_at > now() - INTERVAL "1" DAY THEN INSERT (created_at, value) VALUES (source.created_at, DEFAULT)

WHEN NOT MATCHED BY SOURCE

-- Delete all target rows that have no matches in the source table.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN NOT MATCHED BY SOURCE THEN DELETE

-- Multiple NOT MATCHED BY SOURCE clauses conditionally deleting unmatched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
  ON target.key = source.key
  WHEN NOT MATCHED BY SOURCE AND target.marked_for_deletion THEN DELETE
  WHEN NOT MATCHED BY SOURCE THEN UPDATE SET target.value = DEFAULT

WITH SCHEMA EVOLUTION

-- Multiple MATCHED and NOT MATCHED clauses with schema evolution enabled.
> MERGE WITH SCHEMA EVOLUTION INTO target USING source
  ON source.key = target.key
  WHEN MATCHED THEN UPDATE SET *
  WHEN NOT MATCHED THEN INSERT *
  WHEN NOT MATCHED BY SOURCE THEN DELETE