合并到(Azure Databricks 上的 Delta Lake)Merge Into (Delta Lake on Azure Databricks)
基于源表将一组更新、插入和删除操作合并到目标 Delta 表中。Merge a set of updates, insertions, and deletions based on a source table into a target Delta table.
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [<time_travel_version>] [AS source_alias]
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
其中where
<matched_action> =
DELETE |
UPDATE SET * |
UPDATE SET column1 = value1 [, column2 = value2 ...]
<not_matched_action> =
INSERT * |
INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
<time_travel_version> =
TIMESTAMP AS OF timestamp_expression |
VERSION AS OF version
- 可以有任意数量的
WHEN MATCHED
和WHEN NOT MATCHED
子句。There can be any number ofWHEN MATCHED
andWHEN NOT MATCHED
clauses.
备注
在 Databricks Runtime 7.2 及更低版本中,MERGE
最多可以有 2 个 WHEN MATCHED
子句和最多 1 个 WHEN NOT MATCHED
子句。In Databricks Runtime 7.2 and below, MERGE
can have at most 2 WHEN MATCHED
clauses and at most 1 WHEN NOT MATCHED
clause.
- 当源行根据匹配条件与目标表行匹配时,将执行
WHEN MATCHED
子句。WHEN MATCHED
clauses are executed when a source row matches a target table row based on the match condition. 这些子句具有以下语义。These clauses have the following semantics.WHEN MATCHED
子句最多可以有 1 个UPDATE
和 1 个DELETE
操作。WHEN MATCHED
clauses can have at most onUPDATE
and oneDELETE
action.merge
中的UPDATE
操作只更新匹配目标行的指定列。TheUPDATE
action inmerge
only updates the specified columns of the matched target row.DELETE
操作将删除匹配的行。TheDELETE
action will delete the matched row.- 每个
WHEN MATCHED
子句都可以有一个可选条件。EachWHEN MATCHED
clause can have an optional condition. 如果存在此子句条件,则仅当该子句条件成立时,才对任何匹配的源-目标行对行执行UPDATE
或DELETE
操作。If this clause condition exists, theUPDATE
orDELETE
action is executed for any matching source-target row pair row only when when the clause condition is true. - 如果有多个
WHEN MATCHED
子句,则将按照指定的顺序对其进行求值(即,子句的顺序很重要)。If there are multipleWHEN MATCHED
clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). 除最后一个之外,所有WHEN MATCHED
子句都必须具有条件。AllWHEN MATCHED
clauses, except the last one, must have conditions. - 如果 2 个
WHEN MATCHED
子句都具有条件,并且对于匹配的源-目标行对都没有条件成立,那么匹配的目标行将保持不变。If bothWHEN MATCHED
clauses have conditions and neither of the conditions are true for a matching source-target row pair, then the matched target row is left unchanged. - 若要使用源数据集的相应列更新目标 Delta 表的所有列,请使用
UPDATE SET *
。To update all the columns of the target Delta table with the corresponding columns of the source dataset, useUPDATE SET *
. 这等效于目标 Delta 表的所有列UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]
。This is equivalent toUPDATE SET col1 = source.col1 [, col2 = source.col2 ...]
for all the columns of the target Delta table. 因此,此操作假定源表的列与目标表的列相同,否则查询将引发分析错误。Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.- 启用自动架构迁移后,此行为将更改。This behavior changes when automatic schema migration is enabled. 有关详细信息,请参阅自动架构演变。See Automatic schema evolution for details.
- 当源行根据匹配条件与任何目标行都不匹配时,将执行
WHEN NOT MATCHED
子句。WHEN NOT MATCHED
clauses are executed when a source row does not match any target row based on the match condition. 这些子句具有以下语义。These clauses have the following semantics.WHEN NOT MATCHED
子句只能具有INSERT
操作。WHEN NOT MATCHED
clauses can only have theINSERT
action. 新行是基于指定的列和相应的表达式生成的。The new row is generated based on the specified column and corresponding expressions. 你无需指定目标表中的所有列。All the columns in the target table do not need to be specified. 对于未指定的目标列,将插入NULL
。For unspecified target columns,NULL
will be inserted.备注
在 Databricks Runtime 6.5 及更低版本中,必须为
INSERT
操作提供目标表中的所有列。In Databricks Runtime 6.5 and below, you must provide all columns in the target table for theINSERT
action.每个
WHEN NOT MATCHED
子句都可以有一个可选条件。EachWHEN NOT MATCHED
clause can have an optional condition. 如果存在子句条件,则仅当源条件对该行成立时才插入该行。If the clause condition is present, a source row is inserted only if that condition is true for that row. 否则,将忽略源列。Otherwise, the source column is ignored.如果有多个
WHEN NOT MATCHED
子句,则将按照指定的顺序对其进行求值(即,子句的顺序很重要)。If there are multipleWHEN NOT MATCHED
clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). 除最后一个之外,所有WHEN NOT MATCHED
子句都必须具有条件。AllWHEN NOT MATCHED
clauses, except the last one, must have conditions.若要使用源数据集的相应列插入目标 Delta 表的所有列,请使用
INSERT *
。To insert all the columns of the target Delta table with the corresponding columns of the source dataset, useINSERT *
. 这等效于目标 Delta 表的所有列INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])
。This is equivalent toINSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])
for all the columns of the target Delta table. 因此,此操作假定源表的列与目标表的列相同,否则查询将引发分析错误。Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.备注
启用自动架构迁移后,此行为将更改。This behavior changes when automatic schema migration is enabled. 有关详细信息,请参阅自动架构演变。See Automatic schema evolution for details.
重要
如果源数据集的多行匹配并尝试更新目标 Delta 表的相同行,则 MERGE
操作可能会失败。A MERGE
operation can fail if multiple rows of the source dataset match and attempt to update the same rows of the target Delta table. 根据合并的 SQL 语义,这种更新操作模棱两可,因为尚不清楚应使用哪个源行来更新匹配的目标行。According to the SQL semantics of merge, such an update operation is ambiguous as it is unclear which source row should be used to update the matched target row. 你可以预处理源表来消除出现多个匹配项的可能性。You can preprocess the source table to eliminate the possibility of multiple matches. 请参阅变更数据捕获示例 - 它对变更数据集(即源数据集)进行预处理,以仅保留每键的最新更改,然后再将更改应用到目标 Delta 表中。See the Change data capture example—it preprocesses the change dataset (that is, the source dataset) to retain only the latest change for each key before applying that change into the target Delta table.
备注
在 Databricks Runtime 7.3 及更高版本中,无条件删除匹配项时允许多个匹配项(因为即使有多个匹配项,无条件删除也非常明确)。In Databricks Runtime 7.3 and above, multiple matches are allowed when matches are unconditionally deleted (since unconditional delete is not ambiguous even if there are multiple matches).
示例Examples
可以将 MERGE
用于复杂的操作,如删除重复数据、更新插入更改数据、应用 SCD 类型 2 操作等。请参阅合并示例获取一些示例。You can use MERGE
for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Merge examples for a few examples.