隔离级别Isolation levels

表的隔离级别定义了必须将某事务与并发事务所作修改进行隔离的程度。The isolation level of a table defines the degree to which a transaction must be isolated from modifications made by concurrent transactions. Azure Databricks 上的 Delta Lake 支持两个隔离级别:Serializable 和 WriteSerializable。Delta Lake on Azure Databricks supports two isolation levels: Serializable and WriteSerializable.

  • Serializable:最强隔离级别。Serializable: The strongest isolation level. 它可确保提交的写入操作和所有读取均可序列化It ensures that committed write operations and all reads are Serializable. 只要有一个串行序列一次执行一项操作,且生成与表中所示相同的结果,则可执行这些操作。Operations are allowed as long as there exists a serial sequence of executing them one-at-a-time that generates the same outcome as that seen in the table. 对于写入操作,串行序列与表的历史记录中所示完全相同。For the write operations, the serial sequence is exactly the same as that seen in the table’s history.

  • WriteSerializable(默认) :强度比 Serializable 低的隔离级别。WriteSerializable (Default): A weaker isolation level than Serializable. 它仅确保写入操作(而非读取)可序列化。It ensures only that the write operations (that is, not reads) are serializable. 但是,这仍比快照隔离更安全。However, this is still stronger than Snapshot isolation. WriteSerializable 是默认的隔离级别,因为对大多数常见操作而言,它使数据一致性和可用性之间达到良好的平衡。WriteSerializable is the default isolation level because it provides great balance of data consistency and availability for most common operations.

    在此模式下,Delta 表的内容可能与表历史记录中所示的操作序列不同。In this mode, the content of the Delta table may be different from that which is expected from the sequence of operations seen in the table history. 这是因为此模式允许某些并发写入对(例如操作 X 和 Y)继续执行,这样的话,即使历史记录显示在 X 之后提交了 Y,结果也像在 X 之前执行 Y 一样(即它们之间是可序列化的)。若要禁止这种重新排序,请将表隔离级别设置为 Serializable,以使这些事务失败。This is because this mode allows certain pairs of concurrent writes (say, operations X and Y) to proceed such that the result would be as if Y was performed before X (that is, serializable between them) even though the history would show that Y was committed after X. To disallow this reordering, set the table isolation level to be Serializable to cause these transactions to fail.

读取操作始终使用快照隔离。Read operations always use snapshot isolation. 写入隔离级别确定读取者是否有可能看到某个“从未存在”(根据历史记录)的表的快照。The write isolation level determines whether or not it is possible for a reader to see a snapshot of a table, that according to the history, “never existed”.

对于 Serializable 级别,读取者始终只会看到符合历史记录的表。For the Serializable level, a reader always sees only tables that conform to the history. 对于 WriteSerializable 级别,读取者可能会看到在增量日志中不存在的表。For the WriteSerializable level, a reader could see a table that does not exist in the Delta log.

例如,请考虑 txn1(长期删除)和 txn2(它插入 txn1 删除的数据)。For example, consider txn1, a long running delete and txn2, which inserts data deleted by txn1. txn2 和 txn1 完成,并且它们会按照该顺序记录在历史记录中。txn2 and txn1 complete and they are recorded in that order in the history. 根据历史记录,在 txn2 中插入的数据在表中不应该存在。According to the history, the data inserted in txn2 should not exist in the table. 对于 Serializable 级别,读取者将永远看不到由 txn2 插入的数据。For Serializable level, a reader would never see data inserted by txn2. 但是,对于 WriteSerializable 级别,读取者可能会在某个时间点看到由 txn2 插入的数据。However, for the WriteSerializable level, a reader could at some point see the data inserted by txn2.

若要详细了解哪些类型的操作可能会在每个隔离级别彼此冲突以及可能出现的错误,请参阅并发控制For more information on which types of operations can conflict with each other in each isolation level and the possible errors, see Concurrency control.

设置隔离级别 Set the isolation level

使用 ALTER TABLE 命令设置隔离级别。You set the isolation level using the ALTER TABLE command.

ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.isolationLevel' = <level-name>)

其中,<level-name>SerializableWriteSerializablewhere <level-name> is Serializable or WriteSerializable.

例如,若要将隔离级别从默认的 WriteSerializable 更改为 Serializable,请运行:For example, to change the isolation level from the default WriteSerializable to Serializable, run:

ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.isolationLevel' = 'Serializable')