Use row tracking for Delta tables
Delta Lake row tracking allows Azure Databricks to track row-level lineage in a Delta table. This feature is required for some incremental updates for materialized views.
Important
Delta Lake row tracking is available in Databricks Runtime 14.1 and above.
Row tracking is a table feature and uses a higher table writer protocol than some Delta Lake clients. Table protocol versions cannot be downgraded, and tables with row tracking enabled are not writeable by Delta Lake clients that do not support all enabled Delta writer protocol table features. See How does Azure Databricks manage Delta Lake feature compatibility?.
Enable row tracking
You must explicitly enable row tracking by setting the table property delta.enableRowTracking = true
.
You can do this during table creation, as in the following example:
CREATE TABLE table_name
TBLPROPERTIES (delta.enableRowTracking = true)
AS SELECT * FROM source_table;
You can alter an existing table to enable row tracking, as in the following example:
ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = true);
Important
Enabling row tracking on existing tables automatically assigns row IDs and row commit versions to all existing rows in the table. This process can result in the creation of multiple new versions of the table and could take a significant amount of time.
Cloning a Delta table creates a separate history, so the row IDs and row commit versions on cloned tables do not match those in the original table.
What is the schema of the row tracking metadata fields?
Row tracking adds two hidden metadata fields to the table. You can explicitly add these fields to your query to return the values.
Column name | Type | Values | Explanation |
---|---|---|---|
_metadata.row_id |
Long | The unique identifier of the row. | A row keeps the same ID whenever it is modified using a MERGE or UPDATE statement. |
_metadata.row_commit_version |
Long | The Delta log or table version at which the row was last inserted or updated. | A row is assigned a new version whenever it is modified using a MERGE or UPDATE statement. |
Some operations store these metadata fields using the transaction log. Running OPTIMIZE
or REORG
operations on a table with row tracking enabled rewrites data files to store these fields.
Disable row tracking
You can disable row tracking by setting the table property to false
.
ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = false);
Important
Disabling row tracking does not remove the corresponding table feature and does not downgrade the table protocol version. It also does not remove the metadata fields from the target table.
With row tracking disabled, the row IDs generated are no longer reliable for tracking unique rows.
Limitations
The row IDs and row commit versions metadata fields cannot be accessed while reading the change data feed. See Use Delta Lake change data feed on Azure Databricks.