FSCK REPAIR TABLE
Applies to: Databricks SQL Databricks Runtime
Removes the file entries from the transaction log of a Delta table that can no longer be found in the underlying file system. This can happen when these files have been manually deleted.
Syntax
FSCK REPAIR TABLE table_name [DRY RUN]
Parameters
-
Identifies an existing Delta table. The name must not include a temporal specification.
DRY RUN
Shows information about the file entries that would be removed from the transaction log of a Delta table by
FSCK REPAIR TABLE
, because they can no longer be found in the underlying file system. This can happen when these files have been manually deleted. File entries are either a data file path or a combination of a data file path and deletion vector file path. File entries are included in the output when the data file is missing, when the deletion vector file is missing, or when both are missing.By default,
DRY RUN
only returns the first 1000 files. You can increase this threshold by setting the SparkSession variablespark.databricks.delta.fsck.maxNumEntriesInResult
to a higher value before running the command in a notebook.
Returns
For DRY RUN
A report of the form:
dataFilePath STRING NOT NULL
dataFileMissing BOOLEAN NOT NULL
deletionVectorPath STRING
deletionVectorFileMissing BOOLEAN NOT NULL
Examples
— Assume file1.parquet is missing and no DV is expected.
> FSCK REPAIR TABLE t DRY RUN;
dataFilePath dataFileMissing deletionVectorPath deletionVectorFileMissing
------------- --------------- ------------------ -------------------------
file1.parquet true null false
— Assume dv1.bin is missing.
> FSCK REPAIR TABLE t DRY RUN;
dataFilePath dataFileMissing deletionVectorPath deletionVectorFileMissing
------------- --------------- ------------------ -------------------------
file1.parquet false dv1.bin true