清空Vacuum

清除与表关联的文件。Clean up files associated with a table. 对于 Apache Spark 和 Delta 表,此命令有不同版本。There are different versions of this command for Apache Spark and Delta tables.

清空 Spark 表Vacuum a Spark table

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS]

RETAIN num HOURS

保留期阈值。The retention threshold.

以递归方式清空与 Spark 表关联的目录,并删除超过保留期阈值的未提交文件。Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. 默认阈值为 7 天。The default threshold is 7 days. Azure Databricks 在数据写入时自动触发 VACUUM 操作。Azure Databricks automatically triggers VACUUM operations as data is written. 请参阅清除未提交的文件See Clean up uncommitted files.

清空 Delta 表(Azure Databricks 上的 Delta Lake) Vacuum a Delta table (Delta Lake on Azure Databricks)

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS] [DRY RUN]

以递归方式清空与 Delta 表关联的目录,并删除不再处于表事务日志最新状态且超过保留期阈值的文件。Recursively vacuum directories associated with the Delta table and remove files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. 默认阈值为 7 天。The default threshold is 7 days. Azure Databricks 不会对 Delta 表自动触发 VACUUM 操作。Azure Databricks does not automatically trigger VACUUM operations on Delta tables. 请参阅 VacuumSee Vacuum.

如果对 Delta 表运行 VACUUM,则将无法再回头按时间顺序查看在指定数据保留期之前创建的版本。If you run VACUUM on a Delta table, you lose the ability time travel back to a version older than the specified data retention period.

RETAIN num HOURS

保留期阈值。The retention threshold.

DRY RUN

返回要删除的文件的列表。Return a list of files to be deleted.