Shallow clone for Unity Catalog tables
Important
Shallow clone support for Unity Catalog managed tables is in Public Preview in Databricks Runtime 13.3 and above. Shallow clone support for Unity Catalog external table is in Public Preview in Databricks Runtime 14.2 and above.
You can use shallow clone to create new Unity Catalog tables from existing Unity Catalog tables. Shallow clone support for Unity Catalog allows you to create tables with access control privileges independent from their parent tables without needing to copy underlying data files.
Important
You can only clone Unity Catalog managed tables to Unity Catalog managed tables and Unity Catalog external tables to Unity Catalog external tables. VACUUM
behavior differs between managed and external tables. See Vacuum and Unity Catalog shallow clones.
For more on Delta clone, see Clone a table on Azure Databricks.
For more on Unity Catalog tables, see What are tables and views?.
Create a shallow clone on Unity Catalog
You can create a shallow clone in Unity Catalog using the same syntax available for shallow clones throughout the product, as shown in the following syntax example:
CREATE TABLE <catalog-name>.<schema-name>.<target-table-name> SHALLOW CLONE <catalog-name>.<schema-name>.<source-table-name>
To create a shallow clone on Unity Catalog, you must have sufficient privileges on both the source and target resources, as detailed in the following table:
Resource | Permissions required |
---|---|
Source table | SELECT |
Source schema | USE SCHEMA |
Source catalog | USE CATALOG |
Target schema | USE SCHEMA , CREATE TABLE |
Target catalog | USE CATALOG |
Target external location (external tables only) | CREATE EXTERNAL TABLE |
Like other create table statements, the user who creates a shallow clone is the owner of the target table. The owner of a target cloned table can control the access rights for that table independently of the source table.
Note
The owner of a cloned table might be different than the owner of a source table.
Query or modify a shallow cloned table on Unity Catalog
Important
The instructions in this section describe privileges needed for compute configured with shared access mode. For Single User access mode, see Work with shallow cloned tables in Single User access mode.
To query a shallow clone on Unity Catalog, you must have sufficient privileges on the table and containing resources, as detailed in the following table:
Resource | Permissions required |
---|---|
Catalog | USE CATALOG |
Schema | USE SCHEMA |
Table | SELECT |
You must also have MODIFY
permissions on the target of the clone operation to complete the following actions:
- Insert records
- Delete records
- Update records
MERGE
CREATE OR REPLACE TABLE
DROP TABLE
Vacuum and Unity Catalog shallow clones
Important
This behavior is in Public Preview in Databricks Runtime 13.3 LTS and above for managed tables and Databricks Runtime 14.2 and above for external tables.
When you use Unity Catalog tables for the source and target of a shallow clone operation, Unity Catalog manages the underlying data files to improve reliability for the source and target of the clone operation. Running VACUUM
on the source of a shallow clone does not break the cloned table.
Normally, when VACUUM
identifies valid files for a given retention threshold, only the metadata for the current table is considered. Shallow clone support for Unity Catalog tracks the relationships between all cloned tables and the source data files, so valid files are expanded to include data files necessary for returning queries for any shallow cloned table as well as the source table.
This means that for Unity Catalog shallow clone VACUUM
semantics, a valid data file is any file within the specified retention threshold for the source table or any cloned table. Managed tables and external tables have slightly different semantics.
This enhanced tracking of metadata changes how VACUUM
operations impact data files backing the Delta tables, with the following semantics:
- For managed tables,
VACUUM
operations against either the source or target of a shallow clone operation might delete data files from the source table. - For external tables,
VACUUM
operations only remove data files from the source table when run against the source table. - Only data files not considered valid for the source table or any shallow clone against the source are removed.
- If multiple shallow clones are defined against a single source table, running
VACUUM
on any of the cloned tables does not remove valid data files for other cloned tables.
Note
Databricks recommends never running VACUUM
with a retention setting of less than 7 days to avoid corrupting ongoing long-running transactions. If you need to run VACUUM
with a lower retention threshold, make sure you understand how VACUUM
on shallow clones in Unity Catalog differs from how VACUUM
interacts with other cloned tables on Azure Databricks. See Clone a table on Azure Databricks.
Work with shallow cloned tables in Single User access mode
When working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as well as the target table.
This means that for simple queries in addition to the required permissions on the target table, you must have USE
permissions on the source catalog and schema and SELECT
permissions on the source table. For any queries that would update or insert records to the target table, you must also have MODIFY
permissions on the source table.
Databricks recommends working with Unity Catalog clones on compute with shared access mode as this allows independent evolution of permissions for Unity Catalog shallow clone targets and their source tables.
Limitations
- Shallow clones on external tables must be external tables. Shallow clones on managed tables must be managed tables.
- You cannot share shallow clones using Delta Sharing.
- You cannot nest shallow clones, meaning you cannot make a shallow clone from a shallow clone.
- For managed tables, dropping the source table breaks the target table for shallow clones. Data files backing external tables are not removed by
DROP TABLE
operations, and so shallow clones of external tables are not impacted by dropping the source. - Unity Catalog allows users to
UNDROP
managed tables for around 7 days after aDROP TABLE
command. In Databricks Runtime 13.3 LTS and above, managed shallow clones based on a dropped managed table continue to work during this 7 day period. If you do notUNDROP
the source table in this window, the shallow clone stops functioning once the source table's data files are garbage collected.