CLUSTER BY
clause (TABLE)
Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above Delta Lake only
Defines liquid, multi-dimensional clustering for a Delta Lake table.
You can use this clause when you:
- Create a table using CREATE TABLE
- Alter a table with ALTER TABLE to change the clustering columns. To cluster rows with altered clustering columns, you must run OPTIMIZE. Note that rows clustered by previous clustering columns are not affected.
Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..
For more information on liquid clustering see Use liquid clustering for Delta tables
Syntax
CLUSTER BY { ( column_name [, ...] ] ) |
NONE }
Parameters
-
Specifies columns of the table by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.
NONE
Turns off clustering for the table being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a table, omit the
CLUSTER BY
clause.
Examples
You can find more examples in Use liquid clustering for Delta tables.
-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);
-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);
-- Recluster the table
> OPTIMIZE t;
-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;