CLUSTER BY clause (TABLE)

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above check marked yes Delta Lake only

Defines liquid, multi-dimensional clustering for a Delta Lake table.

You can use this clause when you:

  • Create a table using CREATE TABLE
  • Alter a table with ALTER TABLE to change the clustering columns. To cluster rows with altered clustering columns, you must run OPTIMIZE. Note that rows clustered by previous clustering columns are not affected.

Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..

For more information on liquid clustering see Use liquid clustering for Delta tables

Syntax

CLUSTER BY { ( column_name [, ...] ] ) |
             NONE }

Parameters

  • column_name

    Specifies columns of the table by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.

  • NONE

    Turns off clustering for the table being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a table, omit the CLUSTER BY clause.

Examples

You can find more examples in Use liquid clustering for Delta tables.

-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);

-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);

-- Recluster the table
> OPTIMIZE t;

-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;