What is Apache Iceberg in Azure Databricks?

Important

Unity Catalog-managed Apache Iceberg tables are available in Public Preview in Databricks Runtime 16.4 LTS and above. Foreign Iceberg tables are also in Public Preview in Databricks Runtime 16.4 LTS and above.

Apache Iceberg is an open source table format for analytics workloads. It supports features like schema evolution, time travel, and hidden partitioning. Like Delta Lake, Iceberg provides an abstraction layer that enables ACID transactions on data stored in object storage. Azure Databricks supports Iceberg tables that use the Apache Parquet file format. Iceberg maintains atomicity and consistency by writing new metadata files for each table change.

An Iceberg catalog is the top-level layer of the Iceberg table architecture. It handles operations like creating, dropping, and renaming tables. Its main responsibility is to provide the current metadata when a table is loaded. Azure Databricks supports Iceberg tables managed by:

All Iceberg tables in Azure Databricks follow the open Iceberg table format specification. See the Iceberg table spec.

Create Iceberg tables in Unity Catalog

Iceberg tables created in Unity Catalog are managed Iceberg tables. You can create these tables using:

Managed Iceberg tables are fully integrated with Azure Databricks platform features. Unity Catalog manages lifecycle tasks like snapshot expiration and file compaction on these tables. Managed Iceberg tables also support liquid clustering, which improves query performance.

Read Iceberg tables managed by other catalogs

A foreign Iceberg table is an Iceberg table managed by a catalog outside Unity Catalog. The external catalog stores the table's current metadata. Azure Databricks uses Lakehouse Federation to retrieve metadata and read the table from object storage.

Foreign Iceberg tables are read-only in Azure Databricks and have limited platform support.

Access Iceberg tables using external systems

You can access all Iceberg tables in Unity Catalog using the Iceberg REST Catalog API. This open API supports read and write operations from external Iceberg engines across different languages and platforms. See Access Azure Databricks tables from Apache Iceberg clients.

The REST Catalog supports credential vending, which delivers temporary credentials to external engines for accessing the underlying storage. For more information, see Unity Catalog credential vending for external system access.

Iceberg table limitations

The following limitations apply to Iceberg tables in Azure Databricks and are subject to change:

  • Iceberg tables support only the Apache Parquet file format.
  • Azure Databricks supports versions 1 and 2 of the Apache Iceberg specification, with the following exceptions:
    • Row-level deletes, including position deletes and equality-based deletes, aren't supported.
    • Branching and tagging aren't supported. Only the main branch is accessible when reading foreign Iceberg tables.
    • Partitioning:
      • Partition evolution is supported on managed Iceberg tables only when interacting from external Iceberg engines.
      • Foreign Iceberg tables don't support partition evolution.
      • Partitioning by BINARY type is not supported.
    • The following data types aren't supported:
      • UUID
      • Fixed(L)
      • TIME
      • Nested STRUCT with required fields
  • Managed Iceberg tables do not support primary key or foreign key constraints.

Managed Iceberg table limitations

The following limitations apply specifically to managed Iceberg tables:

  • Vector search isn't supported on managed Iceberg tables.

  • Apache Iceberg doesn't support change data feed. As a result, incremental processing is not supported when reading Managed Iceberg tables as a source for:

    • Materialized views and streaming tables
    • Lakehouse Monitoring
    • Online tables
    • Lakebase
    • Data classification
  • The following table properties are managed by Unity Catalog and cannot be manually set:

    • write.location-provider.impl
    • write.data.path
    • write.metadata.path
    • write.format.default
    • write.delete.format.default

Foreign Iceberg table limitations

The following limitations apply specifically to foreign Iceberg tables:

  • Time travel is supported only for Iceberg snapshots that have been previously read in Azure Databricks (that is, snapshots where a SELECT statement was executed).

  • Using bucket transform functions for Iceberg partitioning can degrade query performance when conditional filters are used.

  • Cloud storage tiering products are not integrated with foreign Iceberg tables. Accessing foreign Iceberg tables in Azure Databricks can restore data archived in lower-cost storage tiers.

  • On dedicated access mode clusters, reads and REFRESH FOREIGN TABLE operations on Iceberg tables require ALL PRIVILEGES.