Unity Catalog GA release note

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See What is Unity Catalog?.

August 25, 2022

Unity Catalog is now generally available on Azure Databricks.

This article describes Unity Catalog as of the date of its GA release. It focuses primarily on the features and updates added to Unity Catalog since the Public Preview. For current information about Unity Catalog, see What is Unity Catalog?. For release notes that describe updates to Unity Catalog since GA, see Azure Databricks platform release notes and Databricks Runtime release notes versions and compatibility.

Metastore limits and resource quotas

As of August 25, 2022

  • Your Azure Databricks account can have only one metastore per region
  • A metastore can have up to 1000 catalogs.
  • A catalog can have up to 10,000 schemas.
  • A schema can have up to 10,000 tables.

For current Unity Catalog quotas, see Resource quotas.

Supported storage formats at GA

As of August 25, 2022:

  • All managed Unity Catalog tables store data with Delta Lake
  • External Unity Catalog tables and external locations support Delta Lake, JSON, CSV, Avro, Parquet, ORC, and text data.

For current Unity Catalog supported table formats, see File format support.

Manage Unity Catalog resources from the account console

Use the Azure Databricks account console UI to:

Supported cluster types and Databricks Runtime versions

Unity Catalog requires clusters that run Databricks Runtime 11.1 or above. Unity Catalog is supported by default on all SQL warehouse compute versions.

Earlier versions of Databricks Runtime supported preview versions of Unity Catalog. Clusters running on earlier versions of Databricks Runtime do not provide support for all Unity Catalog GA features and functionality.

Unity Catalog requires one of the following access modes when you create a new cluster:

  • Shared
    • Languages: SQL or Python
    • A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other's data and credentials.
  • Single user
    • Languages: SQL, Scala, Python, R
    • A secure cluster that can be used exclusively by a specified single user.

For more information about cluster access modes, see Access modes.

For information about updated Unity Catalog functionality in later Databricks Runtime versions, see the release notes for those versions.

System tables

information_schema is fully supported for Unity Catalog data assets. Each metastore includes a catalog referred to as system that includes a metastore scoped information_schema. See Information schema. You can use information_schema to answer questions like the following:

"Count the number of tables per catalog"

SELECT table_catalog, count(table_name)
FROM system.information_schema.tables
GROUP BY 1
ORDER by 2 DESC

"Show me all of the tables that have been altered in the last 24 hours"

SELECT table_name, table_owner, created_by, last_altered, last_altered_by, table_catalog
FROM system.information_schema.tables
WHERE  datediff(now(), last_altered) < 1

Structured Streaming support

Structured Streaming workloads are now supported with Unity Catalog. For details and limitations, see Limitations.

See also Using Unity Catalog with Structured Streaming.

SQL functions

User-defined SQL functions are now fully supported on Unity Catalog. For information about how to create and use SQL UDFs, see CREATE FUNCTION (SQL and Python).

SQL syntax for external locations in Unity Catalog

Standard data definition and data definition language commands are now supported in Spark SQL for external locations, including the following:

CREATE | DROP | ALTER | DESCRIBE | SHOW EXTERNAL LOCATION

You can also manage and view permissions with GRANT, REVOKE, and SHOW for external locations with SQL. See External locations.

Example Syntax:

CREATE EXTERNAL LOCATION <your-location-name>
  URL `<your-location-path>'
  WITH (CREDENTIAL <your-credential-name>);

GRANT READ FILES, WRITE FILES, CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `<your-location-name>`
  TO `finance`;

Unity Catalog limitations at GA

As of August 25, 2022, Unity Catalog had the following limitations. For current limitations, see Limitations.

  • Scala, R, and workloads using the Machine Learning Runtime are supported only on clusters using the single user access mode. Workloads in these languages do not support the use of dynamic views for row-level or column-level security.
  • Shallow clones are not supported when using Unity Catalog as the source or target of the clone.
  • Bucketing is not supported for Unity Catalog tables. If you run commands that try to create a bucketed table in Unity Catalog, it will throw an exception.
  • Writing to the same path or Delta Lake table from workspaces in multiple regions can lead to unreliable performance if some clusters access Unity Catalog and others do not.
  • Overwrite mode for DataFrame write operations into Unity Catalog is supported only for Delta tables, not for other file formats. The user must have the CREATE privilege on the parent schema and must be the owner of the existing object.
  • Streaming currently has the following limitations:
    • It is not supported in clusters using shared access mode. For streaming workloads, you must use single user access mode.
    • Asynchronous checkpointing is not yet supported.
    • On Databricks Runtime version 11.2 and below, streaming queries that last more than 30 days on all-purpose or jobs clusters will throw an exception. For long-running streaming queries, configure automatic job retries or use Databricks Runtime 11.3 and above.
  • Referencing Unity Catalog tables from Delta Live Tables pipelines is currently not supported.
  • Groups previously created in a workspace cannot be used in Unity Catalog GRANT statements. This is to ensure a consistent view of groups that can span across workspaces. To use groups in GRANT statements, create your groups in the account console and update any automation for principal or group management (such as SCIM, Okta and Microsoft Entra ID connectors, and Terraform) to reference account endpoints instead of workspace endpoints.