Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page gives an overview of how to govern data using Unity Catalog in Azure Databricks.
Note
This page focuses on the governance of data. Related security topics, such as the following, are covered in Security and compliance:
- Authentication and access control
- Network configuration
- Data security and encryption
What is Unity Catalog?
Unity Catalog is a centralized data catalog that provides fine-grained access control for tabular and unstructured data in multiple formats on multiple platforms, along with governance of AI assets like machine learning models. It also includes the tools you need to discover data, track usage, capture lineage, and monitor data quality.
Unity Catalog is open-source and supports multiple platforms. It is deeply integrated into Azure Databricks.
The Unity Catalog data governance model
Data governance with Unity Catalog provides the following:
- Data unification: a unified view of all data and AI assets, across platforms, reducing duplication and sprawl.
- Data access control: tools to ensure that data is easy to access, but only for the right users.
- Data discoverability: tools that make it easy to find the data you need.
- Data quality: tools to ensure that data that is accurate, complete, consistent, and secure throughout its lifecycle.
- Data collaboration and sharing: the ability to share data securely not just within your organization but across organizational and platform boundaries.
- Auditing: tools that capture who uses the data and how.
This page explains how your organization can meet these needs using Unity Catalog in Azure Databricks.
Data access control
To make sure that users only access the data they should, Unity Catalog provides a hierarchical privilege model that enables you to grant users, groups, and service principals access to data and AI assets from the account level down to table rows and columns. You can control access to assets that are stored in dedicated Unity Catalog storage or stored in other platforms, like cloud storage or database systems: the key is that Unity Catalog gives your users potential access to all of your data, no matter where it is, from within Azure Databricks, and that Unity Catalog controls their access and tracks their data usage.
Task | Description |
---|---|
Manage privileges | Learn about the securable objects that Unity Catalog manages and how to control access to them. |
Manage attribute-based access control (ABAC) | Learn how to control access ot data using ABAC in Unity Catalog. |
Manage identities | Learn how to manage identities in the context of Unity Catalog. |
Fine-grained access control | Learn how to control access to table data using row filters and column masks. |
Manage access to external storage and data platforms | Learn how to control access to cloud storage, external data platforms, and external non-data services using Unity Catalog. |
Manage access from external platforms | Learn how Unity Catalog can manage access to your data from external platforms that use the Apache Iceberg or open-source Unity Catalog APIs. |
Data discoverability
Azure Databricks and Unity Catalog provide the following tools to help users find the data they need:
Feature | Description |
---|---|
Catalog Explorer | Browse and search for data and AI assets using asset names and metadata such as comments and tags. |
Catalog browsers | Find data and AI assets using browsers that are built into the notebook and SQL query editors. See Navigate the Databricks notebook and file editor and Write queries and explore data in the new SQL editor. |
Table insights | Use a UI built into Catalog Explorer to view the most frequent users and queries of any table in Unity Catalog. |
Data lineage | Capture and visualize the way data flows through your organization. For feature and model lineage, see Feature governance and lineage. |
Entity relationship diagrams (ERD) | Display relationships for tables that have foreign keys defined. |
See also Discover data.
Data collaboration and sharing
Unity Catalog lets your users collaborate on the same data across all of your account's workspaces in the same region. When you require collaboration across workspace regions, across organizations, and across platforms, Unity Catalog provides the foundation for the following sharing tools.
Feature | Description |
---|---|
Delta Sharing | A secure data sharing platform that lets you share data and AI assets in Azure Databricks with users outside your organization, whether those users use Databricks or not. |
Auditing
Audit logs capture fine-grained details about who accessed a given dataset and the actions that they performed. Unity Catalog adds system tables, the easiest way to access and query your account's audit logs.
Legacy Azure Databricks data governance tools
Azure Databricks also provides these legacy governance features. Databricks recommends that you use Unity Catalog instead.
Feature | Description |
---|---|
Table access control | A legacy data governance model that lets you programmatically grant and revoke access to objects managed by your workspace's built-in Hive metastore. |
Azure Data Lake Storage credential passthrough | A legacy data governance feature that allows you authenticate automatically to Azure Storage from Azure Databricks clusters using the same Microsoft Entra ID identity that you use to log into Azure Databricks. |
Next steps
- Learn more about Unity Catalog: What is Unity Catalog?
- Get started with Unity Catalog: Get started with Unity Catalog
- Review best practices: What is Unity Catalog?