Create a Unity Catalog metastore
This article shows how to create a Unity Catalog metastore and link it to workspaces.
Important
For workspaces that were enabled for Unity Catalog automatically, the instructions in this article are unnecessary. Databricks began to enable new workspaces for Unity Catalog automatically on November 9, 2023, with a rollout proceeding gradually across accounts. You must follow the instructions in this article only if you have a workspace and don't already have a metastore in your workspace region. To determine whether a metastore already exists in your region, see Automatic enablement of Unity Catalog.
A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastores register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them. Each metastore exposes a three-level namespace (catalog
.schema
.table
) by which data can be organized. You must have one metastore for each region in which your organization operates. To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
To create a metastore, you do the following:
In your Azure account, optionally create a storage location for metastore-level storage of managed tables and volumes.
For information to help you decide whether you need metastore-level storage, see (Optional) Create metastore-level storage and Data is physically separated in storage.
In your Azure account, create an Azure managed identity or service principal that gives access to that storage location.
In Azure Databricks, create the metastore, attaching the storage location, and assign workspaces to the metastore.
Note
In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.
Before you begin
Before you begin, you should familiarize yourself with the basic Unity Catalog concepts, including metastores and managed storage. See What is Unity Catalog?.
You should also confirm that you meet the following requirements for all setup steps:
You must be an Azure Databricks account admin.
The first Azure Databricks account admin must be a Microsoft Entra ID Global Administrator at the time that they first log in to the Azure Databricks account console. Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Microsoft Entra ID Global Administrator role to access the Azure Databricks account. The first account admin can assign users in the Microsoft Entra ID tenant as additional account admins (who can themselves assign more account admins). Additional account admins do not require specific roles in Microsoft Entra ID.
The workspaces that you attach to the metastore must be on the Azure Databricks Premium plan.
If you want to set up metastore-level root storage, you must have permission to create the following in your Azure tenant:
- A storage account to use with Azure Data Lake Storage Gen2. See Create a storage account to use with Azure Data Lake Storage Gen2.
- A new resource to hold a system-assigned managed identity. This requires that you be a Contributor or Owner of a resource group in any subscription in the tenant.
Step 1 (Optional): Create a storage container for metastore-level managed storage
In this step, which is optional, you create a storage account and container to store managed table and volume data at the metastore level. To determine whether you need metastore-level storage, see (Optional) Create metastore-level storage.
Create a storage account for Azure Data Lake Storage Gen2.
This storage account will contain Unity Catalog managed tables and volumes. This must be an Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces. See Create a storage account to use with Azure Data Lake Storage Gen2.
Create a storage container that will hold your managed tables and volume data at the metastore level.
You can create only one metastore per region. You must use the same region for your metastore and storage container.
This metastore-level storage location can be overridden at the catalog and schema levels. See Specify a managed storage location in Unity Catalog.
Make a note of the ADLSv2 URI for the container, which is in the following format:
abfss://<container-name>@<storage-account-name>.dfs.core.chinacloudapi.cn/<metastore-name>
In the steps that follow, replace
<storage-container>
with this URI.
Step 2 (Optional): Create a managed identity to access the managed storage location
In this step, which is required only if you completed step 1, you create an Azure Databricks access connector that holds a managed identity and give it access to the storage container.
Follow the instructions in Use Azure managed identities in Unity Catalog to access storage.
Note
You can use either an Azure managed identity or a service principal as the identity that gives access to the metastore's storage container. Databricks strongly recommends managed identities, because they do not require you to maintain credentials or rotate secrets, and they let you connect to an Azure Data Lake Storage Gen2 account that is protected by a storage firewall. If you want to use a service principal, see Create Unity Catalog managed storage using a service principal (legacy).
Step 3: Create the metastore and attach a workspace
Each Azure Databricks region requires its own Unity Catalog metastore.
You create a metastore for each region in which your organization operates. You can link each of these regional metastores to any number of workspaces in that region. Each linked workspace has the same view of the data in the metastore, and data access control can be managed across workspaces. You can access data in other metastores using Delta Sharing.
If you chose to create metastore-level storage, the metastore will use the the storage container and Azure managed identity that you created in the previous steps.
To create a metastore:
If you chose to create metastore-level storage, make sure that you have the path to the storage container and the resource ID of the Azure Databricks access connector that you created in the previous task.
Log in to your workspace as an account admin.
Click your username in the top bar of the Azure Databricks workspace and select Manage Account.
Log in to the Azure Databricks account console.
Click Catalog.
Click Create metastore.
Enter the following:
Name for the metastore.
Region where the metastore will be deployed.
This must be in the same region as the workspaces you want to use to access the data. If you chose to create a storage container for metastore-level storage, that region must also be the same.
(Optional) ADLS Gen 2 path: Enter the path to the storage container that you will use as root storage for the metastore.
The
abfss://
prefix is added automatically.(Optional) Access Connector ID: Enter the Azure Databricks access connector's resource ID in the format:
/subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
Click Create.
When prompted, select workspaces to link to the metastore.
For details, see Enable a workspace for Unity Catalog.
Transfer the metastore admin role to a group.
The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See Assign a metastore admin.
Enable Azure Databricks management of uploads to managed volumes.
Azure Databricks uses cross-origin resource sharing (CORS) to upload data to managed volumes in Unity Catalog. See Configure Unity Catalog storage account for CORS.
Next steps
- Create catalogs
- Create schemas
- What is a table?
- Learn more about Unity Catalog
Add managed storage to an existing metastore
Metastore-level managed storage is optional, and it is not included for metastores that were created automatically. You might want to add metastore-level storage to your metastore if you prefer a data isolation model that stores data centrally for multiple workspaces. You need metastore-level storage if you want to share notebooks using Delta Sharing or if you are an Azure Databricks partner who uses personal staging locations.
See also Specify a managed storage location in Unity Catalog.
Requirements
- You must have at least one workspace attached to the Unity Catalog metastore.
- Azure Databricks permissions required:
- To create an external location, you must be a metastore admin or user with the
CREATE EXTERNAL LOCATION
andCREATE STORAGE CREDENTIAL
privileges. - To add the storage location to the metastore definition, you must be an account admin. For instructions on enabling the account admin role in your account, see Establish your first account admin.
- To create an external location, you must be a metastore admin or user with the
- Azure tenant permissions required:
- Permission to create a storage account to use with Azure Data Lake Storage Gen2. This storage account must have a hierarchical namespace. See Create a storage account to use with Azure Data Lake Storage Gen2.
- Permission to create a new resource to hold a system-assigned managed identity. This requires that you be a Contributor or Owner of a resource group in any subscription in the tenant.
Step 1: Create the storage location
Follow the instructions in Step 1 (Optional): Create a storage container for metastore-level managed storage and Step 2 (Optional): Create a managed identity to access the managed storage location to create a storage container in Azure Data Lake Storage Gen2 and an Azure Databricks access connector that holds a managed identity that has access to the storage container.
Step 2: Create an external location in Unity Catalog
In this step, you create an external location in Unity Catalog that references the ADLS Gen 2 path that you just created.
Create a storage credential.
The storage credential will represent the Azure managed identity that you created in Step 1: Create the storage location.
Follow the instructions in Create a storage credential for connecting to Azure Data Lake Storage Gen2.
Create an external location that references the storage credential that you created in the previous step and the ADLS Gen 2 storage container that you created in Step 1: Create the storage location.
Follow the instructions in Create an external location to connect cloud storage to Azure Databricks
Grant yourself the
CREATE MANAGED STORAGE
privilege on the external location.- Click the external location name to open the details pane.
- On the Permissions tab, click Grant.
- On the Grant on
<external location>
dialog, select yourself in the Principals field and selectCREATE MANAGED STORAGE
. - Click Grant.
Step 3: Add the storage location to the metastore
After you have created an external location that represents the metastore storage bucket, you can add it to the metastore.
As an account admin, log in to the account console.
Click Catalog.
Click the metastore name.
Confirm that you are the Metastore Admin.
If you are not, click Edit and assign yourself as the metastore admin. You can unassign yourself when you are done with this procedure.
On the Configuration tab, next to ADLS Gen 2 path, click Set.
On the Set metastore root dialog, enter the ADLS Gen 2 path that you used to create the external location, and click Update.
You cannot modify this path once you set it.
Delete a metastore
If you are closing your Azure Databricks account or have another reason to delete access to data managed by your Unity Catalog metastore, you can delete the metastore.
Warning
All objects managed by the metastore will become inaccessible using Azure Databricks workspaces. This action cannot be undone.
Managed table data and metadata will be auto-deleted after 30 days. External table data in your cloud storage is not affected by metastore deletion.
To delete a metastore:
- As a metastore admin, log in to the account console.
- Click Catalog.
- Click the metastore name.
- On the Configuration tab, click the three-button menu at the far upper right and select Delete.
- On the confirmation dialog, enter the name of the metastore and click Delete.