Create Unity Catalog managed storage using a service principal (legacy)

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. To view current admin documentation, see Manage your Azure Databricks account.

This article describes a legacy method of creating external locations and managed storage using a service principal. Instead of service principals, Databricks strongly recommends that you use an Azure managed identity as the identity that gives access to the storage container. See Use Azure managed identities in Unity Catalog to access storage.

However, if you still want to use a service principal, or if you have existing Unity Catalog managed storage or external locations that use a service principal and you want to understand the process, the instructions are provided in this archive article.

Create a metastore that is accessed using a service principal (legacy)

To create a Unity Catalog metastore that is accessed by a service principal:

  1. Create a storage account for Azure Data Lake Storage Gen2.

    A storage container in this account will store all of the metastore's managed tables, except those that are in a catalog or schema with their own managed storage location.

    See Create a storage account to use with Azure Data Lake Storage Gen2. This must be a Premium performance Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces.

  2. Create a container in the new storage account.

    Make a note of the ADLSv2 URI for the container, which is in the following format:

    abfss://<container-name>@<storage-account-name>.dfs.core.chinacloudapi.cn/<metastore-name>
    

    In the steps that follow, replace <storage-container> with this URI.

  3. In Microsoft Entra ID, create a service principal and assign it permissions to the storage account, using the instructions in Access storage with Microsoft Entra ID (formerly Azure Active Directory) using a service principal.

    Unity Catalog will use this service principal to access containers in the storage account on behalf of Unity Catalog users. Generate a client secret for the service principal. See Microsoft Entra ID service principal authentication. Make a note of the client secret for the service principal, the client application ID, and directory ID where you created this service principal. In the following steps, replace <client-secret>, <client-application-id>, and <directory-id> with these values.

  4. Make a note of these properties, which you will use when you create a metastore:

    • <aad-application-id>
    • The storage account region
    • <storage-container>
    • The service principal's <client-secret>, <client-application-id>, and <directory-id>
  5. You cannot create a metastore that is accessed using a service principal in the account console UI. Instead, use the Account Metastores API. For example:

    curl -n -X POST --header 'Content-Type: application/json' https://<account-domain>/api/2.0/accounts/<account-id>/metastores
    --data '{
       "metastore_info": {
          "name": "<metastore-name>",
          "storage_root": "<storage-container>",
          "region": "<region>"
          "storage_root_credential_id"
       }
    }'
    

    To learn how to authenticate to account-level APIs, see Microsoft Entra ID service principal authentication.

    The user who creates a metastore is its owner. Databricks recommends that you reassign ownership of the metastore to a group. See Assign a metastore admin.

  6. Make a note of the metastore's ID. When you view the metastore's properties, the metastore's ID is the portion of the URL after /data and before /configuration.

  7. The metastore has been created, but Unity Catalog cannot yet write data to it. To finish setting up the metastore:

    1. In a separate browser, log in to a workspace that is assigned to the metastore as a workspace admin.

    2. Make a note of the workspace URL, which is the first portion of the URL, after https:// and inclusive of databricks.azure.cn.

    3. Generate a personal access token. See the Token management API.

    4. Add the personal access token to the .netrc file in your home directory. This improves security by preventing the personal access token from appearing in your shell's command history. See the Token management API.

    5. Run the following cURL command to create the root storage credential for the metastore. Replace the placeholder values:

      • <workspace-url>: The URL of the workspace where the personal access token was generated.
      • <credential-name>: A name for the storage credential.
      • <directory-id>: The directory ID for the service principal you created.
      • <application-id>: The application ID for the service principal you created.
      • <client-secret>: The value of the client secret you generated for the service principal (not the client secret ID).
      curl -n -X POST --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/storage-credentials --data "{
         \"name\": \"<credential-name>\",
         \"azure_service_principal\": {
         \"directory_id\": \"<directory-id>\",
         \"application_id\": \"<application-id>\",
         \"client_secret\": \"<client-secret>\"
         }
      }"
      

      Make a note of the storage credential ID, which is the value of id from the cURL command's response.

  8. Run the following cURL command to update the metastore with the new root storage credential. Replace the placeholder values:

    • <workspace-url>: The URL of the workspace where the personal access token was generated.
    • <metastore-id>: The metastore's ID.
    • <storage-credential-id>: The storage credential's ID from the previous command.
    curl -n -X PATCH --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/metastores/<metastore-id> --data
    "{\"storage_root_credential_id\": \"<storage-credential-id>\"}"
    

You can now add workspaces to the metastore.

Create a storage credential that uses a service principal (legacy)

To create a storage credential using a service principal, you must be an Azure Databricks account admin. The account admin who creates the service principal storage credential can delegate ownership to another user or group to manage permissions on it.

First, create a service principal and grant it access to your storage account following Access storage with Microsoft Entra ID (formerly Azure Active Directory) using a service principal.

You cannot add a service principal storage credential using Catalog Explorer. Instead, use the Storage Credentials API. For example:

curl -X POST -n \
https://<databricks-instance>/api/2.1/unity-catalog/storage-credentials \
-d '{
   "name": "<storage-credential-name>",
   "read_only": true,
   "azure_service_principal": {
      "directory_id": "<directory-id>",
      "application_id": "<application-id>",
      "client_secret": "<client-secret>"
   },
   "skip_validation": "false"
   }'

You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.