Enable data access configuration

This article describes the data access configurations performed by Azure Databricks administrators for all SQL warehouses using the UI.

Note

If your workspace is enabled for Unity Catalog, you don't need to perform the steps in this article. Unity Catalog supports SQL warehouses by default.

Databricks recommends using Unity Catalog volumes or external locations to connect to cloud object storage instead of instance profiles. Unity Catalog simplifies the security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account. See What is Unity Catalog? and Recommendations for using external locations.

To configure all SQL warehouses using the REST API, see SQL Warehouses API.

Important

Changing these settings restarts all running SQL warehouses.

For a general overview of how to enable access to data, see Access control lists.

Requirements

  • You must be an Azure Databricks workspace admin to configure settings for all SQL warehouses.

Configure a service principal

To configure access for your SQL warehouses to an Azure Data Lake Storage Gen2 storage account using service principals, follow these steps:

  1. Register a Microsoft Entra ID (formerly Azure Active Directory) application and record the following properties:

    • Application (client) ID: An ID that uniquely identifies the Microsoft Entra ID application.
    • Directory (tenant) ID: An ID that uniquely identifies the Microsoft Entra ID instance (called directory (tenant) ID in Azure Databricks).
    • Client secret: The value of a client secret created for this application registration. The application will use this secret string to prove its identity.
  2. On your storage account, add a role assignment for the application registered at the previous step to give it access to the storage account.

  3. Create an Azure Key Vault-backed secret scope or a Databricks-scoped secret scope, and record the value of the scope name property:

    • Scope name: The name of the created secret scope.
  4. If using Azure Key Vault, go to the Secrets section and See Create a secret in an Azure Key Vault-backed scope. Then, use the "client secret" you obtained in Step 1 to populate the "value" field of this secret. Keep a record of the secret name that you just chose.

    • Secret name: The name of the created Azure Key Vault secret.
  5. If using a Databricks-backed scope, create a new secret using the Databricks CLI and use it to store the client secret that you have obtained in Step 1. Keep a record of the secret key that you entered at this step.

    • Secret key: The key of the created Databricks-backed secret.

    Note

    Optionally, you can create an additional secret to store the client ID obtained at Step 1.

  6. Click your username in the top bar of the workspace and select Settings from the drop-down.

  7. Click the Compute tab.

  8. Click Manage next to SQL warehouses.

  9. In the Data Access Configuration field, click the Add Service Principal button.

  10. Configure the properties for your Azure Data Lake Storage Gen2 storage account.

  11. Click Add.

    ADLS2 storage account

    You will see that new entries have been added to the Data Access Configuration textbox.

  12. Click Save.

You can also edit the Data Access Configuration textbox entries directly.

Configure data access properties for SQL warehouses

To configure all warehouses with data access properties:

  1. Click your username in the top bar of the workspace and select Settings from the drop-down.

  2. Click the Compute tab.

  3. Click Manage next to SQL warehouses.

  4. In the Data Access Configuration textbox, specify key-value pairs containing metastore properties.

    Important

    To set a Spark configuration property to the value of a secret without exposing the secret value to Spark, set the value to {{secrets/<secret-scope>/<secret-name>}}. Replace <secret-scope> with the secret scope and <secret-name> with the secret name. The value must start with {{secrets/ and end with }}. For more information about this syntax, see Syntax for referencing secrets in a Spark configuration property or environment variable.

  5. Click Save.

You can also configure data access properties using the Databricks Terraform provider and databricks_sql_global_config.

Supported properties

  • For an entry that ends with *, all properties within that prefix are supported.

    For example, spark.sql.hive.metastore.* indicates that both spark.sql.hive.metastore.jars and spark.sql.hive.metastore.version are supported, and any other properties that start with spark.sql.hive.metastore.

  • For properties whose values contain sensitive information, you can store the sensitive information in a secret and set the property's value to the secret name using the following syntax: secrets/<secret-scope>/<secret-name>.

The following properties are supported for SQL warehouses:

  • spark.sql.hive.metastore.*
  • spark.sql.warehouse.dir
  • spark.hadoop.datanucleus.*
  • spark.hadoop.fs.*
  • spark.hadoop.hive.*
  • spark.hadoop.javax.jdo.option.*
  • spark.hive.*

For more information about how to set these properties, see External Hive metastore.