Hive metastore privileges and securable objects (legacy)
This article describes the privilege model for the legacy Azure Databricks Hive metastore, which is built in to each Azure Databricks workspace. It also describes how to grant, deny, and revoke privileges for objects in the built-in Hive metastore. Unity Catalog uses a different model for granting privileges. See Unity Catalog privileges and securable objects.
Note
Table access control for data managed by the Hive metastore is a legacy data governance model. Databricks recommends that you upgrade the tables managed by the Hive metastore to the Unity Catalog metastore. Unity Catalog simplifies security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account. To learn more about how the legacy privilege model differs from the Unity Catalog privilege model, see Work with Unity Catalog and the legacy Hive metastore.
Requirements
- An administrator must enable and enforce table access control for the workspace.
- The cluster must be enabled for table access control.
Note
- Data access control is always enabled in Databricks SQL even if table access control is not enabled for the workspace.
- If table access control is enabled for the workspace and you have already specified ACLs (granted and denied privileges) in the workspace, those ACLs are respected in Databricks SQL.
Manage privileges on objects in the Hive metastore
Privileges on data objects managed by the Hive metastore can be granted by either a workspace admin or the owner of an object. You can manage privileges for Hive metastore objects by using SQL commands.
To manage privileges in SQL, you use GRANT, REVOKE, DENY, MSCK, and SHOW GRANTS statements in a notebook or the Databricks SQL query editor, using the syntax:
GRANT privilege_type ON securable_object TO principal
Where:
privilege_type
is a Hive metastore privilege typesecurable_object
is a securable object in the Hive metastoreprincipal
is a user, service principal (represented by its applicationId value), or group. You must enclose users, service principals, and group names with special characters in backticks (` `
). See Principal.
To grant a privilege to all users in your workspace, grant the privilege to the users
group. For example:
GRANT SELECT ON TABLE <schema-name>.<table-name> TO users
For more information about managing privileges for objects in the Hive metastore using SQL commands, see Privileges and securable objects in the Hive metastore.
You can also manage table access control in a fully automated setup using the Databricks Terraform provider and databricks_sql_permissions.
Object ownership
When table access control is enabled on a cluster or SQL warehouse, a user who creates a schema, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.
Groups may own objects, in which case all members of that group are considered owners.
Either the owner of an object or a workspace admin can transfer ownership of an object using the following command:
ALTER <object> OWNER TO `<user-name>@<user-domain>.com`
Note
When table access control is disabled on a cluster or SQL warehouse, owners are not registered when a schema, table, or view is created. A workspace admin must assign an owner to the object using the ALTER <object> OWNER TO
command.
Securable objects in the Hive metastore
The securable objects are:
CATALOG
: controls access to the entire data catalog.SCHEMA
: controls access to a schema.TABLE
: controls access to a managed or external table.VIEW
: controls access to SQL views.FUNCTION
: controls access to a named function.
ANONYMOUS FUNCTION
: controls access to anonymous or temporary functions.Note
ANONYMOUS FUNCTION
objects are not supported in Databricks SQL.ANY FILE
: controls access to the underlying filesystem.Warning
Users granted access to
ANY FILE
can bypass the restrictions put on the catalog, schemas, tables, and views by reading from the filesystem directly.
Note
Privileges on global and local temporary views are not supported. Local temporary views are visible only within the same session, and views created in the global_temp
schema are visible to all users sharing a cluster or SQL warehouse. However, privileges on the underlying tables and views referenced by any temporary views are enforced.
Privileges you can grant on Hive metastore objects
SELECT
: gives read access to an object.CREATE
: gives ability to create an object (for example, a table in a schema).MODIFY
: gives ability to add, delete, and modify data to or from an object.USAGE
: does not give any abilities, but is an additional requirement to perform any action on a schema object.READ_METADATA
: gives ability to view an object and its metadata.CREATE_NAMED_FUNCTION
: gives ability to create a named UDF in an existing catalog or schema.MODIFY_CLASSPATH
: gives ability to add files to the Spark class path.ALL PRIVILEGES
: gives all privileges (is translated into all the above privileges).
Note
The MODIFY_CLASSPATH
privilege is not supported in Databricks SQL.
USAGE
privilege
To perform an action on a schema object in the Hive metastore, a user must have the USAGE
privilege on that schema in addition to the privilege to perform that action. Any one of the following satisfies the USAGE
requirement:
- Be a workspace admin
- Have the
USAGE
privilege on the schema or be in a group that has theUSAGE
privilege on the schema - Have the
USAGE
privilege on theCATALOG
or be in a group that has theUSAGE
privilege - Be the owner of the schema or be in a group that owns the schema
Even the owner of an object inside a schema must have the USAGE
privilege in order to use it.
Privilege hierarchy
When table access control is enabled on the workspace and on all clusters, SQL objects in Azure Databricks are hierarchical and privileges are inherited downward. This means that granting or denying a privilege on the CATALOG
automatically grants or denies the privilege to all schemas in the catalog. Similarly, privileges granted on a schema object are inherited by all objects in that schema. This pattern is true for all securable objects.
If you deny a user privileges on a table, the user can't see the table by attempting to list all tables in the schema. If you deny a user privileges on a schema, the user can't see that the schema exists by attempting to list all schemas in the catalog.
Dynamic view functions
Azure Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition that is managed by the Hive metastore.
current_user()
: return the current user name.is_member()
: determine if the current user is a member of a specific Azure Databricks group at the workspace level.
The following example combines both functions to determine if a user has the appropriate group membership:
-- Return: true if the user is a member and false if they are not
SELECT
current_user as user,
-- Check to see if the current user is a member of the "Managers" group.
is_member("Managers") as admin
Column-level permissions
You can use dynamic views to limit the columns a specific group or user can see. Consider the following example where only users who belong to the auditors
group are able to see email addresses from the sales_raw
table. At analysis time Spark replaces the CASE
statement with either the literal 'REDACTED'
or the column email
. This behavior allows for all the usual performance optimizations provided by Spark.
-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
user_id,
CASE WHEN
is_group_member('auditors') THEN email
ELSE 'REDACTED'
END AS email,
country,
product,
total
FROM sales_raw
Row-level permissions
Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the managers
group are able to see transaction amounts (total
column) greater than $1,000,000.00:
CREATE VIEW sales_redacted AS
SELECT
user_id,
country,
product,
total
FROM sales_raw
WHERE
CASE
WHEN is_group_member('managers') THEN TRUE
ELSE total <= 1000000
END;
Data masking
As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the auditors
group see users' full email addresses.
-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name
CREATE VIEW sales_redacted AS
SELECT
user_id,
region,
CASE
WHEN is_group_member('auditors') THEN email
ELSE regexp_extract(email, '^.*@(.*)$', 1)
END
FROM sales_raw