为群集启用表访问控制Enable table access control for a cluster

本文介绍如何为群集启用表访问控制。This article describes how to enable table access control for a cluster.

有关在群集上启用了表访问控制后如何对数据对象设置特权的信息,请参阅数据对象特权For information about how to set privileges on a data object once table access control has been enabled on a cluster, see Data object privileges.

为群集启用表访问控制 Enable table access control for a cluster

表访问控制以两个版本提供:Table access control is available in two versions:

  • 仅限 SQL 的表访问控制,这是SQL-only table access control, which:
    • 正式发布版。Is generally available.
    • 将群集用户限制为使用 SQL 命令。Restricts cluster users to SQL commands. 用户限制为只能使用 Apache Spark SQL API,因此无法使用 Python、Scala、R、RDD API 或直接从云存储中读取数据的客户端(例如 DBUtils)。Users are restricted to the Apache Spark SQL API, and therefore cannot use Python, Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.
  • Python 和 SQL 表访问控制,这是Python and SQL table access control, which:
    • 公共预览版。Is in Public Preview.
    • 允许用户运行 SQL、Python 和 PySpark 命令。Allows users to run SQL, Python, and PySpark commands. 用户限制为只能使用 Spark SQL API 和数据帧 API,因此无法使用 Scala、R、RDD API 或直接从云存储中读取数据的客户端(例如 DBUtils)。Users are restricted to the Spark SQL API and DataFrame API, and therefore cannot use Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.

仅限 SQL 的表访问控制 SQL-only table access control

此版本的表访问控制将群集上的用户限制为仅使用 SQL 命令。This version of table access control restricts users on the cluster to SQL commands only.

若要在群集上启用仅限 SQL 表的访问控制并将该群集限制为仅使用 SQL 命令,请在群集的 Spark 配置中设置以下标志:To enable SQL-only table access control on a cluster and restrict that cluster to use only SQL commands, set the following flag in the cluster’s Spark conf:

spark.databricks.acl.sqlOnly true

备注

对仅限 SQL 表访问控制的访问权限不受管理员控制台中启用表访问控制设置的影响。Access to SQL-only table access control is not affected by the Enable Table Access Control setting in the Admin Console. 此设置仅控制是否在工作区范围启用 Python 和 SQL 表访问控制。That setting controls only the workspace-wide enablement of Python and SQL table access control.

Python 和 SQL 表访问控制 Python and SQL table access control

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

此版本的表访问控制允许用户运行使用数据帧 API 和 SQL 的 Python 命令。This version of table access control lets users run Python commands that use the DataFrame API as well as SQL. 在群集上启用此功能后,该群集或池中的用户将:When it is enabled on a cluster, users on that cluster or pool:

  • 只能通过 Spark SQL API 或数据帧 API 访问 Spark。Can access Spark only via the Spark SQL API or DataFrame API. 在这两种情况下,都根据 Azure Databricks 数据治理模型由管理员来限制对表和视图的访问。In both cases, access to tables and views is restricted by administrators according to the Azure Databricks Data governance model.
  • 无法通过 DBFS 或通过从云提供商的元数据服务读取凭据来获取对云中数据的直接访问权限。Cannot acquire direct access to data in the cloud via DBFS or by reading credentials from the cloud provider’s metadata service.
  • 必须在群集节点上运行其命令,因为低特权用户禁止访问文件系统的敏感部分或创建与 80 和 443 以外的端口的网络连接。Must run their commands on cluster nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem or creating network connections to ports other than 80 and 443.
    • 只有内置 Spark 函数可以在 80 和 443 以外的端口上创建网络连接。Only built-in Spark functions can create network connections on ports other than 80 and 443.
    • 只有管​​理员用户或具有任意文件特权的用户才能通过 PySpark JDBC 连接器从外部数据库读取数据。Only admin users or users with ANY FILE privilege can read data from external databases through PySpark JDBC connector.
    • 如果希望 Python 进程能够访问其他出站端口,可以将 Spark 配置 spark.databricks.pyspark.iptable.outbound.whitelisted.ports 设置为要加入允许列表的端口。If you want Python processes to be able to access additional outbound ports, you can set the Spark config spark.databricks.pyspark.iptable.outbound.whitelisted.ports to the ports you want to whitelist. 支持的配置值格式为 [port[:port][,port[:port]]...],例如:21,22,9000:9999The supported format of the configuration value is [port[:port][,port[:port]]...], for example: 21,22,9000:9999. 端口必须位于有效范围内,即 0-65535The port must be within the valid range, that is, 0-65535.

尝试绕过这些限制将失败,并出现异常。Attempts to get around these restrictions will fail with an exception. 有这些限制,则你的用户永远无法通过群集访问非特权数据。These restrictions are in place so that your users can never access unprivileged data through the cluster.

要求Requirements

在用户可以配置 Python 和 SQL 表访问控制之前,Azure Databricks 管理员必须:Before users can configure Python and SQL table access control, an Azure Databricks admin must:

  • 为 Azure Databricks 工作区启用表访问控制。Enable table access control for the Azure Databricks workspace.
  • 拒绝用户访问未启用表访问控制的群集。Deny users access to clusters that are not enabled for table access control. 在实践中,这意味着拒绝大多数用户创建群集的权限,并拒绝用户对未为表访问控制启用的群集的“可以附加到”权限。In practice, that means denying most users permission to create clusters and denying users the Can Attach To permission for clusters that are not enabled for table access control.

有关这两种要求的信息,请参阅为工作区启用表访问控制For information on both these requirements, see Enable table access control for your workspace.

创建启用了表访问控制的群集 Create a cluster enabled for table access control

创建集群时,单击“启用表访问控制并仅允许使用 Python 和 SQL 命令”选项。When you create a cluster, click the Enable table access control and only allow Python and SQL commands option. 此选项仅对于高并发群集可用。This option is available only for high concurrency clusters.

启用表访问控制Enable table access control

若要使用 REST API 创建群集,请参阅有关创建启用了表访问控制的群集的示例To create the cluster using the REST API, see Create cluster enabled for table access control example.

设置对数据对象的特权Set privileges on a data object

请参阅数据对象特权See Data object privileges.