创建视图

本文展示了如何在 Unity Catalog 中创建视图。

视图是由元存储中的一个或多个表和视图组成的只读对象。 它位于 Unity Catalog 的三级命名空间的第三层。 可以从多个架构和目录中的表和其他视图创建视图。

动态视图除了用于数据掩码外,还可用于提供行级和列级访问控制。

创建视图的语法示例:

CREATE VIEW main.default.experienced_employee
  (id COMMENT 'Unique identification number', Name)
  COMMENT 'View for experienced employees'
AS SELECT id, name
   FROM all_employee
   WHERE working_years > 5;

注意

如果为视图提供支持的数据源不是 Delta 表,则视图可能具有不同的执行语义。 Databricks 建议始终使用表或视图名称引用数据源来定义视图。 通过指定路径或 URI 定义数据集视图可能会导致数据治理需求混乱。

要求

若要创建视图,请执行以下操作:

  • 你必须具有对父目录的 USE CATALOG 权限和对父架构的 USE SCHEMACREATE TABLE 权限。 元存储管理员或目录所有者可授予所有这些权限。 架构所有者可授予对架构的 USE SCHEMACREATE TABLE 特权。
  • 你必须能够读取视图中引用的表和视图(表或视图上的 SELECT,以及目录上的 USE CATALOG 和架构上的 USE SCHEMA)。
  • 如果视图引用工作区本地 Hive 元存储中的表,则只能从包含工作区本地表的工作区访问该视图。 因此,Databricks 建议仅从 Unity Catalog 元存储中的表或视图创建视图。
  • 创建的视图不能引用已使用 Delta Sharing 与你共享的视图。 请参阅使用 Delta Sharing 安全共享数据和 AI 资产

若要读取视图,需要的权限由群集访问模式决定:

  • 对于共享群集,需要对视图本身执行 SELECT,对其父目录执行 USE CATALOG,并对其父架构执行 USE SCHEMA
  • 对于单用户群集,除了对父目录执行 USE CATALOG 和对父架构执行 USE SCHEMA 之外,还必须对视图引用的所有表和视图执行 SELECT

不能使用单用户群集创建或读取动态视图。

创建视图

若要创建视图,请运行以下 SQL 命令。 括号中的项是可选的。 替换占位符值替:

  • <catalog-name>:目录的名称。
  • <schema-name>:架构的名称。
  • <view-name>:视图的名称。
  • <query>:用于组成视图的查询、列、表和视图。

SQL

CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS
SELECT <query>;

Python

spark.sql("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS "
  "SELECT <query>")

R

library(SparkR)

sql(paste("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS ",
  "SELECT <query>",
  sep = ""))

Scala

spark.sql("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS " +
  "SELECT <query>")

例如,要通过 sales_raw 表中的列命名创建名为 sales_redacted 的视图:

SQL

CREATE VIEW sales_metastore.sales.sales_redacted AS
SELECT
  user_id,
  email,
  country,
  product,
  total
FROM sales_metastore.sales.sales_raw;

Python

spark.sql("CREATE VIEW sales_metastore.sales.sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  email, "
  "  country, "
  "  product, "
  "  total "
  "FROM sales_metastore.sales.sales_raw")

R

library(SparkR)

sql(paste("CREATE VIEW sales_metastore.sales.sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  email, ",
  "  country, ",
  "  product, ",
  "  total ",
  "FROM sales_metastore.sales.sales_raw",
  sep = ""))

Scala

spark.sql("CREATE VIEW sales_metastore.sales.sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  email, " +
  "  country, " +
  "  product, " +
  "  total " +
  "FROM sales_metastore.sales.sales_raw")

还可以使用 Databricks Terraform 提供程序databricks_table 创建视图。 可以使用 databricks_views 检索视图全名列表。

创建动态试图

在 Unity Catalog 中,可以使用动态视图配置细粒度访问控制,包括:

  • 列或行级别的安全性。
  • 数据掩码。

注意

使用动态视图的精细访问控制在单用户访问模式的群集上不可用。

Unity Catalog 引入了以下功能,这些功能允许你动态限制哪些用户可以访问视图中的行、列或记录:

  • current_user():返回当前用户的电子邮件地址。
  • is_account_group_member():如果当前用户是特定帐户级别组的成员,则返回 TRUE。 建议在针对 Unity Catalog 数据的动态视图中使用。
  • is_member():如果当前用户是特定工作区级别组的成员,则返回 TRUE。 提供此函数是为了与现有 Hive 元存储兼容。 请避免将它与针对 Unity Catalog 数据的视图一起使用,因为它不评估帐户级别的组成员身份。

Azure Databricks 建议不要授予用户读取视图中引用的表和视图的权利。

下面的示例说明如何在 Unity Catalog 中创建动态视图。

列级权限

使用动态视图,可以限制特定用户或组可以访问的列。 在下面的示例中,只有 auditors 组的成员可以从 sales_raw 表中访问电子邮件地址。 在查询分析过程中,Apache Spark 将 CASE 语句替换为文字字符串 REDACTED 或替换为电子邮件地址列的实际内容。 其他列按正常方式返回。 此策略对查询性能没有负面影响。

SQL

-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
  user_id,
  CASE WHEN
    is_account_group_member('auditors') THEN email
    ELSE 'REDACTED'
  END AS email,
  country,
  product,
  total
FROM sales_raw

Python

# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  CASE WHEN "
  "    is_account_group_member('auditors') THEN email "
  "  ELSE 'REDACTED' "
  "  END AS email, "
  "  country, "
  "  product, "
  "  total "
  "FROM sales_raw")

R

library(SparkR)

# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  CASE WHEN ",
  "    is_account_group_member('auditors') THEN email ",
  "  ELSE 'REDACTED' ",
  "  END AS email, ",
  "  country, ",
  "  product, ",
  "  total ",
  "FROM sales_raw",
  sep = ""))

Scala

// Alias the field 'email' to itself (as 'email') to prevent the
// permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  CASE WHEN " +
  "    is_account_group_member('auditors') THEN email " +
  "  ELSE 'REDACTED' " +
  "  END AS email, " +
  "  country, " +
  "  product, " +
  "  total " +
  "FROM sales_raw")

行级权限

使用动态视图,你可以指定低至行级或字段级的权限。 在下面的示例中,当交易金额超过 1,000,000 美元时,只有 managers 组的成员才能查看交易金额。 对于其他用户,匹配结果将被筛选掉。

SQL

 CREATE VIEW sales_redacted AS
 SELECT
   user_id,
   country,
   product,
   total
 FROM sales_raw
 WHERE
   CASE
     WHEN is_account_group_member('managers') THEN TRUE
     ELSE total <= 1000000
   END;

Python

 spark.sql("CREATE VIEW sales_redacted AS "
   "SELECT "
   "  user_id, "
   "  country, "
   "  product, "
   "  total "
   "FROM sales_raw "
   "WHERE "
   "CASE "
   "  WHEN is_account_group_member('managers') THEN TRUE "
   "  ELSE total <= 1000000 "
   "END")

R

 library(SparkR)

 sql(paste("CREATE VIEW sales_redacted AS ",
   "SELECT ",
   "  user_id, ",
   "  country, ",
   "  product, ",
   "  total ",
   "FROM sales_raw ",
   "WHERE ",
   "CASE ",
   "  WHEN is_account_group_member('managers') THEN TRUE ",
   "  ELSE total <= 1000000 ",
   "END",
   sep = ""))

Scala

 spark.sql("CREATE VIEW sales_redacted AS " +
   "SELECT " +
   "  user_id, " +
   "  country, " +
   "  product, " +
   "  total " +
   "FROM sales_raw " +
   "WHERE " +
   "CASE " +
   "  WHEN is_account_group_member('managers') THEN TRUE " +
   "  ELSE total <= 1000000 " +
   "END")

数据屏蔽

因为 Unity Catalog 中的视图使用 Spark SQL,所以可以通过使用更复杂的 SQL 表达式和正则表达式来实现高级数据屏蔽。 在下面的示例中,所有用户都可以分析电子邮件域,但只有 auditors 组的成员才能查看用户的整个电子邮件地址。

SQL

-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name.

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  region,
  CASE
    WHEN is_account_group_member('auditors') THEN email
    ELSE regexp_extract(email, '^.*@(.*)$', 1)
  END
  FROM sales_raw

Python

# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  region, "
  "  CASE "
  "    WHEN is_account_group_member('auditors') THEN email "
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) "
  "  END "
  "  FROM sales_raw")

R

library(SparkR)

# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  region, ",
  "  CASE ",
  "    WHEN is_account_group_member('auditors') THEN email ",
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) ",
  "  END ",
  "  FROM sales_raw",
  sep = ""))

Scala

// The regexp_extract function takes an email address such as
// user.x.lastname@example.com and extracts 'example', allowing
// analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  region, " +
  "  CASE " +
  "    WHEN is_account_group_member('auditors') THEN email " +
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) " +
  "  END " +
  "  FROM sales_raw")

删除视图

你必须是视图的所有者才能删除视图。 若要删除视图,请运行以下 SQL 命令:

DROP VIEW IF EXISTS catalog_name.schema_name.view_name;

后续步骤