创建视图

本文展示了如何在 Unity Catalog 中创建视图。

视图是由元存储中的一个或多个表和视图组成的只读对象。 它位于 Unity Catalog 的三级命名空间的第三层。 可以从多个架构和目录中的表和其他视图创建视图。

动态视图除了用于数据掩码外,还可用于提供行级和列级访问控制。

创建视图的语法示例:

CREATE VIEW main.default.experienced_employee
  (id COMMENT 'Unique identification number', Name)
  COMMENT 'View for experienced employees'
AS SELECT id, name
   FROM all_employee
   WHERE working_years > 5;

注意

如果为视图提供支持的数据源不是 Delta 表,则视图可能具有不同的执行语义。 Databricks 建议始终使用表或视图名称引用数据源来定义视图。 通过指定路径或 URI 定义数据集视图可能会导致数据治理需求混乱。

要求

若要创建视图,请执行以下操作:

  • 你必须具有对父目录的 USE CATALOG 权限和对父架构的 USE SCHEMACREATE TABLE 权限。 元存储管理员或目录所有者可授予所有这些权限。 架构所有者可授予对架构的 USE SCHEMACREATE TABLE 特权。
  • 你必须能够读取视图中引用的表和视图(表或视图上的 SELECT,以及目录上的 USE CATALOG 和架构上的 USE SCHEMA)。
  • 如果视图引用工作区本地 Hive 元存储中的表,则只能从包含工作区本地表的工作区访问该视图。 因此,Databricks 建议仅从 Unity Catalog 元存储中的表或视图创建视图。
  • 创建的视图不能引用已使用 Delta Sharing 与你共享的视图。 请参阅使用 Delta Sharing 安全共享数据和 AI 资产

要读取视图,所需权限取决于计算类型和访问模式:

  • 对于共享群集和 SQL 仓库,需要对视图本身具有 SELECT 权限,对其父目录具有 USE CATALOG 全向,并对其父架构具有 USE SCHEMA 权限。
  • 对于单用户群集,除了对父目录执行 USE CATALOG 和对父架构执行 USE SCHEMA 之外,还必须对视图引用的所有表和视图执行 SELECT

创建或读取动态视图:

  • 除必须使用共享群集或 SQL 仓库创建或读取动态视图以外,对动态视图的要求与前面各部分中列出的要求相同。 不能使用单用户群集。

创建视图

若要创建视图,请运行以下 SQL 命令。 括号中的项是可选的。 替换占位符值替:

  • <catalog-name>:目录的名称。
  • <schema-name>:架构的名称。
  • <view-name>:视图的名称。
  • <query>:用于组成视图的查询、列、表和视图。

SQL

CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS
SELECT <query>;

Python

spark.sql("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS "
  "SELECT <query>")

R

library(SparkR)

sql(paste("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS ",
  "SELECT <query>",
  sep = ""))

Scala

spark.sql("CREATE VIEW <catalog-name>.<schema-name>.<view-name> AS " +
  "SELECT <query>")

例如,要通过 sales_raw 表中的列命名创建名为 sales_redacted 的视图:

SQL

CREATE VIEW sales_metastore.sales.sales_redacted AS
SELECT
  user_id,
  email,
  country,
  product,
  total
FROM sales_metastore.sales.sales_raw;

Python

spark.sql("CREATE VIEW sales_metastore.sales.sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  email, "
  "  country, "
  "  product, "
  "  total "
  "FROM sales_metastore.sales.sales_raw")

R

library(SparkR)

sql(paste("CREATE VIEW sales_metastore.sales.sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  email, ",
  "  country, ",
  "  product, ",
  "  total ",
  "FROM sales_metastore.sales.sales_raw",
  sep = ""))

Scala

spark.sql("CREATE VIEW sales_metastore.sales.sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  email, " +
  "  country, " +
  "  product, " +
  "  total " +
  "FROM sales_metastore.sales.sales_raw")

还可以使用 Databricks Terraform 提供程序databricks_table 创建视图。 可以使用 databricks_views 检索视图全名列表。

创建动态试图

在 Unity Catalog 中,可以使用动态视图配置细粒度访问控制,包括:

  • 列或行级别的安全性。
  • 数据掩码。

注意

使用动态视图的精细访问控制在采用单一用户访问模式的群集上不可用。

Unity Catalog 引入了以下功能,这些功能允许你动态限制哪些用户可以访问视图中的行、列或记录:

  • current_user():返回当前用户的电子邮件地址。
  • is_account_group_member():如果当前用户是特定帐户级别组的成员,则返回 TRUE。 建议在针对 Unity Catalog 数据的动态视图中使用。
  • is_member():如果当前用户是特定工作区级别组的成员,则返回 TRUE。 提供此函数是为了与现有 Hive 元存储兼容。 请避免将它与针对 Unity Catalog 数据的视图一起使用,因为它不评估帐户级别的组成员身份。

Azure Databricks 建议不要授予用户读取视图中引用的表和视图的权利。

下面的示例说明如何在 Unity Catalog 中创建动态视图。

列级权限

使用动态视图,可以限制特定用户或组可以访问的列。 在下面的示例中,只有 auditors 组的成员可以从 sales_raw 表中访问电子邮件地址。 在查询分析过程中,Apache Spark 将 CASE 语句替换为文字字符串 REDACTED 或替换为电子邮件地址列的实际内容。 其他列按正常方式返回。 此策略对查询性能没有负面影响。

SQL

-- Alias the field 'email' to itself (as 'email') to prevent the
-- permission logic from showing up directly in the column name results.
CREATE VIEW sales_redacted AS
SELECT
  user_id,
  CASE WHEN
    is_account_group_member('auditors') THEN email
    ELSE 'REDACTED'
  END AS email,
  country,
  product,
  total
FROM sales_raw

Python

# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  CASE WHEN "
  "    is_account_group_member('auditors') THEN email "
  "  ELSE 'REDACTED' "
  "  END AS email, "
  "  country, "
  "  product, "
  "  total "
  "FROM sales_raw")

R

library(SparkR)

# Alias the field 'email' to itself (as 'email') to prevent the
# permission logic from showing up directly in the column name results.
sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  CASE WHEN ",
  "    is_account_group_member('auditors') THEN email ",
  "  ELSE 'REDACTED' ",
  "  END AS email, ",
  "  country, ",
  "  product, ",
  "  total ",
  "FROM sales_raw",
  sep = ""))

Scala

// Alias the field 'email' to itself (as 'email') to prevent the
// permission logic from showing up directly in the column name results.
spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  CASE WHEN " +
  "    is_account_group_member('auditors') THEN email " +
  "  ELSE 'REDACTED' " +
  "  END AS email, " +
  "  country, " +
  "  product, " +
  "  total " +
  "FROM sales_raw")

行级权限

使用动态视图,你可以指定低至行级或字段级的权限。 在下面的示例中,当交易金额超过 1,000,000 美元时,只有 managers 组的成员才能查看交易金额。 对于其他用户,匹配结果将被筛选掉。

SQL

 CREATE VIEW sales_redacted AS
 SELECT
   user_id,
   country,
   product,
   total
 FROM sales_raw
 WHERE
   CASE
     WHEN is_account_group_member('managers') THEN TRUE
     ELSE total <= 1000000
   END;

Python

 spark.sql("CREATE VIEW sales_redacted AS "
   "SELECT "
   "  user_id, "
   "  country, "
   "  product, "
   "  total "
   "FROM sales_raw "
   "WHERE "
   "CASE "
   "  WHEN is_account_group_member('managers') THEN TRUE "
   "  ELSE total <= 1000000 "
   "END")

R

 library(SparkR)

 sql(paste("CREATE VIEW sales_redacted AS ",
   "SELECT ",
   "  user_id, ",
   "  country, ",
   "  product, ",
   "  total ",
   "FROM sales_raw ",
   "WHERE ",
   "CASE ",
   "  WHEN is_account_group_member('managers') THEN TRUE ",
   "  ELSE total <= 1000000 ",
   "END",
   sep = ""))

Scala

 spark.sql("CREATE VIEW sales_redacted AS " +
   "SELECT " +
   "  user_id, " +
   "  country, " +
   "  product, " +
   "  total " +
   "FROM sales_raw " +
   "WHERE " +
   "CASE " +
   "  WHEN is_account_group_member('managers') THEN TRUE " +
   "  ELSE total <= 1000000 " +
   "END")

数据屏蔽

因为 Unity Catalog 中的视图使用 Spark SQL,所以可以通过使用更复杂的 SQL 表达式和正则表达式来实现高级数据屏蔽。 在下面的示例中,所有用户都可以分析电子邮件域,但只有 auditors 组的成员才能查看用户的整个电子邮件地址。

SQL

-- The regexp_extract function takes an email address such as
-- user.x.lastname@example.com and extracts 'example', allowing
-- analysts to query the domain name.

CREATE VIEW sales_redacted AS
SELECT
  user_id,
  region,
  CASE
    WHEN is_account_group_member('auditors') THEN email
    ELSE regexp_extract(email, '^.*@(.*)$', 1)
  END
  FROM sales_raw

Python

# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS "
  "SELECT "
  "  user_id, "
  "  region, "
  "  CASE "
  "    WHEN is_account_group_member('auditors') THEN email "
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) "
  "  END "
  "  FROM sales_raw")

R

library(SparkR)

# The regexp_extract function takes an email address such as
# user.x.lastname@example.com and extracts 'example', allowing
# analysts to query the domain name.

sql(paste("CREATE VIEW sales_redacted AS ",
  "SELECT ",
  "  user_id, ",
  "  region, ",
  "  CASE ",
  "    WHEN is_account_group_member('auditors') THEN email ",
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) ",
  "  END ",
  "  FROM sales_raw",
  sep = ""))

Scala

// The regexp_extract function takes an email address such as
// user.x.lastname@example.com and extracts 'example', allowing
// analysts to query the domain name.

spark.sql("CREATE VIEW sales_redacted AS " +
  "SELECT " +
  "  user_id, " +
  "  region, " +
  "  CASE " +
  "    WHEN is_account_group_member('auditors') THEN email " +
  "    ELSE regexp_extract(email, '^.*@(.*)$', 1) " +
  "  END " +
  "  FROM sales_raw")

删除视图

你必须是视图的所有者才能删除视图。 若要删除视图,请运行以下 SQL 命令:

DROP VIEW IF EXISTS catalog_name.schema_name.view_name;

后续步骤