在 SQL 数据仓库中使用用户定义架构Using user-defined schemas in SQL Data Warehouse

有关在 Azure SQL 数据仓库中使用 T-SQL 用户定义架构开发解决方案的技巧。Tips for using T-SQL user-defined schemas in Azure SQL Data Warehouse for developing solutions.

应用程序边界的架构Schemas for application boundaries

传统数据仓库通常使用不同的数据库,根据工作负荷、域或安全性来创建应用程序边界。Traditional data warehouses often use separate databases to create application boundaries based on either workload, domain or security. 例如,传统 SQL Server 数据仓库可能包含过渡数据库、数据仓库数据库和某些数据集市数据库。For example, a traditional SQL Server data warehouse might include a staging database, a data warehouse database, and some data mart databases. 在此拓扑中,每个数据库均作为体系结构中的工作负荷和安全边界来运行。In this topology each database operates as a workload and security boundary in the architecture.

相比之下,SQL 数据仓库在一个数据库中运行整个数据仓库工作负荷。By contrast, SQL Data Warehouse runs the entire data warehouse workload within one database. 不允许跨数据库联接。Cross database joins are not permitted. 因此,SQL 数据仓库预期仓库使用的所有表都存储在一个数据库中。Therefore SQL Data Warehouse expects all tables used by the warehouse to be stored within the one database.

Note

SQL 数据仓库不支持任何种类的跨数据库查询。SQL Data Warehouse does not support cross database queries of any kind. 因此,需要修改利用此模式的数据仓库实现。Consequently, data warehouse implementations that leverage this pattern will need to be revised.

建议Recommendations

以下是针对使用用户定义的架构合并工作负荷、安全性、域和功能边界的一些建议These are recommendations for consolidating workloads, security, domain and functional boundaries by using user defined schemas

  1. 使用一个 SQL 数据仓库数据库来运行整个数据仓库工作负荷Use one SQL Data Warehouse database to run your entire data warehouse workload
  2. 合并现有的数据仓库环境,以使用一个 SQL 数据仓库数据库Consolidate your existing data warehouse environment to use one SQL Data Warehouse database
  3. 利用 用户定义的架构 来提供以前使用数据库实现的边界。Leverage user-defined schemas to provide the boundary previously implemented using databases.

如果以前尚未使用用户定义的架构,则就不会存在任何记录。If user-defined schemas have not been used previously then you have a clean slate. 只需使用旧数据库名称作为 SQL 数据仓库数据库中用户定义架构的基础。Simply use the old database name as the basis for your user-defined schemas in the SQL Data Warehouse database.

如果已使用架构,则可以采用以下几个选项:If schemas have already been used then you have a few options:

  1. 删除旧架构名称并重新开始Remove the legacy schema names and start fresh
  2. 在表名称前面附加旧架构名称,以保留旧架构名称。Retain the legacy schema names by pre-pending the legacy schema name to the table name
  3. 在额外架构中的表上实现视图来重建旧架构结构,以保留旧架构名称。Retain the legacy schema names by implementing views over the table in an extra schema to re-create the old schema structure.

Note

在首次检查时,选项 3 似乎像是最吸引人的选项。On first inspection option 3 may seem like the most appealing option. 但是,细节决定成败。However, the devil is in the detail. SQL 数据仓库中的视图为只读状态。Views are read only in SQL Data Warehouse. 任何表修改或数据修改只能针对基础表执行。Any data or table modification would need to be performed against the base table. 选项 3 还在系统中引入了一个视图层。Option 3 also introduces a layer of views into your system. 如果已在体系结构中使用视图,可以再三考虑一下此选项。You might want to give this some additional thought if you are using views in your architecture already.

示例:Examples:

基于数据库名称实现用户定义的架构Implement user-defined schemas based on database names

CREATE SCHEMA [stg]; -- stg previously database name for staging database
GO
CREATE SCHEMA [edw]; -- edw previously database name for the data warehouse
GO
CREATE TABLE [stg].[customer] -- create staging tables in the stg schema
(       CustKey BIGINT NOT NULL
,       ...
);
GO
CREATE TABLE [edw].[customer] -- create data warehouse tables in the edw schema
(       CustKey BIGINT NOT NULL
,       ...
);

在表名称前面附加旧架构名称,以保留旧架构名称。Retain legacy schema names by pre-pending them to the table name. 使用工作负荷边界的架构。Use schemas for the workload boundary.

CREATE SCHEMA [stg]; -- stg defines the staging boundary
GO
CREATE SCHEMA [edw]; -- edw defines the data warehouse boundary
GO
CREATE TABLE [stg].[dim_customer] --pre-pend the old schema name to the table and create in the staging boundary
(       CustKey BIGINT NOT NULL
,       ...
);
GO
CREATE TABLE [edw].[dim_customer] --pre-pend the old schema name to the table and create in the data warehouse boundary
(       CustKey BIGINT NOT NULL
,       ...
);

使用视图保留旧架构名称Retain legacy schema names using views

CREATE SCHEMA [stg]; -- stg defines the staging boundary
GO
CREATE SCHEMA [edw]; -- stg defines the data warehouse boundary
GO
CREATE SCHEMA [dim]; -- edw defines the legacy schema name boundary
GO
CREATE TABLE [stg].[customer] -- create the base staging tables in the staging boundary
(       CustKey    BIGINT NOT NULL
,       ...
)
GO
CREATE TABLE [edw].[customer] -- create the base data warehouse tables in the data warehouse boundary
(       CustKey    BIGINT NOT NULL
,       ...
)
GO
CREATE VIEW [dim].[customer] -- create a view in the legacy schema name boundary for presentation consistency purposes only
AS
SELECT  CustKey
,       ...
FROM    [edw].customer
;

Note

如果架构策略发生任何更改,则需要检查数据库的安全模型。Any change in schema strategy needs a review of the security model for the database. 在许多情况下,可以在架构级别分配权限,以简化安全模型。In many cases you might be able to simplify the security model by assigning permissions at the schema level. 如果需要更高粒度的权限,可以使用数据库角色。If more granular permissions are required then you can use database roles.

后续步骤Next steps

有关更多开发技巧,请参阅 开发概述For more development tips, see development overview.