跨扩展云数据库进行报告(预览)Reporting across scaled-out cloud databases (preview)

跨分片进行查询

分片数据库跨扩展数据层分布行。Sharded databases distribute rows across a scaled out data tier. 所有分区(也称为横向分区)数据库的架构都是一样的。The schema is identical on all participating databases, also known as horizontal partitioning. 使用弹性查询,可以创建跨分片数据库中的所有数据库的报表。Using an elastic query, you can create reports that span all databases in a sharded database.

有关快速入门,请参阅跨扩展云数据库进行报告For a quick start, see Reporting across scaled-out cloud databases.

对于非分片数据库,请参阅跨具有不同架构的云数据库进行查询For non-sharded databases, see Query across cloud databases with different schemas.

必备条件Prerequisites

概述Overview

这些语句在弹性查询数据库中创建元数据表示形式的分片数据层。These statements create the metadata representation of your sharded data tier in the elastic query database.

  1. CREATE MASTER KEYCREATE MASTER KEY
  2. CREATE DATABASE SCOPED CREDENTIALCREATE DATABASE SCOPED CREDENTIAL
  3. CREATE EXTERNAL DATA SOURCECREATE EXTERNAL DATA SOURCE
  4. 创建外部表CREATE EXTERNAL TABLE

1.1 创建数据库范围的主密钥和凭据1.1 Create database scoped master key and credentials

弹性查询使用此凭据连接到远程数据库。The credential is used by the elastic query to connect to your remote databases.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';
CREATE DATABASE SCOPED CREDENTIAL <credential_name>  WITH IDENTITY = '<username>',  
SECRET = '<password>'
[;]

Note

请确保“<username>”中不包括任何“@servername”后缀。Make sure that the "<username>" does not include any "@servername" suffix.

1.2 创建外部数据源1.2 Create external data sources

语法:Syntax:

<External_Data_Source> ::=
CREATE EXTERNAL DATA SOURCE <data_source_name> WITH
        (TYPE = SHARD_MAP_MANAGER,
                   LOCATION = '<fully_qualified_server_name>',
        DATABASE_NAME = '<shardmap_database_name>',
        CREDENTIAL = <credential_name>,
        SHARD_MAP_NAME = '<shardmapname>'
               ) [;]

示例Example

CREATE EXTERNAL DATA SOURCE MyExtSrc
WITH
(
    TYPE=SHARD_MAP_MANAGER,
    LOCATION='myserver.database.chinacloudapi.cn',
    DATABASE_NAME='ShardMapDatabase',
    CREDENTIAL= SMMUser,
    SHARD_MAP_NAME='ShardMap'
);

检索当前外部数据源的列表:Retrieve the list of current external data sources:

select * from sys.external_data_sources;

外部数据源引用分片映射。The external data source references your shard map. 然后,弹性查询使用外部数据源和基础分片映射,枚举参与数据层的数据库。An elastic query then uses the external data source and the underlying shard map to enumerate the databases that participate in the data tier. 在弹性查询处理过程中,相同的凭据用于读取分片映射和访问上分片的数据。The same credentials are used to read the shard map and to access the data on the shards during the processing of an elastic query.

1.3 创建外部表1.3 Create external tables

语法:Syntax:

CREATE EXTERNAL TABLE [ database_name . [ schema_name ] . | schema_name. ] table_name  
    ( { <column_definition> } [ ,...n ])
    { WITH ( <sharded_external_table_options> ) }
) [;]  

<sharded_external_table_options> ::=
  DATA_SOURCE = <External_Data_Source>,
  [ SCHEMA_NAME = N'nonescaped_schema_name',]
  [ OBJECT_NAME = N'nonescaped_object_name',]
  DISTRIBUTION = SHARDED(<sharding_column_name>) | REPLICATED |ROUND_ROBIN

示例Example

CREATE EXTERNAL TABLE [dbo].[order_line](
     [ol_o_id] int NOT NULL,
     [ol_d_id] tinyint NOT NULL,
     [ol_w_id] int NOT NULL,
     [ol_number] tinyint NOT NULL,
     [ol_i_id] int NOT NULL,
     [ol_delivery_d] datetime NOT NULL,
     [ol_amount] smallmoney NOT NULL,
     [ol_supply_w_id] int NOT NULL,
     [ol_quantity] smallint NOT NULL,
     [ol_dist_info] char(24) NOT NULL
)

WITH
(
    DATA_SOURCE = MyExtSrc,
     SCHEMA_NAME = 'orders',
     OBJECT_NAME = 'order_details',
    DISTRIBUTION=SHARDED(ol_w_id)
);

从当前数据库中检索外部表的列表:Retrieve the list of external tables from the current database:

SELECT * from sys.external_tables;

要删除外部表:To drop external tables:

DROP EXTERNAL TABLE [ database_name . [ schema_name ] . | schema_name. ] table_name[;]

备注Remarks

DATA_SOURCE 子句定义了用于外部表的外部数据源(分片映射)。The DATA_SOURCE clause defines the external data source (a shard map) that is used for the external table.

SCHEMA_NAME 和 OBJECT_NAME 子句将外部表定义映射到不同架构的表。The SCHEMA_NAME and OBJECT_NAME clauses map the external table definition to a table in a different schema. 如果省略,则假定远程对象的架构是“dbo”,并假定其名称与所定义的外部表名称相同。If omitted, the schema of the remote object is assumed to be "dbo" and its name is assumed to be identical to the external table name being defined. 如果远程表的名称已在要在其中创建外部表的数据库中使用,那么该做法很有用。This is useful if the name of your remote table is already taken in the database where you want to create the external table. 例如,希望定义一个外部表,用于获取扩展数据层上目录视图或 DMV 的聚合视图。For example, you want to define an external table to get an aggregate view of catalog views or DMVs on your scaled out data tier. 由于目录视图和 DMV 已在本地存在,因此不能在外部表定义中使用其名称。Since catalog views and DMVs already exist locally, you cannot use their names for the external table definition. 而是改用不同名称,并在 SCHEMA_NAME 和/或 OBJECT_NAME 子句中使用目录视图或 DMV 的名称。Instead, use a different name and use the catalog view's or the DMV's name in the SCHEMA_NAME and/or OBJECT_NAME clauses. (请参阅下面的示例。)(See the example below.)

DISTRIBUTION 子句指定用于此表的数据分布。The DISTRIBUTION clause specifies the data distribution used for this table. 查询处理器利用 DISTRIBUTION 子句中提供的信息来构建最有效的查询计划。The query processor utilizes the information provided in the DISTRIBUTION clause to build the most efficient query plans.

  1. SHARDED 表示数据在各数据库之间横向分区。SHARDED means data is horizontally partitioned across the databases. 数据分布的分区键为 <sharding_column_name> 参数。The partitioning key for the data distribution is the <sharding_column_name> parameter.
  2. REPLICATED 表示每个数据库都存在表的相同副本。REPLICATED means that identical copies of the table are present on each database. 要负责确保各数据库上的副本是相同的。It is your responsibility to ensure that the replicas are identical across the databases.
  3. ROUND_ROBIN 表示使用依赖于应用程序的分布方法对表进行横向分区。ROUND_ROBIN means that the table is horizontally partitioned using an application-dependent distribution method.

数据层引用:外部表 DDL 引用外部数据源。Data tier reference: The external table DDL refers to an external data source. 外部数据源指定分片映射,后者为外部表提供在数据层中找到所有数据库所需的信息。The external data source specifies a shard map which provides the external table with the information necessary to locate all the databases in your data tier.

安全注意事项Security considerations

有权访问外部表的用户在使用外部数据源定义中提供的凭据时自动获得对基础远程表的访问权。Users with access to the external table automatically gain access to the underlying remote tables under the credential given in the external data source definition. 避免通过外部数据源的凭据进行不必要的权限提升。Avoid undesired elevation of privileges through the credential of the external data source. 将外部表当作常规表,在其中使用 GRANT 或 REVOKE。Use GRANT or REVOKE for an external table just as though it were a regular table.

定义外部数据源和外部表后,可以对外部表使用完整的 T-SQL。Once you have defined your external data source and your external tables, you can now use full T-SQL over your external tables.

示例:查询横向分区的数据库Example: querying horizontal partitioned databases

下面的查询在仓库、订单和订单行之间执行三向联接,并使用多个聚合和选择性筛选器。The following query performs a three-way join between warehouses, orders and order lines and uses several aggregates and a selective filter. 它假定 (1) 执行横向分区(分片),(2) 仓库、订单和订单行按仓库 ID 列进行分片,弹性查询可以在分片上并置联接,以及并行处理分片上成本较高部分的查询。It assumes (1) horizontal partitioning (sharding) and (2) that warehouses, orders and order lines are sharded by the warehouse id column, and that the elastic query can co-locate the joins on the shards and process the expensive part of the query on the shards in parallel.

    select  
         w_id as warehouse,
         o_c_id as customer,
         count(*) as cnt_orderline,
         max(ol_quantity) as max_quantity,
         avg(ol_amount) as avg_amount,
         min(ol_delivery_d) as min_deliv_date
    from warehouse
    join orders
    on w_id = o_w_id
    join order_line
    on o_id = ol_o_id and o_w_id = ol_w_id
    where w_id > 100 and w_id < 200
    group by w_id, o_c_id

远程 T-SQL 执行的存储过程:sp_execute_remoteStored procedure for remote T-SQL execution: sp_execute_remote

弹性查询还引入了一个存储过程,以便提供对分片的直接访问。Elastic query also introduces a stored procedure that provides direct access to the shards. 该存储过程名为 sp_execute _remote,可用于执行远程存储过程或远程数据库上的 T-SQL 代码。The stored procedure is called sp_execute _remote and can be used to execute remote stored procedures or T-SQL code on the remote databases. 它采用了以下参数:It takes the following parameters:

  • 数据源名称 (nvarchar):RDBMS 类型的外部数据源名称。Data source name (nvarchar): The name of the external data source of type RDBMS.
  • 查询 (nvarchar):要在每个分片上执行的 T-SQL 查询。Query (nvarchar): The T-SQL query to be executed on each shard.
  • 参数声明 (nvarchar) - 可选:在查询参数(如 sp_executesql)中使用的参数的字符串(包含数据类型定义)。Parameter declaration (nvarchar) - optional: String with data type definitions for the parameters used in the Query parameter (like sp_executesql).
  • 参数值列表 - 可选:以逗号分隔的参数值(如 sp_executesql)的列表。Parameter value list - optional: Comma-separated list of parameter values (like sp_executesql).

Sp_execute_remote 使用调用参数中提供的外部数据源在远程数据库上执行给定的 T-SQL 语句。The sp_execute_remote uses the external data source provided in the invocation parameters to execute the given T-SQL statement on the remote databases. 它使用外部数据源的凭据连接到分片映射管理器数据库和远程数据库。It uses the credential of the external data source to connect to the shardmap manager database and the remote databases.

示例:Example:

    EXEC sp_execute_remote
        N'MyExtSrc',
        N'select count(w_id) as foo from warehouse'

工具的连接Connectivity for tools

可以使用常规 SQL Server 连接字符串将应用程序、BI 和数据集成工具连接到具有外部表定义的数据库。Use regular SQL Server connection strings to connect your application, your BI and data integration tools to the database with your external table definitions. 请确保支持将 SQL Server 用作工具的数据源。Make sure that SQL Server is supported as a data source for your tool. 然后,引用弹性查询数据库,就像引用连接到工具的其他任何 SQL Server 数据库一样,并从工具或应用程序使用外部表,就像使用本地表一样。Then reference the elastic query database like any other SQL Server database connected to the tool, and use external tables from your tool or application as if they were local tables.

最佳做法Best practices

  • 确保已向弹性查询终结点数据库授予通过 SQL 数据库防火墙访问分片映射数据库和所有分片的权限。Ensure that the elastic query endpoint database has been given access to the shardmap database and all shards through the SQL DB firewalls.
  • 验证或强制执行由外部表定义的数据分布。Validate or enforce the data distribution defined by the external table. 如果实际的数据分布不同于表定义中指定的分布,查询可能会产生意外的结果。If your actual data distribution is different from the distribution specified in your table definition, your queries may yield unexpected results.
  • 当分片键上的谓词允许安全地从处理中排除某些分片时,弹性查询当前不执行分片消除。Elastic query currently does not perform shard elimination when predicates over the sharding key would allow it to safely exclude certain shards from processing.
  • 弹性查询最适合大部分计算可以在分片上完成的查询。Elastic query works best for queries where most of the computation can be done on the shards. 使用可以在分片或联接上通过分区键求值的选择性筛选器谓词(可以在所有分片上以分区对齐方式执行),通常可以获得最佳查询性能。You typically get the best query performance with selective filter predicates that can be evaluated on the shards or joins over the partitioning keys that can be performed in a partition-aligned way on all shards. 其他查询模式可能需要从分片将大量数据加载到头节点并且可能会执行效果不佳Other query patterns may need to load large amounts of data from the shards to the head node and may perform poorly

后续步骤Next steps