教程:将数据加载到 Azure Synapse Analytics SQL 池Tutorial: Load data to Azure Synapse Analytics SQL pool

本教程使用 PolyBase 将 WideWorldImportersDW 数据仓库从 Azure Blob 存储加载到 Azure Synapse Analytics SQL 池中的数据仓库。This tutorial uses PolyBase to load the WideWorldImportersDW data warehouse from Azure Blob storage to your data warehouse in Azure Synapse Analytics SQL pool. 本教程使用 Azure 门户SQL Server Management Studio (SSMS) 执行以下操作:The tutorial uses the Azure portal and SQL Server Management Studio (SSMS) to:

  • 在 Azure 门户中使用 SQL 池创建数据仓库Create a data warehouse using SQL pool in the Azure portal
  • 在 Azure 门户中设置服务器级防火墙规则Set up a server-level firewall rule in the Azure portal
  • 使用 SSMS 连接到 SQL 池Connect to the SQL pool with SSMS
  • 创建专用于加载数据的用户Create a user designated for loading data
  • 创建使用 Azure Blob 作为数据源的外部表Create external tables that use Azure blob as the data source
  • 使用 CTAS T-SQL 语句将数据加载到数据仓库Use the CTAS T-SQL statement to load data into your data warehouse
  • 查看正在加载的数据的进度View the progress of data as it is loading
  • 在日期维度和销售事实数据表中生成一年的数据Generate a year of data in the date dimension and sales fact tables
  • 创建新加载的数据的统计信息Create statistics on the newly loaded data

如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a Trial before you begin.

准备阶段Before you begin

开始本教程之前,请下载并安装最新版 SQL Server Management Studio (SSMS)。Before you begin this tutorial, download and install the newest version of SQL Server Management Studio (SSMS).

登录到 Azure 门户Sign in to the Azure portal

登录到 Azure 门户Sign in to the Azure portal.

在 SQL 池中创建空白数据仓库Create a blank data warehouse in SQL pool

SQL 池是使用定义的一组计算资源创建的。A SQL pool is created with a defined set of compute resources. SQL 池在 Azure 资源组逻辑 SQL Server 中进行创建。The SQL pool is created within an Azure resource group and in a logical SQL server.

按照以下步骤创建空白的 SQL 池。Follow these steps to create a blank SQL pool.

  1. 在 Azure 门户中选择“创建资源”。Select Create a resource in the the Azure portal.

  2. 在“新建”页中选择“数据库”,然后在“新建”页上的“特色”下选择“Azure Synapse Analytics” 。Select Databases from the New page, and select Azure Synapse Analytics under Featured on the New page.

    创建 SQL 池

  3. 使用以下信息填写“项目详细信息”部分:Fill out the Project Details section with the following information:

    设置Setting 示例Example 说明Description
    订阅Subscription 你的订阅Your subscription 有关订阅的详细信息,请参阅订阅For details about your subscriptions, see Subscriptions.
    资源组Resource group MyResourceGroupmyResourceGroup 有关有效的资源组名称,请参阅 Naming rules and restrictions(命名规则和限制)。For valid resource group names, see Naming rules and restrictions.
  4. 在“SQL 池详细信息”下,提供 SQL 池的名称。Under SQL pool details, provide a name for your SQL pool. 接下来,从下拉菜单中选择现有的服务器,或选择“服务器”设置下的“新建”以创建新服务器 。Next, either select an existing server from the drop down, or select Create new under the Server settings to create a new server. 使用以下信息填写窗体:Fill out the form with the following information:

    设置Setting 建议的值Suggested value 说明Description
    SQL 池名称SQL pool name SampleDWSampleDW 如需有效的数据库名称,请参阅 Database Identifiers(数据库标识符)。For valid database names, see Database Identifiers.
    服务器名称Server name 任何全局唯一名称Any globally unique name 有关有效的服务器名称,请参阅 Naming rules and restrictions(命名规则和限制)。For valid server names, see Naming rules and restrictions.
    服务器管理员登录名Server admin login 任何有效的名称Any valid name 有关有效的登录名,请参阅 Database Identifiers(数据库标识符)。For valid login names, see Database Identifiers.
    密码Password 任何有效的密码Any valid password 密码必须至少有八个字符,且必须包含以下类别中的三个类别的字符:大写字符、小写字符、数字以及非字母数字字符。Your password must have at least eight characters and must contain characters from three of the following categories: upper case characters, lower case characters, numbers, and non-alphanumeric characters.
    位置Location 任何有效的位置Any valid location 有关区域的信息,请参阅 Azure 区域For information about regions, see Azure Regions.

    创建服务器

  5. 选择性能级别。Select performance level. 滑块默认设置为“DW1000c”。The slider by default is set to DW1000c. 向上和向下移动滑块可以选择所需的性能规模。Move the slider up and down to choose the desired performance scale.

    创建服务器 2

  6. 在“其他设置”页上,将“使用现有数据”设置为“无”,并将“排序规则”保留为默认设置“SQL_Latin1_General_CP1_CI_AS” 。On the Additional Settings page, set the Use existing data to None, and leave the Collation at the default of SQL_Latin1_General_CP1_CI_AS.

  7. 选择“查看 + 创建”以查看设置,然后选择“创建”以创建数据仓库 。Select Review + create to review your settings, and then select Create to create your data warehouse. 可以通过从“通知”菜单打开“部署正在进行”页来监视进度 。You can monitor your progress by opening the deployment in progress page from the Notifications menu.

    通知

创建服务器级防火墙规则Create a server-level firewall rule

Azure Synapse Analytics 服务在服务器级别创建一个防火墙,阻止外部应用程序和工具连接到服务器或服务器上的任何数据库。The Azure Synapse Analytics service creates a firewall at the server-level that prevents external applications and tools from connecting to the server or any databases on the server. 要启用连接,可以添加防火墙规则,为特定 IP 地址启用连接。To enable connectivity, you can add firewall rules that enable connectivity for specific IP addresses. 按照以下步骤为客户端的 IP 地址创建服务器级防火墙规则Follow these steps to create a server-level firewall rule for your client's IP address.

备注

Azure Synapse Analytics SQL 池通过端口 1433 进行通信。The Azure Synapse Analytics SQL pool communicates over port 1433. 如果尝试从企业网络内部进行连接,则该网络的防火墙可能不允许经端口 1433 的出站流量。If you are trying to connect from within a corporate network, outbound traffic over port 1433 might not be allowed by your network's firewall. 如果是这样,则无法连接到服务器,除非 IT 部门打开了端口 1433。If so, you cannot connect to your server unless your IT department opens port 1433.

  1. 部署完成后,使用导航菜单中的搜索框搜索池名称,然后选择 SQL 池资源。After the deployment completes, search for your pool name in the search box in the navigation menu, and select the SQL pool resource. 选择服务器名称。Select the server name.

    转到你的资源

  2. 选择服务器名称。Select the server name. 服务器名称server name

  3. 选择“显示防火墙设置”。Select Show firewall settings. 此时会打开服务器的“防火墙设置”页面。The Firewall settings page for the server opens.

    服务器设置

  4. 在“防火墙和虚拟网络”页上选择“添加客户端 IP”,将当前 IP 地址添加到新的防火墙规则 。On the Firewalls and virtual networks page, select Add client IP to add your current IP address to a new firewall rule. 防火墙规则可以针对单个 IP 地址或一系列 IP 地址打开端口 1433。A firewall rule can open port 1433 for a single IP address or a range of IP addresses.

    服务器防火墙规则

  5. 选择“保存” 。Select Save. 此时会针对当前的 IP 地址创建服务器级防火墙规则,在服务器上打开端口 1433。A server-level firewall rule is created for your current IP address opening port 1433 on the server.

现在可以使用客户端 IP 地址连接到服务器。You can now connect to the server using your client IP address. 可从 SQL Server Management Studio 或另一种所选工具进行连接。The connection works from SQL Server Management Studio or another tool of your choice. 连接时,使用之前创建的 serveradmin 帐户。When you connect, use the serveradmin account you created previously.

重要

默认情况下,所有 Azure 服务都允许通过 SQL 数据库防火墙进行访问。By default, access through the SQL Database firewall is enabled for all Azure services. 单击此页上的“关闭”,然后单击“保存”,对所有 Azure 服务禁用防火墙。Click OFF on this page and then click Save to disable the firewall for all Azure services.

获取完全限定的服务器名称Get the fully qualified server name

将使用完全限定的服务器名称连接到服务器。The fully qualified server name is what is used to connect to the server. 在 Azure 门户中转到你的 SQL 池资源,然后在“服务器名称”下查看完全限定的名称。Go to your SQL pool resource in the Azure portal and view the fully qualified name under Server name.

服务器名称

以服务器管理员的身份连接到服务器Connect to the server as server admin

本部分使用 SQL Server Management Studio (SSMS) 来建立与服务器的连接。This section uses SQL Server Management Studio (SSMS) to establish a connection to your server.

  1. 打开 SQL Server Management Studio。Open SQL Server Management Studio.

  2. 在“连接到服务器”对话框中,输入以下信息:In the Connect to Server dialog box, enter the following information:

    设置Setting       建议的值Suggested value 说明Description
    服务器类型Server type 数据库引擎Database engine 此值是必需的This value is required
    服务器名称Server name 完全限定的服务器名称The fully qualified server name 例如,sqlpoolservername.database.chinacloudapi.cn 就是完全限定的服务器名称。For example, sqlpoolservername.database.chinacloudapi.cn is a fully qualified server name.
    身份验证Authentication SQL Server 身份验证SQL Server Authentication SQL 身份验证是本教程中配置的唯一身份验证类型。SQL Authentication is the only authentication type that is configured in this tutorial.
    登录Login 服务器管理员帐户The server admin account 这是在创建服务器时指定的帐户。This is the account that you specified when you created the server.
    密码Password 服务器管理员帐户的密码The password for your server admin account 这是在创建服务器时指定的密码。This is the password that you specified when you created the server.

    连接到服务器

  3. 单击“连接” 。Click Connect. 此时会在 SSMS 中打开“对象资源管理器”窗口。The Object Explorer window opens in SSMS.

  4. 在“对象资源管理器”中,展开“数据库”。In Object Explorer, expand Databases. 然后展开“系统数据库”和“master”,查看 master 数据库中的对象。Then expand System databases and master to view the objects in the master database. 展开“SampleDW”,查看新数据库中的对象。Expand SampleDW to view the objects in your new database.

    数据库对象

创建用于加载数据的用户Create a user for loading data

服务器管理员帐户用于执行管理操作,不适合对用户数据运行查询。The server admin account is meant to perform management operations, and is not suited for running queries on user data. 加载数据是一种内存密集型操作。Loading data is a memory-intensive operation. 内存最大值是根据所用 SQL 池的代系、数据仓库单位资源类定义的。Memory maximums are defined according to the Generation of SQL pool you're using, data warehouse units, and resource class.

最好创建专用于加载数据的登录名和用户。It's best to create a login and user that is dedicated for loading data. 然后,将加载用户添加到启用相应最大内存分配的资源类Then add the loading user to a resource class that enables an appropriate maximum memory allocation.

由于当前是以服务器管理员的身份连接的,因此可以创建登录名和用户。Since you are currently connected as the server admin, you can create logins and users. 使用以下步骤创建名为 LoaderRC60 的登录名和用户。Use these steps to create a login and user called LoaderRC60. 然后将该用户分配到 staticrc60 资源类。Then assign the user to the staticrc60 resource class.

  1. 在 SSMS 中,右键单击“master”,然后在显示的下拉菜单中选择“新建查询”。In SSMS, right-click master to show a drop-down menu, and choose New Query. 此时将打开一个新的查询窗口。A new query window opens.

    在 Master 中新建查询

  2. 在查询窗口中,输入以下 T-SQL 命令,创建名为 LoaderRC60 的登录名和用户,并将“a123STRONGpassword!”替换为自己的密码。In the query window, enter these T-SQL commands to create a login and user named LoaderRC60, substituting your own password for 'a123STRONGpassword!'.

    CREATE LOGIN LoaderRC60 WITH PASSWORD = 'a123STRONGpassword!';
    CREATE USER LoaderRC60 FOR LOGIN LoaderRC60;
    
  3. 单击“执行” 。Click Execute.

  4. 右键单击“SampleDW”,并选择“新建查询”。 Right-click SampleDW, and choose New Query. 此时会打开一个新的查询窗口。A new query Window opens.

    针对示例数据仓库的新查询

  5. 输入以下 T-SQL 命令,为 LoaderRC60 登录名创建名为 LoaderRC60 的数据库用户。Enter the following T-SQL commands to create a database user named LoaderRC60 for the LoaderRC60 login. 第二行为新用户授予对新数据仓库的 CONTROL 权限。The second line grants the new user CONTROL permissions on the new data warehouse. 这些权限类似于使用户成为数据库所有者。These permissions are similar to making the user the owner of the database. 第三行将新用户添加为 staticrc60 资源类的成员。The third line adds the new user as a member of the staticrc60 resource class.

    CREATE USER LoaderRC60 FOR LOGIN LoaderRC60;
    GRANT CONTROL ON DATABASE::[SampleDW] to LoaderRC60;
    EXEC sp_addrolemember 'staticrc60', 'LoaderRC60';
    
  6. 单击“执行” 。Click Execute.

以加载用户的身份连接到服务器Connect to the server as the loading user

加载数据的第一步是以 LoaderRC60 的身份登录。The first step toward loading data is to login as LoaderRC60.

  1. 在对象资源管理器中,单击“连接”下拉菜单,然后选择“数据库引擎”。In Object Explorer, click the Connect drop down menu and select Database Engine. 此时会显示“连接到服务器”对话框。The Connect to Server dialog box appears.

    使用新登录名连接

  2. 输入完全限定的服务器名称,并输入 LoaderRC60 作为登录名。Enter the fully qualified server name, and enter LoaderRC60 as the Login. 输入 LoaderRC60 的密码。Enter your password for LoaderRC60.

  3. 单击“连接” 。Click Connect.

  4. 当连接准备就绪时,对象资源管理器中出现两个服务器连接。When your connection is ready, you will see two server connections in Object Explorer. 一个是作为 ServerAdmin 连接,另一个是作为 LoaderRC60 连接。One connection as ServerAdmin and one connection as LoaderRC60.

    连接成功

创建外部表和对象Create external tables and objects

已准备好开始将数据加载到新的数据仓库。You are ready to begin the process of loading data into your new data warehouse. 若要了解如何将数据置于 Azure Blob 存储或将其直接从源加载到 SQL 池以供将来参考,请参阅加载概述For future reference, to learn how to get your data to Azure Blob storage or to load it directly from your source into SQL pool, see the loading overview.

运行以下 SQL 脚本,指定有关想要加载的数据的信息。Run the following SQL scripts to specify information about the data you wish to load. 此信息包括数据所在的位置、数据内容的格式以及数据的表定义。This information includes where the data is located, the format of the contents of the data, and the table definition for the data. 数据位于全局 Azure Blob 中。The data is located in a global Azure Blob.

  1. 在前一部分,我们以 LoaderRC60 的身份登录到了数据仓库。In the previous section, you logged into your data warehouse as LoaderRC60. 在 SSMS 中,右键单击 LoaderRC60 连接下面的“SampleDW”,并选择“新建查询”。 In SSMS, right-click SampleDW under your LoaderRC60 connection and select New Query. 此时会显示一个新的查询窗口。A new query window appears.

    新的加载查询窗口

  2. 比较查询窗口和之间的图像。Compare your query window to the previous image. 验证新查询窗口是否以 LoaderRC60 的身份运行,并对 SampleDW 数据库执行查询。Verify your new query window is running as LoaderRC60 and performing queries on your SampleDW database. 使用此查询窗口执行所有加载步骤。Use this query window to perform all of the loading steps.

  3. 创建 SampleDW 数据库的主密钥。Create a master key for the SampleDW database. 只需要为每个数据库创建一次主密钥。You only need to create a master key once per database.

    CREATE MASTER KEY;
    
  4. 运行以下 CREATE EXTERNAL DATA SOURCE 语句,定义 Azure Blob 的位置。Run the following CREATE EXTERNAL DATA SOURCE statement to define the location of the Azure blob. 这是外部全球进口商数据的位置。This is the location of the external worldwide importers data. 要运行追加到查询窗口的命令,请突出显示要运行的命令,然后单击“执行”。To run a command that you have appended to the query window, highlight the commands you wish to run and click Execute.

    CREATE EXTERNAL DATA SOURCE WWIStorage
    WITH
    (
        TYPE = Hadoop,
        LOCATION = 'wasbs://wideworldimporters@sqldwholdata.blob.core.windows.net'
    );
    
  5. 运行以下 CREATE EXTERNAL FILE FORMAT T-SQL 语句,指定外部数据文件的格式设置特征和选项。Run the following CREATE EXTERNAL FILE FORMAT T-SQL statement to specify the formatting characteristics and options for the external data file. 此语句指定外部数据需存储为文本,且值由竖线(“|”)字符分隔。This statement specifies the external data is stored as text and the values are separated by the pipe ('|') character.

    CREATE EXTERNAL FILE FORMAT TextFileFormat
    WITH
    (
        FORMAT_TYPE = DELIMITEDTEXT,
        FORMAT_OPTIONS
        (
            FIELD_TERMINATOR = '|',
            USE_TYPE_DEFAULT = FALSE
        )
    );
    
  6. 运行以下 CREATE SCHEMA 语句,创建外部文件格式的架构。Run the following CREATE SCHEMA statements to create a schema for your external file format. ext 架构提供组织即将创建的外部表的方法。The ext schema provides a way to organize the external tables you are about to create. wwi 架构组织要包含数据的标准表。The wwi schema organizes the standard tables that will contain the data.

    CREATE SCHEMA ext;
    GO
    CREATE SCHEMA wwi;
    
  7. 创建外部表。Create the external tables. 表定义存储在数据库中,但表引用数据存储在 Azure Blob 存储中。The table definitions are stored in the database, but the tables reference data that is stored in Azure blob storage. 运行以下 T-SQL 命令创建若干外部表,这些表都指向之前在外部数据源中定义的 Azure Blob。Run the following T-SQL commands to create several external tables that all point to the Azure blob you defined previously in the external data source.

    CREATE EXTERNAL TABLE [ext].[dimension_City](
        [City Key] [int] NOT NULL,
        [WWI City ID] [int] NOT NULL,
        [City] [nvarchar](50) NOT NULL,
        [State Province] [nvarchar](50) NOT NULL,
        [Country] [nvarchar](60) NOT NULL,
        [Continent] [nvarchar](30) NOT NULL,
        [Sales Territory] [nvarchar](50) NOT NULL,
        [Region] [nvarchar](30) NOT NULL,
        [Subregion] [nvarchar](30) NOT NULL,
        [Location] [nvarchar](76) NULL,
        [Latest Recorded Population] [bigint] NOT NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH (LOCATION='/v1/dimension_City/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );  
    CREATE EXTERNAL TABLE [ext].[dimension_Customer] (
        [Customer Key] [int] NOT NULL,
        [WWI Customer ID] [int] NOT NULL,
        [Customer] [nvarchar](100) NOT NULL,
        [Bill To Customer] [nvarchar](100) NOT NULL,
           [Category] [nvarchar](50) NOT NULL,
        [Buying Group] [nvarchar](50) NOT NULL,
        [Primary Contact] [nvarchar](50) NOT NULL,
        [Postal Code] [nvarchar](10) NOT NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH (LOCATION='/v1/dimension_Customer/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );  
    CREATE EXTERNAL TABLE [ext].[dimension_Employee] (
        [Employee Key] [int] NOT NULL,
        [WWI Employee ID] [int] NOT NULL,
        [Employee] [nvarchar](50) NOT NULL,
        [Preferred Name] [nvarchar](50) NOT NULL,
        [Is Salesperson] [bit] NOT NULL,
        [Photo] [varbinary](300) NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION='/v1/dimension_Employee/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[dimension_PaymentMethod] (
        [Payment Method Key] [int] NOT NULL,
        [WWI Payment Method ID] [int] NOT NULL,
        [Payment Method] [nvarchar](50) NOT NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/dimension_PaymentMethod/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[dimension_StockItem](
        [Stock Item Key] [int] NOT NULL,
        [WWI Stock Item ID] [int] NOT NULL,
        [Stock Item] [nvarchar](100) NOT NULL,
        [Color] [nvarchar](20) NOT NULL,
        [Selling Package] [nvarchar](50) NOT NULL,
        [Buying Package] [nvarchar](50) NOT NULL,
        [Brand] [nvarchar](50) NOT NULL,
        [Size] [nvarchar](20) NOT NULL,
        [Lead Time Days] [int] NOT NULL,
        [Quantity Per Outer] [int] NOT NULL,
        [Is Chiller Stock] [bit] NOT NULL,
        [Barcode] [nvarchar](50) NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Recommended Retail Price] [decimal](18, 2) NULL,
        [Typical Weight Per Unit] [decimal](18, 3) NOT NULL,
        [Photo] [varbinary](300) NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/dimension_StockItem/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[dimension_Supplier](
        [Supplier Key] [int] NOT NULL,
        [WWI Supplier ID] [int] NOT NULL,
        [Supplier] [nvarchar](100) NOT NULL,
        [Category] [nvarchar](50) NOT NULL,
        [Primary Contact] [nvarchar](50) NOT NULL,
        [Supplier Reference] [nvarchar](20) NULL,
        [Payment Days] [int] NOT NULL,
        [Postal Code] [nvarchar](10) NOT NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/dimension_Supplier/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[dimension_TransactionType](
        [Transaction Type Key] [int] NOT NULL,
        [WWI Transaction Type ID] [int] NOT NULL,
        [Transaction Type] [nvarchar](50) NOT NULL,
        [Valid From] [datetime2](7) NOT NULL,
        [Valid To] [datetime2](7) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/dimension_TransactionType/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_Movement] (
        [Movement Key] [bigint] NOT NULL,
        [Date Key] [date] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [Customer Key] [int] NULL,
        [Supplier Key] [int] NULL,
        [Transaction Type Key] [int] NOT NULL,
        [WWI Stock Item Transaction ID] [int] NOT NULL,
        [WWI Invoice ID] [int] NULL,
        [WWI Purchase Order ID] [int] NULL,
        [Quantity] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_Movement/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_Order] (
        [Order Key] [bigint] NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [Order Date Key] [date] NOT NULL,
        [Picked Date Key] [date] NULL,
        [Salesperson Key] [int] NOT NULL,
        [Picker Key] [int] NULL,
        [WWI Order ID] [int] NOT NULL,
        [WWI Backorder ID] [int] NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_Order/',
        DATA_SOURCE = WWIStorage,
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_Purchase] (
        [Purchase Key] [bigint] NOT NULL,
        [Date Key] [date] NOT NULL,
        [Supplier Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [WWI Purchase Order ID] [int] NULL,
        [Ordered Outers] [int] NOT NULL,
        [Ordered Quantity] [int] NOT NULL,
        [Received Outers] [int] NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Is Order Finalized] [bit] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_Purchase/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_Sale] (
        [Sale Key] [bigint] NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [Invoice Date Key] [date] NOT NULL,
        [Delivery Date Key] [date] NULL,
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_Sale/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_StockHolding] (
        [Stock Holding Key] [bigint] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [Quantity On Hand] [int] NOT NULL,
        [Bin Location] [nvarchar](20) NOT NULL,
        [Last Stocktake Quantity] [int] NOT NULL,
        [Last Cost Price] [decimal](18, 2) NOT NULL,
        [Reorder Level] [int] NOT NULL,
        [Target Stock Level] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_StockHolding/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    CREATE EXTERNAL TABLE [ext].[fact_Transaction] (
        [Transaction Key] [bigint] NOT NULL,
        [Date Key] [date] NOT NULL,
        [Customer Key] [int] NULL,
        [Bill To Customer Key] [int] NULL,
        [Supplier Key] [int] NULL,
        [Transaction Type Key] [int] NOT NULL,
        [Payment Method Key] [int] NULL,
        [WWI Customer Transaction ID] [int] NULL,
        [WWI Supplier Transaction ID] [int] NULL,
        [WWI Invoice ID] [int] NULL,
        [WWI Purchase Order ID] [int] NULL,
        [Supplier Invoice Number] [nvarchar](20) NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Outstanding Balance] [decimal](18, 2) NOT NULL,
        [Is Finalized] [bit] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH ( LOCATION ='/v1/fact_Transaction/',
        DATA_SOURCE = WWIStorage,  
        FILE_FORMAT = TextFileFormat,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0
    );
    
  8. 在对象资源管理器中展开“SampleDW”,查看已创建的外部表列表。In Object Explorer, expand SampleDW to see the list of external tables you created.

    查看外部表

将数据加载到 SQL 池中Load the data into SQL pool

本部分使用已定义的外部表将示例数据从 Azure Blob 加载到 SQL 池。This section uses the external tables you defined to load the sample data from Azure Blob to SQL pool.

备注

本教程直接将数据加载到最终表。This tutorial loads the data directly into the final table. 在生产环境中,通常使用 CREATE TABLE AS SELECT 将数据加载到临时表。In a production environment, you will usually use CREATE TABLE AS SELECT to load into a staging table. 数据在临时表中时,可以执行任何必要的转换。While data is in the staging table you can perform any necessary transformations. 要将临时表中的数据追加到生产表,可以使用 INSERT...SELECT 语句。To append the data in the staging table to a production table, you can use the INSERT...SELECT statement. 有关详细信息,请参阅将数据插入到生产表For more information, see Inserting data into a production table.

下面的脚本使用 CREATE TABLE AS SELECT (CTAS) T-SQL 语句将数据从 Azure 存储 Blob 加载到数据仓库中的新表。The script uses the CREATE TABLE AS SELECT (CTAS) T-SQL statement to load the data from Azure Storage Blob into new tables in your data warehouse. CTAS 基于 select 语句的结果创建新表。CTAS creates a new table based on the results of a select statement. 新表包含与 select 语句结果相同的列和数据类型。The new table has the same columns and data types as the results of the select statement. 当 select 语句从外部表进行选择时,数据将导入到数据仓库的关系表中。When the select statement selects from an external table, the data is imported into a relational table in the data warehouse.

此脚本不会将数据载入 wwi.dimension_Date 和 wwi.fact_Sale 表。This script does not load data into the wwi.dimension_Date and wwi.fact_Sale tables. 稍后的步骤会生成这些表,使表中包含数目可调整的行。These tables are generated in a later step in order to make the tables have a sizeable number of rows.

  1. 运行以下脚本,将数据加载到数据仓库中的新表。Run the following script to load the data into new tables in your data warehouse.

    CREATE TABLE [wwi].[dimension_City]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_City]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_City]')
    ;
    
    CREATE TABLE [wwi].[dimension_Customer]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_Customer]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Customer]')
    ;
    
    CREATE TABLE [wwi].[dimension_Employee]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_Employee]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Employee]')
    ;
    
    CREATE TABLE [wwi].[dimension_PaymentMethod]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_PaymentMethod]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_PaymentMethod]')
    ;
    
    CREATE TABLE [wwi].[dimension_StockItem]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_StockItem]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_StockItem]')
    ;
    
    CREATE TABLE [wwi].[dimension_Supplier]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_Supplier]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_Supplier]')
    ;
    
    CREATE TABLE [wwi].[dimension_TransactionType]
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[dimension_TransactionType]
    OPTION (LABEL = 'CTAS : Load [wwi].[dimension_TransactionType]')
    ;
    
    CREATE TABLE [wwi].[fact_Movement]
    WITH
    (
        DISTRIBUTION = HASH([Movement Key]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_Movement]
    OPTION (LABEL = 'CTAS : Load [wwi].[fact_Movement]')
    ;
    
    CREATE TABLE [wwi].[fact_Order]
    WITH
    (
        DISTRIBUTION = HASH([Order Key]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_Order]
    OPTION (LABEL = 'CTAS : Load [wwi].[fact_Order]')
    ;
    
    CREATE TABLE [wwi].[fact_Purchase]
    WITH
    (
        DISTRIBUTION = HASH([Purchase Key]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_Purchase]
    OPTION (LABEL = 'CTAS : Load [wwi].[fact_Purchase]')
    ;
    
    CREATE TABLE [wwi].[seed_Sale]
    WITH
    (
        DISTRIBUTION = HASH([WWI Invoice ID]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_Sale]
    OPTION (LABEL = 'CTAS : Load [wwi].[seed_Sale]')
    ;
    
    CREATE TABLE [wwi].[fact_StockHolding]
    WITH
    (
        DISTRIBUTION = HASH([Stock Holding Key]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_StockHolding]
    OPTION (LABEL = 'CTAS : Load [wwi].[fact_StockHolding]')
    ;
    
    CREATE TABLE [wwi].[fact_Transaction]
    WITH
    (
        DISTRIBUTION = HASH([Transaction Key]),
        CLUSTERED COLUMNSTORE INDEX
    )
    AS
    SELECT * FROM [ext].[fact_Transaction]
    OPTION (LABEL = 'CTAS : Load [wwi].[fact_Transaction]')
    ;
    
  2. 在加载数据的同时查看数据。View your data as it loads. 假设要加载几个 GB 的数据,并将其压缩成高性能群集列存储索引。You're loading several GBs of data and compressing it into highly performant clustered columnstore indexes. 在 SampleDW 中打开新查询窗口,并运行以下查询显示负载状态。Open a new query window on SampleDW, and run the following query to show the status of the load. 开始查询后,在 SQL 池执行繁重任务的同时,你可以享用咖啡和小吃。After starting the query, grab a coffee and a snack while SQL pool does some heavy lifting.

    SELECT
        r.command,
        s.request_id,
        r.status,
        count(distinct input_name) as nbr_files,
        sum(s.bytes_processed)/1024/1024/1024 as gb_processed
    FROM
        sys.dm_pdw_exec_requests r
        INNER JOIN sys.dm_pdw_dms_external_work s
        ON r.request_id = s.request_id
    WHERE
        r.[label] = 'CTAS : Load [wwi].[dimension_City]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_Customer]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_Employee]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_PaymentMethod]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_StockItem]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_Supplier]' OR
        r.[label] = 'CTAS : Load [wwi].[dimension_TransactionType]' OR
        r.[label] = 'CTAS : Load [wwi].[fact_Movement]' OR
        r.[label] = 'CTAS : Load [wwi].[fact_Order]' OR
        r.[label] = 'CTAS : Load [wwi].[fact_Purchase]' OR
        r.[label] = 'CTAS : Load [wwi].[fact_StockHolding]' OR
        r.[label] = 'CTAS : Load [wwi].[fact_Transaction]'
    GROUP BY
        r.command,
        s.request_id,
        r.status
    ORDER BY
        nbr_files desc,
        gb_processed desc;
    
  3. 查看所有系统查询。View all system queries.

    SELECT * FROM sys.dm_pdw_exec_requests;
    
  4. 结果让人欣慰,数据已顺利载入数据仓库。Enjoy seeing your data nicely loaded into your data warehouse.

    查看已加载的表

创建表以及用于生成日期和销售表的过程Create tables and procedures to generate the Date and Sales tables

本部分将创建 wwi.dimension_Date 和 wwi.fact_Sale 表。This section creates the wwi.dimension_Date and wwi.fact_Sale tables. 此外,还会创建可在 wwi.dimension_Date 和 wwi.fact_Sale 表中生成数百万行的存储过程。It also creates stored procedures that can generate millions of rows in the wwi.dimension_Date and wwi.fact_Sale tables.

  1. 创建 dimension_Date 和 fact_Sale 表。Create the dimension_Date and fact_Sale tables.

    CREATE TABLE [wwi].[dimension_Date]
    (
        [Date] [datetime] NOT NULL,
        [Day Number] [int] NOT NULL,
        [Day] [nvarchar](10) NOT NULL,
        [Month] [nvarchar](10) NOT NULL,
        [Short Month] [nvarchar](3) NOT NULL,
        [Calendar Month Number] [int] NOT NULL,
        [Calendar Month Label] [nvarchar](20) NOT NULL,
        [Calendar Year] [int] NOT NULL,
        [Calendar Year Label] [nvarchar](10) NOT NULL,
        [Fiscal Month Number] [int] NOT NULL,
        [Fiscal Month Label] [nvarchar](20) NOT NULL,
        [Fiscal Year] [int] NOT NULL,
        [Fiscal Year Label] [nvarchar](10) NOT NULL,
        [ISO Week Number] [int] NOT NULL
    )
    WITH
    (
        DISTRIBUTION = REPLICATE,
        CLUSTERED INDEX ([Date])
    );
    CREATE TABLE [wwi].[fact_Sale]
    (
        [Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        [Invoice Date Key] [date] NOT NULL,
        [Delivery Date Key] [date] NULL,
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    )
    WITH
    (
        DISTRIBUTION = HASH ( [WWI Invoice ID] ),
        CLUSTERED COLUMNSTORE INDEX
    )
    
  2. 创建 [wwi].[InitialSalesDataPopulation],以便以 8 为乘数增加 [wwi].[seed_Sale] 中的行数。Create [wwi].[InitialSalesDataPopulation] to increase the number of rows in [wwi].[seed_Sale] by a factor of eight.

    CREATE PROCEDURE [wwi].[InitialSalesDataPopulation] AS
    BEGIN
        INSERT INTO [wwi].[seed_Sale] (
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        )
        SELECT
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        FROM [wwi].[seed_Sale]
    
        INSERT INTO [wwi].[seed_Sale] (
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        )
        SELECT
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        FROM [wwi].[seed_Sale]
    
        INSERT INTO [wwi].[seed_Sale] (
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        )
        SELECT
            [Sale Key], [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
        FROM [wwi].[seed_Sale]
    END
    
  3. 创建此存储过程,以便在 wwi.dimension_Date 中填充行。Create this stored procedure that populates rows into wwi.dimension_Date.

    CREATE PROCEDURE [wwi].[PopulateDateDimensionForYear] @Year [int] AS
    BEGIN
        IF OBJECT_ID('tempdb..#month', 'U') IS NOT NULL
            DROP TABLE #month
        CREATE TABLE #month (
            monthnum int,
            numofdays int
        )
        WITH ( DISTRIBUTION = ROUND_ROBIN, heap )
        INSERT INTO #month
            SELECT 1, 31 UNION SELECT 2, CASE WHEN (@YEAR % 4 = 0 AND @YEAR % 100 <> 0) OR @YEAR % 400 = 0 THEN 29 ELSE 28 END UNION SELECT 3,31 UNION SELECT 4,30 UNION SELECT 5,31 UNION SELECT 6,30 UNION SELECT 7,31 UNION SELECT 8,31 UNION SELECT 9,30 UNION SELECT 10,31 UNION SELECT 11,30 UNION SELECT 12,31
    
        IF OBJECT_ID('tempdb..#days', 'U') IS NOT NULL
            DROP TABLE #days
        CREATE TABLE #days (days int)
        WITH (DISTRIBUTION = ROUND_ROBIN, HEAP)
    
        INSERT INTO #days
            SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14 UNION SELECT 15 UNION SELECT 16 UNION SELECT 17 UNION SELECT 18 UNION SELECT 19 UNION SELECT 20    UNION SELECT 21 UNION SELECT 22 UNION SELECT 23 UNION SELECT 24 UNION SELECT 25 UNION SELECT 26 UNION SELECT 27 UNION SELECT 28 UNION SELECT 29 UNION SELECT 30 UNION SELECT 31
    
        INSERT [wwi].[dimension_Date] (
            [Date], [Day Number], [Day], [Month], [Short Month], [Calendar Month Number], [Calendar Month Label], [Calendar Year], [Calendar Year Label], [Fiscal Month Number], [Fiscal Month Label], [Fiscal Year], [Fiscal Year Label], [ISO Week Number]
        )
        SELECT
            CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE) AS [Date]
            ,DAY(CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE)) AS [Day Number]
            ,CAST(DATENAME(day, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS CHAR(4)) AS DATE)) AS NVARCHAR(10)) AS [Day]
            ,CAST(DATENAME(month, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(10)) AS [Month]
            ,CAST(SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(3)) AS [Short Month]
            ,MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [Calendar Month Number]
            ,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(10)) AS [Calendar Month Label]
            ,YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [Calendar Year]
            ,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) AS nvarchar(10)) AS [Calendar Year Label]
            ,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
            THEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) - 10
            ELSE MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 2 END AS [Fiscal Month Number]
            ,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
            THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1
            ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) END AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(20)) AS [Fiscal Month Label]
            ,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
            THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1
            ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) END AS [Fiscal Year]
            ,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
            THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) + 1
            ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE))END AS nvarchar(4)) AS nvarchar(10)) AS [Fiscal Year Label]
            , DATEPART(ISO_WEEK, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) AS [ISO Week Number]
    FROM #month m
        CROSS JOIN #days d
    WHERE d.days <= m.numofdays
    
    DROP table #month;
    DROP table #days;
    END;
    
  4. 创建此过程,以便填充 wwi.dimension_Date 和 wwi.fact_Sale 表。Create this procedure that populates the wwi.dimension_Date and wwi.fact_Sale tables. 此过程调用 [wwi].[PopulateDateDimensionForYear] 来填充 wwi.dimension_Date。It calls [wwi].[PopulateDateDimensionForYear] to populate wwi.dimension_Date.

    CREATE PROCEDURE [wwi].[Configuration_PopulateLargeSaleTable] @EstimatedRowsPerDay [bigint],@Year [int] AS
    BEGIN
        SET NOCOUNT ON;
        SET XACT_ABORT ON;
    
        EXEC [wwi].[PopulateDateDimensionForYear] @Year;
    
        DECLARE @OrderCounter bigint = 0;
        DECLARE @NumberOfSalesPerDay bigint = @EstimatedRowsPerDay;
        DECLARE @DateCounter date;
        DECLARE @StartingSaleKey bigint;
        DECLARE @MaximumSaleKey bigint = (SELECT MAX([Sale Key]) FROM wwi.seed_Sale);
        DECLARE @MaxDate date;
        SET @MaxDate = (SELECT MAX([Invoice Date Key]) FROM wwi.fact_Sale)
        IF ( @MaxDate < CAST(@YEAR AS CHAR(4)) + '1231') AND (@MaxDate > CAST(@YEAR AS CHAR(4)) + '0101')
            SET @DateCounter = @MaxDate
        ELSE
            SET @DateCounter= CAST(@Year as char(4)) + '0101';
    
        PRINT 'Targeting ' + CAST(@NumberOfSalesPerDay AS varchar(20)) + ' sales per day.';
    
        DECLARE @OutputCounter varchar(20);
        DECLARE @variance DECIMAL(18,10);
        DECLARE @VariantNumberOfSalesPerDay BIGINT;
    
        WHILE @DateCounter < CAST(@YEAR AS CHAR(4)) + '1231'
        BEGIN
            SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112);
            RAISERROR(@OutputCounter, 0, 1);
            SET @variance = (SELECT RAND() * 10)*.01 + .95
            SET @VariantNumberOfSalesPerDay = FLOOR(@NumberOfSalesPerDay * @variance)
    
            SET @StartingSaleKey = @MaximumSaleKey - @VariantNumberOfSalesPerDay - FLOOR(RAND() * 20000);
            SET @OrderCounter = 0;
    
            INSERT [wwi].[fact_Sale] (
                [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
            )
            SELECT TOP(@VariantNumberOfSalesPerDay)
                [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], @DateCounter, DATEADD(day, 1, @DateCounter), [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key]
            FROM [wwi].[seed_Sale]
            WHERE
                 --[Sale Key] > @StartingSaleKey and /* IDENTITY DOES NOT WORK THE SAME IN SQLDW AND CAN'T USE THIS METHOD FOR VARIANT */
                [Invoice Date Key] >=cast(@YEAR AS CHAR(4)) + '-01-01'
            ORDER BY [Sale Key];
    
            SET @DateCounter = DATEADD(day, 1, @DateCounter);
        END;
    
    END;
    

生成数百万行Generate millions of rows

使用创建的存储过程在 wwi.fact_Sale 表中生成数百万行,并在 wwi.dimension_Date 表中生成相应的数据。Use the stored procedures you created to generate millions of rows in the wwi.fact_Sale table, and corresponding data in the wwi.dimension_Date table.

  1. 运行此过程,在 [wwi].[seed_Sale] 中播种更多行。Run this procedure to seed the [wwi].[seed_Sale] with more rows.

    EXEC [wwi].[InitialSalesDataPopulation]
    
  2. 运行此过程,以便每天在 wwi.fact_Sale 中填充 100,000 行,这些行对应于 2000 年的每一天。Run this procedure to populate wwi.fact_Sale with 100,000 rows per day for each day in the year 2000.

    EXEC [wwi].[Configuration_PopulateLargeSaleTable] 100000, 2000
    
  3. 在上一步骤中,由于生成的数据跨越一整年,因此,该过程可能需要花费较长时间。The data generation in the previous step might take a while as it progresses through the year. 若要查看当前进程正在处理哪一天的数据,请打开新的查询,并运行以下 SQL 命令:To see which day the current process is on, open a new query and run this SQL command:

    SELECT MAX([Invoice Date Key]) FROM wwi.fact_Sale;
    
  4. 运行以下命令查看已用的空间。Run the following command to see the space used.

    EXEC sp_spaceused N'wwi.fact_Sale';
    

填充复制表缓存Populate the replicated table cache

SQL 池通过将数据缓存到每个计算节点来复制表。SQL pool replicates a table by caching the data to each Compute node. 针对该表运行查询时,将会填充缓存。The cache gets populated when a query runs against the table. 因此,针对复制表运行的第一个查询可能需要花费额外的时间来填充缓存。Therefore, the first query on a replicated table might require extra time to populate the cache. 填充缓存后,针对复制表运行查询的速度会加快。After the cache is populated, queries on replicated tables run faster.

运行这些 SQL 查询可填充计算节点上的复制表缓存。Run these SQL queries to populate the replicated table cache on the Compute nodes.

SELECT TOP 1 * FROM [wwi].[dimension_City];
SELECT TOP 1 * FROM [wwi].[dimension_Customer];
SELECT TOP 1 * FROM [wwi].[dimension_Date];
SELECT TOP 1 * FROM [wwi].[dimension_Employee];
SELECT TOP 1 * FROM [wwi].[dimension_PaymentMethod];
SELECT TOP 1 * FROM [wwi].[dimension_StockItem];
SELECT TOP 1 * FROM [wwi].[dimension_Supplier];
SELECT TOP 1 * FROM [wwi].[dimension_TransactionType];

创建新加载的数据的统计信息Create statistics on newly loaded data

若要实现较高的查询性能,必须在首次加载后基于每个表的每个列创建统计信息。To achieve high query performance, it's important to create statistics on each column of each table after the first load. 此外,在对数据做出重大更改后,必须更新统计信息。It's also important to update statistics after substantial changes in the data.

  1. 创建此存储过程,以便更新所有表的所有列中的统计信息。Create this stored procedure that updates statistics on all columns of all tables.

    CREATE PROCEDURE    [dbo].[prc_sqldw_create_stats]
    (   @create_type    tinyint -- 1 default 2 Fullscan 3 Sample
    ,   @sample_pct     tinyint
    )
    AS
    
    IF @create_type IS NULL
    BEGIN
        SET @create_type = 1;
    END;
    
    IF @create_type NOT IN (1,2,3)
    BEGIN
        THROW 151000,'Invalid value for @stats_type parameter. Valid range 1 (default), 2 (fullscan) or 3 (sample).',1;
    END;
    
    IF @sample_pct IS NULL
    BEGIN;
        SET @sample_pct = 20;
    END;
    
    IF OBJECT_ID('tempdb..#stats_ddl') IS NOT NULL
    BEGIN;
        DROP TABLE #stats_ddl;
    END;
    
    CREATE TABLE #stats_ddl
    WITH    (   DISTRIBUTION    = HASH([seq_nmbr])
            ,   LOCATION        = USER_DB
            )
    AS
    WITH T
    AS
    (
    SELECT      t.[name]                        AS [table_name]
    ,           s.[name]                        AS [table_schema_name]
    ,           c.[name]                        AS [column_name]
    ,           c.[column_id]                   AS [column_id]
    ,           t.[object_id]                   AS [object_id]
    ,           ROW_NUMBER()
                OVER(ORDER BY (SELECT NULL))    AS [seq_nmbr]
    FROM        sys.[tables] t
    JOIN        sys.[schemas] s         ON  t.[schema_id]       = s.[schema_id]
    JOIN        sys.[columns] c         ON  t.[object_id]       = c.[object_id]
    LEFT JOIN   sys.[stats_columns] l   ON  l.[object_id]       = c.[object_id]
                                        AND l.[column_id]       = c.[column_id]
                                        AND l.[stats_column_id] = 1
    LEFT JOIN    sys.[external_tables] e    ON    e.[object_id]        = t.[object_id]
    WHERE       l.[object_id] IS NULL
    AND            e.[object_id] IS NULL -- not an external table
    )
    SELECT  [table_schema_name]
    ,       [table_name]
    ,       [column_name]
    ,       [column_id]
    ,       [object_id]
    ,       [seq_nmbr]
    ,       CASE @create_type
            WHEN 1
            THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+')' AS VARCHAR(8000))
            WHEN 2
            THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH FULLSCAN' AS VARCHAR(8000))
            WHEN 3
            THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH SAMPLE '+CONVERT(varchar(4),@sample_pct)+' PERCENT' AS VARCHAR(8000))
            END AS create_stat_ddl
    FROM T
    ;
    
    DECLARE @i INT              = 1
    ,       @t INT              = (SELECT COUNT(*) FROM #stats_ddl)
    ,       @s NVARCHAR(4000)   = N''
    ;
    
    WHILE @i <= @t
    BEGIN
        SET @s=(SELECT create_stat_ddl FROM #stats_ddl WHERE seq_nmbr = @i);
        PRINT @s
        EXEC sp_executesql @s
        SET @i+=1;
    END
    
    DROP TABLE #stats_ddl;
    
  2. 运行此命令,在数据仓库中所有表的所有列中创建统计信息。Run this command to create statistics on all columns of all tables in the data warehouse.

    EXEC [dbo].[prc_sqldw_create_stats] 1, NULL;
    

清理资源Clean up resources

需要为加载到数据仓库中的计算资源和数据付费。You are being charged for compute resources and data that you loaded into your data warehouse. 这些需要单独计费。These are billed separately.

请按照下列步骤按需清理资源。Follow these steps to clean up resources as you desire.

  1. 登录到 Azure 门户,单击数据仓库。Sign in to the Azure portal, click on your data warehouse.

    清理资源

  2. 如果想要将数据保留在存储中,可以在不使用数据仓库时暂停计算。If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse. 暂停计算后,仅需为数据存储付费,并且随时都可在准备处理数据时恢复计算。By pausing compute, you will only be charge for data storage and you can resume the compute whenever you are ready to work with the data. 要暂停计算,请单击“暂停”按钮。To pause compute, click the Pause button. 暂停数据仓库后,可看到“启动”按钮。When the data warehouse is paused, you will see a Start button. 要恢复计算,请单击“启动”。To resume compute, click Start.

  3. 如果不想支付将来的费用,则可以删除数据仓库。If you want to remove future charges, you can delete the data warehouse. 要删除数据仓库,以便不再为计算或存储付费,请单击“删除”。To remove the data warehouse so you won't be charged for compute or storage, click Delete.

  4. 要删除创建的 SQL Server,请单击上图中的“sample-svr.database.chinacloudapi.cn”,然后单击“删除”。To remove the SQL server you created, click sample-svr.database.chinacloudapi.cn in the previous image, and then click Delete. 请审慎执行此操作,因为删除服务器会删除分配给该服务器的所有数据库。Be careful with this as deleting the server will delete all databases assigned to the server.

  5. 若要删除资源组,请单击“SampleRG”,然后单击“删除资源组”。 To remove the resource group, click SampleRG, and then click Delete resource group.

后续步骤Next steps

在本教程中,已学习了如何创建数据仓库以及用于加载数据的用户。In this tutorial, you learned how to create a data warehouse and create a user for loading data. 创建了外部表以定义 Azure 存储 Blob 中存储的数据的结构,然后使用 PolyBase CREATE TABLE AS SELECT 语句将数据加载到数据仓库。You created external tables to define the structure for data stored in Azure Storage Blob, and then used the PolyBase CREATE TABLE AS SELECT statement to load data into your data warehouse.

完成了以下操作:You did these things:

  • 在 Azure 门户中使用 SQL 池创建数据仓库Created a data warehouse using SQL pool in the Azure portal
  • 在 Azure 门户中设置服务器级防火墙规则Set up a server-level firewall rule in the Azure portal
  • 使用 SSMS 连接到 SQL 池Connected to the SQL pool with SSMS
  • 创建了专用于加载数据的用户Created a user designated for loading data
  • 在 Azure 存储 Blob 中为数据创建了外部表Created external tables for data in Azure Storage Blob
  • 使用 CTAS T-SQL 语句将数据加载到数据仓库Used the CTAS T-SQL statement to load data into your data warehouse
  • 查看了正在加载的数据的进度Viewed the progress of data as it is loading
  • 创建了新加载的数据的统计信息Created statistics on the newly loaded data

转到开发概述,了解如何将现有数据库迁移到 Azure Synapse SQL 池。Advance to the development overview to learn how to migrate an existing database to Azure Synapse SQL pool.