教程:加载纽约出租车数据集Tutorial: Load the New York Taxicab dataset

本教程使用 COPY 语句从 Azure Blob 存储帐户加载纽约出租车数据。This tutorial uses the COPY statement to load New York Taxicab dataset from an Azure blob storage account. 本教程使用 Azure 门户SQL Server Management Studio (SSMS) 执行以下操作:The tutorial uses the Azure portal and SQL Server Management Studio (SSMS) to:

  • 在 Azure 门户中创建 SQL 池Create a SQL pool in the Azure portal
  • 在 Azure 门户中设置服务器级防火墙规则Set up a server-level firewall rule in the Azure portal
  • 使用 SSMS 连接到数据仓库Connect to the data warehouse with SSMS
  • 创建专用于加载数据的用户Create a user designated for loading data
  • 为示例数据集创建表Create the tables for the sample dataset
  • 使用 COPY T-SQL 语句将数据加载到数据仓库Use the COPY T-SQL statement to load data into your data warehouse
  • 查看正在加载的数据的进度View the progress of data as it is loading

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

准备阶段Before you begin

开始本教程之前,请下载并安装最新版 SQL Server Management Studio (SSMS)。Before you begin this tutorial, download and install the newest version of SQL Server Management Studio (SSMS).

登录到 Azure 门户Log in to the Azure portal

登录到 Azure 门户Log in to the Azure portal.

创建空数据库Create a blank database

SQL 池是使用定义的一组计算资源创建的。A SQL pool is created with a defined set of compute resources. 数据库在 Azure 资源组逻辑 SQL 服务器中创建。The database is created within an Azure resource group and in a logical SQL server.

按照以下步骤创建空白的数据库。Follow these steps to create a blank database.

  1. 在 Azure 门户的左上角选择“创建资源”。Select Create a resource in the upper left-hand corner of the Azure portal.

  2. 在“新建”页中选择“数据库”,然后在“新建”页上的“特色”下选择“Azure Synapse Analytics” 。Select Databases from the New page, and select Azure Synapse Analytics under Featured on the New page.

    创建数据仓库

  3. 使用以下信息填写窗体:Fill out the form with the following information:

    设置Setting 建议的值Suggested value 说明Description
    名称*Name* mySampleDataWarehousemySampleDataWarehouse 如需有效的数据库名称,请参阅 Database Identifiers(数据库标识符)。For valid database names, see Database Identifiers.
    订阅Subscription 订阅Your subscription 有关订阅的详细信息,请参阅订阅For details about your subscriptions, see Subscriptions.
    资源组Resource group myResourceGroupmyResourceGroup 如需有效的资源组名称,请参阅 Naming rules and restrictions(命名规则和限制)。For valid resource group names, see Naming rules and restrictions.
    选择源Select source 空白数据库Blank database 指定创建空白数据库。Specifies to create a blank database. 请注意,数据仓库是一种数据库。Note, a data warehouse is one type of database.

    创建数据仓库

  4. 选择“服务器”,为新数据库创建并配置新服务器。Select Server to create and configure a new server for your new database. 使用以下信息填写“新建服务器”窗体:Fill out the New server form with the following information:

    设置Setting 建议的值Suggested value 说明Description
    服务器名称Server name 任何全局唯一名称Any globally unique name 如需有效的服务器名称,请参阅 Naming rules and restrictions(命名规则和限制)。For valid server names, see Naming rules and restrictions.
    服务器管理员登录名Server admin login 任何有效的名称Any valid name 如需有效的登录名,请参阅 Database Identifiers(数据库标识符)。For valid login names, see Database Identifiers.
    密码Password 任何有效的密码Any valid password 密码必须至少有八个字符,且必须包含以下类别中的三个类别的字符:大写字符、小写字符、数字以及非字母数字字符。Your password must have at least eight characters and must contain characters from three of the following categories: upper case characters, lower case characters, numbers, and non-alphanumeric characters.
    位置Location 任何有效的位置Any valid location 有关区域的信息,请参阅 Azure 区域For information about regions, see Azure Regions.

    创建服务器

  5. 选择“选择”。Select Select.

  6. 选择“性能级别”,指定数据仓库是 Gen1 还是 Gen2,以及数据仓库单位的数量。Select Performance level to specify whether the data warehouse is Gen1 or Gen2, and the number of data warehouse units.

  7. 对于本教程,请选择 SQL 池“Gen2”。For this tutorial, select SQL pool Gen2. 滑块默认设置为“DW1000c”。The slider is set to DW1000c by default. 请尝试上下移动滑块,以查看其工作原理。Try moving it up and down to see how it works.

    配置性能

  8. 选择“应用”。Select Apply.

  9. 在预配边栏选项卡中,为空白数据库选择一个排序规则。In the provisioning blade, select a collation for the blank database. 对于本教程,请使用默认值。For this tutorial, use the default value. 有关排序规则的详细信息,请参阅 Collations(排序规则)For more information about collations, see Collations

  10. 填写窗体后,请选择“创建”以预配数据库。Now that you have completed the form, select Create to provision the database. 预配需要数分钟。Provisioning takes a few minutes.

  11. 在工具栏上,选择“通知”监视部署过程。On the toolbar, select Notifications to monitor the deployment process.

    通知

创建服务器级防火墙规则Create a server-level firewall rule

服务器级别的防火墙会阻止外部应用程序和工具连接到服务器或服务器上的任何数据库。A firewall at the server-level that prevents external applications and tools from connecting to the server or any databases on the server. 要启用连接,可以添加防火墙规则,为特定 IP 地址启用连接。To enable connectivity, you can add firewall rules that enable connectivity for specific IP addresses. 按照以下步骤为客户端的 IP 地址创建服务器级防火墙规则Follow these steps to create a server-level firewall rule for your client's IP address.

备注

SQL 数据仓库通过端口 1433 进行通信。SQL Data Warehouse communicates over port 1433. 如果尝试从企业网络内部进行连接,则该网络的防火墙可能不允许经端口 1433 的出站流量。If you are trying to connect from within a corporate network, outbound traffic over port 1433 might not be allowed by your network's firewall. 如果是这样,则无法连接到服务器,除非 IT 部门打开了端口 1433。If so, you cannot connect to your server unless your IT department opens port 1433.

  1. 部署完成后,在左侧菜单中选择“SQL 数据库”,然后在“SQL 数据库”页上选择“mySampleDatabase”。After the deployment completes, select SQL databases from the left-hand menu and then select mySampleDatabase on the SQL databases page. 此时会打开数据库的概述页,显示完全限定的服务器名称(例如 mynewserver-20181129.database.chinacloudapi.cn),并且会提供进行进一步配置所需的选项。The overview page for your database opens, showing you the fully qualified server name (such as mynewserver-20181129.database.chinacloudapi.cn) and provides options for further configuration.

  2. 在后续的快速入门中,请复制此完全限定的服务器名称,将其用于连接到服务器及其数据库。Copy this fully qualified server name for use to connect to your server and its databases in subsequent quick starts. 然后选择服务器名称,打开服务器设置。Then select the server name to open server settings.

    查找服务器名称

  3. 选择服务器名称,打开服务器设置。Select the server name to open server settings.

    服务器设置

  4. 选择“显示防火墙设置”。Select Show firewall settings. 此时会打开服务器的“防火墙设置”页面。The Firewall settings page for the server opens.

    服务器防火墙规则

  5. 在工具栏上选择“添加客户端 IP”,将当前的 IP 地址添加到新的防火墙规则。Select Add client IP on the toolbar to add your current IP address to a new firewall rule. 防火墙规则可以针对单个 IP 地址或一系列 IP 地址打开端口 1433。A firewall rule can open port 1433 for a single IP address or a range of IP addresses.

  6. 选择“保存” 。Select Save. 此时会针对当前的 IP 地址创建服务器级防火墙规则,在服务器上打开端口 1433。A server-level firewall rule is created for your current IP address opening port 1433 on the server.

  7. 选择“确定”,然后关闭“防火墙设置”页。 Select OK and then close the Firewall settings page.

现在,可使用此 IP 地址连接到服务器及其数据仓库。You can now connect to the server and its data warehouses using this IP address. 可从 SQL Server Management Studio 或另一种所选工具进行连接。The connection works from SQL Server Management Studio or another tool of your choice. 连接时,请使用之前创建的 ServerAdmin 帐户。When you connect, use the ServerAdmin account you created previously.

重要

默认情况下,所有 Azure 服务都允许通过 SQL 数据库防火墙进行访问。By default, access through the SQL Database firewall is enabled for all Azure services. 在此页上选择“关闭”,然后选择“保存”,对所有 Azure 服务禁用防火墙。 Select OFF on this page and then select Save to disable the firewall for all Azure services.

获取完全限定的服务器名称Get the fully qualified server name

请在 Azure 门户中获取服务器的完全限定的服务器名称。Get the fully qualified server name for your server in the Azure portal. 稍后,在连接到服务器时,将使用该完全限定的名称。Later you will use the fully qualified name when connecting to the server.

  1. 登录到 Azure 门户Log in to the Azure portal.

  2. 在左侧菜单中选择“Azure Synapse Analytics”,然后在“Azure Synapse Analytics”页上选择你的数据库。 Select Azure Synapse Analytics from the left-hand menu, and select your database on the Azure Synapse Analytics page.

  3. 在数据库的“Azure 门户”页的“概要”窗格中,找到并复制“服务器名称”。 In the Essentials pane in the Azure portal page for your database, locate and then copy the Server name. 在此示例中,完全限定的名称为 mynewserver-20181129.database.chinacloudapi.cn。In this example, the fully qualified name is mynewserver-20181129.database.chinacloudapi.cn.

    连接信息

以服务器管理员的身份连接到服务器Connect to the server as server admin

本部分使用 SQL Server Management Studio (SSMS) 来建立与服务器的连接。This section uses SQL Server Management Studio (SSMS) to establish a connection to your server.

  1. 打开 SQL Server Management Studio。Open SQL Server Management Studio.

  2. 在“连接到服务器”对话框中,输入以下信息:In the Connect to Server dialog box, enter the following information:

    设置Setting 建议的值Suggested value 说明Description
    服务器类型Server type 数据库引擎Database engine 此值是必需的This value is required
    服务器名称Server name 完全限定的服务器名称The fully qualified server name 该名称应类似于:mynewserver-20181129.database.chinacloudapi.cnThe name should be something like this: mynewserver-20181129.database.chinacloudapi.cn.
    身份验证Authentication SQL Server 身份验证SQL Server Authentication SQL 身份验证是本教程中配置的唯一身份验证类型。SQL Authentication is the only authentication type that we have configured in this tutorial.
    登录Login 服务器管理员帐户The server admin account 这是在创建服务器时指定的帐户。This is the account that you specified when you created the server.
    密码Password 服务器管理员帐户的密码The password for your server admin account 这是在创建服务器时指定的密码。This is the password that you specified when you created the server.

    连接到服务器

  3. 选择“连接” 。Select Connect. 此时会在 SSMS 中打开“对象资源管理器”窗口。The Object Explorer window opens in SSMS.

  4. 在“对象资源管理器”中,展开“数据库”。In Object Explorer, expand Databases. 然后展开“系统数据库”和“master”,查看 master 数据库中的对象。Then expand System databases and master to view the objects in the master database. 展开“mySampleDatabase”,查看新数据库中的对象。Expand mySampleDatabase to view the objects in your new database.

    数据库对象

创建用于加载数据的用户Create a user for loading data

服务器管理员帐户用于执行管理操作,不适合对用户数据运行查询。The server admin account is meant to perform management operations, and is not suited for running queries on user data. 加载数据是一种内存密集型操作。Loading data is a memory-intensive operation. 内存最大值根据配置的数据仓库单位资源类定义。Memory maximums are defined according to the data warehouse units and resource class configured.

最好创建专用于加载数据的登录名和用户。It's best to create a login and user that is dedicated for loading data. 然后,将加载用户添加到启用相应最大内存分配的资源类Then add the loading user to a resource class that enables an appropriate maximum memory allocation.

由于当前是以服务器管理员的身份连接的,因此可以创建登录名和用户。Since you are currently connected as the server admin, you can create logins and users. 使用以下步骤创建名为 LoaderRC20 的登录名和用户。Use these steps to create a login and user called LoaderRC20. 然后将该用户分配到 staticrc20 资源类。Then assign the user to the staticrc20 resource class.

  1. 在 SSMS 中右键单击“master”,然后在显示的下拉菜单中选择“新建查询”。In SSMS, right-select master to show a drop-down menu, and choose New Query. 此时将打开一个新的查询窗口。A new query window opens.

    在 Master 中新建查询

  2. 在查询窗口中,输入以下 T-SQL 命令,创建一个名为 LoaderRC20 的登录名和用户,并将“a123STRONGpassword!”替换为自己的密码。In the query window, enter these T-SQL commands to create a login and user named LoaderRC20, substituting your own password for 'a123STRONGpassword!'.

    CREATE LOGIN LoaderRC20 WITH PASSWORD = 'a123STRONGpassword!';
    CREATE USER LoaderRC20 FOR LOGIN LoaderRC20;
    
  3. 选择“执行”。Select Execute.

  4. 右键单击“mySampleDataWarehouse”,然后选择“新建查询”。Right-click mySampleDataWarehouse, and choose New Query. 此时会打开一个新的查询窗口。A new query Window opens.

    针对示例数据仓库的新查询

  5. 输入以下 T-SQL 命令,为 LoaderRC20 登录名创建名为 LoaderRC20 的数据库用户。Enter the following T-SQL commands to create a database user named LoaderRC20 for the LoaderRC20 login. 第二行为新用户授予对新数据仓库的 CONTROL 权限。The second line grants the new user CONTROL permissions on the new data warehouse. 这些权限类似于使用户成为数据库所有者。These permissions are similar to making the user the owner of the database. 第三行将新用户添加为 staticrc20 资源类的成员。The third line adds the new user as a member of the staticrc20 resource class.

    CREATE USER LoaderRC20 FOR LOGIN LoaderRC20;
    GRANT CONTROL ON DATABASE::[mySampleDataWarehouse] to LoaderRC20;
    EXEC sp_addrolemember 'staticrc20', 'LoaderRC20';
    
  6. 选择“执行”。Select Execute.

以加载用户的身份连接到服务器Connect to the server as the loading user

加载数据的第一步是以 LoaderRC20 的身份登录。The first step toward loading data is to login as LoaderRC20.

  1. 在对象资源管理器中选择“连接”下拉菜单,然后选择“数据库引擎”。In Object Explorer, select the Connect drop down menu and select Database Engine. 此时会显示“连接到服务器”对话框。The Connect to Server dialog box appears.

    使用新登录名连接

  2. 输入完全限定的服务器名称,并输入“LoaderRC20”作为登录名。Enter the fully qualified server name, and enter LoaderRC20 as the Login. 输入 LoaderRC20 的密码。Enter your password for LoaderRC20.

  3. 选择“连接” 。Select Connect.

  4. 当连接准备就绪时,对象资源管理器中出现两个服务器连接。When your connection is ready, you will see two server connections in Object Explorer. 一个是作为 ServerAdmin 连接,另一个是作为 MedRCLogin 连接。One connection as ServerAdmin and one connection as MedRCLogin.

    连接成功

为示例数据创建表Create tables for the sample data

已准备好开始将数据加载到新的数据仓库。You are ready to begin the process of loading data into your new data warehouse. 本教程的此部分说明如何使用 COPY 语句从 Azure 存储 Blob 加载纽约市出租车数据。This part of the tutorial shows you how to use the COPY statement to load the New York City taxi cab dataset from an Azure Storage blob. 若要了解如何将数据置于 Azure Blob 存储或直接从源加载数据以供将来参考,请参阅加载概述For future reference, to learn how to get your data to Azure blob storage or to load it directly from your source, see the loading overview.

运行以下 SQL 脚本,并指定要加载的数据的相关信息。Run the following SQL scripts and specify information about the data you wish to load. 此信息包括数据所在的位置、数据内容的格式以及数据的表定义。This information includes where the data is located, the format of the contents of the data, and the table definition for the data.

  1. 在前一节中,已经以 LoaderRC20 的身份登录数据仓库。In the previous section, you logged into your data warehouse as LoaderRC20. 在 SSMS 中,右键单击 LoaderRC20 连接,然后选择“新建查询”。In SSMS, right-click your LoaderRC20 connection and select New Query. 此时会显示一个新的查询窗口。A new query window appears.

    新的加载查询窗口

  2. 比较查询窗口和之间的图像。Compare your query window to the previous image. 验证新的查询窗口以 LoaderRC20 的身份运行,并对 MySampleDataWarehouse 数据库执行查询。Verify your new query window is running as LoaderRC20 and performing queries on your MySampleDataWarehouse database. 使用此查询窗口执行所有加载步骤。Use this query window to perform all of the loading steps.

  3. 运行以下 T-SQL 语句以创建表:Run the following T-SQL statements to create the tables:

    CREATE TABLE [dbo].[Date]
    (
        [DateID] int NOT NULL,
        [Date] datetime NULL,
        [DateBKey] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DaySuffix] varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfWeek] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfWeekInMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfWeekInYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfQuarter] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DayOfYear] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [WeekOfMonth] varchar(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [WeekOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [WeekOfYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Month] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [MonthName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [MonthOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Quarter] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [QuarterName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Year] char(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [YearName] char(7) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [MonthYear] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [MMYYYY] char(6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [FirstDayOfMonth] date NULL,
        [LastDayOfMonth] date NULL,
        [FirstDayOfQuarter] date NULL,
        [LastDayOfQuarter] date NULL,
        [FirstDayOfYear] date NULL,
        [LastDayOfYear] date NULL,
        [IsHolidayUSA] bit NULL,
        [IsWeekday] bit NULL,
        [HolidayUSA] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[Geography]
    (
        [GeographyID] int NOT NULL,
        [ZipCodeBKey] varchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [County] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [City] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [State] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Country] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [ZipCode] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[HackneyLicense]
    (
        [HackneyLicenseID] int NOT NULL,
        [HackneyLicenseBKey] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [HackneyLicenseCode] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[Medallion]
    (
        [MedallionID] int NOT NULL,
        [MedallionBKey] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [MedallionCode] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[Time]
    (
        [TimeID] int NOT NULL,
        [TimeBKey] varchar(8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [HourNumber] tinyint NOT NULL,
        [MinuteNumber] tinyint NOT NULL,
        [SecondNumber] tinyint NOT NULL,
        [TimeInSecond] int NOT NULL,
        [HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [DayTimeBucketGroupKey] int NOT NULL,
        [DayTimeBucket] varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[Trip]
    (
        [DateID] int NOT NULL,
        [MedallionID] int NOT NULL,
        [HackneyLicenseID] int NOT NULL,
        [PickupTimeID] int NOT NULL,
        [DropoffTimeID] int NOT NULL,
        [PickupGeographyID] int NULL,
        [DropoffGeographyID] int NULL,
        [PickupLatitude] float NULL,
        [PickupLongitude] float NULL,
        [PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [DropoffLatitude] float NULL,
        [DropoffLongitude] float NULL,
        [DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [PassengerCount] int NULL,
        [TripDurationSeconds] int NULL,
        [TripDistanceMiles] float NULL,
        [PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [FareAmount] money NULL,
        [SurchargeAmount] money NULL,
        [TaxAmount] money NULL,
        [TipAmount] money NULL,
        [TollsAmount] money NULL,
        [TotalAmount] money NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    
    CREATE TABLE [dbo].[Weather]
    (
        [DateID] int NOT NULL,
        [GeographyID] int NOT NULL,
        [PrecipitationInches] float NOT NULL,
        [AvgTemperatureFahrenheit] float NOT NULL
    )
    WITH
    (
        DISTRIBUTION = ROUND_ROBIN,
        CLUSTERED COLUMNSTORE INDEX
    );
    

将数据加载到数据仓库Load the data into your data warehouse

本部分使用 COPY 语句从 Azure 存储 Blob 加载示例数据。This section uses the COPY statement to load the sample data from Azure Storage Blob.

备注

本教程直接将数据加载到最终表。This tutorial loads the data directly into the final table. 对于生产工作负载,通常要加载到临时表中。You would typically load into a staging table for your production workloads. 数据在临时表中时,可以执行任何必要的转换。While data is in the staging table you can perform any necessary transformations.

  1. 运行以下语句以加载数据:Run the following statements to load the data:

    COPY INTO [dbo].[Date]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Date'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = ''
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Date] - Taxi dataset');
    
    
    COPY INTO [dbo].[Geography]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Geography'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = ''
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Geography] - Taxi dataset');
    
    COPY INTO [dbo].[HackneyLicense]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/HackneyLicense'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = ''
    )
    OPTION (LABEL = 'COPY : Load [dbo].[HackneyLicense] - Taxi dataset');
    
    COPY INTO [dbo].[Medallion]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Medallion'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = ''
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Medallion] - Taxi dataset');
    
    COPY INTO [dbo].[Time]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Time'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = ''
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Time] - Taxi dataset');
    
    COPY INTO [dbo].[Weather]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Weather'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = ',',
        FIELDQUOTE = '',
        ROWTERMINATOR='0X0A'
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Weather] - Taxi dataset');
    
    COPY INTO [dbo].[Trip]
    FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013'
    WITH
    (
        FILE_TYPE = 'CSV',
        FIELDTERMINATOR = '|',
        FIELDQUOTE = '',
        ROWTERMINATOR='0X0A',
        COMPRESSION = 'GZIP'
    )
    OPTION (LABEL = 'COPY : Load [dbo].[Trip] - Taxi dataset');
    
  2. 在加载数据的同时查看数据。View your data as it loads. 假设要加载几个 GB 的数据,并将其压缩成高性能群集列存储索引。You're loading several GBs of data and compressing it into highly performant clustered columnstore indexes. 运行以下使用动态管理视图 (DMV) 的查询以显示负载的状态。Run the following query that uses a dynamic management views (DMVs) to show the status of the load.

    SELECT  r.[request_id]                           
    ,       r.[status]                               
    ,       r.resource_class                         
    ,       r.command
    ,       sum(bytes_processed) AS bytes_processed
    ,       sum(rows_processed) AS rows_processed
    FROM    sys.dm_pdw_exec_requests r
                  JOIN sys.dm_pdw_dms_workers w
                         ON r.[request_id] = w.request_id
    WHERE [label] = 'COPY : Load [dbo].[Date] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[Geography] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[HackneyLicense] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[Medallion] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[Time] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[Weather] - Taxi dataset' OR
        [label] = 'COPY : Load [dbo].[Trip] - Taxi dataset' 
    and session_id <> session_id() and type = 'WRITER'
    GROUP BY r.[request_id]                           
    ,       r.[status]                               
    ,       r.resource_class                         
    ,       r.command;
    
  3. 查看所有系统查询。View all system queries.

    SELECT * FROM sys.dm_pdw_exec_requests;
    
  4. 结果让人欣慰,数据已顺利载入数据仓库。Enjoy seeing your data nicely loaded into your data warehouse.

    查看已加载的表

清理资源Clean up resources

需要为加载到数据仓库中的计算资源和数据付费。You are being charged for compute resources and data that you loaded into your data warehouse. 这些需要单独计费。These are billed separately.

  • 如果想要将数据保留在存储中,可以在不使用数据仓库时暂停计算。If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse. 暂停计算后,仅需为数据存储付费,并且随时都可在准备处理数据时恢复计算。By pausing compute you will only be charge for data storage and you can resume the compute whenever you are ready to work with the data.
  • 如果不想支付将来的费用,则可以删除数据仓库。If you want to remove future charges, you can delete the data warehouse.

请按照下列步骤按需清理资源。Follow these steps to clean up resources as you desire.

  1. 登录到 Azure 门户,选择你的数据仓库。Log in to the Azure portal, select your data warehouse.

    清理资源

  2. 若要暂停计算,请选择“暂停”按钮。To pause compute, select the Pause button. 暂停数据仓库后,可看到“启动”按钮。When the data warehouse is paused, you will see a Start button. 若要恢复计算,请选择“启动”。To resume compute, select Start.

  3. 若要删除数据仓库,以便不再为计算或存储付费,请选择“删除”。To remove the data warehouse so you won't be charged for compute or storage, select Delete.

  4. 若要删除所创建的服务器,请选择上图中的“mynewserver-20181129.database.chinacloudapi.cn”,然后选择“删除” 。To remove the server you created, select mynewserver-20181129.database.chinacloudapi.cn in the previous image, and then select Delete. 请审慎执行此操作,因为删除服务器会删除分配给该服务器的所有数据库。Be careful with this as deleting the server will delete all databases assigned to the server.

  5. 若要删除资源组,请选择“myResourceGroup”,然后选择“删除资源组”。To remove the resource group, select myResourceGroup, and then select Delete resource group.

后续步骤Next steps

在本教程中,已学习了如何创建数据仓库以及用于加载数据的用户。In this tutorial, you learned how to create a data warehouse and create a user for loading data. 使用简单的 COPY 语句将数据加载到数据仓库。You used the simple COPY statement to load data into your data warehouse.

完成了以下操作:You did these things:

  • 在 Azure 门户中创建数据仓库Created a data warehouse in the Azure portal
  • 在 Azure 门户中设置服务器级防火墙规则Set up a server-level firewall rule in the Azure portal
  • 使用 SSMS 连接到数据仓库Connected to the data warehouse with SSMS
  • 创建了专用于加载数据的用户Created a user designated for loading data
  • 为示例数据创建表Created the tables for the sample data
  • 使用 COPY T-SQL 语句将数据加载到数据仓库Used the COPY T-SQL statement to load data into your data warehouse
  • 查看了正在加载的数据的进度Viewed the progress of data as it is loading

转到开发概述,了解如何将现有数据库迁移到 Azure Synapse Analytics:Advance to the development overview to learn how to migrate an existing database to Azure Synapse Analytics:

有关加载示例和参考的详细信息,请查看以下文档:For more loading examples and references, view the following documentation: