快速入门:使用 COPY 语句批量加载数据Quickstart: Bulk load data using the COPY statement

在本快速入门中,你将使用简单灵活的 COPY 语句将数据批量加载到 SQL 池中,以实现高吞吐量数据引入。In this quickstart, you'll bulk load data into your SQL pool using the simple and flexible COPY statement for high-throughput data ingestion. COPY 语句是推荐使用的加载实用工具,因为它提供以下功能,可以无缝且灵活地加载数据:The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to:

  • 允许权限较低的用户进行加载,不需要对数据仓库有严格的控制权限Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse
  • 只使用一个 T-SQL 语句,不需要创建任何其他数据库对象Leverage only a single T-SQL statement without having to create any additional database objects
  • 使用更精细的权限模型,无需使用共享访问签名 (SAS) 来公开存储帐户密钥Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
  • 为 ERRORFILE 位置 (REJECTED_ROW_LOCATION) 指定一个不同的存储帐户Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
  • 为每个目标列自定义默认值,并指定要加载到特定目标列中的源数据字段Customize default values for each target column and specify source data fields to load into specific target columns
  • 为 CSV 文件指定自定义行终止符Specify a custom row terminator for CSV files
  • 为 CSV 文件转义字符串、字段和行分隔符Escape string, field, and row delimiters for CSV files
  • 对 CSV 文件使用 SQL Server 日期格式Leverage SQL Server Date formats for CSV files
  • 在存储位置路径中指定通配符和多个文件Specify wildcards and multiple files in the storage location path

先决条件Prerequisites

本快速入门假设你已有一个 SQL 池。This quickstart assumes you already have a SQL pool. 如果尚未创建 SQL 池,请使用创建和连接 - 门户快速入门。If a SQL pool hasn't been created, use the Create and Connect - portal quickstart.

设置所需权限Set up the required permissions

-- List the permissions for your user
select  princ.name
,       princ.type_desc
,       perm.permission_name
,       perm.state_desc
,       perm.class_desc
,       object_name(perm.major_id)
from    sys.database_principals princ
left join
        sys.database_permissions perm
on      perm.grantee_principal_id = princ.principal_id
where name = '<yourusername>';

--Make sure your user has the permissions to CREATE tables in the [dbo] schema
GRANT CREATE TABLE TO <yourusername>;
GRANT ALTER ON SCHEMA::dbo TO <yourusername>;

--Make sure your user has ADMINISTER DATABASE BULK OPERATIONS permissions
GRANT ADMINISTER DATABASE BULK OPERATIONS TO <yourusername>

--Make sure your user has INSERT permissions on the target table
GRANT INSERT ON <yourtable> TO <yourusername>

创建目标表Create the target table

在此示例中,我们将从纽约出租车数据集加载数据。In this example, we'll be loading data from the New York taxi dataset. 我们将加载一个名为 Trip 的表,该表提供一年内的出租车行程。We'll load a table called Trip that represents taxi trips taken within a single year. 运行下列语句来创建此表:Run the following to create the table:

CREATE TABLE [dbo].[Trip]
(
    [DateID] int NOT NULL,
    [MedallionID] int NOT NULL,
    [HackneyLicenseID] int NOT NULL,
    [PickupTimeID] int NOT NULL,
    [DropoffTimeID] int NOT NULL,
    [PickupGeographyID] int NULL,
    [DropoffGeographyID] int NULL,
    [PickupLatitude] float NULL,
    [PickupLongitude] float NULL,
    [PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [DropoffLatitude] float NULL,
    [DropoffLongitude] float NULL,
    [DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [PassengerCount] int NULL,
    [TripDurationSeconds] int NULL,
    [TripDistanceMiles] float NULL,
    [PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [FareAmount] money NULL,
    [SurchargeAmount] money NULL,
    [TaxAmount] money NULL,
    [TipAmount] money NULL,
    [TollsAmount] money NULL,
    [TotalAmount] money NULL
)
WITH
(
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED COLUMNSTORE INDEX
);

运行 COPY 语句Run the COPY statement

运行以下 COPY 语句,将数据从 Azure blob 存储帐户加载到 Trip 表中。Run the following COPY statement that will load data from the Azure blob storage account into the Trip table.

COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/'
WITH (
   FIELDTERMINATOR='|',
   ROWTERMINATOR='0x0A'
) OPTION (LABEL = 'COPY: dbo.trip');

监视加载情况Monitor the load

通过定期运行以下查询来检查加载是否正在进行:Check whether your load is making progress by periodically running the following query:

SELECT  r.[request_id]                           
,       r.[status]                               
,       r.resource_class                         
,       r.command
,       sum(bytes_processed) AS bytes_processed
,       sum(rows_processed) AS rows_processed
FROM    sys.dm_pdw_exec_requests r
              JOIN sys.dm_pdw_dms_workers w
                     ON r.[request_id] = w.request_id
WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER'
GROUP BY r.[request_id]                           
,       r.[status]                               
,       r.resource_class                         
,       r.command;

后续步骤Next steps