使用 Synapse SQL 安全地加载数据Securely load data using Synapse SQL

本文重点介绍 COPY 语句的安全身份验证机制,并提供示例。This article highlights and provides examples on the secure authentication mechanisms for the COPY statement. COPY 语句是在 Synapse SQL 中批量加载数据的最灵活且安全的方法。The COPY statement is the most flexible and secure way of bulk loading data in Synapse SQL.

支持的身份验证机制Supported authentication mechanisms

下表介绍了每种文件类型和存储帐户所支持的身份验证方法。The following matrix describes the supported authentication methods for each file type and storage account. 这适用于源存储位置和错误文件位置。This applies to the source storage location and the error file location.

CSVCSV ParquetParquet ORCORC
Azure blob 存储Azure blob storage SAS/MSI/SERVICE PRINCIPAL/KEY/AADSAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS/KEYSAS/KEY SAS/KEYSAS/KEY
Azure Data Lake Gen2Azure Data Lake Gen2 SAS/MSI/SERVICE PRINCIPAL/KEY/AADSAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS(blob 终结点)/MSI(dfs 终结点)/服务主体/密钥/AADSAS (blob endpoint)/MSI (dfs endpoint)/SERVICE PRINCIPAL/KEY/AAD SAS(blob 终结点)/MSI(dfs 终结点)/服务主体/密钥/AADSAS (blob endpoint)/MSI (dfs endpoint)/SERVICE PRINCIPAL/KEY/AAD

A.A. 以 LF 作为行终止符的存储帐户密钥(Unix 样式的新行)Storage account key with LF as the row terminator (Unix-style new line)

--Note when specifying the column list, input field numbers start from 1
COPY INTO target_table (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://adlsgen2account.dfs.core.chinacloudapi.cn/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>')
    --CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='x6RWv4It5F2msnjelv3H4DA80n0QW0daPdw43jM0nyetx4c6CpDkdj3986DX5AHFMIf/YN4y6kkCnU8lb+Wx0Pj+6MDw=='),
    ,ROWTERMINATOR='0x0A' --0x0A specifies to use the Line Feed character (Unix based systems)
)

重要

  • 使用十六进制值 (0x0A) 指定换行符。Use the hexadecimal value (0x0A) to specify the Line Feed/Newline character. 请注意,COPY 语句会将“\n”字符串解释为“\r\n”(回车换行符)。Note the COPY statement will interpret the '\n' string as '\r\n' (carriage return newline).

B.B. 使用 CRLF 作为行终止符的共享访问签名 (SAS)(Windows 样式新行)Shared Access Signatures (SAS) with CRLF as the row terminator (Windows style new line)

COPY INTO target_table
FROM 'https://adlsgen2account.dfs.core.chinacloudapi.cn/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
    --CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSFSYsz4AkNa%2F%2BTx61FuQ%2FfKHefqoBE%3D'),
    ,ROWTERMINATOR='\n'-- COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems.
)

重要

  • 请勿将 ROWTERMINATOR 指定为“\r\n”,它将被解释为“\r\r\n”,并可能导致解析问题Do not specify the ROWTERMINATOR as '\r\n' which will be interpreted as '\r\r\n' and can result in parsing issues

C.C. 托管标识Managed Identity

将存储帐户附加到 VNet 时,需要进行托管标识身份验证。Managed Identity authentication is required when your storage account is attached to a VNet.

先决条件Prerequisites

  1. 按照此指南安装 Azure PowerShell。Install Azure PowerShell using this guide.
  2. 如果有常规用途 v1 或 Blob 存储帐户,则必须先按照此指南将该帐户升级到常规用途 v2 帐户。If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2 using this guide.
  3. 必须在 Azure 存储帐户的“防火墙和虚拟网络”设置菜单下启用“允许受信任的 Microsoft 服务访问此存储帐户”。You must have Allow trusted Microsoft services to access this storage account turned on under Azure Storage account Firewalls and Virtual networks settings menu. 有关详细信息,请参阅此指南Refer to this guide for more information.

步骤Steps

  1. 在 PowerShell 中,将 SQL Server 注册到 Azure Active Directory (AAD):In PowerShell, register your SQL server with Azure Active Directory (AAD):

    Connect-AzAccount -Environment AzureChinaCloud
    Select-AzSubscription -SubscriptionId your-subscriptionId
    Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-database-servername -AssignIdentity
    
  2. 按照此指南创建常规用途 v2 存储帐户Create a general-purpose v2 Storage Account using this guide.

    备注

    如果有常规用途 v1 或 Blob 存储帐户,则必须先按照此指南将该帐户升级到 v2 帐户。If you have a general-purpose v1 or blob storage account, you must first upgrade to v2 using this guide.

  3. 在存储帐户下导航到“访问控制(标识和访问管理)”,然后选择“添加角色分配”。Under your storage account, navigate to Access Control (IAM), and select Add role assignment. 为 SQL Server 分配存储 Blob 数据所有者、参与者或读取者 Azure 角色。Assign Storage Blob Data Owner, Contributor, or Reader Azure role to your SQL server.

    备注

    只有具有“所有者”特权的成员能够执行此步骤。Only members with Owner privilege can perform this step. 有关各种 Azure 内置角色,请参阅此指南For various Azure built-in roles, refer to this guide.

    重要

    指定存储 Blob 数据所有者、参与者或读取着 Azure 角色 。Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. 这些角色不同于所有者、参与者和读取者 Azure 内置角色。These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    授予 RBAC 加载权限

  4. 现在可以运行指定“托管标识”的 COPY 语句:You can now run the COPY statement specifying "Managed Identity":

    COPY INTO dbo.target_table
    FROM 'https://myaccount.blob.core.chinacloudapi.cn/myblobcontainer/folder1/*.txt'
    WITH (
        FILE_TYPE = 'CSV',
        CREDENTIAL = (IDENTITY = 'Managed Identity'),
    )
    

D.D. Azure Active Directory 身份验证 ((AAD))Azure Active Directory Authentication (AAD)

步骤Steps

  1. 在存储帐户下导航到“访问控制(标识和访问管理)”,然后选择“添加角色分配”。Under your storage account, navigate to Access Control (IAM), and select Add role assignment. 为 AAD Server 分配存储 Blob 数据所有者、参与者或读取者 Azure 角色。Assign Storage Blob Data Owner, Contributor, or Reader Azure role to your AAD user.

    重要

    指定存储 Blob 数据所有者、参与者或读取着 Azure 角色 。Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. 这些角色不同于所有者、参与者和读取者 Azure 内置角色。These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    授予 RBAC 加载权限

  2. 按照以下文档中的步骤配置 Azure AD 身份验证。Configure Azure AD authentication by going through the following documentation.

  3. 使用 Active Directory 连接到 SQL 池,现在可以在其中运行 COPY 语句,而无需指定任何凭据:Connect to your SQL pool using Active Directory where you can now run the COPY statement without specifying any credentials:

    COPY INTO dbo.target_table
    FROM 'https://myaccount.blob.core.chinacloudapi.cn/myblobcontainer/folder1/*.txt'
    WITH (
        FILE_TYPE = 'CSV'
    )
    

后续步骤Next steps