使用 Azure 数据工厂在 Azure Synapse Analytics(前称为 Azure SQL 数据仓库)中复制和转换数据Copy and transform data in Azure Synapse Analytics (formerly Azure SQL Data Warehouse) by using Azure Data Factory

适用于: 是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述如何使用 Azure 数据工厂中的复制活动从/向 Azure Synapse Analytics 复制数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from and to Azure Synapse Analytics. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

适用于: 是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述了如何使用 Azure 数据工厂中的复制活动从/向 Azure SQL 数据仓库复制数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from and to Azure SQL Data Warehouse. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

本文概述如何使用 Azure 数据工厂中的复制活动从/向 Azure Synapse Analytics 复制数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from and to Azure Synapse Analytics. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

支持的功能Supported capabilities

以下活动支持此 Azure Synapse Analytics 连接器:This Azure Synapse Analytics connector is supported for the following activities:

对于复制活动,此 Azure Synapse Analytics 连接器支持以下功能:For Copy activity, this Azure Synapse Analytics connector supports these functions:

  • 将 SQL 身份验证和 Azure Active Directory (Azure AD) 应用程序令牌身份验证与服务主体或 Azure 资源的托管标识配合使用来复制数据。Copy data by using SQL authentication and Azure Active Directory (Azure AD) Application token authentication with a service principal or managed identities for Azure resources.
  • 作为源,使用 SQL 查询或存储过程检索数据。As a source, retrieve data by using a SQL query or stored procedure.
  • 作为接收器,使用 PolyBaseCOPY 语句(预览版)或批量插入来加载数据。As a sink, load data by using PolyBase or COPY statement (preview) or bulk insert. 为提高复制性能,我们建议使用 PolyBase 或 COPY 语句(预览版)。We recommend PolyBase or COPY statement (preview) for better copy performance.

Important

如果使用 Azure 数据工厂集成运行时复制数据,请将 Azure SQL Server 防火墙配置为允许 Azure 服务访问服务器。If you copy data by using Azure Data Factory Integration Runtime, configure an Azure SQL server firewall so that Azure services can access the server. 如果使用自承载集成运行时复制数据,请将 Azure SQL Server 防火墙配置为允许合适的 IP 范围。If you copy data by using a self-hosted integration runtime, configure the Azure SQL server firewall to allow the appropriate IP range. 此范围包括用于连接 Azure Synapse Analytics 的计算机的 IP。This range includes the machine's IP that is used to connect to Azure Synapse Analytics.

入门Get started

Tip

要实现最佳性能,请使用 PolyBase 将数据载入 Azure Synapse Analytics。To achieve best performance, use PolyBase to load data into Azure Synapse Analytics. 有关详细信息,请参阅使用 PolyBase 将数据载入 Azure Synapse Analytics 部分。The Use PolyBase to load data into Azure Synapse Analytics section has details. 有关带有用例的演练,请参阅在不到 15 分钟的时间里通过 Azure 数据工厂将 1 TB 的数据载入 Azure Synapse AnalyticsFor a walkthrough with a use case, see Load 1 TB into Azure Synapse Analytics under 15 minutes with Azure Data Factory.

可以通过以下工具或 SDK 之一结合使用复制活动和管道。You can use one of the following tools or SDKs to use the copy activity with a pipeline. 选择链接,查看分步说明:Select a link for step-by-step instructions:

对于特定于 Azure Synapse Analytics 连接器的数据工厂实体,以下部分提供了有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that define Data Factory entities specific to an Azure Synapse Analytics connector.

链接服务属性Linked service properties

Azure Synapse Analytics 链接服务支持以下属性:The following properties are supported for an Azure Synapse Analytics linked service:

属性Property 说明Description 必需Required
typetype type 属性必须设置为 AzureSqlDWThe type property must be set to AzureSqlDW. Yes
connectionStringconnectionString connectionString 属性指定连接到 Azure Synapse Analytics 实例所需的信息。Specify the information needed to connect to the Azure Synapse Analytics instance for the connectionString property.
将此字段标记为 SecureString,以便安全地将其存储在数据工厂中。Mark this field as a SecureString to store it securely in Data Factory. 还可以将密码/服务主体密钥放在 Azure 密钥保管库中,如果是 SQL 身份验证,则从连接字符串中拉取 password 配置。You can also put password/service principal key in Azure Key Vault,and if it's SQL authentication pull the password configuration out of the connection string. 有关更多详细信息,请参阅表下方的 JSON 示例和将凭据存储在 Azure 密钥保管库中一文。See the JSON example below the table and Store credentials in Azure Key Vault article with more details.
Yes
servicePrincipalIdservicePrincipalId 指定应用程序的客户端 ID。Specify the application's client ID. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的。Yes, when you use Azure AD authentication with a service principal.
servicePrincipalKeyservicePrincipalKey 指定应用程序的密钥。Specify the application's key. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的。Yes, when you use Azure AD authentication with a service principal.
tenanttenant 指定应用程序的租户信息(域名或租户 ID)。Specify the tenant information (domain name or tenant ID) under which your application resides. 可将鼠标悬停在 Azure 门户右上角进行检索。You can retrieve it by hovering the mouse in the top-right corner of the Azure portal. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的。Yes, when you use Azure AD authentication with a service principal.
connectViaconnectVia 用于连接到数据存储的集成运行时The integration runtime to be used to connect to the data store. 可使用 Azure 集成运行时或自承载集成运行时(如果数据存储位于专用网络)。You can use Azure Integration Runtime or a self-hosted integration runtime (if your data store is located in a private network). 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

有关各种身份验证类型,请参阅关于先决条件和 JSON 示例的以下各部分:For different authentication types, refer to the following sections on prerequisites and JSON samples, respectively:

Tip

如果遇到错误(错误代码为“UserErrorFailedToConnectToSqlServer”,且消息如“数据库的会话限制为 XXX 且已达到。”),请将 Pooling=false 添加到连接字符串中,然后重试。If you hit error with error code as "UserErrorFailedToConnectToSqlServer" and message like "The session limit for the database is XXX and has been reached.", add Pooling=false to your connection string and try again.

SQL 身份验证SQL authentication

使用 SQL 身份验证的链接服务示例Linked service example that uses SQL authentication

{
    "name": "AzureSqlDWLinkedService",
    "properties": {
        "type": "AzureSqlDW",
        "typeProperties": {
            "connectionString": "Server=tcp:<servername>.database.chinacloudapi.cn,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Azure 密钥保管库中的密码:Password in Azure Key Vault:

{
    "name": "AzureSqlDWLinkedService",
    "properties": {
        "type": "AzureSqlDW",
        "typeProperties": {
            "connectionString": "Server=tcp:<servername>.database.chinacloudapi.cn,1433;Database=<databasename>;User ID=<username>@<servername>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30",
            "password": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

服务主体身份验证Service principal authentication

若要使用基于服务主体的 Azure AD 应用程序令牌身份验证,请执行以下步骤:To use service principal-based Azure AD application token authentication, follow these steps:

  1. 从 Azure 门户 创建 Azure Active Directory 应用程序Create an Azure Active Directory application from the Azure portal. 记下应用程序名称,以及以下定义链接服务的值:Make note of the application name and the following values that define the linked service:

    • 应用程序 IDApplication ID
    • 应用程序密钥Application key
    • 租户 IDTenant ID
  2. 在 Azure 门户上为 Azure SQL 服务器预配 Azure Active Directory 管理员(如果尚未执行该操作) 。Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. Azure AD 管理员可以是 Azure AD 用户或 Azure AD 组。The Azure AD administrator can be an Azure AD user or Azure AD group. 如果授予包含托管标识的组管理员角色,则可跳过步骤 3 和步骤 4。If you grant the group with managed identity an admin role, skip steps 3 and 4. 管理员将拥有对数据库的完全访问权限。The administrator will have full access to the database.

  3. 为服务主体 创建包含的数据库用户Create contained database users for the service principal. 使用 SSMS 等工具连接到要从中复制数据或要将数据复制到其中的数据仓库,其 Azure AD 标识至少具有 ALTER ANY USER 权限。Connect to the data warehouse from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. 运行以下 T-SQL:Run the following T-SQL:

    CREATE USER [your application name] FROM EXTERNAL PROVIDER;
    
  4. 像通常对 SQL 用户或其他用户所做的那样向服务主体授予所需的权限Grant the service principal needed permissions as you normally do for SQL users or others. 运行以下代码,或者参考此处的更多选项。Run the following code, or refer to more options here. 若要使用 PolyBase 来加载数据,请了解必需的数据库权限If you want to use PolyBase to load the data, learn the required database permission.

    EXEC sp_addrolemember db_owner, [your application name];
    
  5. 在 Azure 数据工厂中配置 Azure Synapse Analytics 链接服务Configure an Azure Synapse Analytics linked service in Azure Data Factory.

使用服务主体身份验证的链接服务示例Linked service example that uses service principal authentication

{
    "name": "AzureSqlDWLinkedService",
    "properties": {
        "type": "AzureSqlDW",
        "typeProperties": {
            "connectionString": "Server=tcp:<servername>.database.chinacloudapi.cn,1433;Database=<databasename>;Connection Timeout=30",
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenant": "<tenant info, e.g. microsoft.partner.onmschina.cn>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Azure 资源的托管标识身份验证Managed identities for Azure resources authentication

可将数据工厂与代表此特定工厂的 Azure 资源托管标识相关联。A data factory can be associated with a managed identity for Azure resources that represents the specific factory. 可将此托管标识用于 Azure Synapse Analytics 身份验证。You can use this managed identity for Azure Synapse Analytics authentication. 指定工厂可使用此标识访问数据仓库数据并从或向其中复制数据。The designated factory can access and copy data from or to your data warehouse by using this identity.

若要使用托管标识身份验证,请执行以下步骤:To use managed identity authentication, follow these steps:

  1. 在 Azure 门户上为 Azure SQL 服务器预配 Azure Active Directory 管理员(如果尚未执行该操作) 。Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. Azure AD 管理员可以是 Azure AD 用户或 Azure AD 组。The Azure AD administrator can be an Azure AD user or Azure AD group. 如果授予包含托管标识的组管理员角色,则可跳过步骤 3 和步骤 4。If you grant the group with managed identity an admin role, skip steps 3 and 4. 管理员将拥有对数据库的完全访问权限。The administrator will have full access to the database.

  2. 为数据工厂托管标识 创建包含的数据库用户Create contained database users for the Data Factory Managed Identity. 使用 SSMS 等工具连接到要从中复制数据或要将数据复制到其中的数据仓库,其 Azure AD 标识至少具有 ALTER ANY USER 权限。Connect to the data warehouse from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. 运行以下 T-SQL。Run the following T-SQL.

    CREATE USER [your Data Factory name] FROM EXTERNAL PROVIDER;
    
  3. 像通常对 SQL 用户和其他用户所做的那样向数据工厂托管标识授予所需的权限Grant the Data Factory Managed Identity needed permissions as you normally do for SQL users and others. 运行以下代码,或者参考此处的更多选项。Run the following code, or refer to more options here. 若要使用 PolyBase 来加载数据,请了解必需的数据库权限If you want to use PolyBase to load the data, learn the required database permission.

    EXEC sp_addrolemember db_owner, [your Data Factory name];
    
  4. 在 Azure 数据工厂中配置 Azure Synapse Analytics 链接服务Configure an Azure Synapse Analytics linked service in Azure Data Factory.

示例:Example:

{
    "name": "AzureSqlDWLinkedService",
    "properties": {
        "type": "AzureSqlDW",
        "typeProperties": {
            "connectionString": "Server=tcp:<servername>.database.chinacloudapi.cn,1433;Database=<databasename>;Connection Timeout=30"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article.

Azure Synapse Analytics 数据集支持以下属性:The following properties are supported for Azure Synapse Analytics dataset:

属性Property 说明Description 必需Required
typetype 数据集的 type 属性必须设置为 AzureSqlDWTableThe type property of the dataset must be set to AzureSqlDWTable. Yes
schemaschema 架构的名称。Name of the schema. 对于源为“否”,对于接收器为“是”No for source, Yes for sink
tabletable 表/视图的名称。Name of the table/view. 对于源为“否”,对于接收器为“是”No for source, Yes for sink
tableNametableName 具有架构的表/视图的名称。Name of the table/view with schema. 此属性支持后向兼容性。This property is supported for backward compatibility. 对于新的工作负荷,请使用 schematableFor new workload, use schema and table. 对于源为“否”,对于接收器为“是”No for source, Yes for sink

数据集属性示例Dataset properties example

{
    "name": "AzureSQLDWDataset",
    "properties":
    {
        "type": "AzureSqlDWTable",
        "linkedServiceName": {
            "referenceName": "<Azure Synapse Analytics linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, retrievable during authoring > ],
        "typeProperties": {
            "schema": "<schema_name>",
            "table": "<table_name>"
        }
    }
}

复制活动属性Copy Activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 Azure Synapse Analytics 源和接收器支持的属性列表。This section provides a list of properties supported by the Azure Synapse Analytics source and sink.

Azure Synapse Analytics 用作源Azure Synapse Analytics as the source

若要从 Azure Synapse Analytics 复制数据,请将复制活动源中的 type 属性设置为 SqlDWSourceTo copy data from Azure Synapse Analytics, set the type property in the Copy Activity source to SqlDWSource. 复制活动 source 节支持以下属性:The following properties are supported in the Copy Activity source section:

属性Property 说明Description 必需Required
typetype 复制活动源的 type 属性必须设置为 SqlDWSourceThe type property of the Copy Activity source must be set to SqlDWSource. Yes
sqlReaderQuerysqlReaderQuery 使用自定义 SQL 查询读取数据。Use the custom SQL query to read data. 示例:select * from MyTableExample: select * from MyTable. No
sqlReaderStoredProcedureNamesqlReaderStoredProcedureName 从源表读取数据的存储过程的名称。The name of the stored procedure that reads data from the source table. 最后一个 SQL 语句必须是存储过程中的 SELECT 语句。The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParametersstoredProcedureParameters 存储过程的参数。Parameters for the stored procedure.
允许的值为名称或值对。Allowed values are name or value pairs. 参数的名称和大小写必须与存储过程参数的名称和大小写匹配。Names and casing of parameters must match the names and casing of the stored procedure parameters.
No
isolationLevelisolationLevel 指定 SQL 源的事务锁定行为。Specifies the transaction locking behavior for the SQL source. 允许的值为:ReadCommitted(默认值)、ReadUncommittedRepeatableReadSerializableSnapshotThe allowed values are: ReadCommitted (default), ReadUncommitted, RepeatableRead, Serializable, Snapshot. 请参阅此文档了解更多详细信息。Refer to this doc for more details. No

示例:使用 SQL 查询Example: using SQL query

"activities":[
    {
        "name": "CopyFromAzureSQLDW",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL DW input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlDWSource",
                "sqlReaderQuery": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

示例:使用存储过程Example: using stored procedure

"activities":[
    {
        "name": "CopyFromAzureSQLDW",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL DW input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlDWSource",
                "sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
                "storedProcedureParameters": {
                    "stringData": { "value": "str3" },
                    "identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

示例存储过程:Sample stored procedure:

CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
    @stringData varchar(20),
    @identifier int
)
AS
SET NOCOUNT ON;
BEGIN
    select *
    from dbo.UnitTestSrcTable
    where dbo.UnitTestSrcTable.stringData != stringData
    and dbo.UnitTestSrcTable.identifier != identifier
END
GO

Azure Synapse Analytics 用作接收器Azure Synapse Analytics as sink

Azure 数据工厂支持通过三种方式将数据载入 SQL 数据仓库。Azure Data Factory supports three ways to load data into SQL Data Warehouse.

SQL 数据仓库接收器复制选项

最快且最灵活的数据加载方式是使用 PolyBaseCOPY 语句(预览版)。The fastest and most scalable way to load data is through PolyBase or the COPY statement (preview).

要向 Azure SQL 数据仓库复制数据,请将复制活动中的接收器类型设置为 SqlDWSinkTo copy data to Azure SQL Data Warehouse, set the sink type in Copy Activity to SqlDWSink. 复制活动 sink 节支持以下属性:The following properties are supported in the Copy Activity sink section:

属性Property 说明Description 必需Required
typetype 复制活动接收器的 type 属性必须设置为 SqlDWSinkThe type property of the Copy Activity sink must be set to SqlDWSink. Yes
allowPolyBaseallowPolyBase 指示是否使用 PolyBase 将数据载入 SQL 数据仓库。Indicates whether to use PolyBase to load data into SQL Data Warehouse. allowCopyCommandallowPolyBase 不能同时为 true。allowCopyCommand and allowPolyBase cannot be both true.

有关约束和详细信息,请参阅使用 PolyBase 将数据加载到 Azure SQL 数据仓库部分。See Use PolyBase to load data into Azure SQL Data Warehouse section for constraints and details.

允许的值为 TrueFalse(默认值)。Allowed values are True and False (default).
否。No.
使用 PolyBase 时适用。Apply when using PolyBase.
polyBaseSettingspolyBaseSettings allowPolybase 属性设置为 true 时可以指定的一组属性。A group of properties that can be specified when the allowPolybase property is set to true. 否。No.
使用 PolyBase 时适用。Apply when using PolyBase.
allowCopyCommandallowCopyCommand 指示是否使用 COPY 语句(预览版)将数据载入 SQL 数据仓库。Indicates whether to use COPY statement (preview) to load data into SQL Data Warehouse. allowCopyCommandallowPolyBase 不能同时为 true。allowCopyCommand and allowPolyBase cannot be both true.

有关约束和详细信息,请参阅使用 COPY 语句将数据载入 Azure SQL 数据仓库部分。See Use COPY statement to load data into Azure SQL Data Warehouse section for constraints and details.

允许的值为 TrueFalse(默认值)。Allowed values are True and False (default).
否。No.
使用 COPY 时适用。Apply when using COPY.
copyCommandSettingscopyCommandSettings allowCopyCommand 属性设置为 TRUE 时可以指定的一组属性。A group of properties that can be specified when allowCopyCommand property is set to TRUE. 否。No.
使用 COPY 时适用。Apply when using COPY.
writeBatchSizewriteBatchSize 每批要插入到 SQL 表中的行数。Number of rows to inserts into the SQL table per batch.

允许的值为 integer(行数)。The allowed value is integer (number of rows). 默认情况下,数据工厂会根据行大小动态确定适当的批大小。By default, Data Factory dynamically determines the appropriate batch size based on the row size.
否。No.
使用批量插入时适用。Apply when using bulk insert.
writeBatchTimeoutwriteBatchTimeout 超时前等待批量插入操作完成的时间。Wait time for the batch insert operation to finish before it times out.

允许的值为 timespanThe allowed value is timespan. 示例:"00:30:00"(30 分钟)。Example: "00:30:00" (30 minutes).
否。No.
使用批量插入时适用。Apply when using bulk insert.
preCopyScriptpreCopyScript 每次运行时,将数据写入到 Azure SQL 数据仓库之前,指定复制活动要运行的 SQL 查询。Specify a SQL query for Copy Activity to run before writing data into Azure SQL Data Warehouse in each run. 使用此属性清理预加载的数据。Use this property to clean up the preloaded data. No
tableOptiontableOption 指定是否根据源架构自动创建接收器表(如果不存在)。Specifies whether to automatically create the sink table if not exists based on the source schema. 在复制活动中配置暂存复制后,无法自动创建表。Auto table creation is not supported when staged copy is configured in copy activity. 允许的值为:none(默认值)、autoCreateAllowed values are: none (default), autoCreate. No
disableMetricsCollectiondisableMetricsCollection 数据工厂收集指标(如 SQL 数据仓库 DWU),以获取复制性能优化和建议。Data Factory collects metrics such as SQL Data Warehouse DWUs for copy performance optimization and recommendations. 如果你担心此行为,请指定 true 将其关闭。If you are concerned with this behavior, specify true to turn it off. 否(默认值为 falseNo (default is false)

SQL 数据仓库接收器示例SQL Data Warehouse sink example

"sink": {
    "type": "SqlDWSink",
    "allowPolyBase": true,
    "polyBaseSettings":
    {
        "rejectType": "percentage",
        "rejectValue": 10.0,
        "rejectSampleValue": 100,
        "useTypeDefault": true
    }
}

使用 PolyBase 将数据加载到 Azure SQL 数据仓库Use PolyBase to load data into Azure SQL Data Warehouse

使用 PolyBase 是以高吞吐量将大量数据载入 Azure Synapse Analytics 的有效方法。Using PolyBase is an efficient way to load a large amount of data into Azure Synapse Analytics with high throughput. 使用 PolyBase 而非默认 BULKINSERT 机制可以实现吞吐量的巨大增加。You'll see a large gain in the throughput by using PolyBase instead of the default BULKINSERT mechanism.

  • 如果源数据位于 Azure Blob、Azure Data Lake Storage Gen2 中,且格式与 PolyBase 兼容,则可使用复制活动直接调用 PolyBase,让 Azure SQL 数据仓库从源拉取数据。If your source data is in Azure Blob, Azure Data Lake Storage Gen2, and the format is PolyBase compatible, you can use copy activity to directly invoke PolyBase to let Azure SQL Data Warehouse pull the data from source. 有关详细信息,请参阅 使用 PolyBase 直接复制For details, see Direct copy by using PolyBase.
  • 如果 PolyBase 最初不支持源数据存储和格式,请改用 使用 PolyBase 的暂存复制 功能。If your source data store and format isn't originally supported by PolyBase, use the Staged copy by using PolyBase feature instead. 暂存复制功能也能提供更高的吞吐量。The staged copy feature also provides you better throughput. 它自动将数据转换为 PolyBase 兼容的格式,将数据存储在 Azure Blob 存储中,然后调用 PolyBase 将数据载入 SQL 数据仓库。It automatically converts the data into PolyBase-compatible format, stores the data in Azure Blob storage, then calls PolyBase to load data into SQL Data Warehouse.

在复制活动中的 polyBaseSettings 下支持以下 PolyBase 设置:The following PolyBase settings are supported under polyBaseSettings in copy activity:

属性Property 说明Description 必需Required
rejectValuerejectValue 指定在查询失败之前可以拒绝的行数或百分比。Specifies the number or percentage of rows that can be rejected before the query fails.

有关 PolyBase 的拒绝选项的详细信息,请参阅 CREATE EXTERNAL TABLE (Transact-SQL) 的“参数”部分。Learn more about PolyBase's reject options in the Arguments section of CREATE EXTERNAL TABLE (Transact-SQL).

允许的值为 0(默认值)、1、2 等。Allowed values are 0 (default), 1, 2, etc.
No
rejectTyperejectType 指定 rejectValue 选项是文本值还是百分比。Specifies whether the rejectValue option is a literal value or a percentage.

允许的值为 Value(默认值)和 PercentageAllowed values are Value (default) and Percentage.
No
rejectSampleValuerejectSampleValue 确定在 PolyBase 重新计算被拒绝行的百分比之前要检索的行数。Determines the number of rows to retrieve before PolyBase recalculates the percentage of rejected rows.

允许的值为 1、2 等。Allowed values are 1, 2, etc.
如果 rejectTypepercentage,则为“是”Yes, if the rejectType is percentage.
useTypeDefaultuseTypeDefault 指定在 PolyBase 从文本文件中检索数据时如何处理带分隔符的文本文件中的缺失值。Specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.

有关此属性的详细信息,请参阅创建外部文件格式 (Transact SQL) 中的参数部分。Learn more about this property from the Arguments section in CREATE EXTERNAL FILE FORMAT (Transact-SQL).

允许的值为 TrueFalse(默认值)。Allowed values are True and False (default).

No

使用 PolyBase 直接复制Direct copy by using PolyBase

SQL 数据仓库 PolyBase 直接支持 Azure Blob 和 Azure Data Lake Storage Gen2。SQL Data Warehouse PolyBase directly supports Azure Blob, and Azure Data Lake Storage Gen2. 如果源数据满足本部分所述的条件,请使用 PolyBase 从源数据存储直接复制到 Azure Synapse Analytics。If your source data meets the criteria described in this section, use PolyBase to copy directly from the source data store to Azure Synapse Analytics. 否则,请改用使用 PolyBase 的暂存复制Otherwise, use Staged copy by using PolyBase.

如果不满足要求,Azure 数据工厂会检查设置,并自动回退到 BULKINSERT 机制以进行数据移动。If the requirements aren't met, Azure Data Factory checks the settings and automatically falls back to the BULKINSERT mechanism for the data movement.

  1. 源链接的服务使用以下类型和身份验证方法:The source linked service is with the following types and authentication methods:

    支持的源数据存储类型Supported source data store type 支持的源身份验证类型Supported source authentication type
    Azure BlobAzure Blob 帐户密钥身份验证, 托管标识身份验证Account key authentication, managed identity authentication
    Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2 帐户密钥身份验证, 托管标识身份验证Account key authentication, managed identity authentication

    Important

    如果 Azure 存储配置了 VNet 服务终结点,则必须使用托管标识身份验证 - 请参阅将 VNet 服务终结点与 Azure 存储配合使用的影响If your Azure Storage is configured with VNet service endpoint, you must use managed identity authentication - refer to Impact of using VNet Service Endpoints with Azure storage. 分别从 Azure Blob - 托管标识身份验证Azure Data Lake Storage Gen2 - 托管标识身份验证部分了解数据工厂中所需的配置。Learn the required configurations in Data Factory from Azure Blob - managed identity authentication and Azure Data Lake Storage Gen2 - managed identity authentication section respectively.

  2. 源数据格式ParquetORC 或“分隔文本”, 使用以下配置:The source data format is of Parquet, ORC, or Delimited text, with the following configurations:

    1. 文件夹路径不包含通配符筛选器。Folder path doesn't contain wildcard filter.
    2. 文件名为空或指向单个文件。File name is empty, or points to a single file. 如果在复制活动中指定通配符文件名,该通配符只能是 **.*If you specify wildcard file name in copy activity, it can only be * or *.*.
    3. rowDelimiterdefault\n\r\n\rrowDelimiter is default, \n, \r\n, or \r.
    4. nullValue 保留为默认值或设置为“空字符串”(“”),并将 treatEmptyAsNull 保留为默认值或设置为 true 。nullValue is left as default or set to empty string (""), and treatEmptyAsNull is left as default or set to true.
    5. encodingName 保留为默认值或设置为 utf-8encodingName is left as default or set to utf-8.
    6. quoteCharescapeCharskipLineCount 未指定。quoteChar, escapeChar, and skipLineCount aren't specified. PolyBase 支持跳过可以在 ADF 中配置为 firstRowAsHeader 的标头行。PolyBase support skip header row, which can be configured as firstRowAsHeader in ADF.
    7. compression 可为无压缩GZipDeflatecompression can be no compression, GZip, or Deflate.
  3. 如果源为文件夹,则必须将复制活动中的 recursive 设置为 true。If your source is a folder, recursive in copy activity must be set to true.

  4. 未指定 wildcardFolderPathwildcardFilenamemodifiedDateTimeStartmodifiedDateTimeEndadditionalColumnswildcardFolderPath , wildcardFilename, modifiedDateTimeStart, modifiedDateTimeEnd and additionalColumns are not specified.

Note

如果源为文件夹,请注意,PolyBase 会从文件夹及其所有子文件夹中检索文件,但不从文件名以下划线 () 或句点 (.) 开头的文件中检索数据,详见此文档 - LOCATION 参数If your source is a folder, note PolyBase retrieves files from the folder and all of its subfolders, and it doesn't retrieve data from files for which the file name begins with an underline () or a period (.), as documented here - LOCATION argument.

"activities":[
    {
        "name": "CopyFromAzureBlobToSQLDataWarehouseViaPolyBase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "ParquetDataset",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "AzureSQLDWDataset",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "ParquetSource",
                "storeSettings":{
                    "type": "AzureBlobStorageReadSettings",
                    "recursive": true
                }
            },
            "sink": {
                "type": "SqlDWSink",
                "allowPolyBase": true
            }
        }
    }
]

使用 PolyBase 的暂存复制Staged copy by using PolyBase

如果源数据原生与 PolyBase 不兼容,请通过暂存 Azure Blob 存储实例(不能是 Azure 高级存储)启用数据复制。When your source data is not natively compatible with PolyBase, enable data copying via an interim staging Azure Blob storage instance (it can't be Azure Premium Storage). 在这种情况下,Azure 数据工厂会自动转换数据,以满足 PolyBase 的数据格式要求。In this case, Azure Data Factory automatically converts the data to meet the data format requirements of PolyBase. 然后,它调用 PolyBase 将数据载入 SQL 数据仓库。Then it invokes PolyBase to load data into SQL Data Warehouse. 最后,它会从 Blob 存储中清理临时数据。Finally, it cleans up your temporary data from the blob storage. 有关通过暂存 Azure Blob 存储实例复制数据的详细信息,请参阅暂存复制See Staged copy for details about copying data via a staging Azure Blob storage instance.

若要使用此功能,请创建 Azure Blob 存储链接服务,该服务引用使用临时 blob 存储的 Azure 存储帐户。To use this feature, create an Azure Blob Storage linked service that refers to the Azure storage account with the interim blob storage. 然后为复制活动指定 enableStagingstagingSettings 属性,如以下代码所示。Then specify the enableStaging and stagingSettings properties for the Copy Activity as shown in the following code.

Important

如果临时 Azure 存储配置了 VNet 服务终结点,则必须使用托管标识身份验证 - 请参阅将 VNet 服务终结点与 Azure 存储配合使用的影响If your staging Azure Storage is configured with VNet service endpoint, you must use managed identity authentication - refer to Impact of using VNet Service Endpoints with Azure storage. Azure Blob - 托管标识身份验证了解数据工厂中所需的配置。Learn the required configurations in Data Factory from Azure Blob - managed identity authentication.

"activities":[
    {
        "name": "CopyFromSQLServerToSQLDataWarehouseViaPolyBase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "SQLServerDataset",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "AzureSQLDWDataset",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlSource",
            },
            "sink": {
                "type": "SqlDWSink",
                "allowPolyBase": true
            },
            "enableStaging": true,
            "stagingSettings": {
                "linkedServiceName": {
                    "referenceName": "MyStagingBlob",
                    "type": "LinkedServiceReference"
                }
            }
        }
    }
]

有关使用 PolyBase 的最佳做法Best practices for using PolyBase

除了 Azure Synapse Analytics 的最佳做法中提到的最佳做法以外,以下部分提供了其他最佳做法。The following sections provide best practices in addition to those mentioned in Best practices for Azure Synapse Analytics.

所需数据库权限Required database permission

若要使用 PolyBase,将数据加载到 SQL 数据仓库的用户必须对目标数据库拥有“CONTROL”权限To use PolyBase, the user that loads data into SQL Data Warehouse must have "CONTROL" permission on the target database. 一种实现方法是将该用户添加为 db_owner 角色的成员。One way to achieve that is to add the user as a member of the db_owner role. SQL 数据仓库概述中了解如何执行该操作。Learn how to do that in the SQL Data Warehouse overview.

行大小和数据类型限制Row size and data type limits

PolyBase 负载限制为小于 1 MB 的行。PolyBase loads are limited to rows smaller than 1 MB. 不能用它加载到 VARCHR(MAX)、NVARCHAR 或 VARBINARY(MAX)。It cannot be used to load to VARCHR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). 有关详细信息,请参阅 SQL 数据仓库服务容量限制For more information, see SQL Data Warehouse service capacity limits.

如果数据源中的行大于 1 MB,可能需要将源表垂直拆分为多个小型表。When your source data has rows greater than 1 MB, you might want to vertically split the source tables into several small ones. 确保每行的最大大小不超过该限制。Make sure that the largest size of each row doesn't exceed the limit. 然后,可以使用 PolyBase 加载这些小型表,并在 Azure Synapse Analytics 中将它们合并在一起。The smaller tables can then be loaded by using PolyBase and merged together in Azure Synapse Analytics.

另外,对于列这样宽的数据,可以使用非 PolyBase 通过 ADF 来加载数据,只需关闭“允许 PolyBase”设置即可。Alternatively, for data with such wide columns, you can use non-PolyBase to load the data using ADF, by turning off "allow PolyBase" setting.

SQL 数据仓库资源类SQL Data Warehouse resource class

若要实现最佳吞吐量,请将更大的资源类分配给通过 PolyBase 在 SQL 数据仓库中加载数据的用户。To achieve the best possible throughput, assign a larger resource class to the user that loads data into SQL Data Warehouse via PolyBase.

排查 PolyBase 问题PolyBase troubleshooting

加载到“小数”列Loading to Decimal column

如果源数据为文本格式,或者位于其他非 PolyBase 兼容存储(使用暂存复制和 PolyBase)中,并且包含的空值需加载到 SQL 数据仓库“小数”列中,则可能出现以下错误:If your source data is in text format or other non-PolyBase compatible stores (using staged copy and PolyBase), and it contains empty value to be loaded into SQL Data Warehouse Decimal column, you may hit the following error:

ErrorCode=FailedDbOperation, ......HadoopSqlException: Error converting data type VARCHAR to DECIMAL.....Detailed Message=Empty string can't be converted to DECIMAL.....

解决方案是在复制活动接收器 -> PolyBase 设置中取消选中“使用类型默认值”选项(为 false)。 The solution is to unselect "Use type default" option (as false) in copy activity sink -> PolyBase settings. USE_TYPE_DEFAULT”是 PolyBase 本机配置,用于指定 PolyBase 从文本文件检索数据时如何处理分隔文本文件中的缺失值。"USE_TYPE_DEFAULT" is a PolyBase native configuration, which specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.

Azure Synapse Analytics 中的 tableNametableName in Azure Synapse Analytics

下表举例说明如何在 JSON 数据集中指定 tableName 属性。The following table gives examples of how to specify the tableName property in the JSON dataset. 其中显示了架构和表名称的多个组合。It shows several combinations of schema and table names.

DB 架构DB Schema 表名称Table name tableName JSON 属性tableName JSON property
dbodbo MyTableMyTable MyTable 或 dbo.MyTable 或 [dbo].[MyTable]MyTable or dbo.MyTable or [dbo].[MyTable]
dbo1dbo1 MyTableMyTable dbo1.MyTable 或 [dbo1].[MyTable]dbo1.MyTable or [dbo1].[MyTable]
dbodbo My.TableMy.Table [My.Table] 或 [dbo].[My.Table][My.Table] or [dbo].[My.Table]
dbo1dbo1 My.TableMy.Table [dbo1].[My.Table][dbo1].[My.Table]

如果看到以下错误,问题可能与为 tableName 属性指定的值有关。If you see the following error, the problem might be the value you specified for the tableName property. 有关为 tableName JSON 属性指定值的正确方法,请参阅上表。See the preceding table for the correct way to specify values for the tableName JSON property.

Type=System.Data.SqlClient.SqlException,Message=Invalid object name 'stg.Account_test'.,Source=.Net SqlClient Data Provider

具有默认值的列Columns with default values

目前,数据工厂中的 PolyBase 功能只接受与目标表中相同数量的列。Currently, the PolyBase feature in Data Factory accepts only the same number of columns as in the target table. 例如,某个表包含四列,其中一列定义了默认值。An example is a table with four columns where one of them is defined with a default value. 输入数据仍需包含四列。The input data still needs to have four columns. 包含三列的输入数据集生成类似于以下消息的错误:A three-column input dataset yields an error similar to the following message:

All columns of the table must be specified in the INSERT BULK statement.

NULL 值是特殊形式的默认值。The NULL value is a special form of the default value. 如果列可为 null,则该列的 Blob 中的输入数据可能为空。If the column is nullable, the input data in the blob for that column might be empty. 但输入数据集中不能缺少该数据。But it can't be missing from the input dataset. PolyBase 在 Azure Synapse Analytics 中插入 NULL 来表示缺少的值。PolyBase inserts NULL for missing values in Azure Synapse Analytics.

使用 COPY 语句将数据载入 Azure SQL 数据仓库(预览版)Use COPY statement to load data into Azure SQL Data Warehouse (preview)

SQL 数据仓库 COPY 语句(预览版)直接支持从 Azure Blob 和 Azure Data Lake Storage Gen2 加载数据。SQL Data Warehouse COPY statement (preview) directly supports loading data from Azure Blob and Azure Data Lake Storage Gen2. 如果源数据符合本部分所述的条件,则你可以选择使用 ADF 中的 COPY 语句将数据载入 Azure SQL 数据仓库。If your source data meets the criteria described in this section, you can choose to use COPY statement in ADF to load data into Azure SQL Data Warehouse. Azure 数据工厂将检查设置,如果不符合条件,复制活动运行将会失败。Azure Data Factory checks the settings and fails the copy activity run if the criteria is not met.

Note

目前,数据工厂仅支持从下面所述的 COPY 语句兼容源进行复制。Currently Data Factory only support copy from COPY statement compatible sources mentioned below.

使用 COPY 语句可支持以下配置:Using COPY statement supports the following configuration:

  1. 源链接服务和格式使用以下类型和身份验证方法:The source linked service and format are with the following types and authentication methods:

    支持的源数据存储类型Supported source data store type 支持的格式Supported format 支持的源身份验证类型Supported source authentication type
    Azure BlobAzure Blob 带分隔符的文本Delimited text 帐户密钥身份验证、共享访问签名身份验证、服务主体身份验证、托管标识身份验证Account key authentication, shared access signature authentication, service principal authentication, managed identity authentication
      ParquetParquet 帐户密钥身份验证、共享访问签名身份验证Account key authentication, shared access signature authentication
      ORCORC 帐户密钥身份验证、共享访问签名身份验证Account key authentication, shared access signature authentication
    Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2 带分隔符的文本Delimited text
    ParquetParquet
    ORCORC
    帐户密钥身份验证、服务主体身份验证、托管标识身份验证Account key authentication, service principal authentication, managed identity authentication

    Important

    如果 Azure 存储配置了 VNet 服务终结点,则必须使用托管标识身份验证 - 请参阅将 VNet 服务终结点与 Azure 存储配合使用的影响If your Azure Storage is configured with VNet service endpoint, you must use managed identity authentication - refer to Impact of using VNet Service Endpoints with Azure storage. 分别从 Azure Blob - 托管标识身份验证Azure Data Lake Storage Gen2 - 托管标识身份验证部分了解数据工厂中所需的配置。Learn the required configurations in Data Factory from Azure Blob - managed identity authentication and Azure Data Lake Storage Gen2 - managed identity authentication section respectively.

  2. 格式设置如下:Format settings are with the following:

    1. 对于“Parquet”:compression 可以是“无压缩”、“Snappy”或“GZip”。 For Parquet: compression can be no compression, Snappy, or GZip.
    2. 对于“ORC”:compression 可以是“无压缩”、“zlib”或“Snappy”。 For ORC: compression can be no compression, zlib, or Snappy.
    3. 对于“带分隔符的文本”: For Delimited text:
      1. rowDelimiter 显式设置为单字符或“\r\n”,不支持默认值。 rowDelimiter is explicitly set as single character or "\r\n", the default value is not supported.
      2. nullValue 保留默认值或设置为空字符串 ("")。nullValue is left as default or set to empty string ("").
      3. encodingName 保留默认值或设置为 utf-8 或 utf-16encodingName is left as default or set to utf-8 or utf-16.
      4. escapeChar 必须与 quoteChar 相同,且不能为空。escapeChar must be same as quoteChar, and is not empty.
      5. skipLineCount 保留默认值或设置为 0。skipLineCount is left as default or set to 0.
      6. compression 可以是“无压缩”或“GZip”。 compression can be no compression or GZip.
  3. 如果源为文件夹,则必须将复制活动中的 recursive 设置为 true。If your source is a folder, recursive in copy activity must be set to true.

  4. 未指定 wildcardFolderPathwildcardFilenamemodifiedDateTimeStartmodifiedDateTimeEndadditionalColumnswildcardFolderPath , wildcardFilename, modifiedDateTimeStart, modifiedDateTimeEnd and additionalColumns are not specified.

在复制活动中的 allowCopyCommand 下支持以下 COPY 语句设置:The following COPY statement settings are supported under allowCopyCommand in copy activity:

属性Property 说明Description 必需Required
defaultValuesdefaultValues 为 SQL 数据仓库中的每个目标列指定默认值。Specifies the default values for each target column in SQL DW. 属性中的默认值将覆盖数据仓库中设置的 DEFAULT 约束,标识列不能有默认值。The default values in the property overwrite the DEFAULT constraint set in the data warehouse, and identity column cannot have a default value. No
additionalOptionsadditionalOptions 将直接在 COPY 语句的“With”子句中传递给 SQL DW COPY 语句的其他选项。Additional options that will be passed to SQL DW COPY statement directly in "With" clause in COPY statement. 根据需要将值括在引号中,以符合 COPY 语句要求。Quote the value as needed to align with the COPY statement requirements. No
"activities":[
    {
        "name": "CopyFromAzureBlobToSQLDataWarehouseViaCOPY",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "ParquetDataset",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "AzureSQLDWDataset",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "ParquetSource",
                "storeSettings":{
                    "type": "AzureBlobStorageReadSettings",
                    "recursive": true
                }
            },
            "sink": {
                "type": "SqlDWSink",
                "allowCopyCommand": true, 
                "copyCommandSettings": {
                    "defaultValues": [ 
                        { 
                            "columnName": "col_string", 
                            "defaultValue": "DefaultStringValue" 
                        }
                    ],
                    "additionalOptions": { 
                        "MAXERRORS": "10000", 
                        "DATEFORMAT": "'ymd'" 
                    }
                }
            },
            "enableSkipIncompatibleRow": true
        }
    }
]

查找活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

GetMetadata 活动属性GetMetadata activity properties

若要了解有关属性的详细信息,请查看 GetMetadata 活动To learn details about the properties, check GetMetadata activity

Azure SQL 数据仓库的数据类型映射Data type mapping for Azure SQL Data Warehouse

Azure Synapse Analytics 的数据类型映射Data type mapping for Azure Synapse Analytics

从/向 Azure Synapse Analytics 复制数据时,以下映射用于从 Azure Synapse Analytics 数据类型映射到 Azure 数据工厂临时数据类型。When you copy data from or to Azure Synapse Analytics, the following mappings are used from Azure Synapse Analytics data types to Azure Data Factory interim data types. 若要了解复制活动如何将源架构和数据类型映射到接收器,请参阅架构和数据类型映射See schema and data type mappings to learn how Copy Activity maps the source schema and data type to the sink.

Tip

请参阅 Azure Synapse Analytics 中的表数据类型一文,了解 SQL 数据仓库支持的数据类型和对于不支持数据类型的解决方法。Refer to Table data types in Azure Synapse Analytics article on SQL DW supported data types and the workarounds for unsupported ones.

Azure Synapse Analytics 数据类型Azure Synapse Analytics data type 数据工厂临时数据类型Data Factory interim data type
bigintbigint Int64Int64
binarybinary Byte[]Byte[]
bitbit BooleanBoolean
charchar String, Char[]String, Char[]
datedate DateTimeDateTime
datetimeDatetime DateTimeDateTime
datetime2datetime2 DateTimeDateTime
DatetimeoffsetDatetimeoffset DateTimeOffsetDateTimeOffset
DecimalDecimal DecimalDecimal
FILESTREAM attribute (varbinary(max))FILESTREAM attribute (varbinary(max)) Byte[]Byte[]
FloatFloat DoubleDouble
imageimage Byte[]Byte[]
intint Int32Int32
moneymoney DecimalDecimal
ncharnchar String, Char[]String, Char[]
numericnumeric DecimalDecimal
nvarcharnvarchar String, Char[]String, Char[]
realreal SingleSingle
rowversionrowversion Byte[]Byte[]
smalldatetimesmalldatetime DateTimeDateTime
smallintsmallint Int16Int16
smallmoneysmallmoney DecimalDecimal
timetime TimeSpanTimeSpan
tinyinttinyint ByteByte
uniqueidentifieruniqueidentifier GuidGuid
varbinaryvarbinary Byte[]Byte[]
varcharvarchar String, Char[]String, Char[]

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持用作源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores supported as sources and sinks by Copy Activity in Azure Data Factory, see supported data stores and formats.