使用 Azure 数据工厂在 Azure SQL 托管实例中复制和转换数据Copy and transform data in Azure SQL Managed Instance by using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics

本文概述如何使用 Azure 数据工厂中的“复制活动”功能从 Azure SQL 托管实例复制数据或将数据复制到其中,并使用“数据流”转换 Azure SQL 托管实例中的数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from and to Azure SQL Managed Instance, and use Data Flow to transform data in Azure SQL Managed Instance. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

支持的功能Supported capabilities

以下活动支持此 SQL 托管实例连接器:This SQL Managed Instance connector is supported for the following activities:

对于复制活动,此 Azure SQL 数据库连接器支持以下功能:For Copy activity, this Azure SQL Database connector supports these functions:

  • 将 SQL 身份验证和 Azure Active Directory (Azure AD) 应用程序令牌身份验证与服务主体或 Azure 资源的托管标识配合使用来复制数据。Copying data by using SQL authentication and Azure Active Directory (Azure AD) Application token authentication with a service principal or managed identities for Azure resources.
  • 作为源,使用 SQL 查询或存储过程检索数据。As a source, retrieving data by using a SQL query or a stored procedure. 还可选择从 SQL MI 源进行并行复制。有关详细信息,请参阅从 SQL MI 进行并行复制部分。You can also choose to parallel copy from SQL MI source, see the Parallel copy from SQL MI section for details.
  • 作为接收器,根据源架构自动创建目标表(如果不存在);在复制过程中,使用自定义逻辑将数据追加到表或调用存储过程。As a sink, automatically creating destination table if not exists based on the source schema; appending data to a table or invoking a stored procedure with custom logic during copy.

备注

目前此连接器不支持 SQL 托管实例 Always EncryptedSQL Managed Instance Always Encrypted isn't supported by this connector now. 为了解决此问题,可以通过自承载 Integration Runtime 使用泛型 ODBC 连接器和 SQL Server ODBC 驱动程序。To work around, you can use a generic ODBC connector and a SQL Server ODBC driver via a self-hosted integration runtime. 使用 Always Encrypted 部分了解更多信息。Learn more from Using Always Encrypted section.

先决条件Prerequisites

若要访问 SQL 托管实例公共终结点,可以使用 Azure 数据工厂管理的 Azure Integration Runtime。To access the SQL Managed Instance public endpoint, you can use an Azure Data Factory managed Azure integration runtime. 确保启用公共终结点,并在网络安全组中允许公共终结点流量,使 Azure 数据工厂能够连接到你的数据库。Make sure that you enable the public endpoint and also allow public endpoint traffic on the network security group so that Azure Data Factory can connect to your database. 有关详细信息,请参阅此指南For more information, see this guidance.

若要访问 SQL 托管实例专用终结点,请设置一个能够访问数据库的自承载集成运行时To access the SQL Managed Instance private endpoint, set up a self-hosted integration runtime that can access the database. 如果预配托管实例所在的虚拟网络中的自承载集成运行时,请确保集成运行时计算机与托管实例位于不同的子网中。If you provision the self-hosted integration runtime in the same virtual network as your managed instance, make sure that your integration runtime machine is in a different subnet than your managed instance. 如果预配的自承载集成运行时与托管实例位于不同的虚拟网络中,则可使用虚拟网络对等互连或虚拟网络间的连接。If you provision your self-hosted integration runtime in a different virtual network than your managed instance, you can use either a virtual network peering or a virtual network to virtual network connection. 有关详细信息,请参阅将应用程序连接到 SQL 托管实例For more information, see Connect your application to SQL Managed Instance.

入门Get started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

以下部分详述了用来定义特定于 SQL 托管实例连接器的 Azure 数据工厂实体的属性。The following sections provide details about properties that are used to define Azure Data Factory entities specific to the SQL Managed Instance connector.

链接服务属性Linked service properties

SQL 托管实例链接服务支持以下属性:The following properties are supported for the SQL Managed Instance linked service:

属性Property 描述Description 必需Required
typetype type 属性必须设置为 AzureSqlMIThe type property must be set to AzureSqlMI. Yes
connectionStringconnectionString 此属性指定通过 SQL 身份验证连接到 SQL 托管实例时所需的 connectionString 信息。This property specifies the connectionString information that's needed to connect to SQL Managed Instance by using SQL authentication. 有关详细信息,请参阅以下示例。For more information, see the following examples.
默认端口为 1433。The default port is 1433. 如果将 SQL 托管实例与公共终结点配合使用,请显式指定端口 3342。If you're using SQL Managed Instance with a public endpoint, explicitly specify port 3342.
还可以在 Azure Key Vault 中输入密码。You also can put a password in Azure Key Vault. 如果使用 SQL 身份验证,请从连接字符串中提取 password 配置。If it's SQL authentication, pull the password configuration out of the connection string. 有关详细信息,请参阅表格后面的 JSON 示例,以及在 Azure Key Vault 中存储凭据For more information, see the JSON example following the table and Store credentials in Azure Key Vault.
Yes
servicePrincipalIdservicePrincipalId 指定应用程序的客户端 ID。Specify the application's client ID. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的Yes, when you use Azure AD authentication with a service principal
servicePrincipalKeyservicePrincipalKey 指定应用程序的密钥。Specify the application's key. 将此字段标记为 SecureString,以安全地将其存储在 Azure 数据工厂中或 引用存储在 Azure Key Vault 中的机密Mark this field as SecureString to store it securely in Azure Data Factory or reference a secret stored in Azure Key Vault. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的Yes, when you use Azure AD authentication with a service principal
tenanttenant 指定应用程序所在的租户的信息(例如域名或租户 ID)。Specify the tenant information, like the domain name or tenant ID, under which your application resides. 将鼠标悬停在 Azure 门户右上角进行检索。Retrieve it by hovering the mouse in the upper-right corner of the Azure portal. 是,将 Azure AD 身份验证与服务主体配合使用时是必需的Yes, when you use Azure AD authentication with a service principal
azureCloudTypeazureCloudType 对于服务主体身份验证,请指定 Azure AD 应用程序注册到的 Azure 云环境的类型。For service principal authentication, specify the type of Azure cloud environment to which your Azure AD application is registered.
允许的值为“AzureChina”。Allowed value is AzureChina. 默认情况下,使用数据工厂的云环境。By default, the data factory's cloud environment is used.
No
connectViaconnectVia 集成运行时用于连接到数据存储。This integration runtime is used to connect to the data store. 如果托管实例有公共终结点且允许 Azure 数据工厂进行访问,则可使用自承载集成运行时或 Azure Integration Runtime。You can use a self-hosted integration runtime or an Azure integration runtime if your managed instance has a public endpoint and allows Azure Data Factory to access it. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, the default Azure integration runtime is used. Yes

有关各种身份验证类型,请参阅关于先决条件和 JSON 示例的以下各部分:For different authentication types, refer to the following sections on prerequisites and JSON samples, respectively:

SQL 身份验证SQL authentication

示例 1:使用 SQL 身份验证Example 1: use SQL authentication

{
    "name": "AzureSqlMILinkedService",
    "properties": {
        "type": "AzureSqlMI",
        "typeProperties": {
            "connectionString": "Data Source=<hostname,port>;Initial Catalog=<databasename>;Integrated Security=False;User ID=<username>;Password=<password>;"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例 2:将 SQL 身份验证与 Azure Key Vault 中的密码结合使用Example 2: use SQL authentication with a password in Azure Key Vault

{
    "name": "AzureSqlMILinkedService",
    "properties": {
        "type": "AzureSqlMI",
        "typeProperties": {
            "connectionString": "Data Source=<hostname,port>;Initial Catalog=<databasename>;Integrated Security=False;User ID=<username>;",
            "password": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

服务主体身份验证Service principal authentication

要使用基于服务主体的 Azure AD 应用程序令牌身份验证,请执行以下步骤:To use a service principal-based Azure AD application token authentication, follow these steps:

  1. 按步骤为托管实例预配 Azure Active Directory 管理员Follow the steps to Provision an Azure Active Directory administrator for your Managed Instance.

  2. 从 Azure 门户创建 Azure Active Directory 应用程序Create an Azure Active Directory application from the Azure portal. 记下应用程序名称,以及以下定义链接服务的值:Make note of the application name and the following values that define the linked service:

    • 应用程序 IDApplication ID
    • 应用程序密钥Application key
    • 租户 IDTenant ID
  3. 为 Azure 数据工厂托管标识创建登录名Create logins for the Azure Data Factory managed identity. 在 SQL Server Management Studio (SSMS) 中,使用作为 sysadmin 的 SQL Server 帐户连接到托管实例。In SQL Server Management Studio (SSMS), connect to your managed instance using a SQL Server account that is a sysadmin. master 数据库中,运行以下 T-SQL:In master database, run the following T-SQL:

    CREATE LOGIN [your application name] FROM EXTERNAL PROVIDER
    
  4. 为 Azure 数据工厂托管标识创建包含的数据库用户Create contained database users for the Azure Data Factory managed identity. 连接到要从其复制数据或将数据复制到其中的数据库,运行以下 T-SQL:Connect to the database from or to which you want to copy data, run the following T-SQL:

    CREATE USER [your application name] FROM EXTERNAL PROVIDER
    
  5. 像通常对 SQL 用户和其他用户所做的那样向数据工厂托管标识授予所需的权限。Grant the Data Factory managed identity needed permissions as you normally do for SQL users and others. 运行以下代码。Run the following code. 有关更多选项,请参阅此文档For more options, see this document.

    ALTER ROLE [role name e.g. db_owner] ADD MEMBER [your application name]
    
  6. 在 Azure 数据工厂中配置 SQL 托管实例链接服务。Configure a SQL Managed Instance linked service in Azure Data Factory.

示例:使用服务主体身份验证Example: use service principal authentication

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlMI",
        "typeProperties": {
            "connectionString": "Data Source=<hostname,port>;Initial Catalog=<databasename>;",
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenant": "<tenant info, e.g. microsoft.partner.onmschina.cn>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Azure 资源托管标识身份验证Managed identities for Azure resources authentication

可将数据工厂与代表此特定数据工厂的 Azure 资源托管标识相关联。A data factory can be associated with a managed identity for Azure resources that represents the specific data factory. 可将此托管标识用于 SQL 托管实例身份验证。You can use this managed identity for SQL Managed Instance authentication. 指定的工厂可以使用此标识访问数据库数据或从/向数据库复制数据。The designated factory can access and copy data from or to your database by using this identity.

若要使用托管标识身份验证,请执行以下步骤:To use managed identity authentication, follow these steps.

  1. 按步骤为托管实例预配 Azure Active Directory 管理员Follow the steps to Provision an Azure Active Directory administrator for your Managed Instance.

  2. 为 Azure 数据工厂托管标识创建登录名Create logins for the Azure Data Factory managed identity. 在 SQL Server Management Studio (SSMS) 中,使用作为 sysadmin 的 SQL Server 帐户连接到托管实例。In SQL Server Management Studio (SSMS), connect to your managed instance using a SQL Server account that is a sysadmin. master 数据库中,运行以下 T-SQL:In master database, run the following T-SQL:

    CREATE LOGIN [your Data Factory name] FROM EXTERNAL PROVIDER
    
  3. 为 Azure 数据工厂托管标识创建包含的数据库用户Create contained database users for the Azure Data Factory managed identity. 连接到要从其复制数据或将数据复制到其中的数据库,运行以下 T-SQL:Connect to the database from or to which you want to copy data, run the following T-SQL:

    CREATE USER [your Data Factory name] FROM EXTERNAL PROVIDER
    
  4. 像通常对 SQL 用户和其他用户所做的那样向数据工厂托管标识授予所需的权限。Grant the Data Factory managed identity needed permissions as you normally do for SQL users and others. 运行以下代码。Run the following code. 有关更多选项,请参阅此文档For more options, see this document.

    ALTER ROLE [role name e.g. db_owner] ADD MEMBER [your Data Factory name]
    
  5. 在 Azure 数据工厂中配置 SQL 托管实例链接服务。Configure a SQL Managed Instance linked service in Azure Data Factory.

示例:使用托管标识身份验证Example: uses managed identity authentication

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlMI",
        "typeProperties": {
            "connectionString": "Data Source=<hostname,port>;Initial Catalog=<databasename>;"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

如需可用于定义数据集的节和属性的完整列表,请参阅有关数据集的文章。For a full list of sections and properties available for use to define datasets, see the datasets article. 此部分提供 SQL 托管实例数据集支持的属性列表。This section provides a list of properties supported by the SQL Managed Instance dataset.

若要从/向 SQL 托管实例复制数据,以下属性需受支持:To copy data to and from SQL Managed Instance, the following properties are supported:

属性Property 描述Description 必需Required
typetype 数据集的 type 属性必须设置为 AzureSqlMITable。The type property of the dataset must be set to AzureSqlMITable. Yes
schemaschema 架构的名称。Name of the schema. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink
table 表/视图的名称。Name of the table/view. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink
tableNametableName 具有架构的表/视图的名称。Name of the table/view with schema. 此属性支持后向兼容性。This property is supported for backward compatibility. 对于新的工作负荷,请使用 schematableFor new workload, use schema and table. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink

示例Example

{
    "name": "AzureSqlMIDataset",
    "properties":
    {
        "type": "AzureSqlMITable",
        "linkedServiceName": {
            "referenceName": "<SQL Managed Instance linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, retrievable during authoring > ],
        "typeProperties": {
            "schema": "<schema_name>",
            "table": "<table_name>"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的节和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for use to define activities, see the Pipelines article. 此部分提供 SQL 托管实例源和和接收器支持的属性列表。This section provides a list of properties supported by the SQL Managed Instance source and sink.

以 SQL 托管实例作为源SQL Managed Instance as a source

提示

若要使用数据分区从 SQL MI 高效地加载数据,请参阅从 SQL MI 进行并行复制To load data from SQL MI efficiently by using data partitioning, learn more from Parallel copy from SQL MI.

若要从 SQL 托管实例复制数据,复制活动的 source 节需要支持以下属性:To copy data from SQL Managed Instance, the following properties are supported in the copy activity source section:

属性Property 描述Description 必需Required
typetype 复制活动源的 type 属性必须设置为 SqlMISourceThe type property of the copy activity source must be set to SqlMISource. Yes
sqlReaderQuerysqlReaderQuery 此属性使用自定义 SQL 查询来读取数据。This property uses the custom SQL query to read data. 例如 select * from MyTableAn example is select * from MyTable. No
sqlReaderStoredProcedureNamesqlReaderStoredProcedureName 此属性是从源表读取数据的存储过程的名称。This property is the name of the stored procedure that reads data from the source table. 最后一个 SQL 语句必须是存储过程中的 SELECT 语句。The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParametersstoredProcedureParameters 这些参数用于存储过程。These parameters are for the stored procedure.
允许的值为名称或值对。Allowed values are name or value pairs. 参数的名称和大小写必须与存储过程参数的名称和大小写匹配。The names and casing of the parameters must match the names and casing of the stored procedure parameters.
No
isolationLevelisolationLevel 指定 SQL 源的事务锁定行为。Specifies the transaction locking behavior for the SQL source. 允许的值为:ReadCommitted、ReadUncommitted、RepeatableRead、Serializable、Snapshot 。The allowed values are: ReadCommitted, ReadUncommitted, RepeatableRead, Serializable, Snapshot. 如果未指定,则使用数据库的默认隔离级别。If not specified, the database's default isolation level is used. 请参阅此文档了解更多详细信息。Refer to this doc for more details. No
partitionOptionspartitionOptions 指定用于从 SQL MI 加载数据的数据分区选项。Specifies the data partitioning options used to load data from SQL MI.
允许值包括:None(默认值)、PhysicalPartitionsOfTable 和 DynamicRange 。Allowed values are: None (default), PhysicalPartitionsOfTable, and DynamicRange.
启用分区选项(即该选项不为 None)时,用于从 SQL MI 并行加载数据的并行度由复制活动上的 parallelCopies 设置控制。When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from SQL MI is controlled by the parallelCopies setting on the copy activity.
No
partitionSettingspartitionSettings 指定数据分区的设置组。Specify the group of the settings for data partitioning.
当分区选项不是 None 时适用。Apply when the partition option isn't None.
No
partitionSettings 下:Under partitionSettings:
partitionColumnNamepartitionColumnName 以整数类型、日期类型或日期/时间类型(intsmallintbigintdatesmalldatetimedatetimedatetime2datetimeoffset)指定源列的名称,范围分区将使用它进行并行复制。Specify the name of the source column in integer or date/datetime type (int, smallint, bigint, date, smalldatetime, datetime, datetime2, or datetimeoffset) that will be used by range partitioning for parallel copy. 如果未指定,系统会自动检测表的索引或主键并将其用作分区列。If not specified, the index or the primary key of the table is auto-detected and used as the partition column.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfDynamicRangePartitionCondition If you use a query to retrieve the source data, hook ?AdfDynamicRangePartitionCondition in the WHERE clause. 有关示例,请参阅从 SQL 数据库进行并行复制部分。For an example, see the Parallel copy from SQL database section.
No
partitionUpperBoundpartitionUpperBound 分区范围拆分的分区列的最大值。The maximum value of the partition column for partition range splitting. 此值用于决定分区步幅,不用于筛选表中的行。This value is used to decide the partition stride, not for filtering the rows in table. 将对表或查询结果中的所有行进行分区和复制。All rows in the table or query result will be partitioned and copied. 如果未指定,复制活动会自动检测该值。If not specified, copy activity auto detect the value.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 有关示例,请参阅从 SQL 数据库进行并行复制部分。For an example, see the Parallel copy from SQL database section.
No
partitionLowerBoundpartitionLowerBound 分区范围拆分的分区列的最小值。The minimum value of the partition column for partition range splitting. 此值用于决定分区步幅,不用于筛选表中的行。This value is used to decide the partition stride, not for filtering the rows in table. 将对表或查询结果中的所有行进行分区和复制。All rows in the table or query result will be partitioned and copied. 如果未指定,复制活动会自动检测该值。If not specified, copy activity auto detect the value.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 有关示例,请参阅从 SQL 数据库进行并行复制部分。For an example, see the Parallel copy from SQL database section.
No

请注意以下几点:Note the following points:

  • 如果为 SqlMISource 指定 sqlReaderQuery,则复制活动会针对 SQL 托管实例源运行此查询以获取数据。If sqlReaderQuery is specified for SqlMISource, the copy activity runs this query against the SQL Managed Instance source to get the data. 也可通过指定 sqlReaderStoredProcedureName 和 storedProcedureParameters 来指定存储过程,前提是存储过程使用参数 。You also can specify a stored procedure by specifying sqlReaderStoredProcedureName and storedProcedureParameters if the stored procedure takes parameters.
  • 在源中使用存储过程检索数据时,请注意,如果存储过程设计为当传入不同的参数值时返回不同的架构,则从 UI 导入架构或使用自动表创建将数据复制到 SQL 数据库时,可能会遇到故障或出现意外的结果。When using stored procedure in source to retrieve data, note if your stored procedure is designed as returning different schema when different parameter value is passed in, you may encounter failure or see unexpected result when importing schema from UI or when copying data to SQL database with auto table creation.

示例:使用 SQL 查询Example: Use a SQL query

"activities":[
    {
        "name": "CopyFromAzureSqlMI",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SQL Managed Instance input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlMISource",
                "sqlReaderQuery": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

示例:使用存储过程Example: Use a stored procedure

"activities":[
    {
        "name": "CopyFromAzureSqlMI",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SQL Managed Instance input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlMISource",
                "sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
                "storedProcedureParameters": {
                    "stringData": { "value": "str3" },
                    "identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

存储过程定义The stored procedure definition

CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
    @stringData varchar(20),
    @identifier int
)
AS
SET NOCOUNT ON;
BEGIN
    select *
    from dbo.UnitTestSrcTable
    where dbo.UnitTestSrcTable.stringData != stringData
    and dbo.UnitTestSrcTable.identifier != identifier
END
GO

以 SQL 托管实例作为接收器SQL Managed Instance as a sink

提示

若要详细了解支持的写入行为、配置和最佳做法,请参阅将数据加载到 SQL 托管实例中的最佳做法Learn more about the supported write behaviors, configurations, and best practices from Best practice for loading data into SQL Managed Instance.

若要将数据复制到 SQL 托管实例,复制活动的 sink 节需要支持以下属性:To copy data to SQL Managed Instance, the following properties are supported in the copy activity sink section:

属性Property 描述Description 必需Required
typetype 复制活动接收器的 type 属性必须设置为 SqlMISinkThe type property of the copy activity sink must be set to SqlMISink. Yes
preCopyScriptpreCopyScript 此属性指定将数据写入到 SQL 托管实例之前要由复制活动运行的 SQL 查询。This property specifies a SQL query for the copy activity to run before writing data into SQL Managed Instance. 每次运行复制仅调用该查询一次。It's invoked only once per copy run. 可以使用此属性清除预加载的数据。You can use this property to clean up preloaded data. No
tableOptiontableOption 指定是否根据源架构自动创建接收器表(如果不存在)。Specifies whether to automatically create the sink table if not exists based on the source schema. 接收器指定存储过程时不支持自动创建表。Auto table creation is not supported when sink specifies stored procedure. 允许的值为:none(默认值)、autoCreateAllowed values are: none (default), autoCreate. No
sqlWriterStoredProcedureNamesqlWriterStoredProcedureName 定义如何将源数据应用于目标表的存储过程的名称。The name of the stored procedure that defines how to apply source data into a target table.
此存储过程由每个批处理调用。This stored procedure is invoked per batch. 若要执行仅运行一次且与源数据无关的操作(例如删除或截断),请使用 preCopyScript 属性。For operations that run only once and have nothing to do with source data, for example, delete or truncate, use the preCopyScript property.
请参阅调用 SQL 接收器的存储过程中的示例。See example from Invoke a stored procedure from a SQL sink.
No
storedProcedureTableTypeParameterNamestoredProcedureTableTypeParameterName 存储过程中指定的表类型的参数名称。The parameter name of the table type specified in the stored procedure. No
sqlWriterTableTypesqlWriterTableType 要在存储过程中使用的表类型名称。The table type name to be used in the stored procedure. 通过复制活动,使移动数据在具备此表类型的临时表中可用。The copy activity makes the data being moved available in a temp table with this table type. 然后,存储过程代码可合并复制数据和现有数据。Stored procedure code can then merge the data that's being copied with existing data. No
storedProcedureParametersstoredProcedureParameters 存储过程的参数。Parameters for the stored procedure.
允许的值为名称和值对。Allowed values are name and value pairs. 参数的名称和大小写必须与存储过程参数的名称和大小写匹配。Names and casing of parameters must match the names and casing of the stored procedure parameters.
No
writeBatchSizewriteBatchSize 每批要插入到 SQL 表中的行数。Number of rows to insert into the SQL table per batch.
允许的值为表示行数的整数。Allowed values are integers for the number of rows. 默认情况下,Azure 数据工厂会根据行大小动态确定适当的批大小。By default, Azure Data Factory dynamically determines the appropriate batch size based on the row size.
No
writeBatchTimeoutwriteBatchTimeout 此属性指定超时前等待批插入操作完成的时间。This property specifies the wait time for the batch insert operation to complete before it times out.
允许的值是指时间跨度。Allowed values are for the timespan. 例如,“00:30:00”表示 30 分钟。An example is "00:30:00," which is 30 minutes.
No
maxConcurrentConnectionsmaxConcurrentConnections 活动运行期间与数据存储建立的并发连接的数目上限。The upper limit of concurrent connections established to the data store during the activity run. 仅在要限制并发连接数目时指定一个值。Specify a value only when you want to limit concurrent connections. No

示例 1:追加数据Example 1: Append data

"activities":[
    {
        "name": "CopyToAzureSqlMI",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<SQL Managed Instance output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlMISink",
                "tableOption": "autoCreate",
                "writeBatchSize": 100000
            }
        }
    }
]

示例 2:在复制过程中调用存储过程Example 2: Invoke a stored procedure during copy

请参阅调用 SQL MI 接收器的存储过程,了解更多详细信息。Learn more details from Invoke a stored procedure from a SQL MI sink.

"activities":[
    {
        "name": "CopyToAzureSqlMI",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<SQL Managed Instance output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlMISink",
                "sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
                "storedProcedureTableTypeParameterName": "MyTable",
                "sqlWriterTableType": "MyTableType",
                "storedProcedureParameters": {
                    "identifier": { "value": "1", "type": "Int" },
                    "stringData": { "value": "str1" }
                }
            }
        }
    }
]

从 SQL MI 进行并行复制Parallel copy from SQL MI

复制活动中的 Azure SQL 托管实例连接器提供内置的数据分区,用于并行复制数据。The Azure SQL Managed Instance connector in copy activity provides built-in data partitioning to copy data in parallel. 可以在复制活动的“源”表中找到数据分区选项。 You can find data partitioning options on the Source tab of the copy activity.

分区选项的屏幕截图

启用分区复制时,复制活动将对 SQL MI 源运行并行查询,以按分区加载数据。When you enable partitioned copy, copy activity runs parallel queries against your SQL MI source to load data by partitions. 可通过复制活动中的 parallelCopies 设置控制并行度。The parallel degree is controlled by the parallelCopies setting on the copy activity. 例如,如果将 parallelCopies 设置为 4,则数据工厂会根据指定的分区选项和设置并行生成并运行 4 个查询,每个查询从 SQL MI 检索一部分数据。For example, if you set parallelCopies to four, Data Factory concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your SQL MI.

建议同时启用并行复制和数据分区,尤其是从 SQL MI 加载大量数据时。You are suggested to enable parallel copy with data partitioning especially when you load large amount of data from your SQL MI. 下面是适用于不同方案的建议配置。The following are suggested configurations for different scenarios. 将数据复制到基于文件的数据存储中时,建议将数据作为多个文件写入文件夹(仅指定文件夹名称),在这种情况下,性能优于写入单个文件。When copying data into file-based data store, it's recommended to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

方案Scenario 建议的设置Suggested settings
从包含物理分区的大型表进行完整加载。Full load from large table, with physical partitions. 分区选项:表的物理分区。Partition option: Physical partitions of table.

在执行期间,数据工厂将自动检测物理分区并按分区复制数据。During execution, Data Factory automatically detects the physical partitions, and copies data by partitions.

若要检查表是否有物理分区,可参考此查询To check if your table has physical partition or not, you can refer to this query.
从不包含物理分区但包含用于数据分区的整数或日期时间列的大型表进行完整加载。Full load from large table, without physical partitions, while with an integer or datetime column for data partitioning. 分区选项:动态范围分区。Partition options: Dynamic range partition.
分区列(可选):指定用于对数据进行分区的列。Partition column (optional): Specify the column used to partition data. 如果未指定,将使用索引或主键列。If not specified, the index or primary key column is used.
分区上限和分区下限(可选) :指定是否要确定分区步幅。Partition upper bound and partition lower bound (optional): Specify if you want to determine the partition stride. 这不适用于筛选表中的行,表中的所有行都将进行分区和复制。This is not for filtering the rows in table, all rows in the table will be partitioned and copied. 如果未指定,复制活动会自动检测这些值。If not specified, copy activity auto detect the values.

例如,如果分区列“ID”的值范围为 1 至 100,其下限设置为 20、上限设置为 80,并行复制设置为 4,则数据工厂会按 4 个分区检索数据,ID 范围分别为 <=20、[21, 50]、[51, 80] 和 >=81。For example, if your partition column "ID" has values range from 1 to 100, and you set the lower bound as 20 and the upper bound as 80, with parallel copy as 4, Data Factory retrieves data by 4 partitions - IDs in range <=20, [21, 50], [51, 80], and >=81, respectively.
使用自定义查询从不包含物理分区但包含用于数据分区的整数或日期/日期时间列的表加载大量数据。Load a large amount of data by using a custom query, without physical partitions, while with an integer or date/datetime column for data partitioning. 分区选项:动态范围分区。Partition options: Dynamic range partition.
查询SELECT * FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition AND <your_additional_where_clause>Query: SELECT * FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition AND <your_additional_where_clause>.
分区列:指定用于对数据进行分区的列。Partition column: Specify the column used to partition data.
分区上限和分区下限(可选) :指定是否要确定分区步幅。Partition upper bound and partition lower bound (optional): Specify if you want to determine the partition stride. 这不适用于筛选表中的行,查询结果中的所有行都将进行分区和复制。This is not for filtering the rows in table, all rows in the query result will be partitioned and copied. 如果未指定,复制活动会自动检测该值。If not specified, copy activity auto detect the value.

在执行期间,数据工厂会将 ?AdfRangePartitionColumnName 替换为每个分区的实际列名和值范围,并将其发送到 SQL MI。During execution, Data Factory replaces ?AdfRangePartitionColumnName with the actual column name and value ranges for each partition, and sends to SQL MI.
例如,如果分区列“ID”的值范围为 1 至 100,其下限设置为 20、上限设置为 80,并行复制设置为 4,则数据工厂会按 4 个分区检索数据,ID 范围分别为 <=20、[21, 50]、[51, 80] 和 >=81。For example, if your partition column "ID" has values range from 1 to 100, and you set the lower bound as 20 and the upper bound as 80, with parallel copy as 4, Data Factory retrieves data by 4 partitions- IDs in range <=20, [21, 50], [51, 80], and >=81, respectively.

下面是针对不同场景的更多示例查询:Here are more sample queries for different scenarios:
1.查询整个表:1. Query the whole table:
SELECT * FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition
2.使用列选择和附加的 where 子句筛选器从表中查询:2. Query from a table with column selection and additional where-clause filters:
SELECT <column_list> FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition AND <your_additional_where_clause>
3.使用子查询进行查询:3. Query with subqueries:
SELECT <column_list> FROM (<your_sub_query>) AS T WHERE ?AdfDynamicRangePartitionCondition AND <your_additional_where_clause>
4.在子查询中使用分区查询:4. Query with partition in subquery:
SELECT <column_list> FROM (SELECT <your_sub_query_column_list> FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition) AS T

使用分区选项加载数据的最佳做法:Best practices to load data with partition option:

  1. 选择独特的列作为分区列(如主键或唯一键),以避免数据倾斜。Choose distinctive column as partition column (like primary key or unique key) to avoid data skew.
  2. 如果表具有内置分区,请使用名为“表的物理分区”分区选项来提升性能。If the table has built-in partition, use partition option "Physical partitions of table" to get better performance.
  3. 如果使用 Azure Integration Runtime 复制数据,则可设置较大的“数据集成单元 (DIU)”(>4) 以利用更多计算资源。If you use Azure Integration Runtime to copy data, you can set larger "Data Integration Units (DIU)" (>4) to utilize more computing resource. 检查此处适用的方案。Check the applicable scenarios there.
  4. 复制并行度”可控制分区数量,将此数字设置得太大有时会损害性能,建议将此数字设置按以下公式计算的值:(DIU 或自承载 IR 节点数)*(2 到 4)。"Degree of copy parallelism" control the partition numbers, setting this number too large sometime hurts the performance, recommend setting this number as (DIU or number of Self-hosted IR nodes) * (2 to 4).

示例:从包含物理分区的大型表进行完整加载Example: full load from large table with physical partitions

"source": {
    "type": "SqlMISource",
    "partitionOption": "PhysicalPartitionsOfTable"
}

示例:使用动态范围分区进行查询Example: query with dynamic range partition

"source": {
    "type": "SqlMISource",
    "query": "SELECT * FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition AND <your_additional_where_clause>",
    "partitionOption": "DynamicRange",
    "partitionSettings": {
        "partitionColumnName": "<partition_column_name>",
        "partitionUpperBound": "<upper_value_of_partition_column (optional) to decide the partition stride, not as data filter>",
        "partitionLowerBound": "<lower_value_of_partition_column (optional) to decide the partition stride, not as data filter>"
    }
}

检查物理分区的示例查询Sample query to check physical partition

SELECT DISTINCT s.name AS SchemaName, t.name AS TableName, pf.name AS PartitionFunctionName, c.name AS ColumnName, iif(pf.name is null, 'no', 'yes') AS HasPartition
FROM sys.tables AS t
LEFT JOIN sys.objects AS o ON t.object_id = o.object_id
LEFT JOIN sys.schemas AS s ON o.schema_id = s.schema_id
LEFT JOIN sys.indexes AS i ON t.object_id = i.object_id 
LEFT JOIN sys.index_columns AS ic ON ic.partition_ordinal > 0 AND ic.index_id = i.index_id AND ic.object_id = t.object_id 
LEFT JOIN sys.columns AS c ON c.object_id = ic.object_id AND c.column_id = ic.column_id 
LEFT JOIN sys.partition_schemes ps ON i.data_space_id = ps.data_space_id 
LEFT JOIN sys.partition_functions pf ON pf.function_id = ps.function_id 
WHERE s.name='[your schema]' AND t.name = '[your table name]'

如果表具有物理分区,你可看到“HasPartition”为“是”,如下所示。If the table has physical partition, you would see "HasPartition" as "yes" like the following.

SQL 查询结果

将数据加载到 SQL 托管实例中的最佳做法Best practice for loading data into SQL Managed Instance

将数据复制到 SQL 托管实例中时,可能需要不同的写入行为:When you copy data into SQL Managed Instance, you might require different write behavior:

  • 追加:我的源数据只包含新记录。Append: My source data has only new records.
  • 更新插入:我的源数据包含插入和更新内容。Upsert: My source data has both inserts and updates.
  • 覆盖:我需要每次都重新加载整个维度表。Overwrite: I want to reload the entire dimension table each time.
  • 使用自定义逻辑进行写入:在将数据最终插入目标表之前,我需要额外的处理。Write with custom logic: I need extra processing before the final insertion into the destination table.

有关如何在 Azure 数据工厂中进行配置和最佳做法,请参阅相应的部分。See the respective sections for how to configure in Azure Data Factory and best practices.

追加数据Append data

追加数据是此 SQL 托管实例接收器连接器的默认行为。Appending data is the default behavior of the SQL Managed Instance sink connector. Azure 数据工厂执行批量插入,以有效地在表中写入数据。Azure Data Factory does a bulk insert to write to your table efficiently. 可以相应地在复制活动中配置源和接收器。You can configure the source and sink accordingly in the copy activity.

更新插入数据Upsert data

选项 1:当需要复制大量数据时,可以使用复制活动将所有记录大容量加载到一个临时表中,然后运行存储过程活动来一次性应用 MERGE 或 INSERT/UPDATE 语句。Option 1: When you have a large amount of data to copy, you can bulk load all records into a staging table by using the copy activity, then run a stored procedure activity to apply a MERGE or INSERT/UPDATE statement in one shot.

复制活动当前并非原生支持将数据加载到数据库临时表中。Copy activity currently doesn't natively support loading data into a database temporary table. 有一种结合多种活动进行设置的高级方法,请参阅优化 SQL 数据库批量更新插入方案There is an advanced way to set it up with a combination of multiple activities, refer to Optimize SQL Database Bulk Upsert scenarios. 下面显示了使用永久表作为暂存的示例。Below shows a sample of using a permanent table as staging.

例如,在 Azure 数据工厂中,可以使用 复制活动 创建一个管道,并将其与 存储过程活动 相链接。As an example, in Azure Data Factory, you can create a pipeline with a Copy activity chained with a Stored Procedure activity. 前者将数据从源存储复制到数据集中的 Azure SQL 托管实例临时表(例如,表名为“UpsertStagingTable”的表)。The former copies data from your source store into an Azure SQL Managed Instance staging table, for example, UpsertStagingTable, as the table name in the dataset. 然后,后者调用一个存储过程,以将临时表中的源数据合并到目标表中,并清理临时表。Then the latter invokes a stored procedure to merge source data from the staging table into the target table and clean up the staging table.

Upsert

在数据库中使用 MERGE 逻辑定义一个存储过程(如以下示例所示),以便从上述存储过程活动指向该过程。In your database, define a stored procedure with MERGE logic, like the following example, which is pointed to from the previous stored procedure activity. 假设目标是包含三个列的 Marketing 表:ProfileIDStateCategoryAssume that the target is the Marketing table with three columns: ProfileID, State, and Category. 根据 ProfileID 列执行更新插入。Do the upsert based on the ProfileID column.

CREATE PROCEDURE [dbo].[spMergeData]
AS
BEGIN
    MERGE TargetTable AS target
    USING UpsertStagingTable AS source
    ON (target.[ProfileID] = source.[ProfileID])
    WHEN MATCHED THEN
        UPDATE SET State = source.State
    WHEN NOT matched THEN
        INSERT ([ProfileID], [State], [Category])
      VALUES (source.ProfileID, source.State, source.Category);
    
    TRUNCATE TABLE UpsertStagingTable
END

选项 2:可选择在复制活动中调用存储过程Option 2: You can choose to invoke a stored procedure within the copy activity. 这种方法运行源表中的每个批(由 writeBatchSize 属性控制),而不是在复制活动中使用批量插入作为默认方法。This approach runs each batch (as governed by the writeBatchSize property) in the source table instead of using bulk insert as the default approach in the copy activity.

覆盖整个表Overwrite the entire table

可以在复制活动接收器中配置 preCopyScript 属性。You can configure the preCopyScript property in a copy activity sink. 在此情况下,对于运行的每个复制活动,Azure 数据工厂会先运行脚本。In this case, for each copy activity that runs, Azure Data Factory runs the script first. 然后,运行复制来插入数据。Then it runs the copy to insert the data. 例如,若要使用最新数据覆盖整个表,请指定一个脚本,以先删除所有记录,然后从源批量加载新数据。For example, to overwrite the entire table with the latest data, specify a script to first delete all the records before you bulk load the new data from the source.

使用自定义逻辑写入数据Write data with custom logic

使用自定义逻辑写入数据的步骤与更新插入数据部分中的描述类似。The steps to write data with custom logic are similar to those described in the Upsert data section. 如果在将源数据最终插入目标表之前需要应用额外的处理,则可先将数据加载到临时表,然后再调用存储过程活动,或者在复制活动接收器中调用存储过程来应用数据。When you need to apply extra processing before the final insertion of source data into the destination table, you can load to a staging table then invoke stored procedure activity, or invoke a stored procedure in copy activity sink to apply data.

调用 SQL 接收器的存储过程Invoke a stored procedure from a SQL sink

将数据复制到 SQL 托管实例中时,还可以通过对每批的源表使用更多参数来配置和调用用户指定的存储过程。When you copy data into SQL Managed Instance, you also can configure and invoke a user-specified stored procedure with additional parameters on each batch of the source table. 存储过程功能利用表值参数The stored procedure feature takes advantage of table-valued parameters.

当内置复制机制无法使用时,还可使用存储过程。You can use a stored procedure when built-in copy mechanisms don't serve the purpose. 例如,在将源数据最终插入目标表之前应用额外的处理。An example is when you want to apply extra processing before the final insertion of source data into the destination table. 额外处理的示例包括合并列、查找其他值以及将数据插入多个表。Some extra processing examples are when you want to merge columns, look up additional values, and insert into more than one table.

以下示例演示如何使用存储过程,在 SQL Server 数据库中的表内执行 upsert。The following sample shows how to use a stored procedure to do an upsert into a table in the SQL Server database. 假设输入数据和接收器 Marketing 表各有三列:ProfileIDStateCategoryAssume that the input data and the sink Marketing table each have three columns: ProfileID, State, and Category. 基于 ProfileID 列执行更新插入,并仅将其应用于名为“ProductA”的特定类别。Do the upsert based on the ProfileID column, and only apply it for a specific category called "ProductA".

  1. 在数据库中,使用与 sqlWriterTableType 相同的名称定义表类型。In your database, define the table type with the same name as sqlWriterTableType. 表类型的架构与输入数据返回的架构相同。The schema of the table type is the same as the schema returned by your input data.

    CREATE TYPE [dbo].[MarketingType] AS TABLE(
        [ProfileID] [varchar](256) NOT NULL,
        [State] [varchar](256) NOT NULL,
        [Category] [varchar](256) NOT NULL
    )
    
  2. 在数据库中,使用与 sqlWriterStoredProcedureName 相同的名称定义存储过程。In your database, define the stored procedure with the same name as sqlWriterStoredProcedureName. 它可处理来自指定源的输入数据,并将其合并到输出表中。It handles input data from your specified source and merges into the output table. 存储过程中的表类型的参数名称与数据集中定义的 tableName 相同。The parameter name of the table type in the stored procedure is the same as tableName defined in the dataset.

    CREATE PROCEDURE spOverwriteMarketing @Marketing [dbo].[MarketingType] READONLY, @category varchar(256)
    AS
    BEGIN
    MERGE [dbo].[Marketing] AS target
    USING @Marketing AS source
    ON (target.ProfileID = source.ProfileID and target.Category = @category)
    WHEN MATCHED THEN
        UPDATE SET State = source.State
    WHEN NOT MATCHED THEN
        INSERT (ProfileID, State, Category)
        VALUES (source.ProfileID, source.State, source.Category);
    END
    
  3. 在 Azure 数据工厂中,在复制活动中定义 SQL MI sink 节,如下所示:In Azure Data Factory, define the SQL MI sink section in the copy activity as follows:

    "sink": {
        "type": "SqlMISink",
        "sqlWriterStoredProcedureName": "spOverwriteMarketing",
        "storedProcedureTableTypeParameterName": "Marketing",
        "sqlWriterTableType": "MarketingType",
        "storedProcedureParameters": {
            "category": {
                "value": "ProductA"
            }
        }
    }
    

映射数据流属性Mapping data flow properties

在映射数据流中转换数据时,可以在 Azure SQL 托管实例中读取表以及将数据写入表。When transforming data in mapping data flow, you can read and write to tables from Azure SQL Managed Instance. 有关详细信息,请参阅映射数据流中的源转换接收器转换For more information, see the source transformation and sink transformation in mapping data flows.

备注

映射数据流中的 Azure SQL 托管实例连接器当前以公共预览版的形式提供。Azure SQL Managed Instance connector in Mapping Data Flow is currently available as public preview. 你还可以连接到 SQL 托管实例公共终结点,但不能连接到专用终结点。You can connect to SQL Managed Instance public endpoint but not private endpoint yet.

源转换Source transformation

下表列出了 Azure SQL 托管实例源支持的属性。The below table lists the properties supported by Azure SQL Managed Instance source. 你可以在“源选项”选项卡中编辑这些属性。You can edit these properties in the Source options tab.

名称Name 描述Description 必需Required 允许的值Allowed values 数据流脚本属性Data flow script property
Table 如果你选择“表”作为输入,则数据流会从数据集中指定的表提取所有数据。If you select Table as input, data flow fetches all the data from the table specified in the dataset. No - -
查询Query 如果你选择“查询”作为输入,请指定一个用来从源提取数据的 SQL 查询,这将替代在数据集中指定的任何表。If you select Query as input, specify a SQL query to fetch data from source, which overrides any table you specify in dataset. 使用查询是一个好方法,它可以减少用于测试或查找的行数。Using queries is a great way to reduce rows for testing or lookups.

不支持 Order By 子句,但你可以设置完整的 SELECT FROM 语句。Order By clause is not supported, but you can set a full SELECT FROM statement. 还可以使用用户定义的表函数。You can also use user-defined table functions. select * from udfGetData() 是 SQL 中的一个 UDF,它返回你可以在数据流中使用的表。select * from udfGetData() is a UDF in SQL that returns a table that you can use in data flow.
查询示例:Select * from MyTable where customerId > 1000 and customerId < 2000Query example: Select * from MyTable where customerId > 1000 and customerId < 2000
No 字符串String queryquery
批大小Batch size 指定批大小,以将大型数据分成多个读取操作。Specify a batch size to chunk large data into reads. No IntegerInteger batchSizebatchSize
隔离级别Isolation Level 选择下列隔离级别之一:Choose one of the following isolation levels:
- 读取已提交的内容- Read Committed
- 读取未提交的内容(默认)- Read Uncommitted (default)
- 可重复的读取- Repeatable Read
- 可序列化- Serializable
- 无(忽略隔离级别)- None (ignore isolation level)
No READ_COMMITTEDREAD_COMMITTED
READ_UNCOMMITTEDREAD_UNCOMMITTED
REPEATABLE_READREPEATABLE_READ
SERIALIZABLESERIALIZABLE
NONE
NONE
isolationLevelisolationLevel

Azure SQL 托管实例源脚本示例Azure SQL Managed Instance source script example

使用 Azure SQL 托管实例作为源类型时,关联的数据流脚本为:When you use Azure SQL Managed Instance as source type, the associated data flow script is:

source(allowSchemaDrift: true,
    validateSchema: false,
    isolationLevel: 'READ_UNCOMMITTED',
    query: 'select * from MYTABLE',
    format: 'query') ~> SQLMISource

接收器转换Sink transformation

下表列出了 Azure SQL 托管实例接收器支持的属性。The below table lists the properties supported by Azure SQL Managed Instance sink. 可以在“接收器选项”选项卡中编辑这些属性。You can edit these properties in the Sink options tab.

名称Name 描述Description 必需Required 允许的值Allowed values 数据流脚本属性Data flow script property
Update 方法Update method 指定数据库目标上允许哪些操作。Specify what operations are allowed on your database destination. 默认设置为仅允许插入。The default is to only allow inserts.
若要更新、更新插入或删除行,需要进行“更改行”转换才能标记这些操作的行。To update, upsert, or delete rows, an Alter row transformation is required to tag rows for those actions.
Yes truefalsetrue or false deletabledeletable
insertableinsertable
updateableupdateable
upsertableupsertable
键列Key columns 对于更新、更新插入和删除操作,必须设置键列来确定要更改的行。For updates, upserts and deletes, key column(s) must be set to determine which row to alter.
后续的更新、更新插入和删除将使用你选取为密钥的列名称。The column name that you pick as the key will be used as part of the subsequent update, upsert, delete. 因此,你必须选取存在于接收器映射中的列。Therefore, you must pick a column that exists in the Sink mapping.
No ArrayArray 密钥keys
跳过写入键列Skip writing key columns 如果你不希望将值写入到键列,请选择“跳过写入键列”。If you wish to not write the value to the key column, select "Skip writing key columns". No truefalsetrue or false skipKeyWritesskipKeyWrites
表操作Table action 确定在写入之前是否从目标表重新创建或删除所有行。Determines whether to recreate or remove all rows from the destination table prior to writing.
- :不会对表进行任何操作。- None: No action will be done to the table.
- 重新创建:将删除表并重新创建表。- Recreate: The table will get dropped and recreated. 如果以动态方式创建表,则是必需的。Required if creating a new table dynamically.
- 截断:将删除目标表中的所有行。- Truncate: All rows from the target table will get removed.
No truefalsetrue or false recreaterecreate
truncatetruncate
批大小Batch size 指定每批中写入的行数。Specify how many rows are being written in each batch. 较大的批大小可提高压缩比并改进内存优化,但在缓存数据时可能会导致内存不足异常。Larger batch sizes improve compression and memory optimization, but risk out of memory exceptions when caching data. No IntegerInteger batchSizebatchSize
预处理和后处理 SQL 脚本Pre and Post SQL scripts 指定在将数据写入接收器数据库之前(预处理)和之后(后处理)会执行的多行 SQL 脚本。Specify multi-line SQL scripts that will execute before (pre-processing) and after (post-processing) data is written to your Sink database. No 字符串String preSQLspreSQLs
postSQLspostSQLs

Azure SQL 托管实例接收器脚本示例Azure SQL Managed Instance sink script example

使用 Azure SQL 托管实例作为接收器类型时,关联的数据流脚本为:When you use Azure SQL Managed Instance as sink type, the associated data flow script is:

IncomingStream sink(allowSchemaDrift: true,
    validateSchema: false,
    deletable:false,
    insertable:true,
    updateable:true,
    upsertable:true,
    keys:['keyColumn'],
    format: 'table',
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> SQLMISink

Lookup 活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

GetMetadata 活动属性GetMetadata activity properties

若要了解有关属性的详细信息,请查看 GetMetadata 活动To learn details about the properties, check GetMetadata activity

SQL 托管实例的数据类型映射Data type mapping for SQL Managed Instance

使用复制活动向/从 SQL 托管实例复制数据时,会使用以下从 SQL 托管实例数据类型到 Azure 数据工厂临时数据类型的映射。When data is copied to and from SQL Managed Instance using copy activity, the following mappings are used from SQL Managed Instance data types to Azure Data Factory interim data types. 若要了解复制活动如何从源架构和数据类型映射到接收器,请参阅架构和数据类型映射To learn how the copy activity maps from the source schema and data type to the sink, see Schema and data type mappings.

SQL 托管实例数据类型SQL Managed Instance data type Azure 数据工厂临时数据类型Azure Data Factory interim data type
bigintbigint Int64Int64
binarybinary Byte[]Byte[]
bitbit 布尔Boolean
charchar String, Char[]String, Char[]
datedate DateTimeDateTime
datetimeDatetime DateTimeDateTime
datetime2datetime2 DateTimeDateTime
DatetimeoffsetDatetimeoffset DateTimeOffsetDateTimeOffset
小数Decimal 小数Decimal
FILESTREAM attribute (varbinary(max))FILESTREAM attribute (varbinary(max)) Byte[]Byte[]
FloatFloat DoubleDouble
图像image Byte[]Byte[]
intint Int32Int32
moneymoney 小数Decimal
ncharnchar String, Char[]String, Char[]
ntextntext String, Char[]String, Char[]
numericnumeric 小数Decimal
nvarcharnvarchar String, Char[]String, Char[]
realreal SingleSingle
rowversionrowversion Byte[]Byte[]
smalldatetimesmalldatetime DateTimeDateTime
smallintsmallint Int16Int16
smallmoneysmallmoney 小数Decimal
sql_variantsql_variant ObjectObject
texttext String, Char[]String, Char[]
timetime TimeSpanTimeSpan
timestamptimestamp Byte[]Byte[]
tinyinttinyint Int16Int16
uniqueidentifieruniqueidentifier GuidGuid
varbinaryvarbinary Byte[]Byte[]
varcharvarchar String, Char[]String, Char[]
xmlxml StringString

备注

对于映射到十进制临时类型的数据类型,目前复制活动支持的最大精度为 28。For data types that map to the Decimal interim type, currently Copy activity supports precision up to 28. 如果数据需要的精度大于 28,请考虑在 SQL 查询中将其转换为字符串。If you have data that requires precision larger than 28, consider converting to a string in a SQL query.

使用 Always EncryptedUsing Always Encrypted

使用 Always Encrypted 从/向 Azure SQL 托管实例复制数据时,请通过 Self-hosted Integration Runtime 使用通用 ODBC 连接器和 SQL Server ODBC 驱动程序。When you copy data from/to Azure SQL Managed Instance with Always Encrypted, use generic ODBC connector and SQL Server ODBC driver via Self-hosted Integration Runtime. 此 Azure SQL 托管实例连接器目前不支持 Always Encrypted。This Azure SQL Managed Instance connector does not support Always Encrypted now.

更具体地说:More specifically:

  1. 安装自承载 Integration Runtime(如果没有)。Set up a Self-hosted Integration Runtime if you don't have one. 有关详细信息,请参阅自承载集成运行时一文。See Self-hosted Integration Runtime article for details.

  2. 此处下载适用于 SQL Server 的 64 位 ODBC 驱动程序,并将其安装在 Integration Runtime 计算机上。Download the 64-bit ODBC driver for SQL Server from here, and install on the Integration Runtime machine. 若要详细了解此驱动程序的工作原理,请参阅在适用于 SQL Server 的 ODBC 驱动程序中使用 Always EncryptedLearn more about how this driver works from Using Always Encrypted with the ODBC Driver for SQL Server.

  3. 若要创建 ODBC 类型的链接服务以连接到 SQL 数据库,请参阅以下示例:Create linked service with ODBC type to connect to your SQL database, refer to the following samples:

    • 若要使用“SQL 身份验证”,请执行以下操作:如下所示指定 ODBC 连接字符串,并选择“基本”身份验证以设置用户名和密码。To use SQL authentication: Specify the ODBC connection string as below, and select Basic authentication to set the user name and password.

      Driver={ODBC Driver 17 for SQL Server};Server=<serverName>;Database=<databaseName>;ColumnEncryption=Enabled;KeyStoreAuthentication=KeyVaultClientSecret;KeyStorePrincipalId=<servicePrincipalKey>;KeyStoreSecret=<servicePrincipalKey>
      
    • 如果在 Azure 虚拟机上运行自承载集成运行时,则可对 Azure VM 的标识使用“托管标识身份验证”:If you run Self-hosted Integration Runtime on Azure Virtual Machine, you can use Managed Identity authentication with Azure VM's identity:

      1. 按照相同的先决条件为托管标识创建数据库用户,并在数据库中授予适当的角色。Follow the same prerequisites to create database user for the managed identity and grant the proper role in your database.
      2. 在链接服务中,如下所示指定 ODBC 连接字符串,并选择“匿名”身份验证,因为连接字符串本身指示 Authentication=ActiveDirectoryMsiIn linked service, specify the ODBC connection string as below, and select Anonymous authentication as the connection string itself indicatesAuthentication=ActiveDirectoryMsi.
      Driver={ODBC Driver 17 for SQL Server};Server=<serverName>;Database=<databaseName>;ColumnEncryption=Enabled;KeyStoreAuthentication=KeyVaultClientSecret;KeyStorePrincipalId=<servicePrincipalKey>;KeyStoreSecret=<servicePrincipalKey>; Authentication=ActiveDirectoryMsi;
      
  4. 相应地使用 ODBC 类型创建数据集和复制活动。Create dataset and copy activity with ODBC type accordingly. 若要了解详细信息,请参阅 ODBC 连接器一文。Learn more from ODBC connector article.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see Supported data stores.