使用 Azure 数据工厂从 Netezza 复制数据Copy data from Netezza by using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述了如何使用 Azure 数据工厂中的复制活动从 Netezza 复制数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from Netezza. 本文是根据总体概述复制活动的 Azure 数据工厂中的复制活动编写的。The article builds on Copy Activity in Azure Data Factory, which presents a general overview of Copy Activity.

提示

对于从 Netezza 迁移到 Azure 的数据迁移方案,请从使用 Azure 数据工厂将数据从本地 Netezza 服务器迁移到 Azure 了解更多信息。For data migration scenario from Netezza to Azure, learn more from Use Azure Data Factory to migrate data from on-premises Netezza server to Azure.

支持的功能Supported capabilities

以下活动支持此 Netezza 连接器:This Netezza connector is supported for the following activities:

可以将数据从 Netezza 复制到任何支持的接收器数据存储。You can copy data from Netezza to any supported sink data store. 有关复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores that Copy Activity supports as sources and sinks, see Supported data stores and formats.

Netezza 连接器支持从源进行并行复制。Netezza connector supports parallel copying from source. 有关详细信息,请参阅从 Netezza 进行并行复制部分。See the Parallel copy from Netezza section for details.

Azure 数据工厂提供内置驱动程序以启用连接。Azure Data Factory provides a built-in driver to enable connectivity. 无需要手动安装任何驱动程序即可使用此连接器。You don't need to manually install any driver to use this connector.

先决条件Prerequisites

如果数据存储是以下方式之一配置的,则需要设置自承载集成运行时才能连接到此数据存储:If your data store is configured in one of the following ways, you need to set up a self-hosted integration runtime to connect to the data store:

  • 数据存储位于本地网络内部、Azure 虚拟网络内部或 Amazon 虚拟私有云内。The data store is located inside an on-premises network, inside an Azure virtual network, or inside Amazon Virtual Private Cloud.
  • 数据存储是一种托管的云数据服务,只有在防火墙规则中列入允许列表的 IP 才能访问该服务。The data store is a managed cloud data service where the access is restricted to IPs that are whitelisted in the firewall rules.

入门Get started

可以通过使用 .NET SDK、Python SDK、Azure PowerShell、REST API 或 Azure 资源管理器模板创建使用复制活动的管道。You can create a pipeline that uses a copy activity by using the .NET SDK, the Python SDK, Azure PowerShell, the REST API, or an Azure Resource Manager template. 有关如何创建包含复制活动的管道的分步说明,请参阅复制活动教程See the Copy Activity tutorial for step-by-step instructions on how to create a pipeline that has a copy activity.

对于特定于 Netezza 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties you can use to define Data Factory entities that are specific to the Netezza connector.

链接服务属性Linked service properties

Netezza 链接服务支持以下属性:The following properties are supported for the Netezza linked service:

属性Property 说明Description 必须Required
typetype “type”属性必须设置为“Netezza” 。The type property must be set to Netezza. Yes
connectionStringconnectionString 用于连接到 Netezza 的 ODBC 连接字符串。An ODBC connection string to connect to Netezza.
还可以将密码放在 Azure 密钥保管库中,并从连接字符串中拉取 pwd 配置。You can also put password in Azure Key Vault and pull the pwd configuration out of the connection string. 有关更多详细信息,请参阅以下示例和在 Azure 密钥保管库中存储凭据一文。Refer to the following samples and Store credentials in Azure Key Vault article with more details.
Yes
connectViaconnectVia 用于连接到数据存储的 Integration RuntimeThe Integration Runtime to use to connect to the data store. 先决条件部分了解更多信息。Learn more from Prerequisites section. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, the default Azure Integration Runtime is used. No

典型的连接字符串为 Server=<server>;Port=<port>;Database=<database>;UID=<user name>;PWD=<password>A typical connection string is Server=<server>;Port=<port>;Database=<database>;UID=<user name>;PWD=<password>. 下表介绍了更多可以设置的属性:The following table describes more properties that you can set:

属性Property 说明Description 必须Required
SecurityLevelSecurityLevel 驱动程序用于连接到数据存储的安全级别。The level of security that the driver uses for the connection to the data store. 驱动程序支持使用 SSL 3 进行单向身份验证的 SSL 连接。The driver supports SSL connections with one-way authentication using SSL version 3.
示例:SecurityLevel=preferredSecuredExample: SecurityLevel=preferredSecured. 支持的值是:Supported values are:
- 仅限未受保护 (onlyUnSecured) :该驱动程序不使用 SSL。- Only unsecured (onlyUnSecured): The driver doesn't use SSL.
- 首选未受保护 (preferredUnSecured)(默认) :如果该服务器提供选择,则驱动程序不使用 SSL。- Preferred unsecured (preferredUnSecured) (default): If the server provides a choice, the driver doesn't use SSL.
- 首选受保护 (preferredSecured) :如果服务器提供选择,则驱动程序使用 SSL。- Preferred secured (preferredSecured): If the server provides a choice, the driver uses SSL.
- 仅限受保护 (onlySecured) :除非有 SSL 连接,否则驱动程序不会连接。- Only secured (onlySecured): The driver doesn't connect unless an SSL connection is available.
No
CaCertFileCaCertFile 服务器使用的 SSL 证书的完整路径。The full path to the SSL certificate that's used by the server. 示例: CaCertFile=<cert path>;Example: CaCertFile=<cert path>; 是,如果启用了 SSLYes, if SSL is enabled

示例Example

{
    "name": "NetezzaLinkedService",
    "properties": {
        "type": "Netezza",
        "typeProperties": {
            "connectionString": "Server=<server>;Port=<port>;Database=<database>;UID=<user name>;PWD=<password>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例:在 Azure 密钥保管库中存储密码Example: store password in Azure Key Vault

{
    "name": "NetezzaLinkedService",
    "properties": {
        "type": "Netezza",
        "typeProperties": {
            "connectionString": "Server=<server>;Port=<port>;Database=<database>;UID=<user name>;",
            "pwd": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

本部分提供 Netezza 数据集支持的属性列表。This section provides a list of properties that the Netezza dataset supports.

若要查看可用于定义数据集的各部分和属性的完整列表,请参阅数据集For a full list of sections and properties that are available for defining datasets, see Datasets.

若要从 Netezza 复制数据,请将数据集的 type 属性设置为“NetezzaTable” 。To copy data from Netezza, set the type property of the dataset to NetezzaTable. 支持以下属性:The following properties are supported:

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为:NetezzaTableThe type property of the dataset must be set to: NetezzaTable Yes
架构schema 架构的名称。Name of the schema. 否(如果指定了活动源中的“query”)No (if "query" in activity source is specified)
table 表的名称。Name of the table. 否(如果指定了活动源中的“query”)No (if "query" in activity source is specified)
tableNametableName 具有架构的表的名称。Name of the table with schema. 支持此属性是为了向后兼容。This property is supported for backward compatibility. 对于新的工作负荷,请使用 schematableUse schema and table for new workload. 否(如果指定了活动源中的“query”)No (if "query" in activity source is specified)

示例Example

{
    "name": "NetezzaDataset",
    "properties": {
        "type": "NetezzaTable",
        "linkedServiceName": {
            "referenceName": "<Netezza linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {}
    }
}

复制活动属性Copy Activity properties

本部分提供 Netezza 源支持的属性列表。This section provides a list of properties that the Netezza source supports.

有关可用于定义活动的各个部分和属性的完整列表,请参阅管道For a full list of sections and properties that are available for defining activities, see Pipelines.

以 Netezza 作为源Netezza as source

提示

若要详细了解如何使用数据分区从 Netezza 高效加载数据,请参阅从 Netezza 进行并行复制部分。To load data from Netezza efficiently by using data partitioning, learn more from Parallel copy from Netezza section.

若要从 Netezza 复制数据,请将复制活动中的 source 类型设置为“NetezzaSource” 。To copy data from Netezza, set the source type in Copy Activity to NetezzaSource. 复制活动 source 节支持以下属性:The following properties are supported in the Copy Activity source section:

属性Property 说明Description 必须Required
typetype 必须将复制活动源的 type 属性设置为“NetezzaSource” 。The type property of the Copy Activity source must be set to NetezzaSource. Yes
查询query 使用自定义 SQL 查询读取数据。Use the custom SQL query to read data. 示例: "SELECT * FROM MyTable"Example: "SELECT * FROM MyTable" 否(如果指定了数据集中的“tableName”)No (if "tableName" in dataset is specified)
partitionOptionspartitionOptions 指定用于从 Netezza 加载数据的数据分区选项。Specifies the data partitioning options used to load data from Netezza.
允许值为:None(默认值)、DataSliceDynamicRangeAllow values are: None (default), DataSlice, and DynamicRange.
启用分区选项(即,该选项不为 None)时,用于从 Netezza 数据库并行加载数据的并行度由复制活动上的 parallelCopies 设置控制。When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from a Netezza database is controlled by parallelCopies setting on the copy activity.
No
partitionSettingspartitionSettings 指定数据分区的设置组。Specify the group of the settings for data partitioning.
当分区选项不是 None 时适用。Apply when partition option isn't None.
No
partitionColumnNamepartitionColumnName 指定并行复制范围分区将使用的源列(整数类型)的名称。Specify the name of the source column in integer type that will be used by range partitioning for parallel copy. 如果未指定,系统会自动检测表的主键并将其用作分区列。If not specified, the primary key of the table is autodetected and used as the partition column.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionColumnNameIf you use a query to retrieve the source data, hook ?AdfRangePartitionColumnName in WHERE clause. 请参阅从 Netezza 进行并行复制部分的示例。See example in Parallel copy from Netezza section.
No
partitionUpperBoundpartitionUpperBound 要从中复制数据的分区列的最大值。The maximum value of the partition column to copy data out.
当分区选项为 DynamicRange 时适用。Apply when partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionUpboundIf you use query to retrieve source data, hook ?AdfRangePartitionUpbound in the WHERE clause. 如需示例,请参阅从 Netezza 进行并行复制部分。For an example, see the Parallel copy from Netezza section.
No
partitionLowerBoundpartitionLowerBound 要从中复制数据的分区列的最小值。The minimum value of the partition column to copy data out.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionLowboundIf you use a query to retrieve the source data, hook ?AdfRangePartitionLowbound in the WHERE clause. 如需示例,请参阅从 Netezza 进行并行复制部分。For an example, see the Parallel copy from Netezza section.
No

示例:Example:

"activities":[
    {
        "name": "CopyFromNetezza",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Netezza input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "NetezzaSource",
                "query": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

从 Netezza 进行并行复制Parallel copy from Netezza

数据工厂 Netezza 连接器提供内置的数据分区,用于从 Netezza 并行复制数据。The Data Factory Netezza connector provides built-in data partitioning to copy data from Netezza in parallel. 可以在复制活动的“源”表中找到数据分区选项。You can find data partitioning options on the Source table of the copy activity.

分区选项的屏幕截图

启用分区复制时,数据工厂将对 Netezza 源运行并行查询,以按分区加载数据。When you enable partitioned copy, Data Factory runs parallel queries against your Netezza source to load data by partitions. 可通过复制活动中的 parallelCopies 设置控制并行度。The parallel degree is controlled by the parallelCopies setting on the copy activity. 例如,如果将 parallelCopies 设置为 4,则数据工厂会根据指定的分区选项和设置并行生成并运行 4 个查询,每个查询从 Netezza 数据库检索一部分数据。For example, if you set parallelCopies to four, Data Factory concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your Netezza database.

建议同时启用并行复制和数据分区,尤其是从 Netezza 数据库加载大量数据时。You are suggested to enable parallel copy with data partitioning especially when you load large amount of data from your Netezza database. 下面是适用于不同方案的建议配置。The following are suggested configurations for different scenarios. 将数据复制到基于文件的数据存储中时,建议将数据作为多个文件写入文件夹(仅指定文件夹名称),在这种情况下,性能优于写入单个文件。When copying data into file-based data store, it's recommanded to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

方案Scenario 建议的设置Suggested settings
从大型表进行完整加载。Full load from large table. 分区选项:数据切片。Partition option: Data Slice.

在执行期间,数据工厂自动根据 Netezza 的内置数据切片将数据分区,并按分区复制数据。During execution, Data Factory automatically partitions the data based on Netezza's built-in data slices, and copies data by partitions.
使用自定义查询加载大量数据。Load large amount of data by using a custom query. 分区选项:数据切片。Partition option: Data Slice.
查询:SELECT * FROM <TABLENAME> WHERE mod(datasliceid, ?AdfPartitionCount) = ?AdfDataSliceCondition AND <your_additional_where_clause>Query: SELECT * FROM <TABLENAME> WHERE mod(datasliceid, ?AdfPartitionCount) = ?AdfDataSliceCondition AND <your_additional_where_clause>.
执行期间,数据工厂将 ?AdfPartitionCount(并行复制数在复制活动中设置)和 ?AdfDataSliceCondition 替换为数据切片分区逻辑,并将其发送到 Netezza。During execution, Data Factory replaces ?AdfPartitionCount (with parallel copy number set on copy activity) and ?AdfDataSliceCondition with the data slice partition logic, and sends to Netezza.
使用自定义查询加载大量数据,某个整数列包含均匀分布的范围分区值。Load large amount of data by using a custom query, having an integer column with evenly distributed value for range partitioning. 分区选项:动态范围分区。Partition options: Dynamic range partition.
查询SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>Query: SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>.
分区列:指定用于对数据进行分区的列。Partition column: Specify the column used to partition data. 可以针对整数数据类型的列进行分区。You can partition against the column with integer data type.
分区上限和分区下限 :指定是否要对分区列进行筛选,以便仅检索介于下限和上限之间的数据。Partition upper bound and partition lower bound: Specify if you want to filter against the partition column to retrieve data only between the lower and upper range.

在执行期间,数据工厂会将 ?AdfRangePartitionColumnName?AdfRangePartitionUpbound?AdfRangePartitionLowbound 替换为每个分区的实际列名和值范围,并将其发送到 Netezza。During execution, Data Factory replaces ?AdfRangePartitionColumnName, ?AdfRangePartitionUpbound, and ?AdfRangePartitionLowbound with the actual column name and value ranges for each partition, and sends to Netezza.
例如,如果为分区列“ID”设置了下限 1、上限 80,并将并行复制设置为 4,则数据工厂会按 4 个分区检索数据。For example, if your partition column "ID" set with the lower bound as 1 and the upper bound as 80, with parallel copy set as 4, Data Factory retrieves data by 4 partitions. 其 ID 分别介于 [1, 20]、[21, 40]、[41, 60] 和 [61, 80] 之间。Their IDs are between [1,20], [21, 40], [41, 60], and [61, 80], respectively.

示例:使用数据切片分区进行查询Example: query with data slice partition

"source": {
    "type": "NetezzaSource",
    "query": "SELECT * FROM <TABLENAME> WHERE mod(datasliceid, ?AdfPartitionCount) = ?AdfDataSliceCondition AND <your_additional_where_clause>",
    "partitionOption": "DataSlice"
}

示例:使用动态范围分区进行查询Example: query with dynamic range partition

"source": {
    "type": "NetezzaSource",
    "query": "SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>",
    "partitionOption": "DynamicRange",
    "partitionSettings": {
        "partitionColumnName": "<dynamic_range_partition_column_name>",
        "partitionUpperBound": "<upper_value_of_partition_column>",
        "partitionLowerBound": "<lower_value_of_partition_column>"
    }
}

Lookup 活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持用作源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores that Copy Activity supports as sources and sinks in Azure Data Factory, see Supported data stores and formats.