使用 Azure 数据工厂从/向 Oracle 复制数据Copy data from and to Oracle by using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述如何在 Azure 数据工厂中使用复制活动从/向 Oracle 数据库复制数据。This article outlines how to use the copy activity in Azure Data Factory to copy data from and to an Oracle database. 本文是在复制活动概述的基础上编写的。It builds on the copy activity overview.

支持的功能Supported capabilities

以下活动支持此 Oracle 连接器:This Oracle connector is supported for the following activities:

可以将数据从 Oracle 数据库复制到任何支持的接收器数据存储。You can copy data from an Oracle database to any supported sink data store. 还可以将数据从任何支持的源数据存储复制到 Oracle 数据库。You also can copy data from any supported source data store to an Oracle database. 有关复制活动支持作为源或接收器的数据存储列表,请参阅支持的数据存储表。For a list of data stores that are supported as sources or sinks by the copy activity, see the Supported data stores table.

具体而言,此 Oracle 连接器支持:Specifically, this Oracle connector supports:

  • 以下版本的 Oracle 数据库:The following versions of an Oracle database:
    • Oracle 19c R1 (19.1) 和更高版本Oracle 19c R1 (19.1) and higher
    • Oracle 18c R1 (18.1) 和更高版本Oracle 18c R1 (18.1) and higher
    • Oracle 12c R1 (12.1) 和更高版本Oracle 12c R1 (12.1) and higher
    • Oracle 11g R1 (11.1) 和更高版本Oracle 11g R1 (11.1) and higher
    • Oracle 10g R1 (10.1) 和更高版本Oracle 10g R1 (10.1) and higher
    • Oracle 9i R2 (9.2) 和更高版本Oracle 9i R2 (9.2) and higher
    • Oracle 8i R3 (8.1.7) 和更高版本Oracle 8i R3 (8.1.7) and higher
    • Oracle Database Cloud Exadata ServiceOracle Database Cloud Exadata Service
  • 从 Oracle 源进行并行复制。Parallel copying from an Oracle source. 有关详细信息,请参阅从 Oracle 进行并行复制部分。See the Parallel copy from Oracle section for details.

备注

不支持 Oracle 代理服务器。Oracle proxy server isn't supported.

先决条件Prerequisites

如果数据存储是以下方式之一配置的,则需要设置自承载集成运行时才能连接到此数据存储:If your data store is configured in one of the following ways, you need to set up a self-hosted integration runtime to connect to the data store:

  • 数据存储位于本地网络内部、Azure 虚拟网络内部或 Amazon 虚拟私有云内。The data store is located inside an on-premises network, inside an Azure virtual network, or inside Amazon Virtual Private Cloud.
  • 数据存储是一种托管的云数据服务,只有在防火墙规则中列入允许列表的 IP 才能访问该服务。The data store is a managed cloud data service where the access is restricted to IPs that are whitelisted in the firewall rules.

集成运行时提供内置的 Oracle 驱动程序。The integration runtime provides a built-in Oracle driver. 因此,在从/向 Oracle 复制数据时不需要手动安装驱动程序。Therefore, you don't need to manually install a driver when you copy data from and to Oracle.

入门Get started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 Oracle 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to the Oracle connector.

链接服务属性Linked service properties

Oracle 链接服务支持以下属性:The Oracle linked service supports the following properties:

属性Property 说明Description 必需Required
typetype type 属性必须设置为 OracleThe type property must be set to Oracle. Yes
connectionStringconnectionString 指定连接到 Oracle 数据库实例所需的信息。Specifies the information needed to connect to the Oracle Database instance.
还可以将密码放在 Azure Key Vault 中,并从连接字符串中拉取 password 配置。You can also put a password in Azure Key Vault, and pull the password configuration out of the connection string. 有关更多详细信息,请参阅以下示例和在 Azure Key Vault 中存储凭据Refer to the following samples and Store credentials in Azure Key Vault with more details.

支持的连接类型:可以使用 Oracle SIDOracle 服务名称来标识数据库:Supported connection type: You can use Oracle SID or Oracle Service Name to identify your database:
- 如果使用 SID:Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;Password=<password>;- If you use SID: Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;Password=<password>;
- 如果使用服务名称:Host=<host>;Port=<port>;ServiceName=<servicename>;User Id=<username>;Password=<password>;- If you use Service Name: Host=<host>;Port=<port>;ServiceName=<servicename>;User Id=<username>;Password=<password>;
对于“高级 Oracle 本机连接”选项,可以选择在 Oracle 服务器上的 TNSNAMES.ORA 文件中添加一个条目,在“ADF Oracle 链接服务”中,选择使用“Oracle 服务名称”连接类型并配置相应的服务名称。For advanced Oracle native connection options, you can choose to add an entry in TNSNAMES.ORA file on the Oracle server, and in ADF Oracle linked service, choose to use Oracle Service Name connection type and configure the corresponding service name.
Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The integration runtime to be used to connect to the data store. 先决条件部分了解更多信息。Learn more from Prerequisites section. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, the default Azure Integration Runtime is used. No

提示

如果遇到错误“ORA-01025:UPI 参数超出范围”,且 Oracle 版本为 8i,请将 WireProtocolMode=1 添加到连接字符串。If you get an error, "ORA-01025: UPI parameter out of range", and your Oracle version is 8i, add WireProtocolMode=1 to your connection string. 然后重试。Then try again.

如果有多个 Oracle 实例用于故障转移方案,则可以创建 Oracle 链接服务并填充主要主机、端口、用户名、密码等,并添加新的“其他连接属性”,其属性名称为 AlternateServers,值为 (HostName=<secondary host>:PortNumber=<secondary port>:ServiceName=<secondary service name>) - 不要遗漏括号,并注意作为分隔符的冒号 (:)。If you have multiple Oracle instances for failover scenario, you can create Oracle linked service and fill in the primary host, port, user name, password, etc., and add a new "Additional connection properties" with property name as AlternateServers and value as (HostName=<secondary host>:PortNumber=<secondary port>:ServiceName=<secondary service name>) - do not miss the brackets and pay attention to the colons (:) as separator. 例如,以下备用服务器值定义用于连接故障转移的两台备用数据库服务器:(HostName=AccountingOracleServer:PortNumber=1521:SID=Accounting,HostName=255.201.11.24:PortNumber=1522:ServiceName=ABackup.NA.MyCompany)As an example, the following value of alternate servers defines two alternate database servers for connection failover: (HostName=AccountingOracleServer:PortNumber=1521:SID=Accounting,HostName=255.201.11.24:PortNumber=1522:ServiceName=ABackup.NA.MyCompany).

可以根据自己的情况在连接字符串中设置更多连接属性:More connection properties you can set in connection string per your case:

属性Property 说明Description 允许的值Allowed values
ArraySizeArraySize 连接器在单个网络往返中可以提取的字节数。The number of bytes the connector can fetch in a single network round trip. 例如,ArraySize=‭10485760‬E.g., ArraySize=‭10485760‬.

较大的值可减少在网络中提取数据的次数,从而提高吞吐量。Larger values increase throughput by reducing the number of times to fetch data across the network. 较小的值会增加响应时间,因为等待服务器传输数据的延迟较小。Smaller values increase response time, as there is less of a delay waiting for the server to transmit data.
1 到 4294967296 (4 GB) 之间的一个整数。An integer from 1 to 4294967296 (4 GB). 默认值为 60000Default value is 60000. 值 1 不定义字节数,而指示仅为一行数据分配空间。The value 1 does not define the number of bytes, but indicates allocating space for exactly one row of data.

若要在 Oracle 连接上启用加密,你有两种选择:To enable encryption on Oracle connection, you have two options:

  • 若要使用三重 DES 加密 (3DES) 和高级加密标准 (AES) ,请在 Oracle 服务器端,转到“Oracle 高级安全性 (OAS)”并配置加密设置。To use Triple-DES Encryption (3DES) and Advanced Encryption Standard (AES), on the Oracle server side, go to Oracle Advanced Security (OAS) and configure the encryption settings. 有关详细信息,请参阅 Oracle 文档For details, see this Oracle documentation. Oracle 应用程序开发框架 (ADF) 连接器会自动协商加密方法,以便在建立与 Oracle 的连接时使用在 OAS 中配置的加密方法。The Oracle Application Development Framework (ADF) connector automatically negotiates the encryption method to use the one you configure in OAS when establishing a connection to Oracle.

  • 若要使用 TLS ,请执行以下操作:To use TLS:

    1. 获取 TLS/SSL 证书信息。Get the TLS/SSL certificate info. 获取 TLS/SSL 证书的可辨别编码规则 (DER) 编码证书信息,并将输出 (----- Begin Certificate …Get the Distinguished Encoding Rules (DER)-encoded certificate information of your TLS/SSL cert, and save the output (----- Begin Certificate … End Certificate -----) 另存为文本文件。End Certificate -----) as a text file.

      openssl x509 -inform DER -in [Full Path to the DER Certificate including the name of the DER Certificate] -text
      

      示例: 从 DERcert.cer 提取证书信息;然后,将输出保存到 cert.txt。Example: Extract cert info from DERcert.cer, and then save the output to cert.txt.

      openssl x509 -inform DER -in DERcert.cer -text
      Output:
      -----BEGIN CERTIFICATE-----
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      XXXXXXXXX
      -----END CERTIFICATE-----
      
    2. 生成 keystoretruststoreBuild the keystore or truststore. 以下命令使用或不使用 PKCS-12 格式的密码来创建 truststore 文件。The following command creates the truststore file, with or without a password, in PKCS-12 format.

      openssl pkcs12 -in [Path to the file created in the previous step] -out [Path and name of TrustStore] -passout pass:[Keystore PWD] -nokeys -export
      

      示例: 使用密码创建名为 MyTrustStoreFile 的 PKCS12 truststore 文件。Example: Create a PKCS12 truststore file, named MyTrustStoreFile, with a password.

      openssl pkcs12 -in cert.txt -out MyTrustStoreFile -passout pass:ThePWD -nokeys -export  
      
    3. truststore 文件放在自承载 IR 计算机上。Place the truststore file on the self-hosted IR machine. 例如,将该文件放在 C:\MyTrustStoreFile。For example, place the file at C:\MyTrustStoreFile.

    4. 在 Azure 数据工厂中,使用 EncryptionMethod=1 和相应的 TrustStore/TrustStorePassword 值配置 Oracle 连接字符串。In Azure Data Factory, configure the Oracle connection string with EncryptionMethod=1 and the corresponding TrustStore/TrustStorePasswordvalue. 例如,Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;Password=<password>;EncryptionMethod=1;TrustStore=C:\\MyTrustStoreFile;TrustStorePassword=<trust_store_password>For example, Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;Password=<password>;EncryptionMethod=1;TrustStore=C:\\MyTrustStoreFile;TrustStorePassword=<trust_store_password>.

示例:Example:

{
    "name": "OracleLinkedService",
    "properties": {
        "type": "Oracle",
        "typeProperties": {
            "connectionString": "Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;Password=<password>;"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例:在 Azure 密钥保管库中存储密码Example: store password in Azure Key Vault

{
    "name": "OracleLinkedService",
    "properties": {
        "type": "Oracle",
        "typeProperties": {
            "connectionString": "Host=<host>;Port=<port>;Sid=<sid>;User Id=<username>;",
            "password": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

本部分提供 Oracle 数据集支持的属性列表。This section provides a list of properties supported by the Oracle dataset. 有关可用于定义数据集的各个部分和属性的完整列表,请参阅数据集For a full list of sections and properties available for defining datasets, see Datasets.

若要从/向 Oracle 复制数据,请将数据集的 type 属性设置为 OracleTableTo copy data from and to Oracle, set the type property of the dataset to OracleTable. 支持以下属性。The following properties are supported.

属性Property 描述Description 必须Required
typetype 数据集的 type 属性必须设置为 OracleTableThe type property of the dataset must be set to OracleTable. Yes
架构schema 架构的名称。Name of the schema. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink
table 表/视图的名称。Name of the table/view. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink
tableNametableName 具有架构的表/视图的名称。Name of the table/view with schema. 此属性支持后向兼容性。This property is supported for backward compatibility. 对于新的工作负荷,请使用 schematableFor new workload, use schema and table. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink

示例:Example:

{
    "name": "OracleDataset",
    "properties":
    {
        "type": "OracleTable",
        "schema": [],
        "typeProperties": {
            "schema": "<schema_name>",
            "table": "<table_name>"
        },
        "linkedServiceName": {
            "referenceName": "<Oracle linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动属性Copy activity properties

本部分提供 Oracle 源和接收器支持的属性列表。This section provides a list of properties supported by the Oracle source and sink. 有关可用于定义活动的各个部分和属性的完整列表,请参阅管道For a full list of sections and properties available for defining activities, see Pipelines.

Oracle 作为源Oracle as source

提示

有关如何使用数据分区从 Oracle 有效加载数据的详细信息,请参阅从 Oracle 进行并行复制部分。To load data from Oracle efficiently by using data partitioning, learn more from Parallel copy from Oracle.

要从 Oracle 复制数据,请将复制活动中的源类型设置为 OracleSourceTo copy data from Oracle, set the source type in the copy activity to OracleSource. 复制活动的 source 节支持以下属性。The following properties are supported in the copy activity source section.

属性Property 描述Description 必须Required
typetype 复制活动 source 的 type 属性必须设置为 OracleSourceThe type property of the copy activity source must be set to OracleSource. Yes
oracleReaderQueryoracleReaderQuery 使用自定义 SQL 查询读取数据。Use the custom SQL query to read data. 例如 "SELECT * FROM MyTable"An example is "SELECT * FROM MyTable".
启用分区加载时,需要在查询中挂接任何相应的内置分区参数。When you enable partitioned load, you need to hook any corresponding built-in partition parameters in your query. 有关示例,请参阅从 Oracle 进行并行复制部分。For examples, see the Parallel copy from Oracle section.
No
partitionOptionspartitionOptions 指定用于从 Oracle 加载数据的数据分区选项。Specifies the data partitioning options used to load data from Oracle.
允许值包括:None(默认值)、PhysicalPartitionsOfTableDynamicRangeAllowed values are: None (default), PhysicalPartitionsOfTable ,and DynamicRange.
启用分区选项(即,该选项不为 None)时,用于从 Oracle 数据库并行加载数据的并行度由复制活动上的 parallelCopies 设置控制。When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from an Oracle database is controlled by the parallelCopies setting on the copy activity.
No
partitionSettingspartitionSettings 指定数据分区的设置组。Specify the group of the settings for data partitioning.
当分区选项不是 None 时适用。Apply when the partition option isn't None.
No
partitionNamespartitionNames 需要复制的物理分区的列表。The list of physical partitions that needs to be copied.
当分区选项是 PhysicalPartitionsOfTable 时适用。Apply when the partition option is PhysicalPartitionsOfTable. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfTabularPartitionNameIf you use a query to retrieve the source data, hook ?AdfTabularPartitionName in the WHERE clause. 有关示例,请参阅从 Oracle 进行并行复制部分。For an example, see the Parallel copy from Oracle section.
No
partitionColumnNamepartitionColumnName 指定并行复制范围分区使用的源列(整数类型)的名称。Specify the name of the source column in integer type that will be used by range partitioning for parallel copy. 如果未指定,系统会自动检测表的主键并将其用作分区列。If not specified, the primary key of the table is auto-detected and used as the partition column.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionColumnNameIf you use a query to retrieve the source data, hook ?AdfRangePartitionColumnName in the WHERE clause. 有关示例,请参阅从 Oracle 进行并行复制部分。For an example, see the Parallel copy from Oracle section.
No
partitionUpperBoundpartitionUpperBound 要从中复制数据的分区列的最大值。The maximum value of the partition column to copy data out.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionUpboundIf you use a query to retrieve the source data, hook ?AdfRangePartitionUpbound in the WHERE clause. 有关示例,请参阅从 Oracle 进行并行复制部分。For an example, see the Parallel copy from Oracle section.
No
partitionLowerBoundpartitionLowerBound 要从中复制数据的分区列的最小值。The minimum value of the partition column to copy data out.
当分区选项是 DynamicRange 时适用。Apply when the partition option is DynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfRangePartitionLowboundIf you use a query to retrieve the source data, hook ?AdfRangePartitionLowbound in the WHERE clause. 有关示例,请参阅从 Oracle 进行并行复制部分。For an example, see the Parallel copy from Oracle section.
No

示例:使用基本查询但不使用分区复制数据Example: copy data by using a basic query without partition

"activities":[
    {
        "name": "CopyFromOracle",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Oracle input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "OracleSource",
                "oracleReaderQuery": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Oracle 作为接收器Oracle as sink

若要向 Oracle 复制数据,请将复制活动中的接收器类型设置为 OracleSinkTo copy data to Oracle, set the sink type in the copy activity to OracleSink. 复制活动 sink 节支持以下属性。The following properties are supported in the copy activity sink section.

属性Property 描述Description 必须Required
typetype 复制活动接收器的 type 属性必须设置为 OracleSinkThe type property of the copy activity sink must be set to OracleSink. Yes
writeBatchSizewriteBatchSize 缓冲区大小达到 writeBatchSize 时将数据插入 SQL 表。Inserts data into the SQL table when the buffer size reaches writeBatchSize.
允许的值为 Integer(行数)。Allowed values are Integer (number of rows).
否(默认值为 10,000)No (default is 10,000)
writeBatchTimeoutwriteBatchTimeout 超时前等待批插入操作完成的时间。The wait time for the batch insert operation to complete before it times out.
允许的值为 Timespan。Allowed values are Timespan. 示例为 00:30:00(30 分钟)。An example is 00:30:00 (30 minutes).
No
preCopyScriptpreCopyScript 指定每次运行时,复制活动将数据写入到 Oracle 之前要运行的 SQL 查询。Specify a SQL query for the copy activity to run before writing data into Oracle in each run. 可以使用此属性清除预加载的数据。You can use this property to clean up the preloaded data. No

示例:Example:

"activities":[
    {
        "name": "CopyToOracle",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Oracle output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "OracleSink"
            }
        }
    }
]

从 Oracle 进行并行复制Parallel copy from Oracle

数据工厂 Oracle 连接器提供内置的数据分区,用于从 Oracle 并行复制数据。The Data Factory Oracle connector provides built-in data partitioning to copy data from Oracle in parallel. 可以在复制活动的“源”表中找到数据分区选项。 You can find data partitioning options on the Source tab of the copy activity.

分区选项的屏幕截图

启用分区复制时,数据工厂将对 Oracle 源运行并行查询,以按分区加载数据。When you enable partitioned copy, Data Factory runs parallel queries against your Oracle source to load data by partitions. 可通过复制活动中的 parallelCopies 设置控制并行度。The parallel degree is controlled by the parallelCopies setting on the copy activity. 例如,如果将 parallelCopies 设置为 4,则数据工厂会根据指定的分区选项和设置并行生成并运行 4 个查询,每个查询从 Oracle 数据库检索一部分数据。For example, if you set parallelCopies to four, Data Factory concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your Oracle database.

建议同时启用并行复制和数据分区,尤其是从 Oracle 数据库加载大量数据时。You are suggested to enable parallel copy with data partitioning especially when you load large amount of data from your Oracle database. 下面是适用于不同方案的建议配置。The following are suggested configurations for different scenarios. 将数据复制到基于文件的数据存储中时,建议将数据作为多个文件写入文件夹(仅指定文件夹名称),在这种情况下,性能优于写入单个文件。When copying data into file-based data store, it's recommanded to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

方案Scenario 建议的设置Suggested settings
从包含物理分区的大型表进行完整加载。Full load from large table, with physical partitions. 分区选项:表的物理分区。Partition option: Physical partitions of table.

在执行期间,数据工厂将自动检测物理分区并按分区复制数据。During execution, Data Factory automatically detects the physical partitions, and copies data by partitions.
从不包含物理分区但包含用于数据分区的整数列的大型表进行完整加载。Full load from large table, without physical partitions, while with an integer column for data partitioning. 分区选项:动态范围分区。Partition options: Dynamic range partition.
分区列:指定用于对数据进行分区的列。Partition column: Specify the column used to partition data. 如果未指定,将使用主键列。If not specified, the primary key column is used.
使用自定义查询从包含物理分区的表加载大量数据。Load a large amount of data by using a custom query, with physical partitions. 分区选项:表的物理分区。Partition option: Physical partitions of table.
查询SELECT * FROM <TABLENAME> PARTITION("?AdfTabularPartitionName") WHERE <your_additional_where_clause>Query: SELECT * FROM <TABLENAME> PARTITION("?AdfTabularPartitionName") WHERE <your_additional_where_clause>.
分区名称:指定要从中复制数据的分区名称。Partition name: Specify the partition name(s) to copy data from. 如果未指定,数据工厂将自动检测在 Oracle 数据集中指定的表的物理分区。If not specified, Data Factory automatically detects the physical partitions on the table you specified in the Oracle dataset.

在执行期间,数据工厂会将 ?AdfTabularPartitionName 替换为实际分区名称并发送到 Oracle。During execution, Data Factory replaces ?AdfTabularPartitionName with the actual partition name, and sends to Oracle.
使用自定义查询从不包含物理分区但包含用于数据分区的整数列的表加载大量数据。Load a large amount of data by using a custom query, without physical partitions, while with an integer column for data partitioning. 分区选项:动态范围分区。Partition options: Dynamic range partition.
查询SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>Query: SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>.
分区列:指定用于对数据进行分区的列。Partition column: Specify the column used to partition data. 可以针对整数数据类型的列进行分区。You can partition against the column with integer data type.
分区上限分区下限:指定是否要对分区列进行筛选,以便仅检索介于下限和上限之间的数据。Partition upper bound and partition lower bound: Specify if you want to filter against partition column to retrieve data only between the lower and upper range.

在执行期间,数据工厂会将 ?AdfRangePartitionColumnName?AdfRangePartitionUpbound?AdfRangePartitionLowbound 替换为每个分区的实际列名和值范围,并将其发送到 Oracle。During execution, Data Factory replaces ?AdfRangePartitionColumnName, ?AdfRangePartitionUpbound, and ?AdfRangePartitionLowbound with the actual column name and value ranges for each partition, and sends to Oracle.
例如,如果为分区列“ID”设置了下限 1、上限 80,并将并行复制设置为 4,则数据工厂会按 4 个分区检索数据。For example, if your partition column "ID" is set with the lower bound as 1 and the upper bound as 80, with parallel copy set as 4, Data Factory retrieves data by 4 partitions. 其 ID 分别介于 [1, 20]、[21, 40]、[41, 60] 和 [61, 80] 之间。Their IDs are between [1,20], [21, 40], [41, 60], and [61, 80], respectively.

示例:使用物理分区进行查询Example: query with physical partition

"source": {
    "type": "OracleSource",
    "query": "SELECT * FROM <TABLENAME> PARTITION(\"?AdfTabularPartitionName\") WHERE <your_additional_where_clause>",
    "partitionOption": "PhysicalPartitionsOfTable",
    "partitionSettings": {
        "partitionNames": [
            "<partitionA_name>",
            "<partitionB_name>"
        ]
    }
}

示例:使用动态范围分区进行查询Example: query with dynamic range partition

"source": {
    "type": "OracleSource",
    "query": "SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>",
    "partitionOption": "DynamicRange",
    "partitionSettings": {
        "partitionColumnName": "<partition_column_name>",
        "partitionUpperBound": "<upper_value_of_partition_column>",
        "partitionLowerBound": "<lower_value_of_partition_column>"
    }
}

Oracle 的数据类型映射Data type mapping for Oracle

从/向 Oracle 复制数据时,将应用以下映射。When you copy data from and to Oracle, the following mappings apply. 若要了解复制活动如何将源架构和数据类型映射到接收器,请参阅架构和数据类型映射To learn about how the copy activity maps the source schema and data type to the sink, see Schema and data type mappings.

Oracle 数据类型Oracle data type 数据工厂临时数据类型Data Factory interim data type
BFILEBFILE Byte[]Byte[]
BLOBBLOB Byte[]Byte[]
(仅支持 Oracle 10g 和更高版本)(only supported on Oracle 10g and higher)
CHARCHAR StringString
CLOBCLOB StringString
DATEDATE DateTimeDateTime
FLOATFLOAT Decimal、String(如果精度 > 28)Decimal, String (if precision > 28)
INTEGERINTEGER Decimal、String(如果精度 > 28)Decimal, String (if precision > 28)
LONGLONG StringString
LONG RAWLONG RAW Byte[]Byte[]
NCHARNCHAR StringString
NCLOBNCLOB StringString
NUMBERNUMBER Decimal、String(如果精度 > 28)Decimal, String (if precision > 28)
NVARCHAR2NVARCHAR2 StringString
RAWRAW Byte[]Byte[]
ROWIDROWID StringString
TIMESTAMPTIMESTAMP DateTimeDateTime
TIMESTAMP WITH LOCAL TIME ZONETIMESTAMP WITH LOCAL TIME ZONE StringString
TIMESTAMP WITH TIME ZONETIMESTAMP WITH TIME ZONE StringString
UNSIGNED INTEGERUNSIGNED INTEGER NumberNumber
VARCHAR2VARCHAR2 StringString
XMLXML StringString

备注

不支持数据类型 INTERVAL YEAR TO MONTH 和 INTERVAL DAY TO SECOND。The data types INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND aren't supported.

查找活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

后续步骤Next steps

有关数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Data Factory, see Supported data stores.