使用 Azure 数据工厂从 SAP HANA 复制数据Copy data from SAP HANA using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics

本文概述了如何使用 Azure 数据工厂中的复制活动从 SAP HANA 数据库复制数据。This article outlines how to use the Copy Activity in Azure Data Factory to copy data from an SAP HANA database. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

提示

若要了解 ADF 对 SAP 数据集成方案的总体支持,请参阅使用 Azure 数据工厂进行 SAP 数据集成白皮书,其中包含各 SAP 连接器的详细介绍、比较和指导。To learn ADF's overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with detailed introduction on each SAP connector, comparsion and guidance.

支持的功能Supported capabilities

以下活动支持此 SAP HANA 连接器:This SAP HANA connector is supported for the following activities:

可以将数据从 SAP HANA 数据库复制到任何支持的接收器数据存储。You can copy data from SAP HANA database to any supported sink data store. 有关复制活动支持作为源/接收器的数据存储列表,请参阅支持的数据存储表。For a list of data stores supported as sources/sinks by the copy activity, see the Supported data stores table.

具体而言,此 SAP HANA 连接器支持:Specifically, this SAP HANA connector supports:

  • 从任何版本的 SAP HANA 数据库复制数据。Copying data from any version of SAP HANA database.
  • HANA 信息模型(如分析和计算视图)和行/列表中复制数据。Copying data from HANA information models (such as Analytic and Calculation views) and Row/Column tables.
  • 使用基本Windows 身份验证复制数据。Copying data using Basic or Windows authentication.
  • 从 SAP HANA 源进行并行复制。Parallel copying from a SAP HANA source. 有关详细信息,请参阅从 SAP HANA 进行并行复制部分。See the Parallel copy from SAP HANA section for details.

提示

要将数据复制到**** SAP HANA 数据存储,请使用泛型 ODBC 连接器。To copy data into SAP HANA data store, use generic ODBC connector. 有关详细信息,请参阅 SAP HANA 接收器部分。See SAP HANA sink section with details. 注意:适用于 SAP HANA 连接器和 ODBC 连接器的链接服务采用不同的类型,因此不能重用。Note the linked services for SAP HANA connector and ODBC connector are with different type thus cannot be reused.

先决条件Prerequisites

要使用此 SAP HANA 连接器,需要:To use this SAP HANA connector, you need to:

  • 设置自承载集成运行时。Set up a Self-hosted Integration Runtime. 有关详细信息,请参阅自承载集成运行时一文。See Self-hosted Integration Runtime article for details.
  • 在集成运行时计算机上安装 SAP HANA ODBC 驱动程序。Install the SAP HANA ODBC driver on the Integration Runtime machine. 可以从 SAP 软件下载中心下载 SAP HANA ODBC 驱动程序。You can download the SAP HANA ODBC driver from the SAP Software Download Center. 使用关键字 SAP HANA CLIENT for Windows 进行搜索。Search with the keyword SAP HANA CLIENT for Windows.

入门Getting started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 SAP HANA 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to SAP HANA connector.

链接服务属性Linked service properties

SAP HANA 链接的服务支持以下属性:The following properties are supported for SAP HANA linked service:

属性Property 说明Description 必须Required
typetype type 属性必须设置为:SapHanaThe type property must be set to: SapHana Yes
connectionStringconnectionString 指定使用基本身份验证Windows 身份验证连接到 SAP HANA 时所需的信息。Specify information that's needed to connect to the SAP HANA by using either basic authentication or Windows authentication. 请参阅以下示例。Refer to the following samples.
在连接字符串中,服务器/端口是必需的(默认端口为 30015)。在使用基本身份验证时,用户名和密码是必需的。In connection string, server/port is mandatory (default port is 30015), and username and password is mandatory when using basic authentication. 有关其他高级设置,请参阅 SAP HANA ODBC 连接属性For additional advanced settings, refer to SAP HANA ODBC Connection Properties
还可以将密码放在 Azure 密钥保管库中,并从连接字符串中拉取密码配置。You can also put password in Azure Key Vault and pull the password configuration out of the connection string. 有关更多详细信息,请参阅在 Azure Key Vault 中存储凭据一文。Refer to Store credentials in Azure Key Vault article with more details.
Yes
userNameuserName 使用 Windows 身份验证时,请指定用户名。Specify user name when using Windows authentication. 示例: user@domain.comExample: user@domain.com No
passwordpassword 指定用户帐户的密码。Specify password for the user account. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. No
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 先决条件中所述,需要自承载集成运行时。A Self-hosted Integration Runtime is required as mentioned in Prerequisites. Yes

示例:使用基本身份验证Example: use basic authentication

{
    "name": "SapHanaLinkedService",
    "properties": {
        "type": "SapHana",
        "typeProperties": {
            "connectionString": "SERVERNODE=<server>:<port (optional)>;UID=<userName>;PWD=<Password>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例:使用 Windows 身份验证Example: use Windows authentication

{
    "name": "SapHanaLinkedService",
    "properties": {
        "type": "SapHana",
        "typeProperties": {
            "connectionString": "SERVERNODE=<server>:<port (optional)>;",
            "userName": "<username>", 
            "password": { 
                "type": "SecureString", 
                "value": "<password>" 
            } 
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

如果使用的是具有以下有效负载的 SAP HANA 链接服务,它仍然按原样受支持,但建议使用新的版本。If you were using SAP HANA linked service with the following payload, it is still supported as-is, while you are suggested to use the new one going forward.

示例:Example:

{
    "name": "SapHanaLinkedService",
    "properties": {
        "type": "SapHana",
        "typeProperties": {
            "server": "<server>:<port (optional)>",
            "authenticationType": "Basic",
            "userName": "<username>",
            "password": {
                "type": "SecureString",
                "value": "<password>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the datasets article. 本部分提供 SAP HANA 数据集支持的属性列表。This section provides a list of properties supported by SAP HANA dataset.

支持使用以下属性从 SAP HANA 复制数据:To copy data from SAP HANA, the following properties are supported:

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为:SapHanaTableThe type property of the dataset must be set to: SapHanaTable Yes
架构schema SAP HANA 数据库中架构的名称。Name of the schema in the SAP HANA database. 否(如果指定了活动源中的“query”)No (if "query" in activity source is specified)
table SAP HANA 数据库中表的名称。Name of the table in the SAP HANA database. 否(如果指定了活动源中的“query”)No (if "query" in activity source is specified)

示例:Example:

{
    "name": "SAPHANADataset",
    "properties": {
        "type": "SapHanaTable",
        "typeProperties": {
            "schema": "<schema name>",
            "table": "<table name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<SAP HANA linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

如果使用 RelationalTable 类型数据集,该数据集仍按原样受支持,但我们建议今后使用新数据集。If you were using RelationalTable typed dataset, it is still supported as-is, while you are suggested to use the new one going forward.

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 SAP HANA 源支持的属性列表。This section provides a list of properties supported by SAP HANA source.

以 SAP HANA 作为源SAP HANA as source

提示

若要详细了解如何使用数据分区从 SAP HANA 有效引入数据,请参阅从 SAP HANA 进行并行复制部分。To ingest data from SAP HANA efficiently by using data partitioning, learn more from Parallel copy from SAP HANA section.

若要从 SAP HANA 复制数据,复制活动的 source 节支持以下属性:To copy data from SAP HANA, the following properties are supported in the copy activity source section:

属性Property 说明Description 必须Required
typetype 复制活动 source 的 type 属性必须设置为:SapHanaSourceThe type property of the copy activity source must be set to: SapHanaSource Yes
查询query 指定要从 SAP HANA 实例读取数据的 SQL 查询。Specifies the SQL query to read data from the SAP HANA instance. Yes
partitionOptionspartitionOptions 指定用于从 SAP HANA 引入数据的数据分区选项。Specifies the data partitioning options used to ingest data from SAP HANA. 从 SAP HANA 进行并行复制部分了解详细信息。Learn more from Parallel copy from SAP HANA section.
允许的值为:None(默认值)、PhysicalPartitionsOfTableSapHanaDynamicRangeAllow values are: None (default), PhysicalPartitionsOfTable, SapHanaDynamicRange. 从 SAP HANA 进行并行复制部分了解详细信息。Learn more from Parallel copy from SAP HANA section. PhysicalPartitionsOfTable 只能在从表而非查询中复制数据时使用。PhysicalPartitionsOfTable can only be used when copying data from a table but not query.
启用分区选项(即,该选项不为 None)时,用于从 SAP HANA 并行加载数据的并行度由复制活动上的 parallelCopies 设置控制。When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from SAP HANA is controlled by the parallelCopies setting on the copy activity.
FalseFalse
partitionSettingspartitionSettings 指定数据分区的设置组。Specify the group of the settings for data partitioning.
当分区选项是 SapHanaDynamicRange 时适用。Apply when partition option is SapHanaDynamicRange.
FalseFalse
partitionColumnNamepartitionColumnName 指定将由分区用于并行复制的源列的名称。Specify the name of the source column that will be used by partition for parallel copy. 如果未指定,系统会自动检测表的索引或主键并将其用作分区列。If not specified, the index or the primary key of the table is auto-detected and used as the partition column.
当分区选项是 SapHanaDynamicRange 时适用。Apply when the partition option is SapHanaDynamicRange. 如果使用查询来检索源数据,请在 WHERE 子句中挂接 ?AdfHanaDynamicRangePartitionConditionIf you use a query to retrieve the source data, hook ?AdfHanaDynamicRangePartitionCondition in WHERE clause. 请参阅从 SAP HANA 进行并行复制部分的示例。See example in Parallel copy from SAP HANA section.
在使用 SapHanaDynamicRange 分区时为“是”。Yes when using SapHanaDynamicRange partition.
packetSizepacketSize 指定网络数据包大小 (KB),以便将数据拆分成多个块。Specifies the network packet size (in Kilobytes) to split data to multiple blocks. 如果有大量的数据需要复制,则大多数情况下,提高数据包大小可以提高从 SAP HANA 读取数据的速度。If you have large amount of data to copy, increasing packet size can increase reading speed from SAP HANA in most cases. 调整数据包大小时,建议进行性能测试。Performance testing is recommended when adjusting the packet size. 否。No.
默认值为 2048 (2MB)。Default value is 2048 (2MB).

示例:Example:

"activities":[
    {
        "name": "CopyFromSAPHANA",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SAP HANA input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SapHanaSource",
                "query": "<SQL query for SAP HANA>"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

如果使用 RelationalSource 类型复制源,该源仍按原样受支持,但我们建议今后使用新源。If you were using RelationalSource typed copy source, it is still supported as-is, while you are suggested to use the new one going forward.

从 SAP HANA 进行并行复制Parallel copy from SAP HANA

数据工厂 SAP HANA 连接器提供内置的数据分区,用于从 SAP HANA 并行复制数据。The Data Factory SAP HANA connector provides built-in data partitioning to copy data from SAP HANA in parallel. 可以在复制活动的“源”表中找到数据分区选项。****You can find data partitioning options on the Source table of the copy activity.

分区选项的屏幕截图

启用分区复制时,数据工厂将对 SAP HANA 源运行并行查询,以按分区检索数据。When you enable partitioned copy, Data Factory runs parallel queries against your SAP HANA source to retrieve data by partitions. 可通过复制活动中的 parallelCopies 设置控制并行度。The parallel degree is controlled by the parallelCopies setting on the copy activity. 例如,如果将 parallelCopies 设置为 4,则数据工厂会根据指定的分区选项和设置并行生成并运行 4 个查询,每个查询从 SAP HANA 检索一部分数据。For example, if you set parallelCopies to four, Data Factory concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your SAP HANA.

建议同时启用并行复制和数据分区,尤其是从 SAP HANA 引入大量数据时。You are suggested to enable parallel copy with data partitioning especially when you ingest large amount of data from your SAP HANA. 下面是适用于不同方案的建议配置。The following are suggested configurations for different scenarios. 将数据复制到基于文件的数据存储中时,建议将数据作为多个文件写入文件夹(仅指定文件夹名称),在这种情况下,性能优于写入单个文件。When copying data into file-based data store, it's recommended to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

方案Scenario 建议的设置Suggested settings
从大型表进行完整加载。Full load from large table. 分区选项:表的物理分区。Partition option: Physical partitions of table.

在执行期间,数据工厂会自动检测指定 SAP HANA 表的物理分区类型,并选择相应的分区策略:During execution, Data Factory automatically detects the physical partition type of the specified SAP HANA table, and choose the corresponding partition strategy:
- 范围分区:获取为表定义的分区列和分区范围,然后按范围复制数据。- Range Partitioning: Get the partition column and partition ranges defined for the table, then copy the data by range.
- 哈希分区:使用哈希分区键作为分区列,然后基于 ADF 计算范围对数据进行分区和复制。- Hash Partitioning: Use hash partition key as partition column, then partition and copy the data based on ADF calculated ranges.
- 轮循机制分区没有分区:使用主键作为分区列,然后基于 ADF 计算范围对数据进行分区和复制。- Round-Robin Partitioning or No Partition: Use primary key as partition column, then partition and copy the data based on ADF calculated ranges.
使用自定义查询加载大量数据。Load large amount of data by using a custom query. 分区选项:动态范围分区。Partition option: Dynamic range partition.
查询SELECT * FROM <TABLENAME> WHERE ?AdfHanaDynamicRangePartitionCondition AND <your_additional_where_clause>Query: SELECT * FROM <TABLENAME> WHERE ?AdfHanaDynamicRangePartitionCondition AND <your_additional_where_clause>.
分区列:指定用于应用动态范围分区的列。Partition column: Specify the column used to apply dynamic range partition.

在执行期间,数据工厂首先计算指定分区列的值范围,方法是:根据非重复分区列值和 ADF 并行复制设置的数量将行均匀分布到多个 Bucket 中,接着将 ?AdfHanaDynamicRangePartitionCondition 替换为筛选每个分区的分区列值范围的操作,然后将其发送到 SAP HANA。During execution, Data Factory firstly calculates the value ranges of the specified partition column, by evenly distributes the rows in a number of buckets according to the number of distinct partition column values and ADF parallel copy setting, then replaces ?AdfHanaDynamicRangePartitionCondition with filtering the partition column value range for each partition, and sends to SAP HANA.

如果要使用多个列作为分区列,可以在查询中将每列的值连接为一个列,并将其指定为 ADF 中的分区列,例如 SELECT * FROM (SELECT *, CONCAT(<KeyColumn1>, <KeyColumn2>) AS PARTITIONCOLUMN FROM <TABLENAME>) WHERE ?AdfHanaDynamicRangePartitionConditionIf you want to use multiple columns as partition column, you can concatenate the values of each column as one column in the query and specify it as partition column in ADF, like SELECT * FROM (SELECT *, CONCAT(<KeyColumn1>, <KeyColumn2>) AS PARTITIONCOLUMN FROM <TABLENAME>) WHERE ?AdfHanaDynamicRangePartitionCondition.

示例:使用表的物理分区进行查询Example: query with physical partitions of a table

"source": {
    "type": "SapHanaSource",
    "partitionOption": "PhysicalPartitionsOfTable"
}

示例:使用动态范围分区进行查询Example: query with dynamic range partition

"source": {
    "type": "SapHanaSource",
    "query": "SELECT * FROM <TABLENAME> WHERE ?AdfHanaDynamicRangePartitionCondition AND <your_additional_where_clause>",
    "partitionOption": "SapHanaDynamicRange",
    "partitionSettings": {
        "partitionColumnName": "<Partition_column_name>"
    }
}

SAP HANA 的数据类型映射Data type mapping for SAP HANA

从 SAP HANA 复制数据时,以下映射用于从 SAP HANA 数据类型映射到 Azure 数据工厂临时数据类型。When copying data from SAP HANA, the following mappings are used from SAP HANA data types to Azure Data Factory interim data types. 若要了解复制活动如何将源架构和数据类型映射到接收器,请参阅架构和数据类型映射See Schema and data type mappings to learn about how copy activity maps the source schema and data type to the sink.

SAP HANA 数据类型SAP HANA data type 数据工厂临时数据类型Data factory interim data type
ALPHANUMALPHANUM StringString
BIGINTBIGINT Int64Int64
BINARYBINARY Byte[]Byte[]
BINTEXTBINTEXT StringString
BLOBBLOB Byte[]Byte[]
BOOLBOOL ByteByte
CLOBCLOB StringString
DATEDATE DateTimeDateTime
DECIMALDECIMAL 小数Decimal
DOUBLEDOUBLE DoubleDouble
FLOATFLOAT DoubleDouble
INTEGERINTEGER Int32Int32
NCLOBNCLOB StringString
NVARCHARNVARCHAR StringString
realREAL SingleSingle
SECONDDATESECONDDATE DateTimeDateTime
SHORTTEXTSHORTTEXT StringString
SMALLDECIMALSMALLDECIMAL 小数Decimal
SMALLINTSMALLINT Int16Int16
STGEOMETRYTYPESTGEOMETRYTYPE Byte[]Byte[]
STPOINTTYPESTPOINTTYPE Byte[]Byte[]
TEXTTEXT StringString
TIMETIME TimeSpanTimeSpan
TINYINTTINYINT ByteByte
VARCHARVARCHAR StringString
TIMESTAMPTIMESTAMP DateTimeDateTime
VARBINARYVARBINARY Byte[]Byte[]

SAP HANA 接收器SAP HANA sink

目前,不支持将 SAP HANA 连接器用作接收器,你可以将通用 ODBC 连接器与 SAP HANA 驱动程序结合使用,以将数据写入 SAP HANA。Currently, the SAP HANA connector is not supported as sink, while you can use generic ODBC connector with SAP HANA driver to write data into SAP HANA.

按照先决条件来设置自承载集成运行时,并首先安装 SAP HANA ODBC 驱动程序。Follow the Prerequisites to set up Self-hosted Integration Runtime and install SAP HANA ODBC driver first. 如以下示例所示,创建 ODBC 链接服务以连接到 SAP HANA 数据存储,然后相应地使用 ODBC 类型创建数据集和复制活动接收器。Create an ODBC linked service to connect to your SAP HANA data store as shown in the following example, then create dataset and copy activity sink with ODBC type accordingly. 若要了解详细信息,请参阅 ODBC 连接器一文。Learn more from ODBC connector article.

{
    "name": "SAPHANAViaODBCLinkedService",
    "properties": {
        "type": "Odbc",
        "typeProperties": {
            "connectionString": "Driver={HDBODBC};servernode=<HANA server>.clouddatahub-int.net:30015",
            "authenticationType": "Basic",
            "userName": "<username>",
            "password": {
                "type": "SecureString",
                "value": "<password>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

查找活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.