使用 Azure 数据工厂通过 Open Hub 从 SAP Business Warehouse 复制数据Copy data from SAP Business Warehouse via Open Hub using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述了如何使用 Azure 数据工厂中的复制活动,通过 Open Hub 从 SAP Business Warehouse (BW) 复制数据。This article outlines how to use the Copy Activity in Azure Data Factory to copy data from an SAP Business Warehouse (BW) via Open Hub. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

提示

若要了解 ADF 对 SAP 数据集成方案的总体支持,请参阅使用 Azure 数据工厂进行 SAP 数据集成白皮书,其中包含详细介绍、比较和指导。To learn ADF's overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with detailed introduction, comparsion and guidance.

支持的功能Supported capabilities

以下活动支持此 SAP Business Warehouse via Open Hub 连接器:This SAP Business Warehouse via Open Hub connector is supported for the following activities:

可以通过 Open Hub 将数据从 SAP Business Warehouse 复制到任何受支持的接收器数据存储。You can copy data from SAP Business Warehouse via Open Hub to any supported sink data store. 有关复制活动支持作为源/接收器的数据存储列表,请参阅支持的数据存储表。For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.

具体而言,此 SAP Business Warehouse Open Hub 连接器支持:Specifically, this SAP Business Warehouse Open Hub connector supports:

  • SAP Business Warehouse 7.01 或更高版本(位于最新的 SAP 支持包堆栈中,该堆栈是 2015 年以后发布的)SAP Business Warehouse version 7.01 or higher (in a recent SAP Support Package Stack released after the year 2015).
  • 通过 Open Hub Destination 本地表复制数据,该表下方可能是 DSO、InfoCube、MultiProvider、DataSource 等。Copying data via Open Hub Destination local table, which underneath can be DSO, InfoCube, MultiProvider, DataSource, etc.
  • 使用基本身份验证复制数据。Copying data using basic authentication.
  • 连接到应用程序服务器。Connecting to Application Server.

SAP BW Open Hub 集成SAP BW Open Hub Integration

可以通过 SAP BW Open Hub Service 从 SAP BW 高效地提取数据。SAP BW Open Hub Service is an efficient way to extract data from SAP BW. 下图显示客户在其 SAP 系统中的典型流之一,数据流动方向为:SAP ECC -> PSA -> DSO -> 多维数据集。The following diagram shows one of the typical flows customers have in their SAP system, in which case data flows from SAP ECC -> PSA -> DSO -> Cube.

SAP BW Open Hub Destination (OHD) 定义 SAP 数据的中继目标。SAP BW Open Hub Destination (OHD) defines the target to which the SAP data is relayed. 可以使用 SAP 数据传输过程 (DTP) 支持的任何对象(例如,DSO、InfoCube、DataSource 等)作为 Open Hub 数据源。Open Hub Destination 类型是存储中继数据的地方,可以是数据库表(本地或远程)和平面文件。Any objects supported by SAP Data Transfer Process (DTP) can be used as open hub data sources, for example, DSO, InfoCube, DataSource, etc. Open Hub Destination type - where the relayed data is stored - can be database tables (local or remote) and flat files. 此 SAP BW Open Hub 连接器支持从 BW 中的 OHD 本地表复制数据。This SAP BW Open Hub connector support copying data from OHD local table in BW. 如果使用其他类型,则可使用其他连接器直接连接到数据库或文件系统。In case you are using other types, you can directly connect to the database or file system using other connectors.

SAP BW Open Hub

增量提取流Delta extraction flow

ADF SAP BW Open Hub 连接器提供两种可选属性:excludeLastRequestbaseRequestId。这两种属性可以用于处理来自 Open Hub 的增量负荷。ADF SAP BW Open Hub Connector offers two optional properties: excludeLastRequest and baseRequestId, which can be used to handle delta load from Open Hub.

  • excludeLastRequestId:是否排除最后一个请求的记录。excludeLastRequestId: Whether to exclude the records of the last request. 默认值为 true。Default value is true.
  • baseRequestId:增量加载的请求的 ID。baseRequestId: The ID of request for delta loading. 设置以后,只会检索 requestId 大于此属性的值的数据。Once it is set, only data with requestId larger than the value of this property will be retrieved.

总之,从 SAP InfoProviders 提取到 Azure 数据工厂 (ADF) 的操作包含两个步骤:Overall, the extraction from SAP InfoProviders to Azure Data Factory (ADF) consists of two steps:

  1. SAP BW 数据传输过程 (DTP) :此步骤将数据从 SAP BW InfoProvider 复制到 SAP BW Open Hub 表SAP BW Data Transfer Process (DTP) This step copies the data from an SAP BW InfoProvider to an SAP BW Open Hub table

  2. ADF 数据复制:在此步骤中,Open Hub 表由 ADF 连接器读取ADF data copy In this step, the Open Hub table is read by the ADF Connector

增量提取流

在第一步中,执行 DTP。In the first step, a DTP is executed. 每执行一次都会创建新的 SAP 请求 ID。Each execution creates a new SAP request ID. 此请求 ID 存储在 Open Hub 表中,然后 ADF 连接器用它来标识增量。The request ID is stored in the Open Hub table and is then used by the ADF connector to identify the delta. 这两个步骤以异步方式运行:DTP 由 SAP 触发,ADF 数据复制通过 ADF 触发。The two steps run asynchronously: the DTP is triggered by SAP, and the ADF data copy is triggered through ADF.

默认情况下,ADF 不从 Open Hub 表读取最新增量(选项“排除上一请求”为 true)。By default, ADF is not reading the latest delta from the Open Hub table (option "exclude last request" is true). 因此,ADF 中的数据并非与 Open Hub 表中的数据 100% 一致(最后的增量缺失)。Hereby, the data in ADF is not 100% up to date with the data in the Open Hub table (the last delta is missing). 反过来,此过程可确保不会因异步提取而导致行丢失。In return, this procedure ensures that no rows get lost caused by the asynchronous extraction. 即使在 ADF 读取 Open Hub 表(此时 DTP 仍在写入同一表)时,它也适用。It works fine even when ADF is reading the Open Hub table while the DTP is still writing into the same table.

通常将在最后一次运行时由 ADF 进行了最多复制的请求 ID 存储在暂存数据存储(例如上图中的 Azure Blob)中。You typically store the max copied request ID in the last run by ADF in a staging data store (such as Azure Blob in above diagram). 因此,同一请求不会在后续运行中由 ADF 再次读取。Therefore, the same request is not read a second time by ADF in the subsequent run. 同时请注意,数据不会自动从 Open Hub 表中删除。Meanwhile, note the data is not automatically deleted from the Open Hub table.

为了进行正常的增量处理,不允许将来自不同 DTP 的请求 ID 置于同一 Open Hub 表中。For proper delta handling, it is not allowed to have request IDs from different DTPs in the same Open Hub table. 因此,不得为单个 Open Hub 目标 (OHD) 创建多个 DTP。Therefore, you must not create more than one DTP for each Open Hub Destination (OHD). 需要从同一 InfoProvider 进行完全提取和增量提取时,应该为同一 InfoProvider 创建两个 OHD。When needing Full and Delta extraction from the same InfoProvider, you should create two OHDs for the same InfoProvider.

先决条件Prerequisites

若要使用此 SAP Business Warehouse Open Hub 连接器,需要:To use this SAP Business Warehouse Open Hub connector, you need to:

  • 设置 3.13 或更高版本的自承载集成运行时。Set up a Self-hosted Integration Runtime with version 3.13 or above. 有关详细信息,请参阅自承载集成运行时一文。See Self-hosted Integration Runtime article for details.

  • 从 SAP 的网站下载 64 位 SAP .NET Connector 3.0 ,将其安装在自承载 IR 计算机上。Download the 64-bit SAP .NET Connector 3.0 from SAP's website, and install it on the Self-hosted IR machine. 安装时,请在可选的安装步骤窗口中确保选择“将程序集安装到 GAC”选项,如下图所示。When installing, in the optional setup steps window, make sure you select the Install Assemblies to GAC option as shown in the following image.

    安装 SAP .NET Connector

  • 在数据工厂 BW 连接器中使用的 SAP 用户需要有以下权限:SAP user being used in the Data Factory BW connector needs to have following permissions:

    • RFC 和 SAP BW 的授权。Authorization for RFC and SAP BW.
    • “执行”授权对象“S_SDSAUTH”的活动的权限。Permissions to the “Execute” Activity of Authorization Object “S_SDSAUTH”.
  • 将 SAP Open Hub Destination 类型创建为“数据库表”(勾选“技术密钥”选项)。Create SAP Open Hub Destination type as Database Table with "Technical Key" option checked. 另外还建议取消选中“从表中删除数据”,虽然这不是必需的操作。It is also recommended to leave the Deleting Data from Table as unchecked although it is not required. 利用 DTP(直接执行或集成到现有进程链中)将数据从所选源对象(例如多维数据集)移到 Open Hub Destination 表。Leverage the DTP (directly execute or integrate into existing process chain) to land data from source object (such as cube) you have chosen to the open hub destination table.

入门Getting started

提示

若要详细了解如何使用 SAP BW Open Hub 连接器,请参阅使用 Azure 数据工厂从 SAP Business Warehouse (BW) 加载数据For a walkthrough of using SAP BW Open Hub connector, see Load data from SAP Business Warehouse (BW) by using Azure Data Factory.

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 SAP Business Warehouse Open Hub 连接器的数据工厂实体,以下部分提供了有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to SAP Business Warehouse Open Hub connector.

链接服务属性Linked service properties

SAP Business Warehouse Open Hub 链接服务支持以下属性:The following properties are supported for SAP Business Warehouse Open Hub linked service:

属性Property 说明Description 必需Required
typetype type 属性必须设置为:SapOpenHubThe type property must be set to: SapOpenHub Yes
serverserver SAP BW 实例所驻留的服务器的名称。Name of the server on which the SAP BW instance resides. Yes
systemNumbersystemNumber SAP BW 系统的系统编号。System number of the SAP BW system.
允许值:用字符串表示的两位十进制数。Allowed value: two-digit decimal number represented as a string.
Yes
messageServermessageServer SAP 消息服务器的主机名。The host name of the SAP message server.
用于连接到 SAP 消息服务器。Use to connect to an SAP message server.
No
messageServerServicemessageServerService 消息服务器的服务名称或端口号。The service name or port number of the message server.
用于连接到 SAP 消息服务器。Use to connect to an SAP message server.
No
systemIdsystemId 表所在的 SAP 系统的 ID。The ID of the SAP system where the table is located.
用于连接到 SAP 消息服务器。Use to connect to an SAP message server.
No
logonGrouplogonGroup SAP 系统的登录组。The logon group for the SAP system.
用于连接到 SAP 消息服务器。Use to connect to an SAP message server.
No
clientIdclientId SAP W 系统中的客户端的客户端 ID。Client ID of the client in the SAP W system.
允许值:用字符串表示的三位十进制数。Allowed value: three-digit decimal number represented as a string.
Yes
语言language SAP 系统使用的语言。Language that the SAP system uses. 否(默认值为 ENNo (default value is EN)
userNameuserName 有权访问 SAP 服务器的用户名。Name of the user who has access to the SAP server. Yes
passwordpassword 用户密码。Password for the user. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 先决条件中所述,需要自承载集成运行时。A Self-hosted Integration Runtime is required as mentioned in Prerequisites. Yes

示例:Example:

{
    "name": "SapBwOpenHubLinkedService",
    "properties": {
        "type": "SapOpenHub",
        "typeProperties": {
            "server": "<server name>",
            "systemNumber": "<system number>",
            "clientId": "<client id>",
            "userName": "<SAP user>",
            "password": {
                "type": "SecureString",
                "value": "<Password for SAP user>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article. 本部分提供 SAP BW Open Hub 数据集支持的属性列表。This section provides a list of properties supported by the SAP BW Open Hub dataset.

若要从/向 SAP BW Open Hub 复制数据,请将数据集的 type 属性设置为 SapOpenHubTableTo copy data from and to SAP BW Open Hub, set the type property of the dataset to SapOpenHubTable. 支持以下属性。The following properties are supported.

属性Property 说明Description 必需Required
typetype type 属性必须设置为 SapOpenHubTableThe type property must be set to SapOpenHubTable. Yes
openHubDestinationNameopenHubDestinationName 要从其复制数据的 Open Hub Destination 的名称。The name of the Open Hub Destination to copy data from. Yes

如果在数据集中设置了 excludeLastRequestbaseRequestId,则仍按原样支持该数据集,但建议你以后在活动源中使用新模型。If you were setting excludeLastRequest and baseRequestId in dataset, it is still supported as-is, while you are suggested to use the new model in activity source going forward.

示例:Example:

{
    "name": "SAPBWOpenHubDataset",
    "properties": {
        "type": "SapOpenHubTable",
        "typeProperties": {
            "openHubDestinationName": "<open hub destination name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<SAP BW Open Hub linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 SAP BW Open Hub 源支持的属性列表。This section provides a list of properties supported by SAP BW Open Hub source.

将 SAP BW Open Hub 作为源SAP BW Open Hub as source

若要从 SAP BW Open Hub 复制数据,复制活动的 source 节支持以下属性:To copy data from SAP BW Open Hub, the following properties are supported in the copy activity source section:

属性Property 说明Description 必需Required
typetype 复制活动源的 type 属性必须设置为 SapOpenHubSourceThe type property of the copy activity source must be set to SapOpenHubSource. Yes
excludeLastRequestexcludeLastRequest 是否排除最后一个请求的记录。Whether to exclude the records of the last request. 否(默认为 trueNo (default is true)
baseRequestIdbaseRequestId 增量加载的请求的 ID。The ID of request for delta loading. 设置以后,只会检索 requestId 大于此属性的值的数据。Once it is set, only data with requestId larger than the value of this property will be retrieved. No

提示

如果 Open Hub 表只包含通过单个请求 ID 生成的数据(例如,始终进行完全加载并覆盖表中的现有数据,或者只在测试时运行 DTP 一次),则请记住取消选中“excludeLastRequest”选项,以便复制数据。If your Open Hub table only contains the data generated by single request ID, for example, you always do full load and overwrite the existing data in the table, or you only run the DTP once for test, remember to uncheck the "excludeLastRequest" option in order to copy the data out.

若要加快数据加载速度,可以在复制活动上设置 parallelCopies,以并行方式从 SAP BW Open Hub 加载数据。To speed up the data loading, you can set parallelCopies on the copy activity to load data from SAP BW Open Hub in parallel. 例如,如果将 parallelCopies 设置为 4,则数据工厂会并发执行 4 个 RFC 调用,每个 RFC 调用都会从按 DTP 请求 ID 和包 ID 分区的 SAP BW Open Hub 表检索一部分数据。For example, if you set parallelCopies to four, Data Factory concurrently executes four RFC calls, and each RFC call retrieves a portion of data from your SAP BW Open Hub table partitioned by the DTP request ID and package ID. 这适用于唯一 DTP 请求 ID + 包 ID 的数目大于 parallelCopies 值的情况。This applies when the number of unique DTP request IDs + package ID is bigger than the value of parallelCopies. 将数据复制到基于文件的数据存储中时,还建议将数据作为多个文件写入文件夹(仅指定文件夹名称),在这种情况下,性能优于写入单个文件。When copying data into file-based data store, it's also recommanded to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

示例:Example:

"activities":[
    {
        "name": "CopyFromSAPBWOpenHub",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SAP BW Open Hub input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SapOpenHubSource",
                "excludeLastRequest": true
            },
            "sink": {
                "type": "<sink type>"
            },
            "parallelCopies": 4
        }
    }
]

SAP BW Open Hub 的数据类型映射Data type mapping for SAP BW Open Hub

从 SAP BW Open Hub 复制数据时,以下映射用于从 SAP BW 数据类型映射到 Azure 数据工厂临时数据类型。When copying data from SAP BW Open Hub, the following mappings are used from SAP BW data types to Azure Data Factory interim data types. 若要了解复制活动如何将源架构和数据类型映射到接收器,请参阅架构和数据类型映射See Schema and data type mappings to learn about how copy activity maps the source schema and data type to the sink.

SAP ABAP 类型SAP ABAP Type 数据工厂临时数据类型Data factory interim data type
C (String)C (String) StringString
I (integer)I (integer) Int32Int32
F (Float)F (Float) DoubleDouble
D (Date)D (Date) StringString
T (Time)T (Time) StringString
P(BCD 打包,货币,小数,Qty)P (BCD Packed, Currency, Decimal, Qty) DecimalDecimal
N (Numc)N (Numc) StringString
X(二进制,原始)X (Binary and Raw) StringString

查找活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

故障排除提示Troubleshooting tips

症状:如果你在 HANA 上运行 SAP BW 并观察到仅使用 ADF 复制活动复制了部分数据(1000000 行),则可能是因为在 DTP 中启用了“SAP HANA 执行”选项,在这种情况下,ADF 只能检索第一批数据。Symptoms: If you are running SAP BW on HANA and observe only subset of data is copied over using ADF copy activity (1 million rows), the possible cause is that you enable "SAP HANA Execution" option in your DTP, in which case ADF can only retrieve the first batch of data.

解决方法: 请在 DTP 中禁用“SAP HANA 执行”选项,重新处理数据,然后再次尝试执行复制活动。Resolution: Disable "SAP HANA Execution" option in DTP, reprocess the data, then try executing the copy activity again.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.