使用 Azure 数据工厂从 SAP Business Warehouse 复制数据Copy data from SAP Business Warehouse by using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文介绍如何使用 Azure 数据工厂通过 Open Hub 将 SAP Business Warehouse (BW) 中的数据复制到 Azure Data Lake Storage Gen2。This article shows how to use Azure Data Factory to copy data from SAP Business Warehouse (BW) via Open Hub to Azure Data Lake Storage Gen2. 可以使用类似的过程将数据复制到其他受支持的接收器数据存储You can use a similar process to copy data to other supported sink data stores.

Tip

有关如何从 SAP BW 复制数据的一般信息,包括 SAP BW Open Hub 集成和增量提取流程,请参阅使用 Azure 数据工厂通过 Open Hub 复制 SAP Business Warehouse 中的数据For general information about copying data from SAP BW, including SAP BW Open Hub integration and delta extraction flow, see Copy data from SAP Business Warehouse via Open Hub by using Azure Data Factory.

先决条件Prerequisites

  • Azure 数据工厂:如果你没有数据工厂,请遵循相应的步骤创建数据工厂Azure Data Factory: If you don't have one, follow the steps to create a data factory.

  • 目标类型为“数据库表”的 SAP BW Open Hub 目标 (OHD) :若要创建 OHD 或检查你的 OHD 是否已针对数据工厂集成正确进行配置,请参阅本文的 SAP BW Open Hub 目标配置部分。SAP BW Open Hub Destination (OHD) with destination type "Database Table": To create an OHD or to check that your OHD is configured correctly for Data Factory integration, see the SAP BW Open Hub Destination configurations section of this article.

  • SAP BW 用户需要以下权限The SAP BW user needs the following permissions:

    • 远程函数调用 (RFC) 和 SAP BW 的授权。Authorization for Remote Function Calls (RFC) and SAP BW.
    • S_SDSAUTH 授权对象拥有“执行”活动权限。Permissions to the “Execute” activity of the S_SDSAUTH authorization object.
  • 一个自承载集成运行时 (IR) 和 SAP .NET 连接器 3.0A self-hosted integration runtime (IR) with SAP .NET connector 3.0. 遵循以下安装步骤:Follow these setup steps:

    1. 安装并注册自承载集成运行时 3.13 或更高版本。Install and register the self-hosted integration runtime, version 3.13 or later. (本文稍后会介绍此过程。)(This process is described later in this article.)

    2. 从 SAP 网站下载 64 位 SAP Connector for Microsoft .NET 3.0,并将其安装到自承载 IR 所在的同一台计算机上。Download the 64-bit SAP Connector for Microsoft .NET 3.0 from SAP's website, and install it on the same computer as the self-hosted IR. 在安装期间,请务必在“可选安装步骤”对话框中选择“将程序集安装到 GAC”,如下图所示: During installation, make sure that you select Install Assemblies to GAC in the Optional setup steps dialog box, as the following image shows:

      “安装 SAP. NET 连接器”对话框

从 SAP BW Open Hub 执行完整复制Do a full copy from SAP BW Open Hub

在 Azure 门户中转到你的数据工厂。In the Azure portal, go to your data factory. 选择“创作和监视”,在单独的选项卡中打开数据工厂 UI。 Select Author & Monitor to open the Data Factory UI in a separate tab.

  1. 在“开始”页上选择“复制数据”,打开“复制数据”工具。 On the Let's get started page, select Copy Data to open the Copy Data tool.

  2. 在“属性”页上指定一个任务名称,然后选择“下一步”。 On the Properties page, specify a Task name, and then select Next.

  3. 在“源数据存储”页上,单击“+创建新连接”。 On the Source data store page, select +Create new connection. 从连接器库中选择“SAP BW Open Hub”,然后选择“继续”。 Select SAP BW Open Hub from the connector gallery, and then select Continue. 若要筛选连接器,可在搜索框中键入 SAPTo filter the connectors, you can type SAP in the search box.

  4. 在“指定 SAP BW Open Hub 连接”页上,执行以下步骤创建新连接。 On the Specify SAP BW Open Hub connection page, follow these steps to create a new connection.

    创建 SAP BW Open Hub 链接服务页

    1. 在“通过集成运行时进行连接”列表中,选择一个现有的自承载 IR。 From the Connect via integration runtime list, select an existing self-hosted IR. 如果你没有自承载 IR,请创建一个。Or, choose to create one if you don't have one yet.

      若要创建新的自承载 IR,请依次选择“+新建”、“自承载”。 To create a new self-hosted IR, select +New, and then select Self-hosted. 输入名称,然后选择“下一步”。 Enter a Name, and then select Next. 选择“快速安装”以在当前计算机上安装,或遵循提供的“手动设置”步骤。 Select Express setup to install on the current computer, or follow the Manual setup steps that are provided.

      先决条件中所述,请确保在运行自承载 IR 的同一台计算机上安装 SAP Connector for Microsoft .NET 3.0。As mentioned in Prerequisites, make sure that you have SAP Connector for Microsoft .NET 3.0 installed on the same computer where the self-hosted IR is running.

    2. 填写 SAP BW 的“服务器名称”、“系统编号”、“客户端 ID”、“语言”(如果不是“EN”)、“用户名”和“密码”。 Fill in the SAP BW Server name, System number, Client ID, Language (if other than EN), User name, and Password.

    3. 选择“测试连接”以验证设置,然后选择“完成”。 Select Test connection to validate the settings, and then select Finish.

    4. 现已创建新的连接。A new connection is created. 选择“下一步”。Select Next.

  5. 在“选择 Open Hub 目标”页上,浏览 SAP BW 中可用的 Open Hub 目标。 On the Select Open Hub Destinations page, browse the Open Hub Destinations that are available in your SAP BW. 选择要从中复制数据的 OHD,然后选择“下一步”。 Select the OHD to copy data from, and then select Next.

    选择 SAP BW Open Hub 目标表

  6. 如果需要,请指定筛选器。Specify a filter, if you need one. 如果你的 OHD 仅包含采用单个请求 ID 的单个数据传输进程 (DTP) 执行中的数据,或者你确定 DTP 已完成并想要复制数据,请清除“排除最后一个请求”复选框。 If your OHD only contains data from a single data-transfer process (DTP) execution with a single request ID, or you're sure that your DTP is finished and you want to copy the data, clear the Exclude Last Request check box.

    请在本文的 SAP BW Open Hub 目标配置部分详细了解这些设置。Learn more about these settings in the SAP BW Open Hub Destination configurations section of this article. 选择“验证”以仔细检查要返回的数据。 Select Validate to double-check what data will be returned. 然后,选择“下一步” 。Then select Next.

    配置 SAP BW Open Hub 筛选器

  7. 在“目标数据存储”页上,选择“+创建新连接” > “Azure Data Lake Storage Gen2” > “继续”。 On the Destination data store page, select +Create new connection > Azure Data Lake Storage Gen2 > Continue.

  8. 在“指定 Azure Data Lake Storage 连接”页上,遵循这些步骤来创建连接。 On the Specify Azure Data Lake Storage connection page, follow these steps to create a connection.

    创建 ADLS Gen2 链接服务页

    1. 在“名称”下拉列表中,选择支持 Data Lake Storage Gen2 的帐户。 Select your Data Lake Storage Gen2-capable account from the Name drop-down list.
    2. 选择“完成” 以创建连接。Select Finish to create the connection. 然后,选择“下一步” 。Then select Next.
  9. 在“选择输出文件或文件夹”页上,输入 copyfromopenhub 作为输出文件夹名称。 On the Choose the output file or folder page, enter copyfromopenhub as the output folder name. 然后,选择“下一步” 。Then select Next.

    选择输出文件夹页

  10. 在“文件格式设置”页上,选择“下一步”以使用默认设置。 On the File format setting page, select Next to use the default settings.

    指定接收器格式页

  11. 在“设置”页上,展开“性能设置”。 On the Settings page, expand Performance settings. 为“复制并行度”输入一个值(例如 5),表示从 SAP BW 加载数据的并行度。 Enter a value for Degree of copy parallelism such as 5 to load from SAP BW in parallel. 然后,选择“下一步” 。Then select Next.

    配置复制设置

  12. 在“摘要”页上复查设置。 On the Summary page, review the settings. 然后,选择“下一步” 。Then select Next.

  13. 在“部署”页上,选择“监视”以监视管道。 On the Deployment page, select Monitor to monitor the pipeline.

    “部署”页

  14. 请注意,页面左侧的“监视”选项卡已自动选择。 Notice that the Monitor tab on the left side of the page is automatically selected. “操作”列中包含用于查看活动运行详细信息以及用于重新运行管道的链接。 The Actions column includes links to view activity-run details and to rerun the pipeline.

    管道监视视图

  15. 若要查看与管道运行关联的活动运行,请选择“操作”列中的“查看活动运行”。 To view activity runs that are associated with the pipeline run, select View Activity Runs in the Actions column. 该管道只包含一个活动(复制活动),因此只显示了一个条目。There's only one activity (copy activity) in the pipeline, so you see only one entry. 若要切换回到管道运行视图,请选择顶部的“管道”链接。 To switch back to the pipeline-runs view, select the Pipelines link at the top. 选择“刷新”可刷新列表。 Select Refresh to refresh the list.

    活动监视屏幕

  16. 若要监视每个复制活动的执行详细信息,请在活动监视视图中选择“操作”下面的“详细信息”链接(眼镜图标) 。To monitor the execution details for each copy activity, select the Details link, which is an eyeglasses icon below Actions in the activity-monitoring view. 提供的详细信息包括从源复制到接收器的数据量、数据吞吐量、执行步骤和持续时间,以及使用的配置。Available details include the data volume copied from the source to the sink, data throughput, execution steps and duration, and configurations used.

    活动监视详细信息

  17. 若要查看“最大请求 ID”,请返回到活动监视视图,并选择“操作”下的“输出”。 To view the maximum Request ID, go back to the activity-monitoring view and select Output under Actions.

    活动输出屏幕

    活动输出详细信息视图

从 SAP BW Open Hub 执行增量复制Incremental copy from SAP BW Open Hub

Tip

请参阅 SAP BW Open Hub 连接器增量提取流程,了解数据工厂中的 SAP BW Open Hub 连接器如何从 SAP BW 复制增量数据。See SAP BW Open Hub connector delta extraction flow to learn how the SAP BW Open Hub connector in Data Factory copies incremental data from SAP BW. 此文还可帮助你了解基本的连接器配置。This article can also help you understand basic connector configuration.

现在,让我们继续配置从 SAP BW Open Hub 执行的增量复制。Now, let's continue to configure incremental copy from SAP BW Open Hub.

增量复制使用基于请求 ID 的“高水印”机制。Incremental copy uses a "high-watermark" mechanism that's based on the request ID. 该 ID 是由 DTP 在 SAP BW Open Hub 目标中自动生成的。That ID is automatically generated in SAP BW Open Hub Destination by the DTP. 下图演示了此工作流:The following diagram shows this workflow:

增量复制工作流程图

在数据工厂的“开始”页上,选择“从模板创建管道”以使用内置模板。 On the data factory Let's get started page, select Create pipeline from template to use the built-in template.

  1. 搜索 SAP BW,找到并选择“从 SAP BW 增量复制到 Azure Data Lake Storage Gen2”模板。 Search for SAP BW to find and select the Incremental copy from SAP BW to Azure Data Lake Storage Gen2 template. 此模板将数据复制到 Azure Data Lake Storage Gen2。This template copies data into Azure Data Lake Storage Gen2. 可以使用类似的工作流将数据复制到其他接收器类型。You can use a similar workflow to copy to other sink types.

  2. 在模板的主页上,选择或创建以下三个连接,然后选择窗口右下角的“使用此模板”。 On the template's main page, select or create the following three connections, and then select Use this template in the lower-right corner of the window.

    • Azure Blob 存储:在本演练中,我们将使用 Azure Blob 存储来存储高水印,即最大复制请求 ID。 Azure Blob storage: In this walkthrough, we use Azure Blob storage to store the high watermark, which is the max copied request ID.
    • SAP BW Open Hub:这是要从中复制数据的源。SAP BW Open Hub: This is the source to copy data from. 有关详细配置,请参考前面的完整复制演练。Refer to the previous full-copy walkthrough for detailed configuration.
    • Azure Data Lake Storage Gen2:这是要将数据复制到的接收器。Azure Data Lake Storage Gen2: This is the sink to copy data to. 有关详细配置,请参考前面的完整复制演练。Refer to the previous full-copy walkthrough for detailed configuration.

    从 SAP BW 模板执行增量复制

  3. 此模板生成包含以下三个活动的管道,并在成功时将这些管道链接到一起:“查找”、“复制数据”和“Web”。 This template generates a pipeline with the following three activities and makes them chained on-success: Lookup, Copy Data, and Web.

    转到管道的“参数”选项卡。 你将看到需要提供的所有配置。Go to the pipeline Parameters tab. You see all the configurations that you need to provide.

    从 SAP BW 配置执行增量复制

    • SAPOpenHubDestinationName:指定要从中复制数据的 Open Hub 表名称。SAPOpenHubDestinationName: Specify the Open Hub table name to copy data from.

    • Data_Destination_Container:指定要将数据复制到的目标 Azure Data Lake Storage Gen2 容器。Data_Destination_Container: Specify the destination Azure Data Lake Storage Gen2 container to copy data to. 如果该容器不存在,则数据工厂的复制活动将在执行期间创建一个容器。If the container doesn't exist, the Data Factory copy activity creates one during execution.

    • Data_Destination_Directory:指定要将数据复制到的 Azure Data Lake Storage Gen2 容器下的文件夹路径。Data_Destination_Directory: Specify the folder path under the Azure Data Lake Storage Gen2 container to copy data to. 如果该路径不存在,则数据工厂的复制活动将在执行期间创建一个路径。If the path doesn't exist, the Data Factory copy activity creates a path during execution.

    • HighWatermarkBlobContainer:指定用于存储高水印值的容器。HighWatermarkBlobContainer: Specify the container to store the high-watermark value.

    • HighWatermarkBlobDirectory:指定用于存储高水印值的容器下的文件夹路径。HighWatermarkBlobDirectory: Specify the folder path under the container to store the high-watermark value.

    • HighWatermarkBlobName:指定用于存储高水印值的 Blob 名称,例如 requestIdCache.txtHighWatermarkBlobName: Specify the blob name to store the high watermark value, such as requestIdCache.txt. 在 Blob 存储中,转到 HighWatermarkBlobContainer+HighWatermarkBlobDirectory+HighWatermarkBlobName 的相应路径,例如“container/path/requestIdCache.txt” 。In Blob storage, go to the corresponding path of HighWatermarkBlobContainer+HighWatermarkBlobDirectory+HighWatermarkBlobName, such as container/path/requestIdCache.txt. 创建包含内容 0 的 Blob。Create a blob with content 0.

      Blob 内容

    • LogicAppURL:在此模板中,我们将使用 WebActivity 来调用 Azure 逻辑应用,以便在 Blob 存储中设置高水印值。LogicAppURL: In this template, we use WebActivity to call Azure Logic Apps to set the high-watermark value in Blob storage. 或者,可以使用 Azure SQL 数据库来存储该值。Or, you can use Azure SQL Database to store it. 使用存储过程活动来更新该值。Use a stored procedure activity to update the value.

      必须先创建逻辑应用,如下图所示。You must first create a logic app, as the following image shows. 然后,粘贴 HTTP POST URLThen, paste in the HTTP POST URL.

      逻辑应用配置

      1. 转到 Azure 门户。Go to the Azure portal. 选择一个新的“逻辑应用”服务。 Select a new Logic Apps service. 选择“+空白逻辑应用”转到“逻辑应用设计器”。 Select +Blank Logic App to go to Logic Apps Designer.

      2. 创建“收到 HTTP 请求时”触发器。 Create a trigger of When an HTTP request is received. 按如下所示指定 HTTP 请求正文:Specify the HTTP request body as follows:

        {
           "properties": {
              "sapOpenHubMaxRequestId": {
                 "type": "string"
              }
           },
           "type": "object"
        }
        
      3. 添加“创建 Blob”操作。 Add a Create blob action. 对于“文件夹路径”和“Blob 名称”,请使用前面在 HighWatermarkBlobContainer+HighWatermarkBlobDirectory 和 HighWatermarkBlobName 中配置的相同值。 For Folder path and Blob name, use the same values that you configured previously in HighWatermarkBlobContainer+HighWatermarkBlobDirectory and HighWatermarkBlobName.

      4. 选择“保存” 。Select Save. 然后,复制“HTTP POST URL”的值以便在数据工厂管道中使用。 Then, copy the value of HTTP POST URL to use in the Data Factory pipeline.

  4. 提供数据工厂的管道参数后,选择“调试” > “完成”以调用一个运行来验证配置。 After you provide the Data Factory pipeline parameters, select Debug > Finish to invoke a run to validate the configuration. 或者,选择“发布”以发布所有更改,然后选择“添加触发器”以执行运行。 Or, select Publish to publish all the changes, and then select Add trigger to execute a run.

SAP BW Open Hub 目标配置SAP BW Open Hub Destination configurations

本部分介绍如何在 SAP BW 端进行配置,以使用数据工厂中的 SAP BW Open Hub 连接器来复制数据。This section introduces configuration of the SAP BW side to use the SAP BW Open Hub connector in Data Factory to copy data.

在 SAP BW 中配置增量提取Configure delta extraction in SAP BW

如果需要历史副本和增量副本,或者只需要增量副本,请在 SAP BW 中配置增量提取。If you need both historical copy and incremental copy or only incremental copy, configure delta extraction in SAP BW.

  1. 创建 Open Hub 目标。Create the Open Hub Destination. 可以在 SAP Transaction RSA1 中创建 OHD,这会自动创建所需的转换和数据传输过程。You can create the OHD in SAP Transaction RSA1, which automatically creates the required transformation and data-transfer process. 使用以下设置:Use the following settings:

    • 对象类型:可以使用任何对象类型。ObjectType: You can use any object type. 此处我们使用 InfoCube 作为示例。Here, we use InfoCube as an example.
    • 目标类型:选择“数据库表”。 Destination Type: Select Database Table.
    • 表键:选择“技术键”。 Key of the Table: Select Technical Key.
    • 提取:选择“保留数据并将记录插入到表中”。 Extraction: Select Keep Data and Insert Records into Table.

    “创建 SAP BW OHD 增量提取”对话框

    “创建 SAP BW OHD 增量提取”对话框 2

    可能提高 DTP 的并行运行 SAP 工作进程数:You might increase the number of parallel running SAP work processes for the DTP:

    create-sap-bw-ohd-delta3

  2. 在进程链中计划 DTP。Schedule the DTP in process chains.

    仅当所需的行未压缩时,针对多维数据集的增量 DTP 才能正常工作。A delta DTP for a cube only works if the necessary rows haven't been compressed. 在对 Open Hub 表运行 DTP 之前,请确保 BW 多维数据集压缩未运行。Make sure that BW cube compression isn't running before the DTP to the Open Hub table. 为此,最简单的方法是将 DTP 集成到现有的进程链中。The easiest way to do this is to integrate the DTP into your existing process chains. 在以下示例中,DTP(针对 OHD)已插入到进程链中的“调整”(聚合汇总)与“折叠”(多维数据集压缩)步骤之间。 In the following example, the DTP (to the OHD) is inserted into the process chain between the Adjust (aggregate rollup) and Collapse (cube compression) steps.

    创建 SAP BW 进程链的流程图

在 SAP BW 中配置完整提取Configure full extraction in SAP BW

除了增量提取以外,还可以配置同一 SAP BW InfoProvider 的完整提取。In addition to delta extraction, you might want a full extraction of the same SAP BW InfoProvider. 若要执行完整复制而不是增量复制,或者要重新同步增量提取,则你通常会这样做。This usually applies if you want to do full copy but not incremental, or you want to resync delta extraction.

不能对同一个 OHD 使用多个 DTP。You can't have more than one DTP for the same OHD. 因此,在执行增量提取之前,必须先创建一个附加的 OHD。So, you must create an additional OHD before delta extraction.

创建 SAP BW OHD 完整提取

对于满负载 OHD,请选择不同于增量提取的选项:For a full load OHD, choose different options than for delta extraction:

  • 在 OHD 中:将“提取”选项设置为“删除数据并插入记录”。 In OHD: Set the Extraction option to Delete Data and Insert Records. 否则,在 BW 进程链中重复 DTP 时,会多次提取数据。Otherwise, data will be extracted many times when you repeat the DTP in a BW process chain.

  • 在 DTP 中:将“提取模式”设置为“完整”。 In the DTP: Set Extraction Mode to Full. 在创建 OHD 后,必须紧接着将自动创建的 DTP 从“增量”更改为“完整”,如下图所示: You must change the automatically created DTP from Delta to Full immediately after the OHD is created, as this image shows:

    用于创建配置了“完整”提取的 SAP BW OHD 的对话框

  • 在数据工厂的 BW Open Hub 连接器中:禁用“排除最后一个请求”。 In the BW Open Hub connector of Data Factory: Turn off Exclude last request. 否则不会提取任何内容。Otherwise, nothing will be extracted.

你通常会手动运行完整 DTP。You typically run the full DTP manually. 或者,可为完整 DTP 创建一个进程链。Or, you can create a process chain for the full DTP. 它通常是一个与现有进程链不相关的独立链。It's typically a separate chain that's independent of your existing process chains. 对于任一做法,请确保在使用数据工厂复制活动开始提取之前已完成 DTP。 In either case, make sure that the DTP is finished before you start the extraction by using Data Factory copy. 否则,只会复制部分数据。Otherwise, only partial data will be copied.

首次运行增量提取Run delta extraction the first time

从技术上讲,首次增量提取属于完整提取。 The first delta extraction is technically a full extraction. 默认情况下,SAP BW Open Hub 连接器在复制数据时会排除最后一个请求。By default, the SAP BW Open Hub connector excludes the last request when it copies data. 对于首次增量提取,在后续 DTP 在具有不同请求 ID 的表中生成增量数据之前,数据工厂复制活动不会提取任何数据。For the first delta extraction, no data is extracted by the Data Factory copy activity until a subsequent DTP generates delta data in the table with a separate request ID. 可通过两种方式来避免这种情况:There are two ways to avoid this scenario:

  • 针对首次增量提取禁用“排除最后一个请求”选项。 Turn off the Exclude last request option for the first delta extraction. 在首次启动增量提取之前,确保已完成首次增量 DTP。Make sure that the first delta DTP is finished before you start the delta extraction the first time.
  • 使用下一部分所述的重新同步增量提取的过程。Use the procedure for resyncing the delta extraction, as described in the next section.

重新同步增量提取Resync delta extraction

出现以下情况时,会更改 SAP BW 多维数据集中的数据,但增量 DTP 将忽略此类情况:The following scenarios change the data in SAP BW cubes but are not considered by the delta DTP:

  • SAP BW 选择性删除行(使用任何筛选条件)SAP BW selective deletion (of rows by using any filter condition)
  • SAP BW 请求删除(错误的请求)SAP BW request deletion (of faulty requests)

SAP Open Hub 目标不是受数据市场控制的数据目标(自 2015 年开始已包含在所有 SAP BW 支持包中)。An SAP Open Hub Destination isn't a data-mart-controlled data target (in all SAP BW support packages since 2015). 因此,无需更改 OHD 中的数据即可删除多维数据集中的数据。So, you can delete data from a cube without changing the data in the OHD. 然后,必须将多维数据集的数据与数据工厂重新同步:You must then resync the data of the cube with Data Factory:

  1. 在数据工厂中运行完整提取(使用 SAP 中的完整 DTP)。Run a full extraction in Data Factory (by using a full DTP in SAP).
  2. 删除增量 DTP 的 Open Hub 表中的所有行。Delete all rows in the Open Hub table for the delta DTP.
  3. 将增量 DTP 的状态设置为“已提取”。 Set the status of the delta DTP to Fetched.

然后,所有后续增量 DTP 和数据工厂增量提取都会按预期方式工作。After this, all subsequent delta DTPs and Data Factory delta extractions work as expected.

若要将增量 DTP 的状态设置为“已提取”,可以使用以下选项来手动运行增量 DTP: To set the status of the delta DTP to Fetched, you can use the following option to run the delta DTP manually:

*No Data Transfer; Delta Status in Source: Fetched*

后续步骤Next steps

了解 SAP BW Open Hub 连接器支持:Learn about SAP BW Open Hub connector support: