自承载 IR 配置为 Azure 数据工厂中 Azure-SSIS IR 的代理Configure a self-hosted IR as a proxy for an Azure-SSIS IR in Azure Data Factory

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

本文介绍如何在将某个自承载集成运行时(自承载 IR)配置为代理的情况下,在 Azure 数据工厂中的 Azure-SSIS Integration Runtime (Azure-SSIS IR) 上运行 SQL Server Integration Services (SSIS) 包。This article describes how to run SQL Server Integration Services (SSIS) packages on an Azure-SSIS Integration Runtime (Azure-SSIS IR) in Azure Data Factory with a self-hosted integration runtime (self-hosted IR) configured as a proxy.

使用此功能可在本地访问数据,而无需将 Azure-SSIS IR 加入虚拟网络With this feature, you can access data on-premises without having to join your Azure-SSIS IR to a virtual network. 当企业网络的配置过于复杂,或者采用过于严格的策略,以致你很难在此网络中注入 Azure-SSIS IR 时,此功能将很有用。The feature is useful when your corporate network has a configuration too complex or a policy too restrictive for you to inject your Azure-SSIS IR into it.

此功能可将 SSIS 数据流任务分解为两个暂存任务(如果适用):This feature breaks down your SSIS data flow task into two staging tasks whenever applicable:

  • 本地暂存任务:此任务运行连接到自承载 IR 上本地数据存储的数据流组件。On-premises staging task: This task runs your data flow component that connects to an on-premises data store on your self-hosted IR. 它在本地数据存储与 Azure Blob 存储中的暂存区域之间来回移动数据。It moves data from the on-premises data store into a staging area in your Azure Blob storage or vice versa.
  • 云暂存任务:此任务运行无法连接到 Azure-SSIS IR 上本地数据存储的数据流组件。Cloud staging task: This task runs your data flow component that doesn't connect to an on-premises data store on your Azure-SSIS IR. 它在 Azure Blob 存储中的暂存区域与云数据存储之间来回移动数据。It moves data from the staging area in your Azure Blob storage to a cloud data store or vice versa.

如果数据流任务将数据从本地移动到云,则第一个和第二个暂存任务将分别为本地暂存任务和云暂存任务。If your data flow task moves data from on premises to cloud, then the first and second staging tasks will be on-premises and cloud staging tasks, respectively. 如果数据流任务将数据从云移动到本地,则第一个和第二个暂存任务将分别为云暂存任务和本地暂存任务。If your data flow task moves data from cloud to on premises, then the first and second staging tasks will be cloud and on-premises staging tasks, respectively. 如果数据流任务将数据从本地移动到本地,则第一个和第二个暂存任务将皆为本地暂存任务。If your data flow task moves data from on premises to on premises, then the first and second staging tasks will be both on-premises staging tasks. 如果数据流任务将数据从云移动到云,则此功能不适用。If your data flow task moves data from cloud to cloud, then this feature isn't applicable.

举例而言,此功能的其他优势和功能使你能够在 Azure-SSIS IR 尚不支持的区域中设置自承载 IR,并在数据源的防火墙中允许自承载 IR 的公共静态 IP 地址。Other benefits and capabilities of this feature allow you to, for example, set up your self-hosted IR in regions that are not yet supported by an Azure-SSIS IR, and allow the public static IP address of your self-hosted IR on the firewall of your data sources.

准备自承载 IRPrepare the self-hosted IR

若要使用此功能,请先创建一个数据工厂,然后在其中设置 Azure-SSIS IR。To use this feature, you first create a data factory and set up an Azure-SSIS IR in it. 如果尚未执行此操作,请按照设置 Azure-SSIS IR 中的说明操作。If you have not already done so, follow the instructions in Set up an Azure-SSIS IR.

然后,在设置了 Azure-SSIS IR 的同一数据工厂中设置自承载 IR。You then set up your self-hosted IR in the same data factory where your Azure-SSIS IR is set up. 为此,请参阅创建自承载 IRTo do so, see Create a self-hosted IR.

最后,按如下所述,在本地计算机或 Azure 虚拟机 (VM) 上下载并安装最新版本的自承载 IR 以及其他驱动程序和运行时:Finally, you download and install the latest version of the self-hosted IR, as well as the additional drivers and runtime, on your on-premises machine or Azure virtual machine (VM), as follows:

  • 下载并安装最新版本的自承载 IRDownload and install the latest version of the self-hosted IR.

  • 如果使用包中的对象链接与嵌入数据库 (OLEDB)/开放式数据库连接 (ODBC) 连接器,请在安装了自承载 IR 的同一台计算机上下载并安装相关驱动程序(如果尚未这样做)。If you use Object Linking and Embedding Database (OLEDB)/Open Database Connectivity (ODBC) connectors in your packages, download and install the relevant drivers on the same machine where your self-hosted IR is installed, if you haven't done so already.

    如果使用早期版本的用于 SQL Server 的 OLEDB 驱动程序 (SQL Server Native Client [SQLNCLI]),请下载 64 位版本If you use the earlier version of the OLEDB driver for SQL Server (SQL Server Native Client [SQLNCLI]), download the 64-bit version.

    如果使用最新版本的用于 SQL Server 的 OLEDB 驱动程序 (MSOLEDBSQL),请下载 64 位版本If you use the latest version of OLEDB driver for SQL Server (MSOLEDBSQL), download the 64-bit version.

    如果使用用于其他数据库系统(例如 PostgreSQL、MySQL、Oracle 等)的 OLEDB/ODBC 驱动程序,可以从其网站下载 64 位版本。If you use OLEDB/ODBC drivers for other database systems, such as PostgreSQL, MySQL, Oracle, and so on, you can download the 64-bit versions from their websites.

  • 如果尚未这样做,请在安装了自承载 IR 的同一台计算机上下载并安装 64 位版本的 Visual C++ (VC) 运行时If you haven't done so already, download and install the 64-bit version of Visual C++ (VC) runtime on the same machine where your self-hosted IR is installed.

准备用于暂存的 Azure Blob 存储链接服务Prepare the Azure Blob storage-linked service for staging

如果尚未这样做,请在设置了 Azure-SSIS IR 的同一数据工厂中创建一个 Azure Blob 存储链接服务。If you haven't already done so, create an Azure Blob storage-linked service in the same data factory where your Azure-SSIS IR is set up. 为此,请参阅创建 Azure 数据工厂链接服务To do so, see Create an Azure data factory-linked service. 确保执行以下操作:Be sure to do the following:

  • 对于“数据存储”,请选择“Azure Blob 存储”。 For Data Store, select Azure Blob Storage.

  • 对于“通过集成运行时连接” ,请选择“AutoResolveIntegrationRuntime” (而非你的 Azure-SSIS IR 或自承载 IR),因为我们使用默认的 Azure IR 来获取 Azure Blob 存储的访问凭据。For Connect via integration runtime, select AutoResolveIntegrationRuntime (not your Azure-SSIS IR nor your self-hosted IR), because we use the default Azure IR to fetch access credentials for your Azure Blob Storage.

  • 对于“身份验证方法”,请选择“帐户密钥”、“SAS URI”或“服务主体”。 For Authentication method, select Account key, SAS URI, or Service Principal.

    提示

    如果选择“服务主体” 方法,请至少为你的服务主体授予“存储 Blob 数据参与者” ** 角色。If you select the Service Principal method, grant your service principal at least a Storage Blob Data Contributor role. 有关详细信息,请参阅  Azure Blob 存储连接器For more information, refer to Azure Blob storage connector.

准备用于暂存的 Azure Blob 存储链接服务

配置使用自承载 IR 作为代理的 Azure-SSIS IR 配置Configure an Azure-SSIS IR with your self-hosted IR as a proxy

准备好用于暂存的自承载 IR 和 Azure Blob 存储链接服务后,接下来可以在数据工厂门户或应用中,配置使用自承载 IR 作为代理的新的或现有 Azure-SSIS IR。Having prepared your self-hosted IR and Azure Blob storage-linked service for staging, you can now configure your new or existing Azure-SSIS IR with the self-hosted IR as a proxy in your data factory portal or app. 不过,在执行此操作之前,如果现有的 Azure-SSIS IR 已运行,请停止再重启它。Before you do so, though, if your existing Azure-SSIS IR is already running, stop it and then restart it.

  1. 在“集成运行时设置”窗格中,选择“下一步”跳过“常规设置”和“SQL 设置”部分。 In the Integration runtime setup pane, skip past the General Settings and SQL Settings sections by selecting Next.

  2. 在“高级设置”部分执行以下操作: In the Advanced settings section, do the following:

    1. 选中“将自承载集成运行时设置为 Azure-SSIS Integration Runtime 的代理”复选框。 Select the Set up Self-Hosted Integration Runtime as a proxy for your Azure-SSIS Integration Runtime check box.

    2. 在“自承载集成运行时”下拉列表中,选择现有的自承载 IR 作为 Azure-SSIS IR 的代理。 In the Self-Hosted Integration Runtime drop-down list, select your existing self-hosted IR as a proxy for the Azure-SSIS IR.

    3. 在“暂存存储链接服务”下拉列表中,选择现有的 Azure Blob 存储链接服务,或创建新的服务用于暂存。 In the Staging storage linked service drop-down list, select your existing Azure Blob storage-linked service or create a new one for staging.

    4. 在“暂存路径”框中,指定所选 Azure Blob 存储帐户中的某个 Blob 容器,或将其留空以使用默认容器进行暂存。 In the Staging path box, specify a blob container in your selected Azure Blob storage account or leave it empty to use a default one for staging.

    5. 选择“继续”。 Select Continue.

    自承载 IR 的高级设置

还可以使用 PowerShell 来配置使用自承载 IR 作为代理的新的或现有 Azure-SSIS IR。You can also configure your new or existing Azure-SSIS IR with the self-hosted IR as a proxy by using PowerShell.

$ResourceGroupName = "[your Azure resource group name]"
$DataFactoryName = "[your data factory name]"
$AzureSSISName = "[your Azure-SSIS IR name]"
# Self-hosted integration runtime info - This can be configured as a proxy for on-premises data access 
$DataProxyIntegrationRuntimeName = "" # OPTIONAL to configure a proxy for on-premises data access 
$DataProxyStagingLinkedServiceName = "" # OPTIONAL to configure a proxy for on-premises data access 
$DataProxyStagingPath = "" # OPTIONAL to configure a proxy for on-premises data access 

# Add self-hosted integration runtime parameters if you configure a proxy for on-premises data accesss
if(![string]::IsNullOrEmpty($DataProxyIntegrationRuntimeName) -and ![string]::IsNullOrEmpty($DataProxyStagingLinkedServiceName))
{
    Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
        -DataFactoryName $DataFactoryName `
        -Name $AzureSSISName `
        -DataProxyIntegrationRuntimeName $DataProxyIntegrationRuntimeName `
        -DataProxyStagingLinkedServiceName $DataProxyStagingLinkedServiceName

    if(![string]::IsNullOrEmpty($DataProxyStagingPath))
    {
        Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
            -DataFactoryName $DataFactoryName `
            -Name $AzureSSISName `
            -DataProxyStagingPath $DataProxyStagingPath
    }
}
Start-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
    -DataFactoryName $DataFactoryName `
    -Name $AzureSSISName `
    -Force

使 SSIS 包能够通过代理进行连接Enable SSIS packages to connect by proxy

使用最新 SSDT 作为 Visual Studio 的 SSIS 项目扩展或独立安装程序,可以发现支持的数据流组件的连接管理器中已添加一个新的 ConnectByProxy 属性。By using the latest SSDT as either the SSIS Projects extension for Visual Studio or a standalone installer, you can find a new ConnectByProxy property that has been added in the connection managers for supported data flow components.

使用可在本地访问数据的组件设计包含数据流任务的新包时,可以通过在相关连接管理器的“属性”窗格中,将此属性设置为 True 来启用此属性。When you design new packages containing data flow tasks with components that access data on premises, you can enable this property by setting it to True in the Properties pane of the relevant connection managers.

启用 ConnectByProxy 属性

还可以在运行现有包时启用此属性,而无需逐个手动更改其设置。You can also enable this property when you run existing packages, without having to manually change them one by one. 有两个选项:There are two options:

  • 选项 A:打开、重新生成并重新部署包含这些包的项目。这些包中包含要在 Azure-SSIS IR 上运行的最新 SSDT。Option A: Open, rebuild, and redeploy the project containing those packages with the latest SSDT to run on your Azure-SSIS IR. 然后,可以针对相关的连接管理器,通过将此属性设置为 True 来启用此属性。You can then enable the property by setting it to True for the relevant connection managers. 从 SSMS 运行包时,这些连接管理器将显示在“执行包”弹出窗口的“连接管理器”选项卡上。 When you're running packages from SSMS, these connection managers appear on the Connection Managers tab of the Execute Package pop-up window.

    启用 ConnectByProxy 属性 2

    在数据工厂管道中运行包时,对于执行 SSIS 包活动的“连接管理器”选项卡上显示的相关连接管理器,还可以通过将此属性设置为 True 来启用此属性。You can also enable the property by setting it to True for the relevant connection managers that appear on the Connection Managers tab of Execute SSIS Package activity when you're running packages in Data Factory pipelines.

    启用 ConnectByProxy 属性 3

  • 选择 B:重新部署包含这些包的项目,以在 SSIS IR 上运行。Option B: Redeploy the project containing those packages to run on your SSIS IR. 然后,可以在从 SSMS 运行包时,通过在“执行包”弹出窗口的“高级”选项卡上提供此属性的属性路径 \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy],并将其设置为 True 作为属性重写,来启用此属性。 You can then enable the property by providing its property path, \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy], and setting it to True as a property override on the Advanced tab of the Execute Package pop-up window when you're running packages from SSMS.

    启用 ConnectByProxy 属性 4

    在数据工厂管道中运行包时,还可以通过在“执行 SSIS 包活动”的“属性重写”选项卡上提供此属性的属性路径 \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy],并将其设置为 True 作为属性重写,来启用此属性。 You can also enable the property by providing its property path, \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy], and setting it to True as a property override on the Property Overrides tab of Execute SSIS Package activity when you're running packages in Data Factory pipelines.

    启用 ConnectByProxy 属性 5

调试本地和云暂存任务Debug the on-premises and cloud staging tasks

在自承载 IR 上,可以在 C:\ProgramData\SSISTelemetry 文件夹中找到运行时日志,并在 C:\ProgramData\SSISTelemetry\ExecutionLog 文件夹中找到本地暂存任务的执行日志 。On your self-hosted IR, you can find the runtime logs in the C:\ProgramData\SSISTelemetry folder and the execution logs of on-premises staging tasks in the C:\ProgramData\SSISTelemetry\ExecutionLog folder. 可以在 SSISDB 或指定的日志路径中找到云暂存任务的执行日志,具体取决于是否将包存储在 SSISDB 中。You can find the execution logs of cloud staging tasks in your SSISDB or specified logging paths, depending on whether you store your packages in SSISDB or not. 还可以在云暂存任务的执行日志中找到本地暂存任务的唯一 ID。You can also find the unique IDs of on-premises staging tasks in the execution logs of cloud staging tasks.

第一个暂存任务的唯一 ID

在本地暂存任务中使用 Windows 身份验证Use Windows authentication in on-premises staging tasks

如果自承载 IR 上的本地暂存任务需要 Windows 身份验证,请将 SSIS 包配置为使用相同的 Windows 身份验证If on-premises staging tasks on your self-hosted IR require Windows authentication, configure your SSIS packages to use the same Windows authentication.

将使用自承载 IR 服务帐户(默认为 NT SERVICE\DIAHostService)调用本地暂存任务,将使用 Windows 身份验证帐户访问数据存储。Your on-premises staging tasks will be invoked with the self-hosted IR service account (NT SERVICE\DIAHostService, by default), and your data stores will be accessed with the Windows authentication account. 需要向这两个帐户分配特定的安全策略。Both accounts require certain security policies to be assigned to them. 在自承载 IR 计算机上,转到“本地安全策略” > “本地策略” > “用户权限分配”,然后执行以下操作: On the self-hosted IR machine, go to Local Security Policy > Local Policies > User Rights Assignment, and then do the following:

  1. 向自承载 IR 服务帐户分配“调整进程的内存配额”和“替换进程级令牌”策略。 Assign the Adjust memory quotas for a process and Replace a process level token policies to the self-hosted IR service account. 使用默认服务帐户安装自承载 IR 时,系统应会自动进行此分配。This should occur automatically when you install your self-hosted IR with the default service account. 如果系统没有自动分配,请手动分配这些策略。If it doesn't, assign those policies manually. 如果使用其他服务帐户,则为其分配相同的策略。If you use a different service account, assign the same policies to it.

  2. 向 Windows 身份验证帐户分配“作为服务登录”策略。 Assign the Log on as a service policy to the Windows Authentication account.

本地暂存任务和云暂存任务的计费Billing for the on-premises and cloud staging tasks

在自承载 IR 上运行的本地暂存任务将单独计费,就像自承载 IR 上运行的任何数据移动活动的计费方式一样。The on-premises staging tasks that run on your self-hosted IR are billed separately, just as any data movement activities that run on a self-hosted IR are billed. 这是在 Azure 数据工厂数据管道定价一文中指定的。This is specified in the Azure Data Factory data pipeline pricing article.

Azure-SSIS IR 上运行的云暂存任务不单独计费,但是运行中的 Azure-SSIS IR 会计费,如 Azure-SSIS IR 定价一文中所述。The cloud staging tasks that run on your Azure-SSIS IR are not be billed separately, but your running Azure-SSIS IR is billed as specified in the Azure-SSIS IR pricing article.

启用 TLS 1.2Enabling TLS 1.2

如果需要在自承载 IR 上使用强加密/更安全的网络协议 (TLS 1.2) 并禁用较旧的 SSL/TLS 版本,则可下载并运行 main.cmd 脚本,该脚本可以在公共预览版容器的 CustomSetupScript/UserScenarios/TLS 1.2 文件夹中找到。If you need to use strong cryptography/more secure network protocol (TLS 1.2) and disable older SSL/TLS versions on your self-hosted IR, you can download and run the main.cmd script that can be found in the CustomSetupScript/UserScenarios/TLS 1.2 folder of our public preview container. 使用 Azure 存储资源管理器,可以通过输入以下 SAS URI 连接到公共预览版容器:Using Azure Storage Explorer, you can connect to our public preview container by entering the following SAS URI:

https://ssisazurefileshare.blob.core.chinacloudapi.cn/publicpreview?sp=rl&st=2020-03-25T04:00:00Z&se=2025-03-25T04:00:00Z&sv=2019-02-02&sr=c&sig=WAD3DATezJjhBCO3ezrQ7TUZ8syEUxZZtGIhhP6Pt4I%3D

当前限制Current limitations

  • 目前仅支持具有 OLEDB/ODBC/平面文件源或 OLEDB 目标的数据流任务。Only data flow tasks with OLEDB/ODBC/Flat File sources or OLEDB destination are currently supported.
  • 目前仅支持使用“帐户密钥”、“共享访问签名 (SAS) URI”或“服务主体”身份验证配置的 Azure Blob 存储链接服务。 Only Azure Blob storage-linked services that are configured with Account key, Shared Access Signature (SAS) URI, or Service Principal authentication are currently supported.
  • 目前不支持 OLEDB 源中的 ParameterMapping。ParameterMapping in OLEDB Source is currently unsupported. 作为变通方法,请使用“变量中的 SQL 命令”作为“访问模式”,并使用“表达式”在 SQL 命令中插入变量/参数。 As a workaround, please use SQL Command From Variable as the AccessMode and use Expression to insert your variables/parameters in a SQL command. 有关说明,请参阅 ParameterMappingSample.dtsx 包,它位于公共预览版容器的 SelfHostedIRProxy/Limitations 文件夹中。As an illustration, see the ParameterMappingSample.dtsx package that can be found in the SelfHostedIRProxy/Limitations folder of our public preview container. 使用 Azure 存储资源管理器,可以通过输入以上 SAS URI 连接到公共预览版容器。Using Azure Storage Explorer, you can connect to our public preview container by entering the above SAS URI.

后续步骤Next steps

将自承载 IR 配置为 Azure-SSIS IR 的代理后,可以部署并运行包,以便在数据工厂管道中以“执行 SSIS 包”活动的形式访问本地数据。After you've configured your self-hosted IR as a proxy for your Azure-SSIS IR, you can deploy and run your packages to access data on-premises as Execute SSIS Package activities in Data Factory pipelines. 有关操作方法,请参阅在数据工厂管道中以“执行 SSIS 包”活动的形式运行 SSIS 包To learn how, see Run SSIS packages as Execute SSIS Package activities in Data Factory pipelines.