创建和配置自承载集成运行时Create and configure a self-hosted integration runtime

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

集成运行时 (IR) 是 Azure 数据工厂用于在不同的网络环境之间提供数据集成功能的计算基础结构。The integration runtime (IR) is the compute infrastructure that Azure Data Factory uses to provide data-integration capabilities across different network environments. 有关 IR 的详细信息,请参阅集成运行时概述For details about IR, see Integration runtime overview.

自承载集成运行时能够在云数据存储和专用网络中数据存储之间运行复制活动。A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. 它还可以针对本地网络或 Azure 虚拟网络中的计算资源调度转换活动。It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network. 安装自承载集成运行时需要在专用网络中提供一台本地计算机或虚拟机。The installation of a self-hosted integration runtime needs an on-premises machine or a virtual machine inside a private network.

本文介绍如何创建和配置自承载 IR。This article describes how you can create and configure a self-hosted IR.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

设置自承载集成运行时Setting up a self-hosted integration runtime

若要创建和设置自承载集成运行时,请使用以下过程。To create and set up a self-hosted integration runtime, use the following procedures.

通过 Azure PowerShell 创建自承载 IRCreate a self-hosted IR via Azure PowerShell

  1. 可以使用 Azure PowerShell 来完成此任务。You can use Azure PowerShell for this task. 以下是示例:Here is an example:

    Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName -Type SelfHosted -Description "selfhosted IR description"
    
  2. 在本地计算机上下载并安装自承载集成运行时。Download and install the self-hosted integration runtime on a local machine.

  3. 检索身份验证密钥并使用密钥注册自承载集成运行时。Retrieve the authentication key and register the self-hosted integration runtime with the key. 下面是 PowerShell 示例:Here is a PowerShell example:

    
    Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName  
    
    

通过 Azure 数据工厂 UI 创建自承载 IRCreate a self-hosted IR via Azure Data Factory UI

使用以下步骤通过 Azure 数据工厂 UI 创建自承载 IR。Use the following steps to create a self-hosted IR using Azure Data Factory UI.

  1. 在 Azure 数据工厂 UI 的“开始使用”页上,从最左侧的窗格选择“管理”选项卡On the Let's get started page of Azure Data Factory UI, select the Manage tab from the leftmost pane.

    主页“管理”按钮

  2. 在左窗格中选择“集成运行时”,然后选择“+ 新建” 。Select Integration runtimes on the left pane, and then select +New.

    创建集成运行时

  3. 在“集成运行时安装”页面上,选择“Azure,自承载”,然后选择“继续”。 On the Integration runtime setup page, select Azure, Self-Hosted, and then select Continue.

  4. 在下一页上选择“自承载”以创建自承载 IR,然后选择“继续”。On the following page, select Self-Hosted to create a Self-Hosted IR, and then select Continue. 创建自承载 IRCreate a selfhosted IR

  5. 输入 IR 的名称,然后选择“创建”。Enter a name for your IR, and select Create.

  6. 在“集成运行时安装”页面上,选择“选项 1”下的链接,在计算机上打开快速安装。 On the Integration runtime setup page, select the link under Option 1 to open the express setup on your computer. 或者遵循“选项 2”下的步骤进行手动安装。Or follow the steps under Option 2 to set up manually. 以下说明基于手动安装:The following instructions are based on manual setup:

    集成运行时安装

    1. 复制并粘贴身份验证密钥。Copy and paste the authentication key. 选择“下载并安装集成运行时”。Select Download and install integration runtime.

    2. 将自承载集成运行时下载到本地 Windows 计算机上。Download the self-hosted integration runtime on a local Windows machine. 运行安装程序。Run the installer.

    3. 在“注册集成运行时(自承载)”页上粘贴前面保存的密钥,然后选择“注册”。 On the Register Integration Runtime (Self-hosted) page, paste the key you saved earlier, and select Register.

      注册 Integration Runtime

    4. 在“新建 Integration Runtime (自承载)节点”页上,选择“完成”。 On the New Integration Runtime (Self-hosted) Node page, select Finish.

  7. 成功注册自承载集成运行时后,会看到以下窗口:After the self-hosted integration runtime is registered successfully, you see the following window:

    注册成功

通过 Azure 资源管理器模板在 Azure VM 上安装自承载 IRSet up a self-hosted IR on an Azure VM via an Azure Resource Manager template

可以使用创建自承载 IR 模板在 Azure 虚拟机上自动完成自承载 IR 的安装。You can automate self-hosted IR setup on an Azure virtual machine by using the Create self host IR template. 使用该模板可以轻松地在 Azure 虚拟网络中创建一个完全正常运行的自承载 IR。The template provides an easy way to have a fully functional self-hosted IR inside an Azure virtual network. 该 IR 具有高可用性和可伸缩性功能,前提是能够将节点计数设置为 2 或以上。The IR has high-availability and scalability features, as long as you set the node count to 2 or higher.

通过本地 PowerShell 安装现有的自承载 IRSet up an existing self-hosted IR via local PowerShell

可以使用命令行来安装或管理现有的自承载 IR。You can use a command line to set up or manage an existing self-hosted IR. 这种用法特别有助于自动完成自承载 IR 节点的安装和注册。This usage can especially help to automate the installation and registration of self-hosted IR nodes.

自承载安装程序中包含 Dmgcmd.exe。Dmgcmd.exe is included in the self-hosted installer. 该程序通常位于 C:\Program Files\Microsoft Integration Runtime\4.0\Shared\folder 文件夹中。It's typically located in the C:\Program Files\Microsoft Integration Runtime\4.0\Shared\ folder. 此应用程序支持各种参数,可以使用用于自动化的批处理脚本通过命令行来调用。This application supports various parameters and can be invoked via a command line using batch scripts for automation.

按如下所示使用该应用程序:Use the application as follows:

dmgcmd [ -RegisterNewNode "<AuthenticationKey>" -EnableRemoteAccess "<port>" ["<thumbprint>"] -EnableRemoteAccessInContainer "<port>" ["<thumbprint>"] -DisableRemoteAccess -Key "<AuthenticationKey>" -GenerateBackupFile "<filePath>" "<password>" -ImportBackupFile "<filePath>" "<password>" -Restart -Start -Stop -StartUpgradeService -StopUpgradeService -TurnOnAutoUpdate -TurnOffAutoUpdate -SwitchServiceAccount "<domain\user>" ["<password>"] -Loglevel <logLevel> ]

下面是应用程序参数和属性的详细信息:Here are details of the application's parameters and properties:

属性Property 说明Description 必须Required
RegisterNewNode "<AuthenticationKey>"RegisterNewNode "<AuthenticationKey>" 使用指定的身份验证密钥注册自承载集成运行时节点。Register a self-hosted integration runtime node with the specified authentication key. No
RegisterNewNode "<AuthenticationKey>" "<NodeName>"RegisterNewNode "<AuthenticationKey>" "<NodeName>" 使用指定的身份验证密钥和节点名称注册自承载集成运行时节点。Register a self-hosted integration runtime node with the specified authentication key and node name. No
EnableRemoteAccess "<port>" ["<thumbprint>"]EnableRemoteAccess "<port>" ["<thumbprint>"] 在当前节点上启用远程访问以设置高可用性群集。Enable remote access on the current node to set up a high-availability cluster. 或者直接对自承载 IR 启用凭据设置,而无需使用 Azure 数据工厂。Or enable setting credentials directly against the self-hosted IR without going through Azure Data Factory. 如果采用后一种做法,可以在同一网络中的远程计算机上使用 New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet。You do the latter by using the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet from a remote machine in the same network. No
EnableRemoteAccessInContainer "<port>" ["<thumbprint>"]EnableRemoteAccessInContainer "<port>" ["<thumbprint>"] 启用当节点在容器中运行时以远程方式访问当前节点。Enable remote access to the current node when the node runs in a container. No
DisableRemoteAccessDisableRemoteAccess 禁用对当前节点的远程访问。Disable remote access to the current node. 多节点设置需要远程访问。Remote access is needed for multinode setup. 即使禁用了远程访问,也仍可正常运行 New-AzDataFactoryV2LinkedServiceEncryptedCredential PowerShell cmdlet。The New-AzDataFactoryV2LinkedServiceEncryptedCredential PowerShell cmdlet still works even when remote access is disabled. 只要在自承载 IR 节点所在的同一台计算机上执行该 cmdlet,就可以实现此行为。This behavior is true as long as the cmdlet is executed on the same machine as the self-hosted IR node. No
Key "<AuthenticationKey>"Key "<AuthenticationKey>" 覆盖或更新以前的身份验证密钥。Overwrite or update the previous authentication key. 请谨慎执行此操作。Be careful with this action. 如果密钥属于新的集成运行时,以前的自承载 IR 节点可能会脱机。Your previous self-hosted IR node can go offline if the key is of a new integration runtime. No
GenerateBackupFile "<filePath>" "<password>"GenerateBackupFile "<filePath>" "<password>" 为当前节点生成备份文件。Generate a backup file for the current node. 备份文件包含节点密钥和数据存储凭据。The backup file includes the node key and data-store credentials. No
ImportBackupFile "<filePath>" "<password>"ImportBackupFile "<filePath>" "<password>" 从备份文件还原节点。Restore the node from a backup file. No
重启Restart 重启自承载集成运行时主机服务。Restart the self-hosted integration runtime host service. No
启动Start 启动自承载集成运行时主机服务。Start the self-hosted integration runtime host service. No
停止Stop 停止自承载集成运行时主机服务。Stop the self-hosted integration runtime host service. No
StartUpgradeServiceStartUpgradeService 启动自承载集成运行时升级服务。Start the self-hosted integration runtime upgrade service. No
StopUpgradeServiceStopUpgradeService 停止自承载集成运行时升级服务。Stop the self-hosted integration runtime upgrade service. No
TurnOnAutoUpdateTurnOnAutoUpdate 启用自承载集成运行时自动更新。Turn on the self-hosted integration runtime auto-update. No
TurnOffAutoUpdateTurnOffAutoUpdate 禁用自承载集成运行时自动更新。Turn off the self-hosted integration runtime auto-update. No
SwitchServiceAccount "<domain\user>" ["<password>"]SwitchServiceAccount "<domain\user>" ["<password>"] 将 DIAHostService 设置为以新帐户的形式运行。Set DIAHostService to run as a new account. 对系统帐户和虚拟帐户使用空密码 ""。Use the empty password "" for system accounts and virtual accounts. No

命令流和数据流Command flow and data flow

在本地与云之间移动数据时,该活动使用自承载集成运行时在本地数据源与云之间传输数据。When you move data between on-premises and the cloud, the activity uses a self-hosted integration runtime to transfer the data between an on-premises data source and the cloud.

下面是使用自承载 IR 进行复制的概要数据流步骤:Here is a high-level summary of the data-flow steps for copying with a self-hosted IR:

数据流概要

  1. 数据开发人员使用 PowerShell cmdlet 在 Azure 数据工厂中创建自承载集成运行时。A data developer creates a self-hosted integration runtime within an Azure data factory by using a PowerShell cmdlet. 目前,Azure 门户不支持此功能。Currently, the Azure portal doesn't support this feature.
  2. 数据开发人员为本地数据存储创建一个链接服务。The data developer creates a linked service for an on-premises data store. 为此,开发人员可以指定服务用来连接数据存储的自承载集成运行时实例。The developer does so by specifying the self-hosted integration runtime instance that the service should use to connect to data stores.
  3. 自承载集成运行时节点使用 Windows 数据保护应用程序编程接口 (DPAPI) 加密凭据,并将凭据保存在本地。The self-hosted integration runtime node encrypts the credentials by using Windows Data Protection Application Programming Interface (DPAPI) and saves the credentials locally. 如果设置多个节点以实现高可用性,则凭据将跨其他节点进一步同步。If multiple nodes are set for high availability, the credentials are further synchronized across other nodes. 每个节点使用 DPAPI 加密凭据并将其存储在本地。Each node encrypts the credentials by using DPAPI and stores them locally. 凭据同步对数据开发者透明并由自承载 IR 处理。Credential synchronization is transparent to the data developer and is handled by the self-hosted IR.
  4. Azure 数据工厂与自承载集成运行时通信,以计划和管理作业。Azure Data Factory communicates with the self-hosted integration runtime to schedule and manage jobs. 通信是通过使用共享 Azure 服务总线中继连接的控制通道进行的。Communication is via a control channel that uses a shared Azure Service Bus Relay connection. 需要运行某个活动作业时,数据工厂会将请求以及任何凭据信息排队。When an activity job needs to be run, Data Factory queues the request along with any credential information. 如果凭据尚未存储在自承载集成运行时中,则它就会执行此操作。It does so in case credentials aren't already stored on the self-hosted integration runtime. 自承载集成运行时在轮询队列后启动作业。The self-hosted integration runtime starts the job after it polls the queue.
  5. 自承载集成运行时在本地存储与云存储之间复制数据。The self-hosted integration runtime copies data between an on-premises store and cloud storage. 复制方向取决于复制活动在数据管道中的配置方式。The direction of the copy depends on how the copy activity is configured in the data pipeline. 对于此步骤,自承载集成运行时直接通过安全 HTTPS 通道与基于云的存储服务(如 Azure Blob 存储)通信。For this step, the self-hosted integration runtime directly communicates with cloud-based storage services like Azure Blob storage over a secure HTTPS channel.

使用自承载 IR 的注意事项Considerations for using a self-hosted IR

  • 可将单个自承载集成运行时用于多个本地数据源。You can use a single self-hosted integration runtime for multiple on-premises data sources. 还可以在同一 Azure Active Directory (Azure AD) 租户中将此集成运行时与另一个数据工厂共享。You can also share it with another data factory within the same Azure Active Directory (Azure AD) tenant. 有关详细信息,请参阅共享自承载集成运行时For more information, see Sharing a self-hosted integration runtime.
  • 在一台计算机上只能安装一个自承载集成运行时实例。You can install only one instance of a self-hosted integration runtime on any single machine. 如果有两个数据工厂需要访问本地数据源,请使用自承载 IR 共享功能共享自承载集成 IR,或者在两台本地计算机(每个数据工厂一台)上安装自承载 IR。If you have two data factories that need to access on-premises data sources, either use the self-hosted IR sharing feature to share the self-hosted IR, or install the self-hosted IR on two on-premises computers, one for each data factory.
  • 自承载集成运行时不需要位于数据源所在的计算机上。The self-hosted integration runtime doesn't need to be on the same machine as the data source. 但是,使自承载集成运行时更接近于数据源会减少自承载集成运行时连接到数据源的时间。However, having the self-hosted integration runtime close to the data source reduces the time for the self-hosted integration runtime to connect to the data source. 建议在不同于托管本地数据源的计算机上安装自承载集成运行时。We recommend that you install the self-hosted integration runtime on a machine that differs from the one that hosts the on-premises data source. 当自承载集成运行时和数据源位于不同的计算机上时,自承载集成运行时不会与数据源争用资源。When the self-hosted integration runtime and data source are on different machines, the self-hosted integration runtime doesn't compete with the data source for resources.
  • 可将不同计算机上的多个自承载集成运行时连接到同一本地数据源。You can have multiple self-hosted integration runtimes on different machines that connect to the same on-premises data source. 例如,如果有两个自承载集成运行时为两个数据工厂提供服务,则可以将同一个本地数据源注册到这两个数据工厂。For example, if you have two self-hosted integration runtimes that serve two data factories, the same on-premises data source can be registered with both data factories.
  • 如果已在计算机中安装了为 Power BI 方案提供服务的网关,那么在其他计算机上安装用于数据工厂的单独自承载集成运行时。If you already have a gateway installed on your computer to serve a Power BI scenario, install a separate self-hosted integration runtime for Data Factory on another machine.
  • 使用自承载集成运行时来支持 Azure 虚拟网络中的数据集成。Use a self-hosted integration runtime to support data integration within an Azure virtual network.
  • 即使使用 Azure ExpressRoute,也要将数据源视为本地数据源(位于防火墙之后)。Treat your data source as an on-premises data source that is behind a firewall, even when you use Azure ExpressRoute. 使用自承载集成运行时将服务连接到数据源。Use the self-hosted integration runtime to connect the service to the data source.
  • 即使数据存储位于云中的 Azure 基础结构即服务 (IaaS) 虚拟机上,也应该使用自承载集成运行时。Use the self-hosted integration runtime even if the data store is in the cloud on an Azure Infrastructure as a Service (IaaS) virtual machine.
  • 在启用了符合 FIPS 标准的加密的 Windows 服务器上安装的自承载集成运行时中的任务可能会失败。Tasks might fail in a self-hosted integration runtime that you installed on a Windows server for which FIPS-compliant encryption is enabled. 要解决此问题,请禁用服务器上符合 FIPS 标准的加密。To work around this problem, disable FIPS-compliant encryption on the server. 若要禁用符合 FIPS 标准的加密,请将以下注册表子项的值从 1(启用)更改为 0(禁用):HKLM\System\CurrentControlSet\Control\Lsa\FIPSAlgorithmPolicy\EnabledTo disable FIPS-compliant encryption, change the following registry subkey's value from 1 (enabled) to 0 (disabled): HKLM\System\CurrentControlSet\Control\Lsa\FIPSAlgorithmPolicy\Enabled.

先决条件Prerequisites

  • 支持的 Windows 版本为:The supported versions of Windows are:

    • Windows 7 Service Pack 1Windows 7 Service Pack 1
    • Windows 8.1Windows 8.1
    • Windows 10Windows 10
    • Windows Server 2008 R2 SP1Windows Server 2008 R2 SP1
    • Windows Server 2012Windows Server 2012
    • Windows Server 2012 R2Windows Server 2012 R2
    • Windows Server 2016Windows Server 2016
    • Windows Server 2019Windows Server 2019

    不支持在域控制器上安装自承载集成运行时。Installation of the self-hosted integration runtime on a domain controller isn't supported.

  • 需要 .NET Framework 4.6.1 或更高版本。.NET Framework 4.6.1 or later is required. 如果在 Windows 7 计算机上安装自承载集成运行时,请安装 .NET Framework 4.6.1 或更高版本。If you're installing the self-hosted integration runtime on a Windows 7 machine, install .NET Framework 4.6.1 or later. 有关详细信息,请参阅 .NET Framework 系统需求See .NET Framework System Requirements for details.

  • 对于自承载集成运行时计算机,建议的最低配置为 4 核 2 GHz 处理器,8 GB RAM,80 GB 可用硬盘空间。The recommended minimum configuration for the self-hosted integration runtime machine is a 2-GHz processor with 4 cores, 8 GB of RAM, and 80 GB of available hard drive space.

  • 如果主机计算机进入休眠状态,则自承载集成运行时不会响应数据请求。If the host machine hibernates, the self-hosted integration runtime doesn't respond to data requests. 安装自承载集成运行时之前,请在计算机上配置相应的电源计划。Configure an appropriate power plan on the computer before you install the self-hosted integration runtime. 如果计算机配置为休眠,则自承载集成运行时安装程序会通过消息发出提示。If the machine is configured to hibernate, the self-hosted integration runtime installer prompts with a message.

  • 只有计算机管理员才能成功安装和配置自承载集成运行时。You must be an administrator on the machine to successfully install and configure the self-hosted integration runtime.

  • 复制活动按特定的频率运行。Copy-activity runs happen with a specific frequency. 计算机上的处理器和 RAM 使用率遵循相同的高峰期和空闲期模式。Processor and RAM usage on the machine follows the same pattern with peak and idle times. 此外,资源使用率在很大程度上取决于移动的数据量。Resource usage also depends heavily on the amount of data that is moved. 进行多个复制作业时,会看到资源使用率在高峰期上升。When multiple copy jobs are in progress, you see resource usage go up during peak times.

  • 在提取 Parquet、ORC 或 Avro 格式的数据时,任务可能会失败。Tasks might fail during extraction of data in Parquet, ORC, or Avro formats. 有关 Parquet 的详细信息,请参阅 Azure 数据工厂中的 Parquet 格式For more on Parquet, see Parquet format in Azure Data Factory. 文件创建活动在自承载集成计算机上运行。File creation runs on the self-hosted integration machine. 必须满足以下先决条件才能按预期方式运行文件创建活动:To work as expected, file creation requires the following prerequisites:

安装最佳做法Installation best practices

可以通过从 Microsoft 下载中心下载托管标识安装包来安装自承载集成运行时。You can install the self-hosted integration runtime by downloading a Managed Identity setup package from Microsoft Download Center. 有关分步说明,请参阅在本地与云之间移动数据一文。See the article Move data between on-premises and cloud for step-by-step instructions.

  • 在主机上为自承载集成运行时配置电源计划,使计算机不会休眠。Configure a power plan on the host machine for the self-hosted integration runtime so that the machine doesn't hibernate. 如果主机进入休眠状态,则自承载集成运行时将会脱机。If the host machine hibernates, the self-hosted integration runtime goes offline.
  • 定期备份与自承载集成运行时相关的凭据。Regularly back up the credentials associated with the self-hosted integration runtime.
  • 若要自动完成自承载 IR 安装操作,请参阅通过 PowerShell 安装现有的自承载 IRTo automate self-hosted IR setup operations, please refer to Set up an existing self hosted IR via PowerShell.

从 Microsoft 下载中心安装并注册自承载 IRInstall and register a self-hosted IR from Microsoft Download Center

  1. 转到 Microsoft 集成运行时下载页Go to the Microsoft integration runtime download page.

  2. 选择“下载”,选择 64 位版本,然后选择“下一步”。 Select Download, select the 64-bit version, and select Next. 不支持 32 位版本。The 32-bit version isn't supported.

  3. 直接运行托管标识文件,或将它保存到硬盘再运行它。Run the Managed Identity file directly, or save it to your hard drive and run it.

  4. 在“欢迎”窗口中选择语言,然后选择“下一步” 。On the Welcome window, select a language and select Next.

  5. 接受 Microsoft 软件许可条款,然后选择“下一步”。Accept the Microsoft Software License Terms and select Next.

  6. 选择用于安装自承载集成运行时的文件夹,然后选择“下一步”。Select folder to install the self-hosted integration runtime, and select Next.

  7. 在“准备安装”页上,选择“安装”。 On the Ready to install page, select Install.

  8. 选择“完成”以完成安装。Select Finish to complete installation.

  9. 使用 PowerShell 获取身份验证密钥。Get the authentication key by using PowerShell. 下面是检索身份验证密钥的 PowerShell 示例:Here's a PowerShell example for retrieving the authentication key:

    Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntime
    
  10. 在计算机上运行的 Microsoft Integration Runtime Configuration Manager 的“注册集成运行时(自承载)”窗口中执行以下步骤:On the Register Integration Runtime (Self-hosted) window of Microsoft Integration Runtime Configuration Manager running on your machine, take the following steps:

    1. 将身份验证密钥粘贴到文本区域。Paste the authentication key in the text area.

    2. 或者选择“显示身份验证密钥”,以查看密钥文本。Optionally, select Show authentication key to see the key text.

    3. 选择“注册”。Select Register.

高可用性和可伸缩性High availability and scalability

可将一个自承载集成运行时关联到 Azure 中的多个本地计算机或虚拟机。You can associate a self-hosted integration runtime with multiple on-premises machines or virtual machines in Azure. 这些计算机称为节点。These machines are called nodes. 最多可将 4 个节点与一个自承载集成运行时相关联。You can have up to four nodes associated with a self-hosted integration runtime. 在安装了某个网关作为逻辑网关的本地计算机上配置多个节点的好处如下:The benefits of having multiple nodes on on-premises machines that have a gateway installed for a logical gateway are:

  • 更高的自承载集成运行时可用性,使其不再是大数据解决方案或与数据工厂集成的云数据中的单一故障点。Higher availability of the self-hosted integration runtime so that it's no longer the single point of failure in your big data solution or cloud data integration with Data Factory. 这种可用性有助于确保使用最多 4 个节点来实现连续性。This availability helps ensure continuity when you use up to four nodes.
  • 在本地和云数据存储之间移动数据期间提高了性能和吞吐量。Improved performance and throughput during data movement between on-premises and cloud data stores. 获取有关性能比较的更多信息。Get more information on performance comparisons.

可以通过从下载中心安装自承载集成运行时来关联多个节点。You can associate multiple nodes by installing the self-hosted integration runtime software from Download Center. 然后,根据此教程中所述,使用通过 New-AzDataFactoryV2IntegrationRuntimeKey cmdlet 获取的任一身份验证密钥来注册自承载集成运行时。Then, register it by using either of the authentication keys that were obtained from the New-AzDataFactoryV2IntegrationRuntimeKey cmdlet, as described in the tutorial.

备注

无需创建新的自承载集成运行时即可关联每个节点。You don't need to create a new self-hosted integration runtime to associate each node. 可以在另一台计算机上安装自承载集成运行时,并使用同一身份验证密钥注册它。You can install the self-hosted integration runtime on another machine and register it by using the same authentication key.

备注

在添加另一个用于实现高可用性和可伸缩性的节点之前,请确保已在第一个节点上启用了“远程访问 Intranet”选项。Before you add another node for high availability and scalability, ensure that the Remote access to intranet option is enabled on the first node. 为此,请选择“Microsoft Integration Runtime Configuration Manager” > “设置” > “远程访问 Intranet”。 To do so, select Microsoft Integration Runtime Configuration Manager > Settings > Remote access to intranet.

扩展注意事项Scale considerations

向外扩展Scale out

如果自承载 IR 上的处理器使用率较高且可用内存较低,添加新节点有助于跨计算机横向扩展负载。When processor usage is high and available memory is low on the self-hosted IR, add a new node to help scale out the load across machines. 如果活动因超时或自承载 IR 节点处于脱机状态而失败,则向网关添加节点会有所作用。If activities fail because they time out or the self-hosted IR node is offline, it helps if you add a node to the gateway.

纵向扩展Scale up

如果处理器和可用 RAM 未充分利用,但并发作业的执行达到了节点的限制,可以通过增加节点上可运行的并发作业数进行纵向扩展。When the processor and available RAM aren't well utilized, but the execution of concurrent jobs reaches a node's limits, scale up by increasing the number of concurrent jobs that a node can run. 活动因自承载 IR 过载而超时时,也可以进行纵向扩展。You might also want to scale up when activities time out because the self-hosted IR is overloaded. 如下图所示,可以增加节点的最大容量:As shown in the following image, you can increase the maximum capacity for a node:

增加节点上可运行的并发作业数

TLS/SSL 证书要求TLS/SSL certificate requirements

下面是用于保护集成运行时节点间通信的 TLS/SSL 证书的相关要求:Here are the requirements for the TLS/SSL certificate that you use to secure communication between integration runtime nodes:

  • 证书必须是公共可信的 X509 v3 证书。The certificate must be a publicly trusted X509 v3 certificate. 建议使用公共合作伙伴证书颁发机构 (CA) 颁发的证书。We recommend that you use certificates that are issued by a public partner certification authority (CA).
  • 每个集成运行时节点必须信任此证书。Each integration runtime node must trust this certificate.
  • 不建议使用使用者可选名称 (SAN) 证书,因为只会使用最后一个 SAN 项。We don't recommend Subject Alternative Name (SAN) certificates because only the last SAN item is used. 其他所有 SAN 项将被忽略。All other SAN items are ignored. 例如,如果某个 SAN 证书的 SAN 为 node1.domain.contoso.comnode2.domain.contoso.com,则只能在完全限定的域名 (FQDN) 为 node2.domain.contoso.com 的计算机上使用此证书。For example, if you have a SAN certificate whose SANs are node1.domain.contoso.com and node2.domain.contoso.com, you can use this certificate only on a machine whose fully qualified domain name (FQDN) is node2.domain.contoso.com.
  • 此证书可以使用 Windows Server 2012 R2 支持的任何 TLS/SSL 证书密钥大小。The certificate can use any key size supported by Windows Server 2012 R2 for TLS/SSL certificates.
  • 不支持使用 CNG 密钥的证书。Certificates that use CNG keys aren't supported.

备注

此证书用于:This certificate is used:

  • 加密自承载 IR 节点上的端口。To encrypt ports on a self-hosted IR node.
  • 进行节点间的通信以实现状态同步,包括节点间的链接服务的凭据同步。For node-to-node communication for state synchronization, which includes credentials synchronization of linked services across nodes.
  • 使用 PowerShell cmdlet 从本地网络内部完成链接服务凭据设置时使用此证书。When a PowerShell cmdlet is used for linked-service credential settings from within a local network.

如果你的专用网络环境不安全或者你要确保专用网络内部节点间通信的安全性,则我们建议使用此证书。We suggest you use this certificate if your private network environment is not secure or if you want to secure the communication between nodes within your private network.

无论是否设置此证书,从自承载 IR 到其他数据存储的数据移动始终在加密的通道中发生。Data movement in transit from a self-hosted IR to other data stores always happens within an encrypted channel, regardless of whether or not this certificate is set.

在 Azure 数据工厂中创建共享自承载集成运行时Create a shared self-hosted integration runtime in Azure Data Factory

可以重复使用已在数据工厂中设置的现有自承载集成运行时基础结构。You can reuse an existing self-hosted integration runtime infrastructure that you already set up in a data factory. 这种重复使用可以通过引用现有的共享自承载 IR,在不同的数据工厂中创建链接的自承载集成运行时。This reuse lets you create a linked self-hosted integration runtime in a different data factory by referencing an existing shared self-hosted IR.

术语Terminology

  • 共享 IR:在物理基础结构上运行的原始自承载 IR。Shared IR: An original self-hosted IR that runs on a physical infrastructure.
  • 链接 IR:引用另一个共享 IR 的 IR。Linked IR: An IR that references another shared IR. 链接的 IR 是一个逻辑 IR,它使用另一个共享自承载 IR 的基础结构。The linked IR is a logical IR and uses the infrastructure of another shared self-hosted IR.

共享自承载集成运行时的方法Methods to share a self-hosted integration runtime

若要与多个数据工厂共享某个自承载集成运行时,请参阅创建共享的自承载集成运行时了解详细信息。To share a self-hosted integration runtime with multiple data factories, see Create a shared self-hosted integration runtime for details.

监视Monitoring

共享 IRShared IR

用于查找共享集成运行时的选项

监视共享的集成运行时

链接 IRLinked IR

用于查找链接集成运行时的选项

监视链接的集成运行时

自承载 IR 共享的已知限制Known limitations of self-hosted IR sharing

  • 要在其中创建链接 IR 的数据工厂必须有一个托管标识The data factory in which a linked IR is created must have an Managed Identity. 默认情况下,在 Azure 门户或 PowerShell cmdlet 中创建的数据工厂已隐式创建了托管标识。By default, the data factories created in the Azure portal or PowerShell cmdlets have an implicitly created Managed Identity. 但是,如果数据工厂是通过 Azure 资源管理器模板或 SDK 创建的,则必须显式设置 Identity 属性。But when a data factory is created through an Azure Resource Manager template or SDK, you must set the Identity property explicitly. 此设置确保资源管理器创建包含托管标识的数据工厂。This setting ensures that Resource Manager creates a data factory that contains a Managed Identity.

  • 支持此功能的数据工厂 .NET SDK 必须是 1.1.0 或更高版本。The Data Factory .NET SDK that supports this feature must be version 1.1.0 or later.

  • 若要授予权限,需要在共享 IR 所在的数据工厂中拥有“所有者”角色或继承的“所有者”角色。To grant permission, you need the Owner role or the inherited Owner role in the data factory where the shared IR exists.

  • 共享功能仅适用于同一 Azure AD 租户中的数据工厂。The sharing feature works only for data factories within the same Azure AD tenant.

  • 对于 Azure AD 来宾用户,UI 中的搜索功能(通过使用搜索关键字列出所有数据工厂)将不起作用For Azure AD guest users, the search functionality in the UI, which lists all data factories by using a search keyword, doesn't work. 但只要来宾用户是数据工厂的所有者,你就可以在不使用搜索功能的情况下共享 IR。But as long as the guest user is the owner of the data factory, you can share the IR without the search functionality. 对于需要共享 IR 的数据工厂的托管标识,请在“分配权限”框中输入该托管标识,然后在数据工厂 UI 中选择“添加”。For the Managed Identity of the data factory that needs to share the IR, enter that Managed Identity in the Assign Permission box and select Add in the Data Factory UI.

    备注

    此功能只能在数据工厂 V2 中使用。This feature is available only in Data Factory V2.

通知区域图标和通知Notification area icons and notifications

如果将光标移到通知区域中的图标或消息上,可以查看自承载集成运行时状态的详细信息。If you move your cursor over the icon or message in the notification area, you can see details about the state of the self-hosted integration runtime.

通知区域中的通知

端口和防火墙Ports and firewalls

需要考虑两个防火墙:There are two firewalls to consider:

  • 在组织的中央路由器上运行的企业防火墙。The corporate firewall that runs on the central router of the organization
  • 在安装自承载集成运行时的本地计算机上作为守护程序配置的 Windows 防火墙。The Windows firewall that is configured as a daemon on the local machine where the self-hosted integration runtime is installed

防火墙

在企业防火墙级别,需配置以下域和出站端口:At the corporate firewall level, you need to configure the following domains and outbound ports:

域名Domain names 出站端口Outbound ports 说明Description
*.servicebus.chinacloudapi.cn 443443 自承载集成运行时连接到 Azure 数据工厂中的数据移动服务时需要此端口。Required by the self-hosted integration runtime to connect to data movement services in Azure Data Factory.
*.frontend.datamovement.azure.cn 443443 自承载集成运行时连接到数据工厂服务时需要此端口。Required by the self-hosted integration runtime to connect to the Data Factory service.
download.microsoft.com 443443 自承载集成运行时下载更新时需要此端口。Required by the self-hosted integration runtime for downloading the updates. 如果已禁用自动更新,则可以跳过对此域的配置。If you have disabled auto-update, you can skip configuring this domain.
*.core.chinacloudapi.cn 443443 使用分阶段复制功能时,由自承载集成运行时用来连接到 Azure 存储帐户。Used by the self-hosted integration runtime to connect to the Azure storage account when you use the staged copy feature.
*.database.chinacloudapi.cn 14331433 仅当从/向 Azure SQL 数据库或 Azure SQL 数据仓库复制时才为必需项,否则为可选项。Required only when you copy from or to Azure SQL Database or Azure SQL Data Warehouse and optional otherwise. 在不打开端口 1433 的情况下,使用暂存复制功能将数据复制到 SQL 数据库或 SQL 数据仓库。Use the staged-copy feature to copy data to SQL Database or SQL Data Warehouse without opening port 1433.

在 Windows 防火墙级别或计算机级别,通常已启用这些出站端口。At the Windows firewall level or machine level, these outbound ports are normally enabled. 如果未启用,可以在自承载集成运行时计算机上配置域和端口。If they aren't, you can configure the domains and ports on a self-hosted integration runtime machine.

备注

根据源和接收器,可能需要在企业防火墙或 Windows 防火墙中允许其他域和出站端口。Based on your source and sinks, you might need to allow additional domains and outbound ports in your corporate firewall or Windows firewall.

对于某些云数据库(例如,Azure SQL 数据库),可能需要在其防火墙配置上允许自承载集成运行时计算机的 IP 地址。For some cloud databases, such as Azure SQL Database, you might need to allow IP addresses of self-hosted integration runtime machines on their firewall configuration.

将数据从源复制到接收器Copy data from a source to a sink

确保在企业防火墙、自承载集成运行时计算机上的 Windows 防火墙和数据存储上正确启用防火墙规则。Ensure that you properly enable firewall rules on the corporate firewall, the Windows firewall of the self-hosted integration runtime machine, and the data store itself. 启用这些规则可以让自承载集成运行时成功连接到源和接收器。Enabling these rules lets the self-hosted integration runtime successfully connect to both source and sink. 为复制操作涉及的每个数据存储启用规则。Enable rules for each data store that is involved in the copy operation.

例如,若要从本地数据存储复制到 SQL 数据库接收器或 Azure Synapse Analytics(以前为 SQL 数据仓库)接收器,请执行以下步骤:For example, to copy from an on-premises data store to a SQL Database sink or an Azure Synapse Analytics (formerly SQL Data Warehouse) sink, take the following steps:

  1. 对于 Windows 防火墙和企业防火墙,允许 1433 端口上的出站 TCP 通信。Allow outbound TCP communication on port 1433 for both the Windows firewall and the corporate firewall.
  2. 配置 SQL 数据库的防火墙设置,将自承载集成运行时计算机的 IP 地址添加到允许的 IP 地址列表。Configure the firewall settings of the SQL Database to add the IP address of the self-hosted integration runtime machine to the list of allowed IP addresses.

备注

如果防火墙不允许出站端口 1433,则自承载集成运行时无法直接访问 SQL 数据库。If your firewall doesn't allow outbound port 1433, the self-hosted integration runtime can't access the SQL database directly. 在这种情况下,可对 SQL 数据库和 Azure Synapse Analytics 使用分阶段复制In this case, you can use a staged copy to SQL Database and Azure Synapse Analytics. 对于此方案,只需将 HTTPS(端口 443)用于数据移动。In this scenario, you require only HTTPS (port 443) for the data movement.

代理服务器注意事项Proxy server considerations

如果企业网络环境使用代理服务器访问 Internet,请配置自承载集成运行时以使用合适的代理设置。If your corporate network environment uses a proxy server to access the internet, configure the self-hosted integration runtime to use appropriate proxy settings. 可以在初始注册阶段设置代理。You can set the proxy during the initial registration phase.

指定代理

配置后,自承载集成运行时使用代理服务器连接到云服务的源和目标(使用 HTTP/HTTPS 协议的源/目标)。When configured, the self-hosted integration runtime uses the proxy server to connect to the cloud service's source and destination (which use the HTTP or HTTPS protocol). 因此,请在初始设置期间选择“更改链接”。This is why you select Change link during initial setup.

设置代理

有三个配置选项:There are three configuration options:

  • 不使用代理:自承载集成运行时不显式使用任何代理来连接到云服务。Do not use proxy: The self-hosted integration runtime doesn't explicitly use any proxy to connect to cloud services.
  • 使用系统代理:自承载集成运行时使用在 diahost.exe.config 和 diawp.exe.config 中配置的代理设置。如果这些文件未指定代理配置,则自承载集成运行时无需通过代理,可直接连接到云服务。Use system proxy: The self-hosted integration runtime uses the proxy setting that is configured in diahost.exe.config and diawp.exe.config. If these files specify no proxy configuration, the self-hosted integration runtime connects to the cloud service directly without going through a proxy.
  • 使用自定义代理:配置用于自承载集成运行时的 HTTP 代理设置,而不使用 diahost.exe.config 和 diawp.exe.config 中的配置。“地址”和“端口”值是必需的。 Use custom proxy: Configure the HTTP proxy setting to use for the self-hosted integration runtime, instead of using configurations in diahost.exe.config and diawp.exe.config. Address and Port values are required. “用户名”和“密码”值是可选的,具体取决于代理的身份验证设置。 User Name and Password values are optional, depending on your proxy's authentication setting. 所有设置都使用 Windows DPAPI 在自承载集成运行时进行加密,并存储在本地计算机上。All settings are encrypted with Windows DPAPI on the self-hosted integration runtime and stored locally on the machine.

保存更新的代理设置之后,集成运行时主机服务会自动重启。The integration runtime host service restarts automatically after you save the updated proxy settings.

成功注册自承载集成运行时后,如果想要查看或更新代理设置,请使用 Microsoft Integration Runtime Configuration Manager。After you register the self-hosted integration runtime, if you want to view or update proxy settings, use Microsoft Integration Runtime Configuration Manager.

  1. 打开“Microsoft Integration Runtime Configuration Manager”。Open Microsoft Integration Runtime Configuration Manager.
  2. 选择“设置”选项卡。Select the Settings tab.
  3. 在“HTTP 代理”下,选择“更改”链接打开“设置 HTTP 代理”对话框。 Under HTTP Proxy, select the Change link to open the Set HTTP Proxy dialog box.
  4. 选择“下一步”。Select Next. 此时会出现警告,询问是否允许保存代理设置和重启集成运行时主机服务。You then see a warning that asks for your permission to save the proxy setting and restart the integration runtime host service.

可以使用 Configuration Manager 工具查看和更新 HTTP 代理。You can use the configuration manager tool to view and update the HTTP proxy.

查看和更新代理

备注

如果使用 NTLM 身份验证设置代理服务器,集成运行时主机服务会在域帐户下运行。If you set up a proxy server with NTLM authentication, the integration runtime host service runs under the domain account. 如果之后更改域帐户密码,请记得更新服务的配置设置并重启服务。If you later change the password for the domain account, remember to update the configuration settings for the service and restart the service. 鉴于此项要求,我们建议使用专用域帐户来访问代理服务器,这样就无需经常更新密码。Because of this requirement, we suggest that you access the proxy server by using a dedicated domain account that doesn't require you to update the password frequently.

配置代理服务器设置Configure proxy server settings

如果为 HTTP 代理选择“使用系统代理”选项,则自承载集成运行时使用 diahost.exe.config 和 diawp.exe.config 中的代理设置。如果这些文件未指定代理,则自承载集成运行时无需通过代理,可直接连接到云服务。If you select the Use system proxy option for the HTTP proxy, the self-hosted integration runtime uses the proxy settings in diahost.exe.config and diawp.exe.config. When these files specify no proxy, the self-hosted integration runtime connects to the cloud service directly without going through a proxy. 以下过程说明如何更新 diahost.exe.config 文件:The following procedure provides instructions for updating the diahost.exe.config file:

  1. 在文件资源管理器中,生成 C:\Program Files\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config 的安全副本作为原始文件的备份。In File Explorer, make a safe copy of C:\Program Files\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config as a backup of the original file.

  2. 以管理员身份打开记事本。Open Notepad running as administrator.

  3. 在记事本中,打开文本文件 C:\Program Files\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config。In Notepad, open the text file C:\Program Files\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config.

  4. 找到默认的 system.net 标记,如以下代码中所示:Find the default system.net tag as shown in the following code:

    <system.net>
        <defaultProxy useDefaultCredentials="true" />
    </system.net>
    

    然后可以添加代理服务器的详细信息,如以下示例所示:You can then add proxy server details as shown in the following example:

    <system.net>
        <defaultProxy enabled="true">
              <proxy bypassonlocal="true" proxyaddress="http://proxy.domain.org:8888/" />
        </defaultProxy>
    </system.net>
    

    允许在代理标记中使用其他属性,以指定所需的设置(如 scriptLocation)。The proxy tag allows additional properties to specify required settings like scriptLocation. 有关语法,请参阅 <proxy> 元素(网络设置)See <proxy> Element (Network Settings) for syntax.

    <proxy autoDetect="true|false|unspecified" bypassonlocal="true|false|unspecified" proxyaddress="uriString" scriptLocation="uriString" usesystemdefault="true|false|unspecified "/>
    
  5. 将配置文件保存到其原始位置。Save the configuration file in its original location. 然后重启自承载集成运行时主机服务,以拾取更改。Then restart the self-hosted integration runtime host service, which picks up the changes.

    若要重启服务,请从控制面板使用“服务”小程序。To restart the service, use the services applet from Control Panel. 或在“Integration Runtime Configuration Manager”中依次选择“停止服务”按钮和“启动服务”。 Or from Integration Runtime Configuration Manager, select the Stop Service button, and then select Start Service.

    如果服务未启动,原因可能是将错误的 XML 标记语法添加到了编辑的应用程序配置文件中。If the service doesn't start, you likely added incorrect XML tag syntax in the application configuration file that you edited.

重要

不要忘记同时更新 diahost.exe.config 和 diawp.exe.config。Don't forget to update both diahost.exe.config and diawp.exe.config.

还需要确保 Azure 在你公司的允许列表中。You also need to make sure that Azure is in your company's allow list. 可以从 Microsoft 下载中心下载有效的 Azure IP 地址列表。You can download the list of valid Azure IP addresses from Microsoft Download Center.

如果出现如下所示的错误消息,原因可能是防火墙或代理服务器的配置不当。If you see error messages like the following ones, the likely reason is improper configuration of the firewall or proxy server. 此类配置会阻止自承载集成运行时连接到数据工厂对自身进行身份验证。Such configuration prevents the self-hosted integration runtime from connecting to Data Factory to authenticate itself. 若要确保正确配置防火墙和代理服务器,请参阅上一部分。To ensure that your firewall and proxy server are properly configured, refer to the previous section.

  • 尝试注册自承载集成运行时时收到以下错误消息:“无法注册此 Integration Runtime 节点!When you try to register the self-hosted integration runtime, you receive the following error message: "Failed to register this Integration Runtime node! 请确认身份验证密钥有效,且集成服务主机服务在此计算机上运行。”Confirm that the Authentication key is valid and the integration service host service is running on this machine."

  • 打开 Integration Runtime Configuration Manager 时,将看到状态为“已断开连接”或“正在连接”。When you open Integration Runtime Configuration Manager, you see a status of Disconnected or Connecting. 查看 Windows 事件日志时,在“事件查看器” > “应用程序和服务日志” > “Microsoft Integration Runtime”下看到如下所示的错误消息: When you view Windows event logs, under Event Viewer > Application and Services Logs > Microsoft Integration Runtime, you see error messages like this one:

    Unable to connect to the remote server
    A component of Integration Runtime has become unresponsive and restarts automatically. Component name: Integration Runtime (Self-hosted).
    

启用“从 Intranet 进行远程访问”Enable remote access from an intranet

如果使用 PowerShell 来加密未安装自承载集成运行时的另一台联网计算机上的凭据,可以启用“从 Intranet 进行远程访问”选项。If you use PowerShell to encrypt credentials from a networked machine other than where you installed the self-hosted integration runtime, you can enable the Remote Access from Intranet option. 如果运行 PowerShell 来加密已安装自承载集成运行时的计算机上的凭据,则无法启用“从 Intranet 进行远程访问”。If you run PowerShell to encrypt credentials on the machine where you installed the self-hosted integration runtime, you can't enable Remote Access from Intranet.

在添加另一个用于实现高可用性和可伸缩性的节点之前,请启用“从 Intranet 进行远程访问”。Enable Remote Access from Intranet before you add another node for high availability and scalability.

运行自承载集成运行时安装程序版本 3.3 或更高版本时,默认情况下,自承载集成运行时安装程序将在自承载集成运行时计算机上禁用“从 Intranet 进行远程访问”。When you run the self-hosted integration runtime setup version 3.3 or later, by default the self-hosted integration runtime installer disables Remote Access from Intranet on the self-hosted integration runtime machine.

使用合作伙伴的防火墙或其他防火墙时,可以手动打开端口 8060 或用户配置的端口。When you use a firewall from a partner or others, you can manually open port 8060 or the user-configured port. 如果在安装自承载集成运行时期间防火墙出现问题,请使用以下命令在不配置防火墙的情况下安装自承载集成运行时。If you have a firewall problem while setting up the self-hosted integration runtime, use the following command to install the self-hosted integration runtime without configuring the firewall:

msiexec /q /i IntegrationRuntime.msi NOFIREWALL=1

如果选择不打开自承载集成运行时计算机上的端口 8060,请使用除“设置凭据”应用程序以外的机制来配置数据存储凭据。If you choose not to open port 8060 on the self-hosted integration runtime machine, use mechanisms other than the Setting Credentials application to configure data-store credentials. 例如,可以使用 New-AzDataFactoryV2LinkedServiceEncryptCredential PowerShell cmdlet。For example, you can use the New-AzDataFactoryV2LinkedServiceEncryptCredential PowerShell cmdlet.

后续步骤Next steps

有关分步说明,请参阅教程:将本地数据复制到云中For step-by-step instructions, see Tutorial: Copy on-premises data to cloud.