Azure 数据工厂中的集成运行时Integration runtime in Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

集成运行时 (IR) 是 Azure 数据工厂用于在不同的网络环境之间提供以下数据集成功能的计算基础结构:The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments:

  • 数据移动:跨公用网络中的数据存储和专用网络(本地或虚拟专用网络)中的数据存储复制数据。Data movement: Copy data across data stores in public network and data stores in private network (on-premises or virtual private network). 它提供对内置连接器、格式转换、列映射以及性能和可扩展数据传输的支持。It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
  • 活动分派:分派和监视在各种计算服务(如 Azure HDInsight、Azure SQL 数据库、SQL Server 等等)上运行的转换活动。Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure HDInsight, Azure SQL Database, SQL Server, and more.
  • SSIS 包执行:在托管的 Azure 计算环境中本机执行 SQL Server 集成服务 (SSIS) 包。SSIS package execution: Natively execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.

在数据工厂中,活动定义要执行的操作。In Data Factory, an activity defines the action to be performed. 链接服务定义目标数据存储或计算服务。A linked service defines a target data store or a compute service. 集成运行时提供活动和链接服务之间的桥梁。An integration runtime provides the bridge between the activity and linked Services. 它被链接服务或活动引用,提供运行或分派活动的计算环境。It's referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from. 这样一来,可以在最接近目标数据存储的区域中执行活动,或者,以最优性能计算服务的同时满足安全和合规性需求。This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.

可以在 Azure 数据工厂 UX 中通过管理中心创建集成运行时以及可引用它们的任何活动或数据集。Integration runtimes can be created in the Azure Data Factory UX via the management hub and any activities, or datasets that reference them.

集成运行时类型Integration runtime types

数据工厂提供三种类型的 Integration Runtime (IR),应选择最能满足你所寻求的数据集成功能和网络环境需求的类型。Data Factory offers three types of Integration Runtime (IR), and you should choose the type that best serve the data integration capabilities and network environment needs you're looking for. 这三种类型是:These three types are:

  • AzureAzure
  • 自承载Self-hosted
  • Azure-SSISAzure-SSIS

下表介绍了针对其中每个集成运行时类型提供的功能和网络支持:The following table describes the capabilities and network support for each of the integration runtime types:

IR 类型IR type 公用网络Public network 专用网络Private network
AzureAzure 数据移动Data movement
活动分派Activity dispatch
数据移动Data movement
活动分派Activity dispatch
自承载Self-hosted 数据移动Data movement
活动分派Activity dispatch
数据移动Data movement
活动分派Activity dispatch
Azure-SSISAzure-SSIS SSIS 包执行SSIS package execution SSIS 包执行SSIS package execution

Azure 集成运行时Azure integration runtime

Azure 集成运行时可以:An Azure integration runtime can:

  • 在 Azure 中运行数据流Run Data Flows in Azure
  • 在云数据存储之间运行复制活动Run copy activity between cloud data stores
  • 在公用网络中调度以下转换活动:HDInsight Hive 活动、HDInsight Pig 活动、HDInsight MapReduce 活动、HDInsight Spark 活动、HDInsight Streaming 活动、机器学习批处理执行活动、Machine Learning Update Resource 活动、存储过程活动、.NET 自定义活动、Web 活动、Lookup 活动和 Get Metadata 活动。Dispatch the following transform activities in public network: HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, .NET custom activity, Web activity, Lookup activity, and Get Metadata activity.

Azure IR 网络环境Azure IR network environment

Azure Integration Runtime 支持使用可公开访问的终结点连接到数据存储和计算服务。Azure Integration Runtime supports connecting to data stores and computes services with public accessible endpoints. 为 Azure 虚拟网络环境使用自承载集成运行时。Use a self-hosted integration runtime for Azure Virtual Network environment.

Azure IR 计算资源和缩放Azure IR compute resource and scaling

Azure 集成运行时在 Azure 中提供完全托管的无服务器计算。Azure integration runtime provides a fully managed, serverless compute in Azure. 无需担心基础结构配置、软件安装、修补或功能缩放。You don't have to worry about infrastructure provision, software installation, patching, or capacity scaling. 此外,只需为实际使用时间付费。In addition, you only pay for the duration of the actual utilization.

Azure 集成运行时提供了使用安全、可靠和高性能的方式在云数据存储之间移动数据的本机计算。Azure integration runtime provides the native compute to move data between cloud data stores in a secure, reliable, and high-performance manner. 可以设置在复制活动上要使用的数据集成单元的数量,相应地,Azure IR 的计算大小弹性地纵向扩展,无需显式调整 Azure 集成运行时的大小。You can set how many data integration units to use on the copy activity, and the compute size of the Azure IR is elastically scaled up accordingly without you having to explicitly adjusting size of the Azure Integration Runtime.

活动分派是将活动路由到目标计算服务的轻型操作,因此,无需纵向扩展此方案的计算大小。Activity dispatch is a lightweight operation to route the activity to the target compute service, so there isn't need to scale up the compute size for this scenario.

有关创建和配置 Azure IR 的信息,请参阅操作方法指南下的“如何创建和配置 Azure IR”。For information about creating and configuring an Azure IR, reference how to create and configure Azure IR under how to guides.

自承载集成运行时Self-hosted integration runtime

自承载 IR 能够:A self-hosted IR is capable of:

  • 在专用网络中的云数据存储和数据存储之间运行复制活动。Running copy activity between a cloud data stores and a data store in private network.
  • 对本地或 Azure 虚拟网络中的计算资源分派以下转换活动:HDInsight Hive 活动 (BYOC-Bring Your Own Cluster)、HDInsight Pig 活动 (BYOC)、HDInsight MapReduce 活动 (BYOC)、HDInsight Spark 活动 (BYOC)、HDInsight Streaming 活动 (BYOC)、机器学习批处理执行活动、Machine Learning Update Resource 活动、存储过程活动、自定义活动(在 Azure Batch 上运行)、Lookup 活动和 Get Metadata 活动。Dispatching the following transform activities against compute resources in on-premises or Azure Virtual Network: HDInsight Hive activity (BYOC-Bring Your Own Cluster), HDInsight Pig activity (BYOC), HDInsight MapReduce activity (BYOC), HDInsight Spark activity (BYOC), HDInsight Streaming activity (BYOC), Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Custom activity (runs on Azure Batch), Lookup activity, and Get Metadata activity.


使用自承载集成运行时支持需要自带驱动程序(如 SAP Hana、MySQL 等)的数据存储。有关详细信息,请参阅支持的数据存储Use self-hosted integration runtime to support data stores that requires bring-your-own driver such as SAP Hana, MySQL, etc. For more information, see supported data stores.


Java Runtime Environment (JRE) 是自承载 IR 的依赖项。Java Runtime Environment (JRE) is a dependency of Self Hosted IR. 请确保将 JRE 安装在同一主机上。Please make sure you have JRE installed on the same host.

自承载 IR 网络环境Self-hosted IR network environment

如果想要在专用网络环境中安全地执行数据集成(不在公有云环境中建立直通连接),可以在企业防火墙后的本地环境中或虚拟专用网络内安装自承载 IR。If you want to perform data integration securely in a private network environment, which doesn't have a direct line-of-sight from the public cloud environment, you can install a self-hosted IR on premises environment behind your corporate firewall, or inside a virtual private network. 自承载集成运行时仅进行基于出站 HTTP 的连接,以打开 Internet。The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.

自承载 IR 计算资源和缩放Self-hosted IR compute resource and scaling

在本地计算机或专用网络中的虚拟机上安装自承载 IR。Install Self-hosted IR on an on-premises machine or a virtual machine inside a private network. 目前,仅支持在 Windows 操作系统上运行自承载 IR。Currently, we only support running the self-hosted IR on a Windows operating system.

为了获得高可用性和可伸缩性,可以通过在主动-主动模式中将逻辑实例与多个本地计算机相关联来向外扩展自承载 IR。For high availability and scalability, you can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode. 有关详细信息,请参阅操作指南下的如何创建和配置自承载 IR 一文。For more information, see how to create and configure self-hosted IR article under how to guides for details.

Azure-SSIS 集成运行时Azure-SSIS Integration Runtime

若要提升和切换现有 SSIS 工作负荷,可以创建 Azure-SSIS IR 以本机执行 SSIS 包。To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.

Azure-SSIS IR 网络环境Azure-SSIS IR network environment

可以在公用网络或专用网络中配置 Azure-SSIS IR。Azure-SSIS IR can be provisioned in either public network or private network. 通过将 Azure-SSIS IR 加入连接到本地网络的虚拟网络来支持本地数据访问。On-premises data access is supported by joining Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

Azure-SSIS IR 计算资源和缩放Azure-SSIS IR compute resource and scaling

Azure-SSIS IR 是完全托管的 Azure VM 群集,专用于运行 SSIS 包。Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. 可使用自己的 Azure SQL 数据库或 SQL 托管实例来托管 SSIS 项目/包 (SSISDB) 的目录。You can bring your own Azure SQL Database or SQL Managed Instance for the catalog of SSIS projects/packages (SSISDB). 可以通过指定节点大小纵向扩展计算能力并通过指定群集中的节点数对其进行横向扩展。You can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster. 可以在认为合适时停止和启动 Azure-SSIS 集成运行时以管理运行的成本。You can manage the cost of running your Azure-SSIS Integration Runtime by stopping and starting it as you see fit.

有关详细信息,请参阅操作方法指南下的“如何创建和配置 Azure-SSIS IR”一文。For more information, see how to create and configure Azure-SSIS IR article under how to guides. 创建后,即可使用熟悉的工具(如 SQL Server 数据工具 (SSDT) 和 SQL Server Management Studio (SSMS))部署和管理现有 SSIS 包,无需对其更改或仅做少量更改。Once created, you can deploy and manage your existing SSIS packages with little to no change using familiar tools such as SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS), just like using SSIS on premises.

有关 Azure-SSIS 运行时的详细信息,请参阅以下文章:For more information about Azure-SSIS runtime, see the following articles:

  • 教程:将 SSIS 包部署到 AzureTutorial: deploy SSIS packages to Azure. 此文提供有关创建 Azure-SSIS IR,并使用 Azure SQL 数据库来承载 SSIS 目录的分步说明。This article provides step-by-step instructions to create an Azure-SSIS IR and uses an Azure SQL Database to host the SSIS catalog.
  • 如何:创建 Azure-SSIS 集成运行时How to: Create an Azure-SSIS integration runtime. 此文延伸了本教程的内容,介绍了如何使用 SQL 托管实例以及如何将 IR 加入虚拟网络。This article expands on the tutorial and provides instructions on using SQL Managed Instance and joining the IR to a virtual network.
  • 监视 Azure-SSIS IRMonitor an Azure-SSIS IR. 此文介绍如何检索有关 Azure-SSIS IR 的信息,以及返回的信息中的状态说明。This article shows you how to retrieve information about an Azure-SSIS IR and descriptions of statuses in the returned information.
  • 管理 Azure-SSIS IRManage an Azure-SSIS IR. 此文介绍如何停止、启动或删除 Azure-SSIS IR。This article shows you how to stop, start, or remove an Azure-SSIS IR. 此外,介绍如何通过在 Azure-SSIS IR 中添加更多节点来扩展 IR。It also shows you how to scale out your Azure-SSIS IR by adding more nodes to the IR.
  • 将 Azure-SSIS IR 加入虚拟网络Join an Azure-SSIS IR to a virtual network. 此文提供有关将 Azure-SSIS IR 加入 Azure 虚拟网络的概念性信息。This article provides conceptual information about joining an Azure-SSIS IR to an Azure virtual network. 此外,还介绍可以执行哪些步骤来使用 Azure 门户配置虚拟网络,以便 Azure-SSIS IR 能够加入虚拟网络。It also provides steps to use Azure portal to configure virtual network so that Azure-SSIS IR can join the virtual network.

集成运行时位置Integration runtime location

工厂位置和 IR 位置之间的关系Relationship between factory location and IR location

客户在创建数据工厂实例时需要指定数据工厂的位置。When customer creates a data factory instance, they need to specify the location for the data factory. 数据工厂位置是存储数据工厂元数据和启动管道触发所在的位置。The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from. 工厂的元数据仅存储在客户选择的区域中,而不会存储在其他区域中。Metadata for the factory is only stored in the region of customer's choice and will not be stored in other regions.

同时,数据工厂可以访问其他 Azure 区域的数据存储和计算数据,在数据存储之间移动数据或使用计算服务处理数据。Meanwhile, a data factory can access data stores and compute services in other Azure regions to move data between data stores or process data using compute services. 此行为通过全局可用 IR 来实现,以确保数据的符合性、有效性并减少网络对外费用。This behavior is realized through the globally available IR to ensure data compliance, efficiency, and reduced network egress costs.

IR 位置定义其后端计算的位置,尤其是执行数据移动、活动分派和 SSIS 包执行的位置。The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. IR 位置可能与数据工厂所属的位置不同。The IR location can be different from the location of the data factory it belongs to.

Azure IR 位置Azure IR location

可以设置 Azure IR 的特定位置,这样活动执行或活动调度就会发生在该特定区域。You can set a certain location of an Azure IR, in which case the activity execution or dispatch will happen in that specific region.

如果选择在公用网络中使用默认的自动解析 Azure IR,则会出现以下情况:If you choose to use the auto-resolve Azure IR in public network, which is the default,

  • 对于复制活动,ADF 会尽最大努力自动检测接收器数据存储的位置,然后使用同一区域中的 IR(如果可用)或者使用同一地理位置中最靠近的 IR;如果检测不到接收器数据存储的区域,则会改用数据工厂区域中的 IR。For copy activity, ADF will make a best effort to automatically detect your sink data store's location, then use the IR in either the same region if available or the closest one in the same geography; if the sink data store's region is not detectable, IR in the data factory region as alternative is used.

  • 对于查找/获取元数据/删除活动执行(也称为管道活动)、转换活动调度(也称为外部活动)和创作操作(测试连接、浏览文件夹列表和表列表、预览数据),ADF 会使用数据工厂区域中的 IR。For Lookup/GetMetadata/Delete activity execution (also known as Pipeline activities), transformation activity dispatching (also known as External activities), and authoring operations (test connection, browse folder list and table list, preview data), ADF uses the IR in the data factory region.

如果为自动解析 Azure IR 启用托管虚拟网络,则 ADF 会在数据工厂区域中使用 IR。If you enable Managed Virtual Network for auto-resolve Azure IR, ADF uses the IR in the data factory region.

可以在 UI 或活动监视有效负载的管道活动监视视图中监视哪个 IR 位置在活动执行期间生效。You can monitor which IR location takes effect during activity execution in pipeline activity monitoring view on UI or activity monitoring payload.

自承载 IR 位置Self-hosted IR location

自承载 IR 逻辑上注册到数据工厂,用于支持其功能的计算由你提供。The self-hosted IR is logically registered to the Data Factory and the compute used to support its functionalities is provided by you. 因此,没有适用于自承载 IR 的显式位置属性。Therefore there is no explicit location property for self-hosted IR.

用于执行数据移动时,自承载 IR 从源提取数据并写入到目标。When used to perform data movement, the self-hosted IR extracts data from the source and writes into the destination.

Azure-SSIS IR 位置Azure-SSIS IR location

为你的 Azure-SSIS IR 选择正确的位置对在提取-转换-加载 (ETL) 工作流中实现高性能至关重要。Selecting the right location for your Azure-SSIS IR is essential to achieve high performance in your extract-transform-load (ETL) workflows.

  • Azure-SSIS IR 的位置无需与数据工厂的位置相同,但应与你自己的要托管 SSISDB 的 Azure SQL 数据库或 SQL 托管实例的位置相同。The location of your Azure-SSIS IR does not need to be the same as the location of your data factory, but it should be the same as the location of your own Azure SQL Database or SQL Managed Instance where SSISDB. 这样一来,Azure-SSIS 集成运行时可以轻松地访问 SSISDB,且不会在不同位置之间产生过多的流量。This way, your Azure-SSIS Integration Runtime can easily access SSISDB without incurring excessive traffics between different locations.
  • 如果没有现有 SQL 数据库或 SQL 托管实例,但有本地数据源/目标,则应在连接到本地网络的虚拟网络的同一位置创建新的 Azure SQL 数据库或 SQL 托管实例。If you do not have an existing SQL Database or SQL Managed Instance, but you have on-premises data sources/destinations, you should create a new Azure SQL Database or SQL Managed Instance in the same location of a virtual network connected to your on-premises network. 这样,即可使用新的 Azure SQL 数据库或 SQL 托管实例并加入该虚拟网络来创建 Azure-SSIS IR,所有操作都在同一位置进行,从而有效地将不同位置间的数据移动降至最低。This way, you can create your Azure-SSIS IR using the new Azure SQL Database or SQL Managed Instance and joining that virtual network, all in the same location, effectively minimizing data movements across different locations.
  • 如果现有 Azure SQL 数据库或 SQL 托管实例的位置与连接到本地网络的虚拟网络的位置不同,请先使用现有 Azure SQL 数据库或 SQL 托管实例并在同一位置加入其他虚拟网络来创建 Azure-SSIS IR,然后配置不同位置之间的虚拟网络到虚拟网络连接。If the location of your existing Azure SQL Database or SQL Managed Instance is not the same as the location of a virtual network connected to your on-premises network, first create your Azure-SSIS IR using an existing Azure SQL Database or SQL Managed Instance and joining another virtual network in the same location, and then configure a virtual network to virtual network connection between different locations.

下图显示了数据工厂及其集成运行时的位置设置:The following diagram shows location settings of Data Factory and its integration run times:


确定要使用哪个 IRDetermining which IR to use

复制活动Copy activity

对于复制活动,它需要使用源和接收器链接服务,以定义数据流的方向。For Copy activity, it requires source and sink linked services to define the direction of data flow. 以下逻辑用于确定执行复制所使用的集成运行时实例的类型:The following logic is used to determine which integration runtime instance is used to perform the copy:

  • 在两个云数据源之间复制:当源和接收器链接服务都使用 Azure IR 时,ADF 会使用区域性的 Azure IR(如果已指定),或者自动确定 Azure IR 的位置,前提是你根据集成运行时位置部分的说明选择自动解析 IR(默认设置)。Copying between two cloud data sources: when both source and sink linked services are using Azure IR, ADF uses the regional Azure IR if you specified, or auto determine a location of Azure IR if you choose the autoresolve IR (default) as described in Integration runtime location section.
  • 在云数据源和专用网络中的数据源之间复制:如果源或接收器链接服务指向自承载 IR,则在该自承载集成运行时上执行复制活动。Copying between a cloud data source and a data source in private network: if either source or sink linked service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
  • 在专用网络中的两个数据源之间复制:源和接收器链接服务必须同时指向同一集成运行时实例,且该集成运行时用于执行复制活动。Copying between two data sources in private network: both the source and sink Linked Service must point to the same instance of integration runtime, and that integration runtime is used to execute the copy Activity.

查找和 GetMetadata 活动Lookup and GetMetadata activity

查找和 GetMetadata 活动在关联到数据存储链接服务的集成运行时上执行。The Lookup and GetMetadata activity is executed on the integration runtime associated to the data store linked service.

外部转换活动External transformation activity

每个使用外部计算引擎的外部转换活动都有一个目标计算链接服务,该服务指向一个集成运行时。Each external transformation activity that utilizes an external compute engine has a target compute Linked Service, which points to an integration runtime. 此集成运行时实例确定外部手动编码转换活动的调度位置。This integration runtime instance determines the location where that external hand-coded transformation activity is dispatched from.

后续步骤Next steps

请参阅以下文章:See the following articles: