Azure 数据工厂支持的计算环境Compute environments supported by Azure Data Factory

本文介绍可用于处理或转换数据的不同计算环境。This article explains different compute environments that you can use to process or transform data. 同时还详细介绍了配置将这些计算环境链接到 Azure 数据工厂的链接服务时,数据工厂所支持的不同配置(按需和自带)。It also provides details about different configurations (on-demand vs. bring your own) supported by Data Factory when configuring linked services linking these compute environments to an Azure data factory.

下表提供了数据工厂所支持的计算环境以及可以在其上运行的活动的列表。The following table provides a list of compute environments supported by Data Factory and the activities that can run on them.

计算环境Compute environment 活动activities
按需 HDInsight 群集自己的 HDInsight 群集On-demand HDInsight cluster or your own HDInsight cluster HivePigSparkMapReduceHadoop StreamingHive, Pig, Spark, MapReduce, Hadoop Streaming
Azure BatchAzure Batch 自定义Custom
Azure SQLAzure SQL 数据仓库SQL ServerAzure SQL, Azure SQL Data Warehouse, SQL Server 存储过程Stored Procedure

按需 HDInsight 计算环境On-demand HDInsight compute environment

在此类型的配置中,计算环境由 Azure 数据工厂服务完全管理。In this type of configuration, the computing environment is fully managed by the Azure Data Factory service. 作业提交到进程数据前,数据工厂服务会自动创建计算环境,作业完成后则自动将其删除。It is automatically created by the Data Factory service before a job is submitted to process data and removed when the job is completed. 可以为按需计算环境创建并配置链接服务,以及控制作业执行、群集管理和引导操作的粒度设置。You can create a linked service for the on-demand compute environment, configure it, and control granular settings for job execution, cluster management, and bootstrapping actions.

Azure HDInsight 按需链接服务Azure HDInsight on-demand linked service

Azure 数据工厂服务可自动创建按需 HDInsight 群集,以处理数据。The Azure Data Factory service can automatically create an on-demand HDInsight cluster to process data. 群集创建在与该群集相关联的存储帐户(JSON 中的 linkedServiceName 属性)所在的同一区域中。The cluster is created in the same region as the storage account (linkedServiceName property in the JSON) associated with the cluster. 存储帐户必须是通用的标准 Azure 存储帐户。The storage account must be a general-purpose standard Azure storage account.

请注意以下关于按需 HDInsight 链接服务的重要事项:Note the following important points about on-demand HDInsight linked service:

  • 按需 HDInsight 群集在 Azure 订阅下创建。The on-demand HDInsight cluster is created under your Azure subscription. 当群集启动并运行时,你可以在 Azure 门户中查看群集。You are able to see the cluster in your Azure portal when the cluster is up and running.
  • 在按需 HDInsight 群集上运行的作业日志将复制到与 HDInsight 群集相关联的存储帐户。The logs for jobs that are run on an on-demand HDInsight cluster are copied to the storage account associated with the HDInsight cluster. 在链接的服务定义中定义的 clusterUserName、clusterPassword、clusterSshUserName、clusterSshPassword 用于登录到群集,以便在群集的生命周期内进行深入故障排除。The clusterUserName, clusterPassword, clusterSshUserName, clusterSshPassword defined in your linked service definition are used to log in to the cluster for in-depth troubleshooting during the lifecycle of the cluster.
  • 仅对 HDInsight 群集启动并运行作业的时间进行收费。You are charged only for the time when the HDInsight cluster is up and running jobs.
  • 可以对 Azure HDInsight 按需链接服务使用脚本操作You can use a Script Action with the Azure HDInsight on-demand linked service.

重要

按需预配 Azure HDInsight 群集通常需要 20 分钟或更长时间。It typically takes 20 minutes or more to provision an Azure HDInsight cluster on demand.

示例Example

以下 JSON 定义基于 Linux 的按需 HDInsight 链接服务。The following JSON defines a Linux-based on-demand HDInsight linked service. 数据工厂服务自动创建基于 Linux 的 HDInsight 群集来处理所需的活动。The Data Factory service automatically creates a Linux-based HDInsight cluster to process the required activity.

{
  "name": "HDInsightOnDemandLinkedService",
  "properties": {
    "type": "HDInsightOnDemand",
    "typeProperties": {
      "clusterType": "hadoop",
      "clusterSize": 1,
      "timeToLive": "00:15:00",
      "hostSubscriptionId": "<subscription ID>",
      "servicePrincipalId": "<service principal ID>",
      "servicePrincipalKey": {
        "value": "<service principal key>",
        "type": "SecureString"
      },
      "tenant": "<tenent id>",
      "clusterResourceGroup": "<resource group name>",
      "version": "3.6",
      "osType": "Linux",
      "linkedServiceName": {
        "referenceName": "AzureStorageLinkedService",
        "type": "LinkedServiceReference"
      }
    },
    "connectVia": {
      "referenceName": "<name of Integration Runtime>",
      "type": "IntegrationRuntimeReference"
    }
  }
}

重要

HDInsight 群集在 JSON 中指定的 Blob 存储 (linkedServiceName).内创建默认容器The HDInsight cluster creates a default container in the blob storage you specified in the JSON (linkedServiceName). HDInsight 不会在删除群集时删除此容器。HDInsight does not delete this container when the cluster is deleted. 此行为是设计使然。This behavior is by design. 使用按需 HDInsight 链接服务时,除非有现有的实时群集 (timeToLive),否则每当需要处理切片时会创建 HDInsight 群集;并在处理完成后删除该群集。With on-demand HDInsight linked service, a HDInsight cluster is created every time a slice needs to be processed unless there is an existing live cluster (timeToLive) and is deleted when the processing is done.

随着运行的活动越来越多,Azure Blob 存储中会出现大量的容器。As more activity runs, you see many containers in your Azure blob storage. 如果不需要使用它们对作业进行故障排除,则可能需要删除它们以降低存储成本。If you do not need them for troubleshooting of the jobs, you may want to delete them to reduce the storage cost. 这些容器的名称遵循 adf**yourdatafactoryname**-**linkedservicename**-datetimestamp模式。The names of these containers follow a pattern: adf**yourdatafactoryname**-**linkedservicename**-datetimestamp. 使用 Microsoft 存储资源管理器 等工具删除 Azure Blob 存储中的容器。Use tools such as Microsoft Storage Explorer to delete containers in your Azure blob storage.

属性Properties

属性Property 说明Description 必须Required
typetype 类型属性应设置为 HDInsightOnDemandThe type property should be set to HDInsightOnDemand. Yes
clusterSizeclusterSize 群集中辅助进程/数据节点的数量。Number of worker/data nodes in the cluster. HDInsight 群集创建时具有 2 个头节点以及一定数量的辅助进程节点(此节点的数量是为此属性所指定的数量)。The HDInsight cluster is created with 2 head nodes along with the number of worker nodes you specify for this property. 这些节点的大小为拥有 4 个核心的 Standard_D3,因此一个具有 4 个辅助节点的群集拥有 24 个核心(辅助节点有 4*4 = 16 个核心,头节点有 2*4 = 8 个核心)。The nodes are of size Standard_D3 that has 4 cores, so a 4 worker node cluster takes 24 cores (4*4 = 16 cores for worker nodes, plus 2*4 = 8 cores for head nodes). 请参阅使用 Hadoop、Spark、Kafka 等在 HDInsight 中设置群集,了解详细信息。See Set up clusters in HDInsight with Hadoop, Spark, Kafka, and more for details. Yes
linkedServiceNamelinkedServiceName 由按需群集用于存储和处理数据的 Azure 存储链接服务。Azure Storage linked service to be used by the on-demand cluster for storing and processing data. HDInsight 群集在创建时与此 Azure 存储帐户位于同一区域。The HDInsight cluster is created in the same region as this Azure Storage account. Azure HDInsight 会限制可在其支持的每个 Azure 区域中使用的核心总数。Azure HDInsight has limitation on the total number of cores you can use in each Azure region it supports. 确保在 Azure 区域中有足够的内核配额来满足所需的 clusterSize。Make sure you have enough core quotas in that Azure region to meet the required clusterSize. 有关详细信息,请参阅使用 Hadoop、Spark、Kafka 等在 HDInsight 中设置群集For details, refer to Set up clusters in HDInsight with Hadoop, Spark, Kafka, and more Yes
clusterResourceGroupclusterResourceGroup 在此资源组中创建 HDInsight 群集。The HDInsight cluster is created in this resource group. Yes
timetolivetimetolive 按需 HDInsight 群集允许的空闲时间。The allowed idle time for the on-demand HDInsight cluster. 指定当活动运行完成后,如果群集中没有其他的活动作业,按需 HDInsight 群集保持活动状态的时间。Specifies how long the on-demand HDInsight cluster stays alive after completion of an activity run if there are no other active jobs in the cluster. 允许的最小值为 5 分钟 (00: 05:00)。The minimal allowed value is 5 minutes (00:05:00).

例如,如果一个活动运行需要 6 分钟,而 timetolive 的设置是 5 分钟,则当 6 分钟的活动运行处理结束后,群集将保持 5 分钟的活动状态。For example, if an activity run takes 6 minutes and timetolive is set to 5 minutes, the cluster stays alive for 5 minutes after the 6 minutes of processing the activity run. 如果在这 6 分钟的时间内执行其他的活动运行,则由同一群集进行处理。If another activity run is executed with the 6-minutes window, it is processed by the same cluster.

创建按需 HDInsight 群集是一项开销非常大的操作(可能会花费一定的时间),因此请根据需要使用此设置,以通过重复使用一个按需 HDInsight 群集来提高数据工厂的性能。Creating an on-demand HDInsight cluster is an expensive operation (could take a while), so use this setting as needed to improve performance of a data factory by reusing an on-demand HDInsight cluster.

如果将 timetolive 值设置为 0,则将会在活动运行处理完后立即删除群集。If you set timetolive value to 0, the cluster is deleted as soon as the activity run completes. 然而,如果设置了较高的值,则群集可能会保持空闲状态,以方便你登录进行某些故障排除工作,但这可能会导致成本高昂。Whereas, if you set a high value, the cluster may stay idle for you to log on for some troubleshooting purpose but it could result in high costs. 因此,根据具体需要设置适当的值非常重要。Therefore, it is important that you set the appropriate value based on your needs.

如果 timetolive 属性值设置适当,多个管道则可共享按需 HDInsight 群集实例。If the timetolive property value is appropriately set, multiple pipelines can share the instance of the on-demand HDInsight cluster.
Yes
clusterTypeclusterType 要创建的 HDInsight 群集的类型。The type of the HDInsight cluster to be created. 允许的值是“hadoop”和“spark”。Allowed values are "hadoop" and "spark". 如果未指定,默认值为 hadoop。If not specified, default value is hadoop. 无法按需创建启用企业安全性套餐的群集,请改用现有群集/自带计算Enterprise Security Package enabled cluster cannot be created on-demand, instead use an existing cluster/ bring your own compute. No
版本version HDInsight 群集的版本。Version of the HDInsight cluster. 如果未指定较高的值,则使用当前 HDInsight 定义的默认版本。If not specified, it's using the current HDInsight defined default version. No
hostSubscriptionIdhostSubscriptionId 用于创建 HDInsight 群集的 Azure 订阅 ID。The Azure subscription ID used to create HDInsight cluster. 如果未指定,则使用 Azure 登录上下文的订阅 ID。If not specified, it uses the Subscription ID of your Azure login context. No
clusterNamePrefixclusterNamePrefix HDI 群集名称的前缀,将自动在群集名称末尾追加时间戳The prefix of HDI cluster name, a timestamp will be automatically appended at the end of the cluster name No
sparkVersionsparkVersion 群集类型为“Spark”时的 spark 版本The version of spark if the cluster type is "Spark" No
additionalLinkedServiceNamesadditionalLinkedServiceNames 指定 HDInsight 链接服务的其他存储帐户,使数据工厂服务能够代为注册它们。Specifies additional storage accounts for the HDInsight linked service so that the Data Factory service can register them on your behalf. 这些存储帐户必须与 HDInsight 群集位于同一区域中,该群集是在与 linkedServiceName 指定的存储帐户相同的区域中创建的。These storage accounts must be in the same region as the HDInsight cluster, which is created in the same region as the storage account specified by linkedServiceName. No
osTypeosType 操作系统的类型。Type of operating system. 允许值包括:Linux 和 Windows(仅适用于 HDInsight 3.3)。Allowed values are: Linux and Windows (for HDInsight 3.3 only). 默认值为 Linux。Default is Linux. No
hcatalogLinkedServiceNamehcatalogLinkedServiceName 指向 HCatalog 数据库的 Azure SQL 链接服务的名称。The name of Azure SQL linked service that point to the HCatalog database. 将 Azure SQL 数据库用作元存储以创建按需 HDInsight 群集。The on-demand HDInsight cluster is created by using the Azure SQL database as the metastore. No
connectViaconnectVia 用于将活动分发到此 HDInsight 链接服务的集成运行时。The Integration Runtime to be used to dispatch the activities to this HDInsight linked service. 对于按需的 HDInsight 链接服务,它仅支持 Azure 集成运行时。For on-demand HDInsight linked service, it only supports Azure Integration Runtime. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No
clusterUserNameclusterUserName 用于访问群集的用户名。The username to access the cluster. No
clusterPasswordclusterPassword 用于访问群集的安全字符串类型密码。The password in type of secure string to access the cluster. No
clusterSshUserNameclusterSshUserName 用于通过 SSH 远程连接到群集节点的用户名(适用于 Linux)。The username to SSH remotely connects to cluster's node (for Linux). No
clusterSshPasswordclusterSshPassword 用于通过 SSH 远程连接到群集节点的安全字符串类型密码(适用于 Linux)。The password in type of secure string to SSH remotely connect cluster's node (for Linux). No
scriptActionsscriptActions 在按需创建群集期间指定 HDInsight 群集自定义的脚本。Specify script for HDInsight cluster customizations during on-demand cluster creation.
目前,Azure 数据工厂的用户界面创作工具仅支持指定 1 个脚本操作,但你可以通过 JSON 解决此限制(在 JSON 中指定多个脚本操作)。Currently, Azure Data Factory's User Interface authoring tool supports specifying only 1 script action, but you can get through this limitation in the JSON (specify multiple script actions in the JSON).
No

重要

HDInsight 支持多个可部署的 Hadoop 群集版本。HDInsight supports multiple Hadoop cluster versions that can be deployed. 每个版本选项创建 Hortonworks 数据平台 (HDP) 分发的特定版本和该分发内包含的一组组件。Each version choice creates a specific version of the Hortonworks Data Platform (HDP) distribution and a set of components that are contained within that distribution. 支持的 HDInsight 版本列表会不断更新,以提供最新的 Hadoop 生态系统组件和修补程序。The list of supported HDInsight versions keeps being updated to provide latest Hadoop ecosystem components and fixes. 请确保始终参考支持的 HDInsight 版本和操作系统类型的最新信息,以确保使用的是受支持的 HDInsight 版本。Make sure you always refer to latest information of Supported HDInsight version and OS Type to ensure you are using supported version of HDInsight.

重要

目前,HDInsight 链接服务不支持 HBase、交互式查询 (Hive LLAP)、Storm。Currently, HDInsight linked services does not support HBase, Interactive Query (Hive LLAP), Storm.

additionalLinkedServiceNames JSON 示例additionalLinkedServiceNames JSON example

"additionalLinkedServiceNames": [{
    "referenceName": "MyStorageLinkedService2",
    "type": "LinkedServiceReference"          
}]

服务主体身份验证Service principal authentication

按需 HDInsight 链接服务要求进行服务主体身份验证,以代表你创建 HDInsight 群集。The On-Demand HDInsight linked service requires a service principal authentication to create HDInsight clusters on your behalf. 要使用服务主体身份验证,请在 Azure Active Directory (Azure AD) 中注册应用程序实体,并向其授予在其中创建 HDInsight 群集的订阅或资源组的“参与者” 角色。To use service principal authentication, register an application entity in Azure Active Directory (Azure AD) and grant it the Contributor role of the subscription or the resource group in which the HDInsight cluster is created. 有关详细步骤,请参阅使用门户创建可访问资源的 Azure Active Directory 应用程序和服务主体For detailed steps, see Use portal to create an Azure Active Directory application and service principal that can access resources. 记下下面的值,这些值用于定义链接服务:Make note of the following values, which you use to define the linked service:

  • 应用程序 IDApplication ID
  • 应用程序密钥Application key
  • 租户 IDTenant ID

通过指定以下属性使用服务主体身份验证:Use service principal authentication by specifying the following properties:

属性Property 说明Description 必须Required
servicePrincipalIdservicePrincipalId 指定应用程序的客户端 ID。Specify the application's client ID. Yes
servicePrincipalKeyservicePrincipalKey 指定应用程序的密钥。Specify the application's key. Yes
tenanttenant 指定应用程序的租户信息(域名或租户 ID)。Specify the tenant information (domain name or tenant ID) under which your application resides. 可将鼠标悬停在 Azure 门户右上角进行检索。You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. Yes

高级属性Advanced Properties

也可以为按需 HDInsight 群集的粒度配置指定以下属性。You can also specify the following properties for the granular configuration of the on-demand HDInsight cluster.

属性Property 说明Description 必须Required
coreConfigurationcoreConfiguration 为待创建的 HDInsight 群集指定核心配置参数(如在 core-site.xml 中)。Specifies the core configuration parameters (as in core-site.xml) for the HDInsight cluster to be created. No
hBaseConfigurationhBaseConfiguration 为 HDInsight 群集指定 HBase 配置参数 (hbase-site.xml)。Specifies the HBase configuration parameters (hbase-site.xml) for the HDInsight cluster. No
hdfsConfigurationhdfsConfiguration 为 HDInsight 群集指定 HDFS 配置参数 (hdfs-site.xml)。Specifies the HDFS configuration parameters (hdfs-site.xml) for the HDInsight cluster. No
hiveConfigurationhiveConfiguration 为 HDInsight 群集指定 hive 配置参数 (hive-site.xml)。Specifies the hive configuration parameters (hive-site.xml) for the HDInsight cluster. No
mapReduceConfigurationmapReduceConfiguration 为 HDInsight 群集指定 MapReduce 配置参数 (mapred-site.xml)。Specifies the MapReduce configuration parameters (mapred-site.xml) for the HDInsight cluster. No
oozieConfigurationoozieConfiguration 为 HDInsight 群集指定 Oozie 配置参数 (oozie-site.xml)。Specifies the Oozie configuration parameters (oozie-site.xml) for the HDInsight cluster. No
stormConfigurationstormConfiguration 为 HDInsight 群集指定 Storm 配置参数 (storm-site.xml)。Specifies the Storm configuration parameters (storm-site.xml) for the HDInsight cluster. No
yarnConfigurationyarnConfiguration 为 HDInsight 群集指定 Yarn 配置参数 (yarn-site.xml)。Specifies the Yarn configuration parameters (yarn-site.xml) for the HDInsight cluster. No

示例 - 具有高级属性的按需 HDInsight 群集配置Example – On-demand HDInsight cluster configuration with advanced properties

{
    "name": " HDInsightOnDemandLinkedService",
    "properties": {
      "type": "HDInsightOnDemand",
      "typeProperties": {
          "clusterSize": 16,
          "timeToLive": "01:30:00",
          "hostSubscriptionId": "<subscription ID>",
          "servicePrincipalId": "<service principal ID>",
          "servicePrincipalKey": {
            "value": "<service principal key>",
            "type": "SecureString"
          },
          "tenant": "<tenent id>",
          "clusterResourceGroup": "<resource group name>",
          "version": "3.6",
          "osType": "Linux",
          "linkedServiceName": {
              "referenceName": "AzureStorageLinkedService",
              "type": "LinkedServiceReference"
            },
            "coreConfiguration": {
                "templeton.mapper.memory.mb": "5000"
            },
            "hiveConfiguration": {
                "templeton.mapper.memory.mb": "5000"
            },
            "mapReduceConfiguration": {
                "mapreduce.reduce.java.opts": "-Xmx4000m",
                "mapreduce.map.java.opts": "-Xmx4000m",
                "mapreduce.map.memory.mb": "5000",
                "mapreduce.reduce.memory.mb": "5000",
                "mapreduce.job.reduce.slowstart.completedmaps": "0.8"
            },
            "yarnConfiguration": {
                "yarn.app.mapreduce.am.resource.mb": "5000",
                "mapreduce.map.memory.mb": "5000"
            },
            "additionalLinkedServiceNames": [{
                "referenceName": "MyStorageLinkedService2",
                "type": "LinkedServiceReference"          
            }]
        }
    },
      "connectVia": {
      "referenceName": "<name of Integration Runtime>",
      "type": "IntegrationRuntimeReference"
    }
}

节点大小Node sizes

可使用以下属性指定头节点、数据节点和 Zookeeper 节点的大小:You can specify the sizes of head, data, and zookeeper nodes using the following properties:

属性Property 说明Description 必须Required
headNodeSizeheadNodeSize 指定头节点的大小。Specifies the size of the head node. 默认值为:Standard_D3。The default value is: Standard_D3. 有关详细信息,请参阅指定节点大小部分。See the Specifying node sizes section for details. No
dataNodeSizedataNodeSize 指定数据节点的大小。Specifies the size of the data node. 默认值为:Standard_D3。The default value is: Standard_D3. No
zookeeperNodeSizezookeeperNodeSize 指定 ZooKeeper 节点的大小。Specifies the size of the Zoo Keeper node. 默认值为:Standard_D3。The default value is: Standard_D3. No

指定节点大小Specifying node sizes

若要了解需要为在上面的部分中提到的属性指定的字符串值,请参阅虚拟机的大小See the Sizes of Virtual Machines article for string values you need to specify for the properties mentioned in the previous section. 这些值需要符合文章中所引用的 CMDLET 和 APIThe values need to conform to the CMDLETs & APIS referenced in the article. 如文章中所示,大尺寸(默认)的数据节点拥有 7 GB 的内存,这可能无法满足具体方案的需求。As you can see in the article, the data node of Large (default) size has 7-GB memory, which may not be good enough for your scenario.

如果想要创建 D4 大小的头节点和辅助进程节点,请将 Standard_D4 指定为 headNodeSize 和 dataNodeSize 属性的值。If you want to create D4 sized head nodes and worker nodes, specify Standard_D4 as the value for headNodeSize and dataNodeSize properties.

"headNodeSize": "Standard_D4",    
"dataNodeSize": "Standard_D4",

如果为这些属性指定了错误的值,则可能会收到以下错误: 未能创建群集。If you specify a wrong value for these properties, you may receive the following error: Failed to create cluster. 异常:无法完成群集创建操作。Exception: Unable to complete the cluster create operation. 操作失败,代码为 '400'。Operation failed with code '400'. 群集保持为 'Error' 状态。Cluster left behind state: 'Error'. 消息:'PreClusterCreationValidationFailure'。Message: 'PreClusterCreationValidationFailure'. 收到此错误时,请确保使用的是虚拟机的大小一文中的表中的 CMDLET 和 API 名称。When you receive this error, ensure that you are using the CMDLET & APIS name from the table in the Sizes of Virtual Machines article.

自带计算环境Bring your own compute environment

在此类型的配置中,用户可在数据工厂中将现有的计算环境注册为链接服务。In this type of configuration, users can register an already existing computing environment as a linked service in Data Factory. 该计算环境由用户进行管理,数据工厂服务用它来执行活动。The computing environment is managed by the user and the Data Factory service uses it to execute the activities.

此类型的配置支持以下计算环境:This type of configuration is supported for the following compute environments:

  • Azure HDInsightAzure HDInsight
  • Azure BatchAzure Batch
  • Azure SQL DB、Azure SQL DW、SQL ServerAzure SQL DB, Azure SQL DW, SQL Server

Azure HDInsight 链接服务Azure HDInsight linked service

可以创建 Azure HDInsight 链接服务,以向数据工厂注册自己的 HDInsight 群集。You can create an Azure HDInsight linked service to register your own HDInsight cluster with Data Factory.

示例Example

{
    "name": "HDInsightLinkedService",
    "properties": {
      "type": "HDInsight",
      "typeProperties": {
        "clusterUri": " https://<hdinsightclustername>.azurehdinsight.cn/",
        "userName": "username",
        "password": {
            "value": "passwordvalue",
            "type": "SecureString"
          },
        "linkedServiceName": {
              "referenceName": "AzureStorageLinkedService",
              "type": "LinkedServiceReference"
        }
      },
      "connectVia": {
        "referenceName": "<name of Integration Runtime>",
        "type": "IntegrationRuntimeReference"
      }
    }
  }

属性Properties

属性Property 说明Description 必须Required
typetype 类型属性应设置为 HDInsightThe type property should be set to HDInsight. Yes
clusterUriclusterUri HDInsight 群集的 URI。The URI of the HDInsight cluster. Yes
usernameusername 指定用于连接到现有 HDInsight 群集的用户的名称。Specify the name of the user to be used to connect to an existing HDInsight cluster. Yes
passwordpassword 指定用户帐户的密码。Specify password for the user account. Yes
linkedServiceNamelinkedServiceName Azure 存储链接服务(指 HDInsight 群集使用的 Azure Blob 存储)的名称。Name of the Azure Storage linked service that refers to the Azure blob storage used by the HDInsight cluster.

目前,不能为此属性指定 Azure Data Lake Store 链接服务。Currently, you cannot specify an Azure Data Lake Store linked service for this property.

Yes
isEspEnabledisEspEnabled 默认值为“false” 。Default is 'false'. No
connectViaconnectVia 用于将活动分发到此链接服务的集成运行时。The Integration Runtime to be used to dispatch the activities to this linked service. 可以使用 Azure 集成运行时或自托管集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime.
对于启用了企业安全性套餐 (ESP) 的 HDInsight 群集,请使用自承载集成运行时,该运行时具有群集的视线,或者应该与 ESP HDInsight 群集部署在同一虚拟网络内。For Enterprise Security Package (ESP) enabled HDInsight cluster use a self-hosted integration runtime, which has a line of sight to the cluster or it should be deployed inside the same Virtual Network as the ESP HDInsight cluster.
No

重要

HDInsight 支持多个可部署的 Hadoop 群集版本。HDInsight supports multiple Hadoop cluster versions that can be deployed. 每个版本选项创建 Hortonworks 数据平台 (HDP) 分发的特定版本和该分发内包含的一组组件。Each version choice creates a specific version of the Hortonworks Data Platform (HDP) distribution and a set of components that are contained within that distribution. 支持的 HDInsight 版本列表会不断更新,以提供最新的 Hadoop 生态系统组件和修补程序。The list of supported HDInsight versions keeps being updated to provide latest Hadoop ecosystem components and fixes. 请确保始终参考支持的 HDInsight 版本和操作系统类型的最新信息,以确保使用的是受支持的 HDInsight 版本。Make sure you always refer to latest information of Supported HDInsight version and OS Type to ensure you are using supported version of HDInsight.

重要

目前,HDInsight 链接服务不支持 HBase、交互式查询 (Hive LLAP)、Storm。Currently, HDInsight linked services does not support HBase, Interactive Query (Hive LLAP), Storm.

Azure Batch 链接服务Azure Batch linked service

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

可以创建 Azure Batch 链接服务,以向数据工厂注册虚拟机 (VM) 的 Batch 池。You can create an Azure Batch linked service to register a Batch pool of virtual machines (VMs) to a data factory. 你可以使用 Azure Batch 运行自定义活动。You can run Custom activity using Azure Batch.

如果不熟悉 Azure Batch 服务,请参阅以下文章:See following articles if you are new to Azure Batch service:

示例Example

{
    "name": "AzureBatchLinkedService",
    "properties": {
      "type": "AzureBatch",
      "typeProperties": {
        "accountName": "batchaccount",
        "accessKey": {
          "type": "SecureString",
          "value": "access key"
        },
        "batchUri": "https://batchaccount.region.batch.azure.cn",
        "poolName": "poolname",
        "linkedServiceName": {
          "referenceName": "StorageLinkedService",
          "type": "LinkedServiceReference"
        }
      },
      "connectVia": {
        "referenceName": "<name of Integration Runtime>",
        "type": "IntegrationRuntimeReference"
      }
    }
  }

属性Properties

属性Property 说明Description 必须Required
typetype 类型属性应设置为 AzureBatchThe type property should be set to AzureBatch. Yes
accountNameaccountName Azure Batch 帐户的名称。Name of the Azure Batch account. Yes
accessKeyaccessKey Azure Batch 帐户的访问密钥。Access key for the Azure Batch account. Yes
batchUribatchUri 指向 Azure Batch 帐户的 URL,格式为 https://batchaccountname.region.batch.azure.cn。URL to your Azure Batch account, in format of https://batchaccountname.region.batch.azure.cn. Yes
poolNamepoolName 虚拟机的池名称。Name of the pool of virtual machines. Yes
linkedServiceNamelinkedServiceName 与此 Azure Batch 链接服务相关联的 Azure 存储链接服务的名称。Name of the Azure Storage linked service associated with this Azure Batch linked service. 此链接服务用于暂存运行活动所需的文件。This linked service is used for staging files required to run the activity. Yes
connectViaconnectVia 用于将活动分发到此链接服务的集成运行时。The Integration Runtime to be used to dispatch the activities to this linked service. 可以使用 Azure 集成运行时或自托管集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

Azure SQL 数据库链接服务Azure SQL Database linked service

创建 Azure SQL 链接服务,并将其与存储过程活动配合使用,以从数据工厂管道调用存储过程。You create an Azure SQL linked service and use it with the Stored Procedure Activity to invoke a stored procedure from a Data Factory pipeline. 请参阅 Azure SQL 连接器一文,以了解此链接服务的详细信息。See Azure SQL Connector article for details about this linked service.

Azure SQL 数据仓库链接服务Azure SQL Data Warehouse linked service

创建 Azure SQL 数据仓库链接服务,并将其与存储的过程活动配合使用,以从数据工厂管道调用存储的过程。You create an Azure SQL Data Warehouse linked service and use it with the Stored Procedure Activity to invoke a stored procedure from a Data Factory pipeline. 请参阅Azure SQL 数据仓库连接器一文,以了解此链接服务的详细信息。See Azure SQL Data Warehouse Connector article for details about this linked service.

SQL Server 链接服务SQL Server linked service

创建 SQL Server 链接服务,并将其与存储的过程活动配合使用,以从数据工厂管道调用存储的过程。You create a SQL Server linked service and use it with the Stored Procedure Activity to invoke a stored procedure from a Data Factory pipeline. 请参阅 SQL Server 连接器一文,以了解此链接服务的详细信息。See SQL Server connector article for details about this linked service.

Azure 函数链接服务Azure Function linked service

创建 Azure 函数链接服务,并将其与 Azure 函数活动一起使用,以在数据工厂管道中运行 Azure Functions。You create an Azure Function linked service and use it with the Azure Function activity to run Azure Functions in a Data Factory pipeline. Azure 函数的返回类型必须是有效的 JObjectThe return type of the Azure function has to be a valid JObject. (请记住:JArray 不 是 JObject。)除了 JObject 之外的任何返回类型都将失败,并且会引发用户错误响应内容不是有效的 JObject(Keep in mind that JArray is not a JObject.) Any return type other than JObject fails and raises the user error Response Content is not a valid JObject.

属性Property 说明Description 必需Required
typetype type 属性必须设置为:AzureFunctionThe type property must be set to: AzureFunction yes
函数应用 URLfunction app url Azure 函数应用的 URL。URL for the Azure Function App. 格式为 https://<accountname>.chinacloudsites.cnFormat is https://<accountname>.chinacloudsites.cn. 在 Azure 门户中查看函数应用时,此 URL 是 URL 部分下的值This URL is the value under URL section when viewing your Function App in the Azure portal yes
函数密钥function key Azure 函数的访问密钥。Access key for the Azure Function. 单击相应函数的“管理” 部分,并复制“函数密钥” 或“主机密钥” 。Click on the Manage section for the respective function, and copy either the Function Key or the Host key. 在此处了解详细信息:Azure Functions HTTP 触发器和绑定Find out more here: Azure Functions HTTP triggers and bindings yes

后续步骤Next steps

有关 Azure 数据工厂支持的数据转换活动的列表,请参阅转换数据For a list of the transformation activities supported by Azure Data Factory, see Transform data.