在 Azure 数据工厂中使用 Hadoop Pig 活动转换数据Transform data using Hadoop Pig activity in Azure Data Factory

数据工厂管道中的 HDInsight Pig 活动会在自己的按需 HDInsight 群集上执行 Pig 查询。The HDInsight Pig activity in a Data Factory pipeline executes Pig queries on your own or on-demand HDInsight cluster. 本文基于数据转换活动一文,它概述了数据转换和受支持的转换活动。This article builds on the data transformation activities article, which presents a general overview of data transformation and the supported transformation activities.

如果不熟悉 Azure 数据工厂,请在阅读本文之前,先通读 Azure 数据工厂简介,并学习教程:转换数据If you are new to Azure Data Factory, read through Introduction to Azure Data Factory and do the Tutorial: transform data before reading this article.

语法Syntax

{
    "name": "Pig Activity",
    "description": "description",
    "type": "HDInsightPig",
    "linkedServiceName": {
        "referenceName": "MyHDInsightLinkedService",
        "type": "LinkedServiceReference"
    },
    "typeProperties": {
        "scriptLinkedService": {
            "referenceName": "MyAzureStorageLinkedService",
            "type": "LinkedServiceReference"
        },
        "scriptPath": "MyAzureStorage\\PigScripts\\MyPigSript.pig",
        "getDebugInfo": "Failure",
        "arguments": [
            "SampleHadoopJobArgument1"
        ],
        "defines": {
            "param1": "param1Value"
        }
    }   
}

语法详细信息Syntax details

属性Property 说明Description 必须Required
namename 活动名称Name of the activity Yes
说明description 描述活动用途的文本Text describing what the activity is used for No
typetype 对于 Hive 活动,活动类型是 HDinsightPigFor Hive Activity, the activity type is HDinsightPig Yes
linkedServiceNamelinkedServiceName 引用在数据工厂中注册为链接服务的 HDInsight 群集。Reference to the HDInsight cluster registered as a linked service in Data Factory. 若要了解此链接服务,请参阅计算链接服务一文。To learn about this linked service, see Compute linked services article. Yes
scriptLinkedServicescriptLinkedService 对 Azure 存储链接服务的引用,该服务用于存储要执行的 Pig 脚本。Reference to an Azure Storage Linked Service used to store the Pig script to be executed. 如果未指定此链接服务,则使用 HDInsight 链接服务中定义的 Azure 存储链接服务。If you don't specify this Linked Service, the Azure Storage Linked Service defined in the HDInsight Linked Service is used. No
scriptPathscriptPath 提供由 scriptLinkedService 引用的 Azure 存储中存储的脚本文件的路径。Provide the path to the script file stored in the Azure Storage referred by scriptLinkedService. 文件名称需区分大小写。The file name is case-sensitive. No
getDebugInfogetDebugInfo 指定何时将日志文件复制到 HDInsight 群集使用的(或者)scriptLinkedService 指定的 Azure 存储。Specifies when the log files are copied to the Azure Storage used by HDInsight cluster (or) specified by scriptLinkedService. 允许的值:None、Always 或 Failure。Allowed values: None, Always, or Failure. 默认值:无。Default value: None. No
argumentsarguments 指定 Hadoop 作业的参数数组。Specifies an array of arguments for a Hadoop job. 参数以命令行参数的形式传递到每个任务。The arguments are passed as command-line arguments to each task. No
定义defines 在 Pig 脚本中指定参数作为键/值对,以供引用。Specify parameters as key/value pairs for referencing within the Pig script. No

后续步骤Next steps

参阅以下文章了解如何以其他方式转换数据:See the following articles that explain how to transform data in other ways: