Azure 数据工厂中的查找活动Lookup activity in Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

查找活动可以从任何 Azure 数据工厂支持的数据源检索数据集。Lookup activity can retrieve a dataset from any of the Azure Data Factory-supported data sources. 在以下方案中使用它:Use it in the following scenario:

  • 动态确定哪些对象在后续活动中工作,而不是针对对象名称进行硬编码。Dynamically determine which objects to operate on in a subsequent activity, instead of hard coding the object name. 一些对象示例包括文件和表。Some object examples are files and tables.

查找活动读取并返回配置文件或表的内容。Lookup activity reads and returns the content of a configuration file or table. 它还返回执行查询或存储过程的结果。It also returns the result of executing a query or stored procedure. 查找活动的输出可用于后续复制或转换活动(如果它是单一实例值)。The output from Lookup activity can be used in a subsequent copy or transformation activity if it's a singleton value. 该输出可用于 ForEach 活动(如果它是属性数组)。The output can be used in a ForEach activity if it's an array of attributes.

支持的功能Supported capabilities

查找活动支持以下数据源。The following data sources are supported for Lookup activity. 查找活动可返回的最大行数是 5,000,最大大小为 2 MB。The largest number of rows that can be returned by Lookup activity is 5,000, up to 2 MB in size. 目前,查找活动在超时前的最长持续时间为 1 小时。Currently, the longest duration for Lookup activity before timeout is one hour.

CategoryCategory 数据存储Data store
AzureAzure Azure Blob 存储Azure Blob storage
  Azure Cosmos DB (SQL API)Azure Cosmos DB (SQL API)
  Azure 数据资源管理器Azure Data Explorer
  Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2
  Azure Database for MariaDBAzure Database for MariaDB
  Azure Database for MySQLAzure Database for MySQL
  Azure Database for PostgreSQLAzure Database for PostgreSQL
  Azure 文件Azure Files
  Azure SQL 数据库Azure SQL Database
  Azure SQL 托管实例Azure SQL Managed Instance
  Azure Synapse Analytics(以前称为 SQL 数据仓库)Azure Synapse Analytics (formerly SQL Data Warehouse)
  Azure 表存储Azure Table storage
DatabaseDatabase Amazon RedshiftAmazon Redshift
  DB2DB2
  DrillDrill
  Google BigQueryGoogle BigQuery
  GreenplumGreenplum
  HBaseHBase
  HiveHive
  Apache ImpalaApache Impala
  InformixInformix
  MariaDBMariaDB
  Microsoft AccessMicrosoft Access
  MySQLMySQL
  NetezzaNetezza
  OracleOracle
  PhoenixPhoenix
  PostgreSQLPostgreSQL
  Presto(预览)Presto (Preview)
  SAP Business Warehouse Open HubSAP Business Warehouse Open Hub
  通过 MDX 实现的 SAP Business WarehouseSAP Business Warehouse via MDX
  SAP HANASAP HANA
  SAP TableSAP Table
  SnowflakeSnowflake
  SparkSpark
  SQL ServerSQL Server
  SybaseSybase
  TeradataTeradata
  VerticaVertica
NoSQLNoSQL CassandraCassandra
  Couchbase(预览)Couchbase (Preview)
文件File Amazon S3Amazon S3
  文件系统File System
  FTPFTP
  Google Cloud StorageGoogle Cloud Storage
  HDFSHDFS
  SFTPSFTP
通用协议Generic protocol 泛型 HTTPGeneric HTTP
  泛型 ODataGeneric OData
  泛型 ODBCGeneric ODBC
服务和应用Services and apps Amazon Marketplace Web ServiceAmazon Marketplace Web Service
  Common Data Service for AppsCommon Data Service for Apps
  Concur(预览)Concur (Preview)
  Dynamics 365Dynamics 365
  Dynamics AXDynamics AX
  Dynamics CRMDynamics CRM
  Google AdWordsGoogle AdWords
  HubSpot(预览)HubSpot (Preview)
  JiraJira
  Magento(预览)Magento (Preview)
  Marketo(预览)Marketo (Preview)
  Oracle Eloqua(预览)Oracle Eloqua (Preview)
  Oracle Responsys(预览)Oracle Responsys (Preview)
  Oracle 服务云(预览)Oracle Service Cloud (Preview)
  PayPal(预览)PayPal (Preview)
  QuickBooks(预览)QuickBooks (Preview)
  SalesforceSalesforce
  Salesforce 服务云Salesforce Service Cloud
  Salesforce Marketing CloudSalesforce Marketing Cloud
  SAP Cloud for Customer (C4C)SAP Cloud for Customer (C4C)
  SAP ECCSAP ECC
  ServiceNowServiceNow
  Shopify(预览)Shopify (Preview)
  SharePoint Online 列表SharePoint Online List
  Square(预览)Square (Preview)
  Web 表(HTML 表)Web Table (HTML table)
  XeroXero
  Zoho(预览)Zoho (Preview)

Note

连接器标记为“预览”意味着,可以试用它并向我们提供反馈。Any connector marked as Preview means that you can try it out and give us feedback. 若要在解决方案中使用预览版连接器的依赖项,请联系 Azure 客户支持If you want to take a dependency on preview connectors in your solution, please contact Azure support.

语法Syntax

{
    "name": "LookupActivity",
    "type": "Lookup",
    "typeProperties": {
        "source": {
            "type": "<source type>"
            <additional source specific properties (optional)>
        },
        "dataset": { 
            "referenceName": "<source dataset name>",
            "type": "DatasetReference"
        },
        "firstRowOnly": false
    }
}

Type 属性Type properties

名称Name 说明Description 类型Type 必需?Required?
datasetdataset 为查找提供数据集引用。Provides the dataset reference for the lookup. 从每篇相应的连接器文章的“数据集属性”部分中获取详细信息。Get details from the Dataset properties section in each corresponding connector article. 键/值对Key/value pair Yes
sourcesource 包含特定于数据集的源属性,与复制活动源相同。Contains dataset-specific source properties, the same as the Copy Activity source. 从每篇相应的连接器文章的“复制活动属性”部分中获取详细信息。Get details from the Copy Activity properties section in each corresponding connector article. 键/值对Key/value pair Yes
firstRowOnlyfirstRowOnly 指示仅返回第一行还是返回所有行。Indicates whether to return only the first row or all rows. BooleanBoolean 不是。No. 默认为 trueThe default is true.

Note

  • 不支持 ByteArray 类型的源列。Source columns with ByteArray type aren't supported.
  • 数据集定义不支持结构Structure isn't supported in dataset definitions. 对于文本格式化文件,使用标头行提供列名。For text-format files, use the header row to provide the column name.
  • 如果查找源是 JSON 文件,则不支持用于重塑 JSON 对象的 jsonPathDefinition 设置。If your lookup source is a JSON file, the jsonPathDefinition setting for reshaping the JSON object isn't supported. 将检索整个对象。The entire objects will be retrieved.

在后续活动中使用查找活动结果Use the Lookup activity result in a subsequent activity

查找结果会返回到活动运行结果的 output 节。The lookup result is returned in the output section of the activity run result.

  • firstRowOnly 设置为 true(默认值)时,输出格式如以下代码所示。When firstRowOnly is set to true (default), the output format is as shown in the following code. 查找结果位于固定的 firstRow 键下。The lookup result is under a fixed firstRow key. 若要在后续活动中使用该结果,请使用 @{activity('MyLookupActivity').output.firstRow.TableName} 模式。To use the result in subsequent activity, use the pattern of @{activity('MyLookupActivity').output.firstRow.TableName}.

    {
        "firstRow":
        {
            "Id": "1",
            "TableName" : "Table1"
        }
    }
    
  • firstRowOnly 设置为 false,输出格式如以下代码所示。When firstRowOnly is set to false, the output format is as shown in the following code. count 字段指示返回的记录数。A count field indicates how many records are returned. 详细值显示在固定的 value 数组下。Detailed values are displayed under a fixed value array. 在这种情况下,查找活动后跟 Foreach 活动In such a case, the Lookup activity is followed by a Foreach activity. 使用 @activity('MyLookupActivity').output.value 模式将 value 数组传递给 ForEach 活动 items 字段。You pass the value array to the ForEach activity items field by using the pattern of @activity('MyLookupActivity').output.value. 若要访问 value 数组中的元素,请使用以下语法:@{activity('lookupActivity').output.value[zero based index].propertyname}To access elements in the value array, use the following syntax: @{activity('lookupActivity').output.value[zero based index].propertyname}. 示例为 @{activity('lookupActivity').output.value[0].tablename}An example is @{activity('lookupActivity').output.value[0].tablename}.

    {
        "count": "2",
        "value": [
            {
                "Id": "1",
                "TableName" : "Table1"
            },
            {
                "Id": "2",
                "TableName" : "Table2"
            }
        ]
    } 
    

复制活动示例Copy Activity example

在本示例中,复制活动将 Azure SQL 数据库实例的 SQL 表中的数据复制到 Azure Blob 存储。In this example, Copy Activity copies data from a SQL table in your Azure SQL Database instance to Azure Blob storage. SQL 表的名称存储在 Blob 存储的 JSON 文件中。The name of the SQL table is stored in a JSON file in Blob storage. 查找活动在运行时查找表名。The Lookup activity looks up the table name at runtime. 使用此方法动态修改 JSON。JSON is modified dynamically by using this approach. 不需要重新部署管道或数据集。You don't need to redeploy pipelines or datasets.

本示例演示如何只查找第一行。This example demonstrates lookup for the first row only. 若要查找所有行并将结果与 ForEach 活动链接,请参阅使用 Azure 数据工厂批量复制多个表中的示例。For lookup for all rows and to chain the results with ForEach activity, see the samples in Copy multiple tables in bulk by using Azure Data Factory.

管道Pipeline

此管道包含两个活动:查找和复制。This pipeline contains two activities: Lookup and Copy.

  • 查找活动配置为使用 LookupDataset,该项引用 Azure Blob 存储中的一个位置。The Lookup activity is configured to use LookupDataset, which refers to a location in Azure Blob storage. 查找活动在此位置从 JSON 文件读取 SQL 表名称。The Lookup activity reads the name of the SQL table from a JSON file in this location.
  • 复制活动使用查找活动的输出,即 SQL 表的名称。Copy Activity uses the output of the Lookup activity, which is the name of the SQL table. SourceDataset 中的 tableName 属性配置为使用查找活动的输出。The tableName property in the SourceDataset is configured to use the output from the Lookup activity. 复制活动将数据从 SQL 表复制到 Azure Blob 存储中的一个位置。Copy Activity copies data from the SQL table to a location in Azure Blob storage. 该位置由 SinkDataset 属性指定。The location is specified by the SinkDataset property.
{
    "name": "LookupPipelineDemo",
    "properties": {
        "activities": [
            {
                "name": "LookupActivity",
                "type": "Lookup",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "dataset": { 
                        "referenceName": "LookupDataset", 
                        "type": "DatasetReference" 
                    }
                }
            },
            {
                "name": "CopyActivity",
                "type": "Copy",
                "typeProperties": {
                    "source": { 
                        "type": "SqlSource", 
                        "sqlReaderQuery": "select * from @{activity('LookupActivity').output.firstRow.tableName}" 
                    },
                    "sink": { 
                        "type": "BlobSink" 
                    }
                },                
                "dependsOn": [ 
                    { 
                        "activity": "LookupActivity", 
                        "dependencyConditions": [ "Succeeded" ] 
                    }
                 ],
                "inputs": [ 
                    { 
                        "referenceName": "SourceDataset", 
                        "type": "DatasetReference" 
                    } 
                ],
                "outputs": [ 
                    { 
                        "referenceName": "SinkDataset", 
                        "type": "DatasetReference" 
                    } 
                ]
            }
        ]
    }
}

查找数据集Lookup dataset

查找数据集是指由 AzureStorageLinkedService 类型指定的 Azure 存储查找文件夹中的 sourcetable.json 文件。The lookup dataset is the sourcetable.json file in the Azure Storage lookup folder specified by the AzureStorageLinkedService type.

{
    "name": "LookupDataset",
    "properties": {
        "type": "AzureBlob",
        "typeProperties": {
            "folderPath": "lookup",
            "fileName": "sourcetable.json",
            "format": {
                "type": "JsonFormat",
                "filePattern": "SetOfObjects"
            }
        },
        "linkedServiceName": {
            "referenceName": "AzureStorageLinkedService",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动的数据集Source dataset for Copy Activity

数据集使用查找活动的输出,即 SQL 表名称。The source dataset uses the output of the Lookup activity, which is the name of the SQL table. 复制活动将数据从此 SQL 表复制到 Azure Blob 存储中的一个位置。Copy Activity copies data from this SQL table to a location in Azure Blob storage. 该位置由接收器数据集指定。The location is specified by the sink dataset.

{
    "name": "SourceDataset",
    "properties": {
        "type": "AzureSqlTable",
        "typeProperties":{
            "tableName": "@{activity('LookupActivity').output.firstRow.tableName}"
        },
        "linkedServiceName": {
            "referenceName": "AzureSqlLinkedService",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动的接收器数据集Sink dataset for Copy Activity

复制活动将数据从 SQL 表复制到 Azure 存储中 csv 文件夹下的 filebylookup.csv 文件。Copy Activity copies data from the SQL table to the filebylookup.csv file in the csv folder in Azure Storage. 该文件由 AzureStorageLinkedService 属性指定。The file is specified by the AzureStorageLinkedService property.

{
    "name": "SinkDataset",
    "properties": {
        "type": "AzureBlob",
        "typeProperties": {
            "folderPath": "csv",
            "fileName": "filebylookup.csv",
            "format": {
                "type": "TextFormat"                                                                    
            }
        },
        "linkedServiceName": {
            "referenceName": "AzureStorageLinkedService",
            "type": "LinkedServiceReference"
        }
    }
}

Azure 存储链接服务Azure Storage linked service

此存储帐户包含 JSON 文件和 SQL 表名称。This storage account contains the JSON file with the names of the SQL tables.

{
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<StorageAccountName>;AccountKey=<StorageAccountKey>;EndpointSuffix=core.chinacloudapi.cn"
        }
    },
        "name": "AzureStorageLinkedService"
}

Azure SQL 数据库链接服务Azure SQL Database linked service

此 Azure SQL 数据库实例包含要复制到 Blob 存储的数据。This Azure SQL Database instance contains the data to be copied to Blob storage.

{
    "name": "AzureSqlLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "description": "",
        "typeProperties": {
                        "connectionString": "Server=<server>;Initial Catalog=<database>;User ID=<user>;Password=<password>;"
        }
    }
}

sourcetable.jsonsourcetable.json

对象集Set of objects

{
  "Id": "1",
  "tableName": "Table1"
}
{
   "Id": "2",
  "tableName": "Table2"
}

对象数组Array of objects

[ 
    {
        "Id": "1",
        "tableName": "Table1"
    },
    {
        "Id": "2",
        "tableName": "Table2"
    }
]

限制和解决方法Limitations and workarounds

以下是 Lookup 活动的一些限制以及建议的解决方法。Here are some limitations of the Lookup activity and suggested workarounds.

限制Limitation 解决方法Workaround
Lookup 活动最多有 5,000 行,最大大小为 2 MB。The Lookup activity has a maximum of 5,000 rows, and a maximum size of 2 MB. 设计一个外部管道对内部管道进行迭代的两级管道,该管道会检索不超过最大行数或大小的数据。Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.

后续步骤Next steps

查看数据工厂支持的其他控制流活动:See other control flow activities supported by Data Factory: