Azure 数据工厂中的“获取元数据”活动Get Metadata activity in Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

可以使用“获取元数据”活动来检索 Azure 数据工厂中任何数据的元数据。You can use the Get Metadata activity to retrieve the metadata of any data in Azure Data Factory. 可在以下方案中使用此活动:You can use this activity in the following scenarios:

  • 验证任何数据的元数据。Validate the metadata of any data.
  • 数据就绪/可用时触发管道。Trigger a pipeline when data is ready/available.

控制流中有以下功能:The following functionality is available in the control flow:

  • 可以在条件表达式中使用“获取元数据”活动的输出来执行验证。You can use the output from the Get Metadata activity in conditional expressions to perform validation.
  • 可以在满足条件时通过 Do Until 循环触发管道。You can trigger a pipeline when a condition is satisfied via Do Until looping.

功能Capabilities

“获取元数据”活动采用数据集作为输入,并返回元数据信息作为输出。The Get Metadata activity takes a dataset as an input and returns metadata information as output. 目前支持以下连接器以及对应的可检索元数据。Currently, the following connectors and corresponding retrievable metadata are supported. 返回的元数据的最大大小为 2 MB。The maximum size of returned metadata is 2 MB.

备注

如果在自承载集成运行时中运行“获取元数据”活动,3.6 或更高版本将支持最新的功能。If you run the Get Metadata activity on a self-hosted integration runtime, the latest capabilities are supported on version 3.6 or later.

受支持的连接器Supported connectors

文件存储File storage

连接器/元数据Connector/Metadata itemNameitemName
(文件/文件夹)(file/folder)
itemTypeitemType
(文件/文件夹)(file/folder)
大小size
(文件)(file)
createdcreated
(文件/文件夹)(file/folder)
lastModifiedlastModified
(文件/文件夹)(file/folder)
childItemschildItems
(文件夹)(folder)
contentMD5contentMD5
(文件)(file)
structurestructure
(文件)(file)
columnCountcolumnCount
(文件)(file)
existsexists
(文件/文件夹)(file/folder)
Amazon S3Amazon S3 √/√√/√ √/√√/√ x/xx/x √/√*√/√* xx √/√*√/√*
Google Cloud StorageGoogle Cloud Storage √/√√/√ √/√√/√ x/xx/x √/√*√/√* xx √/√*√/√*
Azure Blob 存储Azure Blob storage √/√√/√ √/√√/√ x/xx/x √/√*√/√* √/√√/√
Azure Data Lake Storage Gen2Azure Data Lake Storage Gen2 √/√√/√ √/√√/√ x/xx/x √/√√/√ xx √/√√/√
Azure 文件Azure Files √/√√/√ √/√√/√ √/√√/√ √/√√/√ xx √/√√/√
文件系统File system √/√√/√ √/√√/√ √/√√/√ √/√√/√ xx √/√√/√
SFTPSFTP √/√√/√ √/√√/√ x/xx/x √/√√/√ xx √/√√/√
FTPFTP √/√√/√ √/√√/√ x/xx/x x/xx/x xx √/√√/√
  • 对文件夹使用“获取元数据”活动时,请确保对给定文件夹具有“列出/执行”权限。When using Get Metadata activity against a folder, make sure you have LIST/EXECUTE permission to the given folder.
  • 对于 Amazon S3 和 Google 云存储,lastModified 适用于桶和键,但不适用于虚拟文件夹;而 exists 适用于桶和键,但不适用于前缀或虚拟文件夹。For Amazon S3 and Google Cloud Storage, lastModified applies to the bucket and the key but not to the virtual folder, and exists applies to the bucket and the key but not to the prefix or virtual folder.
  • 对于 Azure Blob 存储,lastModified 适用于容器和 Blob,但不适用于虚拟文件夹。For Azure Blob storage, lastModified applies to the container and the blob but not to the virtual folder.
  • lastModified 筛选器当前适用于筛选子项,但不适用于筛选指定的文件夹/文件本身。lastModified filter currently applies to filter child items but not the specified folder/file itself.
  • “获取元数据”活动不支持文件夹/文件的通配符筛选器。Wildcard filter on folders/files is not supported for Get Metadata activity.
  • 从二进制文件、JSON 文件或 XML 文件获取元数据时,不支持 structurecolumnCountstructure and columnCount are not supported when getting metadata from Binary, JSON, or XML files.

关系数据库Relational database

连接器/元数据Connector/Metadata structurestructure columnCountcolumnCount existsexists
Azure SQL 数据库Azure SQL Database
Azure SQL 托管实例Azure SQL Managed Instance
Azure Synapse AnalyticsAzure Synapse Analytics
SQL ServerSQL Server

元数据选项Metadata options

可以在“获取元数据”活动字段列表中指定以下元数据类型,以检索相应的信息:You can specify the following metadata types in the Get Metadata activity field list to retrieve the corresponding information:

元数据类型Metadata type 说明Description
itemNameitemName 文件或文件夹的名称。Name of the file or folder.
itemTypeitemType 文件或文件夹的类型。Type of the file or folder. 返回的值为 FileFolderReturned value is File or Folder.
大小size 文件大小,以字节为单位。Size of the file, in bytes. 仅适用于文件。Applicable only to files.
createdcreated 文件或文件夹的创建日期时间。Created datetime of the file or folder.
lastModifiedlastModified 文件或文件夹的上次修改日期时间。Last modified datetime of the file or folder.
childItemschildItems 给定文件夹中的子文件夹和文件列表。List of subfolders and files in the given folder. 仅适用于文件夹。Applicable only to folders. 返回的值为每个子项的名称和类型列表。Returned value is a list of the name and type of each child item.
contentMD5contentMD5 文件的 MD5。MD5 of the file. 仅适用于文件。Applicable only to files.
structurestructure 文件或关系数据库表的数据结构。Data structure of the file or relational database table. 返回的值为列名称和列类型列表。Returned value is a list of column names and column types.
columnCountcolumnCount 文件或关系表中的列数。Number of columns in the file or relational table.
existsexists 是否存在某个文件、文件夹或表。Whether a file, folder, or table exists. 请注意,如果在“获取元数据”字段列表中指定了 exists,那么,即使不存在该文件、文件夹或表,该活动也不会失败,Note that if exists is specified in the Get Metadata field list, the activity won't fail even if the file, folder, or table doesn't exist. 而是在输出中返回 exists: falseInstead, exists: false is returned in the output.

提示

若要验证是否存在某个文件、文件夹或表,请在“获取元数据”活动字段列表中指定 existsWhen you want to validate that a file, folder, or table exists, specify exists in the Get Metadata activity field list. 然后可以检查活动输出中的 exists: true/false 结果。You can then check the exists: true/false result in the activity output. 如果未在该字段列表中指定 exists,那么,在找不到对象时,“获取元数据”活动将会失败。If exists isn't specified in the field list, the Get Metadata activity will fail if the object isn't found.

备注

从文件存储获取元数据以及配置 modifiedDatetimeStartmodifiedDatetimeEnd 时,输出中的 childItems 只包含给定路径中其最近修改时间在指定范围内的文件。When you get metadata from file stores and configure modifiedDatetimeStart or modifiedDatetimeEnd, the childItems in output will include only files in the given path that have a last modified time within the specified range. 它不包含子文件夹中的项。In won’t include items in subfolders.

语法Syntax

获取元数据活动Get Metadata activity

{
    "name":"MyActivity",
    "type":"GetMetadata",
    "dependsOn":[

    ],
    "policy":{
        "timeout":"7.00:00:00",
        "retry":0,
        "retryIntervalInSeconds":30,
        "secureOutput":false,
        "secureInput":false
    },
    "userProperties":[

    ],
    "typeProperties":{
        "dataset":{
            "referenceName":"MyDataset",
            "type":"DatasetReference"
        },
        "fieldList":[
            "size",
            "lastModified",
            "structure"
        ],
        "storeSettings":{
            "type":"AzureBlobStorageReadSettings"
        },
        "formatSettings":{
            "type":"JsonReadSettings"
        }
    }
}

数据集Dataset

{
    "name":"MyDataset",
    "properties":{
        "linkedServiceName":{
            "referenceName":"AzureStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "annotations":[

        ],
        "type":"Json",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "fileName":"file.json",
                "folderPath":"folder",
                "container":"container"
            }
        }
    }
}

Type 属性Type properties

目前,“获取元数据”活动可以返回以下类型的元数据信息:Currently, the Get Metadata activity can return the following types of metadata information:

属性Property 描述Description 必须Required
fieldListfieldList 所需元数据信息的类型。The types of metadata information required. 有关支持的元数据的详细信息,请参阅本文的元数据选项部分。For details on supported metadata, see the Metadata options section of this article. Yes
datasetdataset 引用数据集,其元数据将由“获取元数据”活动检索。The reference dataset whose metadata is to be retrieved by the Get Metadata activity. 有关支持的连接器的信息,请参阅功能部分。See the Capabilities section for information on supported connectors. 有关数据集语法详细信息,请参阅特定的连接器主题。Refer to the specific connector topics for dataset syntax details. Yes
formatSettingsformatSettings 使用格式类型数据集时适用。Apply when using format type dataset. No
storeSettingsstoreSettings 使用格式类型数据集时适用。Apply when using format type dataset. No

示例输出Sample output

“获取元数据”的结果显示在活动输出中。The Get Metadata results are shown in the activity output. 以下两个示例演示了大量的元数据选项。Following are two samples showing extensive metadata options. 若要在后续活动中使用这些结果,请使用以下模式:@{activity('MyGetMetadataActivity').output.itemName}To use the results in a subsequent activity, use this pattern: @{activity('MyGetMetadataActivity').output.itemName}.

获取文件的元数据Get a file's metadata

{
  "exists": true,
  "itemName": "test.csv",
  "itemType": "File",
  "size": 104857600,
  "lastModified": "2017-02-23T06:17:09Z",
  "created": "2017-02-23T06:17:09Z",
  "contentMD5": "cMauY+Kz5zDm3eWa9VpoyQ==",
  "structure": [
    {
        "name": "id",
        "type": "Int64"
    },
    {
        "name": "name",
        "type": "String"
    }
  ],
  "columnCount": 2
}

获取文件夹的元数据Get a folder's metadata

{
  "exists": true,
  "itemName": "testFolder",
  "itemType": "Folder",
  "lastModified": "2017-02-23T06:17:09Z",
  "created": "2017-02-23T06:17:09Z",
  "childItems": [
    {
      "name": "test.avro",
      "type": "File"
    },
    {
      "name": "folder hello",
      "type": "Folder"
    }
  ]
}

后续步骤Next steps

了解数据工厂支持的其他控制流活动:Learn about other control flow activities supported by Data Factory: