使用 Azure 数据工厂从/向 Azure 文件存储复制数据Copy data from or to Azure File Storage by using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本文概述了如何将数据复制到 Azure 文件存储和从 Azure 文件存储复制数据。This article outlines how to copy data to and from Azure File Storage. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

支持的功能Supported capabilities

以下活动支持此 Azure 文件存储连接器:This Azure File Storage connector is supported for the following activities:

可将数据从 Azure 文件存储复制到任一支持的接收器数据存储,反之亦然。You can copy data from Azure File Storage to any supported sink data store, or copy data from any supported source data store to Azure File Storage. 有关复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores that Copy Activity supports as sources and sinks, see Supported data stores and formats.

具体而言,此 Azure 文件存储连接器支持:Specifically, this Azure File Storage connector supports:

入门Getting started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

下面的部分详细阐述了定义 Azure 文件存储特定的数据工厂实体时所用的属性。The following sections provide details about properties that are used to define Data Factory entities specific to Azure File Storage.

链接服务属性Linked service properties

此 Azure 文件存储连接器支持以下身份验证类型。This Azure File Storage connector supports the following authentication types. 有关详细信息,请参阅相应部分。See the corresponding sections for details.

备注

如果你使用包含旧模型的 Azure 文件存储链接服务,其中 ADF 创作 UI 上显示为“基本身份验证”,则它仍像以前一样受支持,但建议使用新模型。If you were using Azure File Storage linked service with legacy model, where on ADF authoring UI shown as "Basic authentication", it is still supported as-is, while you are suggested to use the new model going forward. 旧模型通过服务器消息块 (SMB) 与存储相互传输数据,而新模型利用具有更高吞吐量的存储 SDK。The legacy model transfers data from/to storage over Server Message Block (SMB), while the new model utilizes the storage SDK which has better throughput. 若要升级,可以编辑链接服务以将身份验证方法切换为“帐户密钥”或“SAS URI”,无需更改数据集或复制活动。To upgrade, you can edit your linked service to switch the authentication method to "Account key" or "SAS URI"; no change needed on dataset or copy activity.

帐户密钥身份验证Account key authentication

数据工厂支持使用以下属性进行 Azure 文件存储帐户密钥身份验证:Data Factory supports the following properties for Azure File Storage account key authentication:

PropertyProperty 说明Description 必需Required
typetype type 属性必须设置为: AzureFileStorageThe type property must be set to: AzureFileStorage . Yes
connectionStringconnectionString 指定连接到 Azure 文件存储所需的信息。Specify the information needed to connect to Azure File Storage.
还可以将帐户密钥放在 Azure Key Vault 中,从连接字符串中拉取 accountKey 配置。You can also put the account key in Azure Key Vault and pull the accountKey configuration out of the connection string. 有关详细信息,请参阅以下示例和在 Azure Key Vault 中存储凭据一文。For more information, see the following samples and the Store credentials in Azure Key Vault article.
Yes
fileSharefileShare 指定文件共享。Specify the file share. Yes
快照snapshot 如果要从快照复制,请指定文件共享快照的日期。Specify the date of the file share snapshot if you want to copy from a snapshot. No
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 如果数据存储位于专用网络,则可以使用 Azure Integration Runtime 或自承载集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in private network). 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

示例:Example:

{
    "name": "AzureFileStorageLinkedService",
    "properties": {
        "type": "AzureFileStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>;EndpointSuffix=core.chinacloudapi.cn;",
            "fileShare": "<file share name>"
        },
        "connectVia": {
          "referenceName": "<name of Integration Runtime>",
          "type": "IntegrationRuntimeReference"
        }
    }
}

示例:在 Azure Key Vault 中存储帐户密钥Example: store the account key in Azure Key Vault

{
    "name": "AzureFileStorageLinkedService",
    "properties": {
        "type": "AzureFileStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;EndpointSuffix=core.chinacloudapi.cn;",
            "fileShare": "<file share name>",
            "accountKey": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "<Azure Key Vault linked service name>",
                    "type": "LinkedServiceReference"
                },
                "secretName": "<secretName>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }            
    }
}

共享访问签名身份验证Shared access signature authentication

共享访问签名对存储帐户中的资源提供委托访问。A shared access signature provides delegated access to resources in your storage account. 使用共享访问签名可以在指定的时间内授予客户端对存储帐户中对象的有限访问权限。You can use a shared access signature to grant a client limited permissions to objects in your storage account for a specified time. 有关共享访问签名的详细信息,请参阅共享访问签名:了解共享访问签名模型For more information about shared access signatures, see Shared access signatures: Understand the shared access signature model.

数据工厂支持通过以下属性来使用共享访问签名身份验证:Data Factory supports the following properties for using shared access signature authentication:

属性Property 说明Description 必需Required
typetype type 属性必须设置为: AzureFileStorageThe type property must be set to: AzureFileStorage . Yes
sasUrisasUri 指定资源的共享访问签名 URI。Specify the shared access signature URI to the resources.
将此字段标记为 SecureString ,以便安全地将其存储在数据工厂中。Mark this field as SecureString to store it securely in Data Factory. 还可以将 SAS 令牌放在 Azure Key Vault 中,以使用自动轮换和删除令牌部分。You can also put the SAS token in Azure Key Vault to use auto-rotation and remove the token portion. 有关详细信息,请参阅以下示例和在 Azure Key Vault 中存储凭据For more information, see the following samples and Store credentials in Azure Key Vault.
Yes
fileSharefileShare 指定文件共享。Specify the file share. Yes
快照snapshot 如果要从快照复制,请指定文件共享快照的日期。Specify the date of the file share snapshot if you want to copy from a snapshot. No
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 如果数据存储位于专用网络,则可以使用 Azure Integration Runtime 或自承载集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in private network). 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

示例:Example:

{
    "name": "AzureFileStorageLinkedService",
    "properties": {
        "type": "AzureFileStorage",
        "typeProperties": {
            "sasUri": {
                "type": "SecureString",
                "value": "<SAS URI of the resource e.g. https://<accountname>.file.core.chinacloudapi.cn/?sv=<storage version>&st=<start time>&se=<expire time>&sr=<resource>&sp=<permissions>&sip=<ip range>&spr=<protocol>&sig=<signature>>"
            },
            "fileShare": "<file share name>",
            "snapshot": "<snapshot version>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例:在 Azure Key Vault 中存储帐户密钥Example: store the account key in Azure Key Vault

{
    "name": "AzureFileStorageLinkedService",
    "properties": {
        "type": "AzureFileStorage",
        "typeProperties": {
            "sasUri": {
                "type": "SecureString",
                "value": "<SAS URI of the Azure Storage resource without token e.g. https://<accountname>.file.core.chinacloudapi.cn/>"
            },
            "sasToken": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "<Azure Key Vault linked service name>",
                    "type": "LinkedServiceReference"
                },
                "secretName": "<secretName with value of SAS token e.g. ?sv=<storage version>&st=<start time>&se=<expire time>&sr=<resource>&sp=<permissions>&sip=<ip range>&spr=<protocol>&sig=<signature>>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

旧模型Legacy model

PropertyProperty 说明Description 必需Required
typetype type 属性必须设置为: AzureFileStorageThe type property must be set to: AzureFileStorage . Yes
hosthost 将 Azure 文件存储终结点指定为:Specifies the Azure File Storage endpoint as:
-使用 UI:指定 \\<storage name>.file.core.chinacloudapi.cn\<file service name>-Using UI: specify \\<storage name>.file.core.chinacloudapi.cn\<file service name>
-使用 JSON:"host": "\\\\<storage name>.file.core.chinacloudapi.cn\\<file service name>"- Using JSON: "host": "\\\\<storage name>.file.core.chinacloudapi.cn\\<file service name>".
Yes
useriduserid 将要访问 Azure 文件存储的用户指定为:Specify the user to access the Azure File Storage as:
-使用 UI:指定 AZURE\<storage name>-Using UI: specify AZURE\<storage name>
-使用 JSON:"userid": "AZURE\\<storage name>"-Using JSON: "userid": "AZURE\\<storage name>".
Yes
passwordpassword 指定存储访问密钥。Specify the storage access key. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 如果数据存储位于专用网络,则可以使用 Azure Integration Runtime 或自承载集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in private network). 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. 对于源为“No”,对于接收器为“Yes”No for source, Yes for sink

示例:Example:

{
    "name": "AzureFileStorageLinkedService",
    "properties": {
        "type": "AzureFileStorage",
        "typeProperties": {
            "host": "\\\\<storage name>.file.core.chinacloudapi.cn\\<file service name>",
            "userid": "AZURE\\<storage name>",
            "password": {
                "type": "SecureString",
                "value": "<storage access key>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article.

Azure 数据工厂支持以下文件格式。Azure Data Factory supports the following file formats. 请参阅每一篇介绍基于格式的设置的文章。Refer to each article for format-based settings.

在基于格式的数据集中的 location 设置下,Azure 文件存储支持以下属性:The following properties are supported for Azure File Storage under location settings in format-based dataset:

propertiesProperty 说明Description 必需Required
typetype 数据集中 location 下的 type 属性必须设置为 AzureFileStorageLocation。The type property under location in dataset must be set to AzureFileStorageLocation . Yes
folderPathfolderPath 文件夹的路径。The path to folder. 如果要使用通配符筛选文件夹,请跳过此设置并在活动源设置中指定。If you want to use wildcard to filter folder, skip this setting and specify in activity source settings. No
fileNamefileName 给定 folderPath 下的文件名。The file name under the given folderPath. 如果要使用通配符筛选文件,请跳过此设置并在活动源设置中指定。If you want to use wildcard to filter files, skip this setting and specify in activity source settings. No

示例:Example:

{
    "name": "DelimitedTextDataset",
    "properties": {
        "type": "DelimitedText",
        "linkedServiceName": {
            "referenceName": "<Azure File Storage linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, auto retrieved during authoring > ],
        "typeProperties": {
            "location": {
                "type": "AzureFileStorageLocation",
                "folderPath": "root/folder/subfolder"
            },
            "columnDelimiter": ",",
            "quoteChar": "\"",
            "firstRowAsHeader": true,
            "compressionCodec": "gzip"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 Azure 文件存储源和接收器支持的属性列表。This section provides a list of properties supported by Azure File Storage source and sink.

用作源的 Azure 文件存储Azure File Storage as source

Azure 数据工厂支持以下文件格式。Azure Data Factory supports the following file formats. 请参阅每一篇介绍基于格式的设置的文章。Refer to each article for format-based settings.

在基于格式的复制源中的 storeSettings 设置下,Azure 文件存储支持以下属性:The following properties are supported for Azure File Storage under storeSettings settings in format-based copy source:

propertiesProperty 说明Description 必需Required
typetype storeSettings 下的 type 属性必须设置为 AzureFileStorageReadSettings。The type property under storeSettings must be set to AzureFileStorageReadSettings . Yes
找到要复制的文件:Locate the files to copy:
选项 1:静态路径OPTION 1: static path
从数据集中指定的给定文件夹/文件路径复制。Copy from the given folder/file path specified in the dataset. 若要复制文件夹中的所有文件,请另外将 wildcardFileName 指定为 *If you want to copy all files from a folder, additionally specify wildcardFileName as *.
选项 2:文件前缀OPTION 2: file prefix
- prefix- prefix
数据集中配置的给定文件共享下的文件名的前缀,用于筛选源文件。Prefix for the file name under the given file share configured in a dataset to filter source files. 选择了名称以 fileshare_in_linked_service/this_prefix 开头的文件。Files with name starting with fileshare_in_linked_service/this_prefix are selected. 它利用 Azure 文件存储的服务端筛选器,与通配符筛选器相比,该筛选器可提供更好的性能。It utilizes the service-side filter for Azure File Storage, which provides better performance than a wildcard filter. 使用旧链接服务模型时不支持此功能。This feature is not supported when using a legacy linked service model. No
选项 3:通配符OPTION 3: wildcard
- wildcardFolderPath- wildcardFolderPath
带有通配符的文件夹路径,用于筛选源文件夹。The folder path with wildcard characters to filter source folders.
允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件夹名称中包含通配符或此转义字符,请使用 ^ 进行转义。Allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual folder name has wildcard or this escape char inside.
请参阅文件夹和文件筛选器示例中的更多示例。See more examples in Folder and file filter examples.
No
选项 3:通配符OPTION 3: wildcard
- wildcardFileName- wildcardFileName
给定的 folderPath/wildcardFolderPath 下带有通配符的文件名,用于筛选源文件。The file name with wildcard characters under the given folderPath/wildcardFolderPath to filter source files.
允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件夹名称中包含通配符或此转义字符,请使用 ^ 进行转义。Allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual folder name has wildcard or this escape char inside. 请参阅文件夹和文件筛选器示例中的更多示例。See more examples in Folder and file filter examples.
Yes
选项 4:文件列表OPTION 4: a list of files
- fileListPath- fileListPath
指明复制给定文件集。Indicates to copy a given file set. 指向包含要复制的文件列表的文本文件,每行一个文件(即数据集中所配置路径的相对路径)。Point to a text file that includes a list of files you want to copy, one file per line, which is the relative path to the path configured in the dataset.
使用此选项时,请不要在数据集中指定文件名。When using this option, do not specify file name in dataset. 请参阅文件列表示例中的更多示例。See more examples in File list examples.
No
其他设置:Additional settings:
recursiverecursive 指示是要从子文件夹中以递归方式读取数据,还是只从指定的文件夹中读取数据。Indicates whether the data is read recursively from the subfolders or only from the specified folder. 当 recursive 设置为 true 且接收器是基于文件的存储时,将不会在接收器上复制或创建空的文件夹或子文件夹。When recursive is set to true and the sink is a file-based store, an empty folder or subfolder isn't copied or created at the sink.
允许的值为 true (默认值)和 falseAllowed values are true (default) and false .
如果配置 fileListPath,则此属性不适用。This property doesn't apply when you configure fileListPath.
No
deleteFilesAfterCompletiondeleteFilesAfterCompletion 指示是否会在二进制文件成功移到目标存储后将其从源存储中删除。Indicates whether the binary files will be deleted from source store after successfully moving to the destination store. 文件删除按文件进行。因此,当复制活动失败时,你会看到一些文件已经复制到目标并从源中删除,而另一些文件仍保留在源存储中。The file deletion is per file, so when copy activity fails, you will see some files have already been copied to the destination and deleted from source, while others are still remaining on source store.
此属性仅在二进制文件复制方案中有效。This property is only valid in binary files copy scenario. 默认值:false。The default value: false.
No
modifiedDatetimeStartmodifiedDatetimeStart 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified.
如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time is within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".
属性可以为 NULL,这意味着不向数据集应用任何文件属性筛选器。The properties can be NULL, which means no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
如果配置 fileListPath,则此属性不适用。This property doesn't apply when you configure fileListPath.
No
modifiedDatetimeEndmodifiedDatetimeEnd 同上。Same as above. No
enablePartitionDiscoveryenablePartitionDiscovery 对于已分区的文件,请指定是否从文件路径分析分区,并将它们添加为附加的源列。For files that are partitioned, specify whether to parse the partitions from the file path and add them as additional source columns.
允许的值为 false(默认)和 true 。Allowed values are false (default) and true .
No
partitionRootPathpartitionRootPath 启用分区发现时,请指定绝对根路径,以便将已分区文件夹读取为数据列。When partition discovery is enabled, specify the absolute root path in order to read partitioned folders as data columns.

如果未指定,默认情况下,If it is not specified, by default,
- 在数据集或源的文件列表中使用文件路径时,分区根路径是在数据集中配置的路径。- When you use file path in dataset or list of files on source, partition root path is the path configured in dataset.
- 使用通配符文件夹筛选器时,分区根路径是第一个通配符前的子路径。- When you use wildcard folder filter, partition root path is the sub-path before the first wildcard.

例如,假设你将数据集中的路径配置为“root/folder/year=2020/month=08/day=27”:For example, assuming you configure the path in dataset as "root/folder/year=2020/month=08/day=27":
- 如果将分区根路径指定为“root/folder/year=2020”,则除了文件内的列外,复制活动还将生成另外两个列 monthday,其值分别为“08”和“27”。- If you specify partition root path as "root/folder/year=2020", copy activity will generate two more columns month and day with value "08" and "27" respectively, in addition to the columns inside the files.
- 如果未指定分区根路径,则不会生成额外的列。- If partition root path is not specified, no extra column will be generated.
No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到存储库的连接数。The number of the connections to connect to storage store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

示例:Example:

"activities":[
    {
        "name": "CopyFromAzureFileStorage",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Delimited text input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "DelimitedTextSource",
                "formatSettings":{
                    "type": "DelimitedTextReadSettings",
                    "skipLineCount": 10
                },
                "storeSettings":{
                    "type": "AzureFileStorageReadSettings",
                    "recursive": true,
                    "wildcardFolderPath": "myfolder*A",
                    "wildcardFileName": "*.csv"
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

用作接收器的 Azure 文件存储Azure File Storage as sink

Azure 数据工厂支持以下文件格式。Azure Data Factory supports the following file formats. 请参阅每一篇介绍基于格式的设置的文章。Refer to each article for format-based settings.

在基于格式的复制接收器中的 storeSettings 设置下,Azure 文件存储支持以下属性:The following properties are supported for Azure File Storage under storeSettings settings in format-based copy sink:

propertiesProperty 说明Description 必需Required
typetype storeSettings 下的 type 属性必须设置为 AzureFileStorageWriteSettings。The type property under storeSettings must be set to AzureFileStorageWriteSettings . Yes
copyBehaviorcopyBehavior 定义以基于文件的数据存储中的文件为源时的复制行为。Defines the copy behavior when the source is files from a file-based data store.

允许值包括:Allowed values are:
- PreserveHierarchy(默认):将文件层次结构保留到目标文件夹中。- PreserveHierarchy (default): Preserves the file hierarchy in the target folder. 从源文件到源文件夹的相对路径与从目标文件到目标文件夹的相对路径相同。The relative path of source file to source folder is identical to the relative path of target file to target folder.
- FlattenHierarchy:源文件夹中的所有文件都位于目标文件夹的第一级中。- FlattenHierarchy: All files from the source folder are in the first level of the target folder. 目标文件具有自动生成的名称。The target files have autogenerated names.
- MergeFiles:将源文件夹中的所有文件合并到一个文件中。- MergeFiles: Merges all files from the source folder to one file. 如果指定了文件名,则合并文件的名称为指定名称。If the file name is specified, the merged file name is the specified name. 否则,它是自动生成的文件名。Otherwise, it's an autogenerated file name.
No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到数据存储的连接数。The number of the connections to connect to the data store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

示例:Example:

"activities":[
    {
        "name": "CopyToAzureFileStorage",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Parquet output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "ParquetSink",
                "storeSettings":{
                    "type": "AzureFileStorageWriteSettings",
                    "copyBehavior": "PreserveHierarchy"
                }
            }
        }
    }
]

文件夹和文件筛选器示例Folder and file filter examples

本部分介绍使用通配符筛选器生成文件夹路径和文件名的行为。This section describes the resulting behavior of the folder path and file name with wildcard filters.

folderPathfolderPath fileNamefileName recursiverecursive 源文件夹结构和筛选器结果(用 粗体 表示的文件已检索)Source folder structure and filter result (files in bold are retrieved)
Folder* (为空,使用默认值)(empty, use default) falsefalse FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* (为空,使用默认值)(empty, use default) true FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* *.csv falsefalse FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* *.csv true FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv

文件列表示例File list examples

本部分介绍了在复制活动源中使用文件列表路径时的结果行为。This section describes the resulting behavior of using file list path in copy activity source.

假设有以下源文件夹结构,并且要复制加粗显示的文件:Assuming you have the following source folder structure and want to copy the files in bold:

示例源结构Sample source structure FileListToCopy.txt 中的内容Content in FileListToCopy.txt ADF 配置ADF configuration
rootroot
    FolderA    FolderA
        File1.csv        File1.csv
        File2.json        File2.json
        Subfolder1        Subfolder1
            File3.csv            File3.csv
            File4.json            File4.json
            File5.csv            File5.csv
    元数据    Metadata
        FileListToCopy.txt        FileListToCopy.txt
File1.csvFile1.csv
Subfolder1/File3.csvSubfolder1/File3.csv
Subfolder1/File5.csvSubfolder1/File5.csv
在数据集中:In dataset:
- 文件夹路径:root/FolderA- Folder path: root/FolderA

在复制活动源中:In copy activity source:
- 文件列表路径:root/Metadata/FileListToCopy.txt- File list path: root/Metadata/FileListToCopy.txt

文件列表路径指向同一数据存储中的文本文件,其中包含要复制的文件列表,每行一个文件,以及在数据集中配置的路径的相对路径。The file list path points to a text file in the same data store that includes a list of files you want to copy, one file per line with the relative path to the path configured in the dataset.

recursive 和 copyBehavior 示例recursive and copyBehavior examples

本节介绍了将 recursive 和 copyBehavior 值进行不同组合所产生的复制操作行为。This section describes the resulting behavior of the Copy operation for different combinations of recursive and copyBehavior values.

recursiverecursive copyBehaviorcopyBehavior 源文件夹结构Source folder structure 生成目标Resulting target
true preserveHierarchypreserveHierarchy Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用与源相同的结构创建目标文件夹 Folder1:The target folder Folder1 is created with the same structure as the source:

Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5。        File5.
true flattenHierarchyflattenHierarchy Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用以下结构创建目标 Folder1:The target Folder1 is created with the following structure:

Folder1Folder1
    File1 的自动生成的名称    autogenerated name for File1
    File2 的自动生成的名称    autogenerated name for File2
    File3 的自动生成的名称    autogenerated name for File3
    File4 的自动生成的名称    autogenerated name for File4
    File5 的自动生成的名称    autogenerated name for File5
true mergeFilesmergeFiles Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用以下结构创建目标 Folder1:The target Folder1 is created with the following structure:

Folder1Folder1
    File1 + File2 + File3 + File4 + File 5 的内容将合并到一个文件中,且自动生成文件名    File1 + File2 + File3 + File4 + File 5 contents are merged into one file with autogenerated file name
falsefalse preserveHierarchypreserveHierarchy Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用以下结构创建目标文件夹 Folder1The target folder Folder1 is created with the following structure

Folder1Folder1
    File1    File1
    File2    File2

不会选取包含 File3、File4 和 File5 的 Subfolder1。Subfolder1 with File3, File4, and File5 are not picked up.
falsefalse flattenHierarchyflattenHierarchy Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用以下结构创建目标文件夹 Folder1The target folder Folder1 is created with the following structure

Folder1Folder1
    File1 的自动生成的名称    autogenerated name for File1
    File2 的自动生成的名称    autogenerated name for File2

不会选取包含 File3、File4 和 File5 的 Subfolder1。Subfolder1 with File3, File4, and File5 are not picked up.
falsefalse mergeFilesmergeFiles Folder1Folder1
    File1    File1
    File2    File2
    Subfolder1    Subfolder1
        File3        File3
        File4        File4
        File5        File5
使用以下结构创建目标文件夹 Folder1The target folder Folder1 is created with the following structure

Folder1Folder1
    File1 + File2 的内容将合并到一个文件中,且自动生成文件名。    File1 + File2 contents are merged into one file with autogenerated file name. File1 的自动生成的名称autogenerated name for File1

不会选取包含 File3、File4 和 File5 的 Subfolder1。Subfolder1 with File3, File4, and File5 are not picked up.

Lookup 活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

GetMetadata 活动属性GetMetadata activity properties

若要了解有关属性的详细信息,请查看 GetMetadata 活动To learn details about the properties, check GetMetadata activity

Delete 活动属性Delete activity properties

若要详细了解这些属性,请查看 Delete 活动To learn details about the properties, check Delete activity

旧模型Legacy models

备注

仍按原样支持以下模型,以实现向后兼容性。The following models are still supported as-is for backward compatibility. 建议你以后使用前面部分中提到的新模型,ADF 创作 UI 已经切换到生成新模型。You are suggested to use the new model mentioned in above sections going forward, and the ADF authoring UI has switched to generating the new model.

旧数据集模型Legacy dataset model

propertiesProperty 说明Description 必需Required
typetype 数据集的 type 属性必须设置为: FileShareThe type property of the dataset must be set to: FileShare Yes
folderPathfolderPath 文件夹路径。Path to the folder.

支持通配符筛选器,允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件夹名中包含通配符或此转义字符,请使用 ^ 进行转义。Wildcard filter is supported, allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual folder name has wildcard or this escape char inside.

示例:“rootfolder/subfolder/”,请参阅文件夹和文件筛选器示例中的更多示例。Examples: rootfolder/subfolder/, see more examples in Folder and file filter examples.
Yes
fileNamefileName 指定“folderPath”下的文件的“名称或通配符筛选器”。Name or wildcard filter for the file(s) under the specified "folderPath". 如果没有为此属性指定任何值,则数据集会指向文件夹中的所有文件。If you don't specify a value for this property, the dataset points to all files in the folder.

对于筛选器,允许的通配符为:*(匹配零个或更多字符)和 ?(匹配零个或单个字符)。For filter, allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character).
- 示例 1:"fileName": "*.csv"- Example 1: "fileName": "*.csv"
- 示例 2:"fileName": "???20180427.txt"- Example 2: "fileName": "???20180427.txt"
如果实际文件名内具有通配符或此转义符,请使用 ^ 进行转义。Use ^ to escape if your actual file name has wildcard or this escape char inside.

如果没有为输出数据集指定 fileName,并且没有在活动接收器中指定 preserveHierarchy,则复制活动会自动生成采用以下模式的文件名称:“Data.[活动运行 ID GUID].[GUID (如果为 FlattenHierarchy)].[格式(如果已配置)].[压缩(如果已配置)]”,例如“Data.0a405f8a-93ff-4c6f-b3be-f69616f1df7a.txt.gz”;如果使用表名(而不是查询)从表格源进行复制,则名称模式为“[表名].[格式].[压缩(如果已配置)]”,例如“MyTable.csv”。When fileName isn't specified for an output dataset and preserveHierarchy isn't specified in the activity sink, the copy activity automatically generates the file name with the following pattern: " Data.[activity run ID GUID].[GUID if FlattenHierarchy].[format if configured].[compression if configured] ", for example "Data.0a405f8a-93ff-4c6f-b3be-f69616f1df7a.txt.gz"; if you copy from tabular source using table name instead of query, the name pattern is " [table name].[format].[compression if configured] ", for example "MyTable.csv".
No
modifiedDatetimeStartmodifiedDatetimeStart 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified. 如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time is within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".

请注意,若要从大量文件中筛选文件,启用此设置会影响数据移动的总体性能。Be aware the overall performance of data movement will be impacted by enabling this setting when you want to do file filter from huge amounts of files.

属性可以为 NULL,这意味着不向数据集应用任何文件属性筛选器。The properties can be NULL that means no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
No
modifiedDatetimeEndmodifiedDatetimeEnd 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified. 如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time is within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".

请注意,若要从大量文件中筛选文件,启用此设置会影响数据移动的总体性能。Be aware the overall performance of data movement will be impacted by enabling this setting when you want to do file filter from huge amounts of files.

属性可以为 NULL,这意味着不向数据集应用任何文件属性筛选器。The properties can be NULL that means no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
No
formatformat 如果想要在基于文件的存储之间按原样复制文件(二进制副本),可以在输入和输出数据集定义中跳过格式节。If you want to copy files as-is between file-based stores (binary copy), skip the format section in both input and output dataset definitions.

若要分析或生成具有特定格式的文件,以下是受支持的文件格式类型:TextFormat、JsonFormat、AvroFormat、OrcFormat、ParquetFormat 。If you want to parse or generate files with a specific format, the following file format types are supported: TextFormat , JsonFormat , AvroFormat , OrcFormat , ParquetFormat . 请将格式中的“type”属性设置为上述值之一。Set the type property under format to one of these values. 有关详细信息,请参阅文本格式Json 格式Avro 格式Orc 格式Parquet 格式部分。For more information, see Text Format, Json Format, Avro Format, Orc Format, and Parquet Format sections.
否(仅适用于二进制复制方案)No (only for binary copy scenario)
compressioncompression 指定数据的压缩类型和级别。Specify the type and level of compression for the data. 有关详细信息,请参阅受支持的文件格式和压缩编解码器For more information, see Supported file formats and compression codecs.
支持的类型包括: GZipDeflateBZip2ZipDeflateSupported types are: GZip , Deflate , BZip2 , and ZipDeflate .
支持的级别为:“最佳”和“最快” 。Supported levels are: Optimal and Fastest .
No

提示

如需复制文件夹下的所有文件,请仅指定 folderPathTo copy all files under a folder, specify folderPath only.
如需复制具有给定名称的单个文件,请使用文件夹部分指定 folderPath 并使用文件名指定 fileNameTo copy a single file with a given name, specify folderPath with folder part and fileName with file name.
如需复制文件夹下的文件子集,请指定文件夹部分的 folderPath 和通配符筛选器部分的 fileNameTo copy a subset of files under a folder, specify folderPath with folder part and fileName with wildcard filter.

备注

如果文件筛选器使用“fileFilter”属性,则在建议你今后使用添加到“fileName”的新筛选器功能时,仍按原样支持该属性。If you were using "fileFilter" property for file filter, it is still supported as-is, while you are suggested to use the new filter capability added to "fileName" going forward.

示例:Example:

{
    "name": "AzureFileStorageDataset",
    "properties": {
        "type": "FileShare",
        "linkedServiceName":{
            "referenceName": "<Azure File Storage linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "folderPath": "folder/subfolder/",
            "fileName": "*",
            "modifiedDatetimeStart": "2018-12-01T05:00:00Z",
            "modifiedDatetimeEnd": "2018-12-01T06:00:00Z",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "rowDelimiter": "\n"
            },
            "compression": {
                "type": "GZip",
                "level": "Optimal"
            }
        }
    }
}

旧复制活动源模型Legacy copy activity source model

propertiesProperty 说明Description 必需Required
typetype 复制活动 source 的 type 属性必须设置为:FileSystemSourceThe type property of the copy activity source must be set to: FileSystemSource Yes
recursiverecursive 指示是要从子文件夹中以递归方式读取数据,还是只从指定的文件夹中读取数据。Indicates whether the data is read recursively from the sub folders or only from the specified folder. 当 recursive 设置为 true 且接收器是基于文件的存储时,将不会在接收器上复制/创建空的文件夹/子文件夹。Note when recursive is set to true and sink is file-based store, empty folder/sub-folder will not be copied/created at sink.
允许的值为:true(默认)、false Allowed values are: true (default), false
No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到存储库的连接数。The number of the connections to connect to storage store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

示例:Example:

"activities":[
    {
        "name": "CopyFromAzureFileStorage",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure File Storage input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "FileSystemSource",
                "recursive": true
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

旧复制活动接收器模型Legacy copy activity sink model

propertiesProperty 说明Description 必需Required
typetype 复制活动接收器的 type 属性必须设置为: FileSystemSinkThe type property of the copy activity sink must be set to: FileSystemSink Yes
copyBehaviorcopyBehavior 定义以基于文件的数据存储中的文件为源时的复制行为。Defines the copy behavior when the source is files from file-based data store.

允许值包括:Allowed values are:
- PreserveHierarchy(默认值):保留目标文件夹中的文件层次结构。- PreserveHierarchy (default): preserves the file hierarchy in the target folder. 从源文件到源文件夹的相对路径与从目标文件到目标文件夹的相对路径相同。The relative path of source file to source folder is identical to the relative path of target file to target folder.
- FlattenHierarchy:源文件夹中的所有文件都位于目标文件夹的第一级。- FlattenHierarchy: all files from the source folder are in the first level of target folder. 目标文件具有自动生成的名称。The target files have autogenerated name.
- MergeFiles:将源文件夹中的所有文件合并到一个文件中。- MergeFiles: merges all files from the source folder to one file. 如果指定文件名,则合并的文件名将为指定的名称;否则,会自动生成文件名。If the File Name is specified, the merged file name would be the specified name; otherwise, would be autogenerated file name.
No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到存储库的连接数。The number of the connections to connect to storage store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

示例:Example:

"activities":[
    {
        "name": "CopyToAzureFileStorage",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure File Storage output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "FileSystemSink",
                "copyBehavior": "PreserveHierarchy"
            }
        }
    }
]

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.