使用 Azure 数据工厂从 SFTP 服务器复制数据Copy data from SFTP server using Azure Data Factory

本文概述了如何从 SFTP 服务器复制数据。This article outlines how to copy data from SFTP server. 若要了解 Azure 数据工厂,请阅读介绍性文章To learn about Azure Data Factory, read the introductory article.

支持的功能Supported capabilities

以下活动支持此 SFTP 连接器:This SFTP connector is supported for the following activities:

具体而言,此 SFTP 连接器支持:Specifically, this SFTP connector supports:

先决条件Prerequisites

如果数据存储是以下方式之一配置的,则需要设置自托管集成运行时才能连接到此数据存储:If your data store is configured in one of the following ways, you need to set up a Self-hosted Integration Runtime in order to connect to this data store:

  • 数据存储位于本地网络内部、Azure 虚拟网络内部或 Amazon 虚拟私有云内。The data store is located inside an on-premises network, inside Azure Virtual Network, or inside Amazon Virtual Private Cloud.
  • 数据存储是一种托管云数据服务,在其中访问权限限制为列入防火墙规则允许列表的 IP。The data store is a managed cloud data service where the access is restricted to IPs whitelisted in the firewall rules.

入门Get started

可以通过以下工具或 SDK 之一结合使用复制活动和管道。You can use one of the following tools or SDKs to use the copy activity with a pipeline. 选择链接,查看分步说明:Select a link for step-by-step instructions:

对于特定于 SFTP 的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to SFTP.

链接服务属性Linked service properties

SFTP 链接的服务支持以下属性:The following properties are supported for SFTP linked service:

属性Property 说明Description 必选Required
typetype type 属性必须设置为:SftpThe type property must be set to: Sftp. Yes
hosthost SFTP 服务器的名称或 IP 地址。Name or IP address of the SFTP server. Yes
端口port SFTP 服务器侦听的端口。Port on which the SFTP server is listening.
允许的值是:整数,默认值是 22 。Allowed values are: integer, default value is 22.
No
skipHostKeyValidationskipHostKeyValidation 指定是否要跳过主机密钥验证。Specify whether to skip host key validation.
允许的值为:true、false(默认) 。Allowed values are: true, false (default).
No
hostKeyFingerprinthostKeyFingerprint 指定主机密钥的指纹。Specify the finger print of the host key. 是(如果“skipHostKeyValidation”设置为 false)。Yes if the "skipHostKeyValidation" is set to false.
authenticationTypeauthenticationType 指定身份验证类型。Specify authentication type.
允许值包括:BasicSshPublicKeyAllowed values are: Basic, SshPublicKey. 有关其他属性和 JSON 示例,请分别参阅使用基本身份验证使用 SSH 公钥身份验证部分。Refer to Using basic authentication and Using SSH public key authentication sections on more properties and JSON samples respectively.
Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 先决条件部分了解更多信息。Learn more from Prerequisites section. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

使用基本身份验证Using basic authentication

要使用基本身份验证,请将“authenticationType”属性设置为“基本” ,并指定除上一部分所述 SFTP 连接器泛型属性以外的下列属性:To use basic authentication, set "authenticationType" property to Basic, and specify the following properties besides the SFTP connector generic ones introduced in the last section:

属性Property 说明Description 必须Required
userNameuserName 有权访问 SFTP 服务器的用户。User who has access to the SFTP server. Yes
passwordpassword 用户 (userName) 的密码。Password for the user (userName). 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes

示例:Example:

{
    "name": "SftpLinkedService",
    "type": "linkedservices",
    "properties": {
        "type": "Sftp",
        "typeProperties": {
            "host": "<sftp server>",
            "port": 22,
            "skipHostKeyValidation": false,
            "hostKeyFingerPrint": "ssh-rsa 2048 xx:00:00:00:xx:00:x0:0x:0x:0x:0x:00:00:x0:x0:00",
            "authenticationType": "Basic",
            "userName": "<username>",
            "password": {
                "type": "SecureString",
                "value": "<password>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

使用 SSH 公钥身份验证Using SSH public key authentication

要使用 SSH 公钥身份验证,请将“authenticationType”属性设置为“SshPublicKey” ,并指定除上一部分所述 SFTP 连接器泛型属性以外的下列属性:To use SSH public key authentication, set "authenticationType" property as SshPublicKey, and specify the following properties besides the SFTP connector generic ones introduced in the last section:

属性Property 说明Description 必须Required
userNameuserName 有权访问 SFTP 服务器的用户User who has access to the SFTP server Yes
privateKeyPathprivateKeyPath 指定集成运行时可以访问的私钥文件的绝对路径。Specify absolute path to the private key file that Integration Runtime can access. 仅当在“connectVia”中指定自承载类型的集成运行时时适用。Applies only when Self-hosted type of Integration Runtime is specified in "connectVia". 指定 privateKeyPathprivateKeyContentSpecify either the privateKeyPath or privateKeyContent.
privateKeyContentprivateKeyContent Base64 编码的 SSH 私钥内容。Base64 encoded SSH private key content. SSH 私钥应采用 OpenSSH 格式。SSH private key should be OpenSSH format. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. 指定 privateKeyPathprivateKeyContentSpecify either the privateKeyPath or privateKeyContent.
passPhrasepassPhrase 如果密钥文件受通行短语保护,请指定用于解密私钥的通行短语/密码。Specify the pass phrase/password to decrypt the private key if the key file is protected by a pass phrase. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. 如果私钥文件受通行短语的保护,则为 Yes。Yes if the private key file is protected by a pass phrase.

Note

SFTP 连接器支持 RSA/DSA OpenSSH 密钥。SFTP connector supports RSA/DSA OpenSSH key. 请确保密钥文件内容以“-----BEGIN [RSA/DSA] PRIVATE KEY-----”开头。Make sure your key file content starts with "-----BEGIN [RSA/DSA] PRIVATE KEY-----". 如果私钥文件是 ppk 格式文件,请使用 Putty 工具从 .ppk 转换为 OpenSSH 格式。If the private key file is a ppk-format file, please use Putty tool to convert from .ppk to OpenSSH format.

示例 1:使用私钥文件路径进行 SshPublicKey 身份验证Example 1: SshPublicKey authentication using private key filePath

{
    "name": "SftpLinkedService",
    "type": "Linkedservices",
    "properties": {
        "type": "Sftp",
        "typeProperties": {
            "host": "<sftp server>",
            "port": 22,
            "skipHostKeyValidation": true,
            "authenticationType": "SshPublicKey",
            "userName": "xxx",
            "privateKeyPath": "D:\\privatekey_openssh",
            "passPhrase": {
                "type": "SecureString",
                "value": "<pass phrase>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

示例 2:使用私钥内容进行 SshPublicKey 身份验证Example 2: SshPublicKey authentication using private key content

{
    "name": "SftpLinkedService",
    "type": "Linkedservices",
    "properties": {
        "type": "Sftp",
        "typeProperties": {
            "host": "<sftp server>",
            "port": 22,
            "skipHostKeyValidation": true,
            "authenticationType": "SshPublicKey",
            "userName": "<username>",
            "privateKeyContent": {
                "type": "SecureString",
                "value": "<base64 string of the private key content>"
            },
            "passPhrase": {
                "type": "SecureString",
                "value": "<pass phrase>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article.

Parquet、带分隔符文本、JSON、Avro 和二进制格式数据集Parquet, delimited text, JSON, Avro and binary format dataset

若要在 Parquet、带分隔符文本、JSON、Avro 和二进制格式之间来回复制数据,请参阅 Parquet 格式带分隔符文本格式Avro 格式二进制格式文章,了解基于格式的数据集和支持的设置。To copy data to and from Parquet, delimited text, JSON, Avro and binary format, refer to Parquet format, Delimited text format, Avro format and Binary format article on format-based dataset and supported settings. 基于格式的数据集中 location 设置下的 SFTP 支持以下属性:The following properties are supported for SFTP under location settings in format-based dataset:

属性Property 说明Description 必选Required
typetype 数据集中 location 下的 type 属性必须设置为 SftpLocationThe type property under location in dataset must be set to SftpLocation. Yes
folderPathfolderPath 文件夹的路径。The path to folder. 如果要使用通配符筛选文件夹,请跳过此设置并在活动源设置中指定。If you want to use wildcard to filter folder, skip this setting and specify in activity source settings. No
fileNamefileName 给定 folderPath 下的文件名。The file name under the given folderPath. 如果要使用通配符筛选文件,请跳过此设置并在活动源设置中指定。If you want to use wildcard to filter files, skip this setting and specify in activity source settings. No

Note

Copy/Lookup/GetMetadata 活动仍然按原样支持下一部分中提到的带有 Parquet/Text 格式的 FileShare 类型数据集,以实现向后兼容。FileShare type dataset with Parquet/Text format mentioned in next section is still supported as-is for Copy/Lookup/GetMetadata activity for backward compatibility. 建议你继续使用此新模型,并且 ADF 创作 UI 已切换为生成这些新类型。You are suggested to use this new model going forward, and the ADF authoring UI has switched to generating these new types.

示例:Example:

{
    "name": "DelimitedTextDataset",
    "properties": {
        "type": "DelimitedText",
        "linkedServiceName": {
            "referenceName": "<SFTP linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, auto retrieved during authoring > ],
        "typeProperties": {
            "location": {
                "type": "SftpLocation",
                "folderPath": "root/folder/subfolder"
            },
            "columnDelimiter": ",",
            "quoteChar": "\"",
            "firstRowAsHeader": true,
            "compressionCodec": "gzip"
        }
    }
}

其他格式数据集Other format dataset

若要以 ORC 格式通过 SFTP 复制数据,需要支持以下属性:To copy data from SFTP in ORC format, the following properties are supported:

属性Property 说明Description 必选Required
typetype 数据集的 type 属性必须设置为:FileShareThe type property of the dataset must be set to: FileShare Yes
folderPathfolderPath 文件夹路径。Path to the folder. 支持通配符筛选器,允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件名中包含通配符或此转义字符,请使用 ^ 进行转义。Wildcard filter is supported, allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual file name has wildcard or this escape char inside.

示例:“rootfolder/subfolder/”,请参阅文件夹和文件筛选器示例中的更多示例。Examples: rootfolder/subfolder/, see more examples in Folder and file filter examples.
Yes
fileNamefileName 指定“folderPath”下的文件的“名称或通配符筛选器” 。Name or wildcard filter for the file(s) under the specified "folderPath". 如果没有为此属性指定任何值,则数据集会指向文件夹中的所有文件。If you don't specify a value for this property, the dataset points to all files in the folder.

对于筛选器,允许的通配符为:*(匹配零个或更多字符)和 ?(匹配零个或单个字符)。For filter, allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character).
- 示例 1:"fileName": "*.csv"- Example 1: "fileName": "*.csv"
- 示例 2:"fileName": "???20180427.txt"- Example 2: "fileName": "???20180427.txt"
如果实际文件夹名内具有通配符或此转义符,请使用 ^ 进行转义。Use ^ to escape if your actual folder name has wildcard or this escape char inside.
No
modifiedDatetimeStartmodifiedDatetimeStart 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified. 如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time are within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".

请注意,当你要从大量文件中进行文件筛选时,启用此设置将影响数据移动的整体性能。Be aware the overall performance of data movement will be impacted by enabling this setting when you want to do file filter from huge amounts of files.

属性可以为 NULL,这意味着不向数据集应用任何文件属性筛选器。The properties can be NULL that mean no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
No
modifiedDatetimeEndmodifiedDatetimeEnd 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified. 如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time are within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".

请注意,当你要从大量文件中进行文件筛选时,启用此设置将影响数据移动的整体性能。Be aware the overall performance of data movement will be impacted by enabling this setting when you want to do file filter from huge amounts of files.

属性可以为 NULL,这意味着不向数据集应用任何文件属性筛选器。The properties can be NULL that mean no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
No
格式format 如果想要在基于文件的存储之间按原样复制文件 (二进制副本),可以在输入和输出数据集定义中跳过格式节。If you want to copy files as-is between file-based stores (binary copy), skip the format section in both input and output dataset definitions.

若要分析具有特定格式的文件,以下是受支持的文件格式类型:TextFormat、JsonFormat、AvroFormat、OrcFormat、ParquetFormat 。If you want to parse files with a specific format, the following file format types are supported: TextFormat, JsonFormat, AvroFormat, OrcFormat, ParquetFormat. 请将格式中的“type”属性设置为上述值之一 。Set the type property under format to one of these values. 有关详细信息,请参阅文本格式Json 格式Avro 格式Orc 格式Parquet 格式部分。For more information, see Text Format, Json Format, Avro Format, Orc Format, and Parquet Format sections.
否(仅适用于二进制复制方案)No (only for binary copy scenario)
compressioncompression 指定数据的压缩类型和级别。Specify the type and level of compression for the data. 有关详细信息,请参阅受支持的文件格式和压缩编解码器For more information, see Supported file formats and compression codecs.
支持的类型包括:GZipDeflateBZip2ZipDeflateSupported types are: GZip, Deflate, BZip2, and ZipDeflate.
支持的级别为:“最佳”和“最快” 。Supported levels are: Optimal and Fastest.
No

Tip

如需复制文件夹下的所有文件,请仅指定 folderPathTo copy all files under a folder, specify folderPath only.
如需复制具有给定名称的单个文件,请指定文件夹部分的 folderPath 和文件名部分的 fileNameTo copy a single file with a given name, specify folderPath with folder part and fileName with file name.
如需复制文件夹下的文件子集,请指定文件夹部分的 folderPath 和通配符筛选器部分的 fileNameTo copy a subset of files under a folder, specify folderPath with folder part and fileName with wildcard filter.

Note

如果文件筛选器使用“fileFilter”属性,则在建议你今后使用添加到“fileName”的新筛选器功能时,仍按原样支持该属性。If you were using "fileFilter" property for file filter, it is still supported as-is, while you are suggested to use the new filter capability added to "fileName" going forward.

示例:Example:

{
    "name": "SFTPDataset",
    "type": "Datasets",
    "properties": {
        "type": "FileShare",
        "linkedServiceName":{
            "referenceName": "<SFTP linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "folderPath": "folder/subfolder/",
            "fileName": "*",
            "modifiedDatetimeStart": "2018-12-01T05:00:00Z",
            "modifiedDatetimeEnd": "2018-12-01T06:00:00Z",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "rowDelimiter": "\n"
            },
            "compression": {
                "type": "GZip",
                "level": "Optimal"
            }
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 SFTP 源支持的属性列表。This section provides a list of properties supported by SFTP source.

以 SFTP 作为源SFTP as source

Parquet、带分隔符文本、JSON、Avro 和二进制格式源Parquet, delimited text, JSON, Avro and binary format source

若要从 Parquet、带分隔符文本、JSON、Avro 和二进制格式复制数据,请参阅 Parquet 格式带分隔符文本格式Avro 格式二进制格式文章,了解基于格式的复制活动源和支持的设置。To copy data from Parquet, delimited text, JSON, Avro and binary format, refer to Parquet format, Delimited text format, Avro format and Binary format article on format-based copy activity source and supported settings. 基于格式的复制源中 storeSettings 设置下的 SFTP 支持以下属性:The following properties are supported for SFTP under storeSettings settings in format-based copy source:

属性Property 说明Description 必选Required
typetype storeSettings 下的 type 属性必须设置为 SftpReadSettingThe type property under storeSettings must be set to SftpReadSetting. Yes
recursiverecursive 指示是要从子文件夹中以递归方式读取数据,还是只从指定的文件夹中读取数据。Indicates whether the data is read recursively from the subfolders or only from the specified folder. 请注意,当 recursive 设置为 true 且接收器是基于文件的存储时,将不会在接收器上复制或创建空的文件夹或子文件夹。Note that when recursive is set to true and the sink is a file-based store, an empty folder or subfolder isn't copied or created at the sink. 允许的值为 true(默认值)和 falseAllowed values are true (default) and false. No
wildcardFolderPathwildcardFolderPath 带有通配符的文件夹路径,用于筛选源文件夹。The folder path with wildcard characters to filter source folders.
允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件夹名称中包含通配符或此转义字符,请使用 ^ 进行转义。Allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual folder name has wildcard or this escape char inside.
请参阅文件夹和文件筛选器示例中的更多示例。See more examples in Folder and file filter examples.
No
wildcardFileNamewildcardFileName 给定的 folderPath/wildcardFolderPath 下带有通配符的文件名,用于筛选源文件。The file name with wildcard characters under the given folderPath/wildcardFolderPath to filter source files.
允许的通配符为:*(匹配零个或更多个字符)和 ?(匹配零个或单个字符);如果实际文件夹名称中包含通配符或此转义字符,请使用 ^ 进行转义。Allowed wildcards are: * (matches zero or more characters) and ? (matches zero or single character); use ^ to escape if your actual folder name has wildcard or this escape char inside. 请参阅文件夹和文件筛选器示例中的更多示例。See more examples in Folder and file filter examples.
如果数据集中未指定 fileName,则为“是”Yes if fileName is not specified in dataset
modifiedDatetimeStartmodifiedDatetimeStart 基于属性“上次修改时间”的文件筛选器。Files filter based on the attribute: Last Modified. 如果文件的上次修改时间在 modifiedDatetimeStartmodifiedDatetimeEnd 之间的时间范围内,则将选中这些文件。The files will be selected if their last modified time are within the time range between modifiedDatetimeStart and modifiedDatetimeEnd. 该时间应用于 UTC 时区,格式为“2018-12-01T05:00:00Z”。The time is applied to UTC time zone in the format of "2018-12-01T05:00:00Z".
属性可以为 NULL,这意味着不向数据集应用任何文件特性筛选器。The properties can be NULL which mean no file attribute filter will be applied to the dataset. 如果 modifiedDatetimeStart 具有日期/时间值,但 modifiedDatetimeEnd 为 NULL,则意味着将选中“上次修改时间”属性大于或等于该日期/时间值的文件。When modifiedDatetimeStart has datetime value but modifiedDatetimeEnd is NULL, it means the files whose last modified attribute is greater than or equal with the datetime value will be selected. 如果 modifiedDatetimeEnd 具有日期/时间值,但 modifiedDatetimeStart 为 NULL,则意味着将选中“上次修改时间”属性小于该日期/时间值的文件。When modifiedDatetimeEnd has datetime value but modifiedDatetimeStart is NULL, it means the files whose last modified attribute is less than the datetime value will be selected.
No
modifiedDatetimeEndmodifiedDatetimeEnd 同上。Same as above. No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到存储库的连接数。The number of the connections to connect to storage store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

Note

对于 Parquet/带分隔符的文本格式,仍然按原样支持下一部分中提到的 FileSystemSource 类型复制活动源,以实现向后兼容性。For Parquet/delimited text format, FileSystemSource type copy activity source mentioned in next section is still supported as-is for backward compatibility. 建议你继续使用此新模型,并且 ADF 创作 UI 已切换为生成这些新类型。You are suggested to use this new model going forward, and the ADF authoring UI has switched to generating these new types.

示例:Example:

"activities":[
    {
        "name": "CopyFromSFTP",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Delimited text input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "DelimitedTextSource",
                "formatSettings":{
                    "type": "DelimitedTextReadSetting",
                    "skipLineCount": 10
                },
                "storeSettings":{
                    "type": "SftpReadSetting",
                    "recursive": true,
                    "wildcardFolderPath": "myfolder*A",
                    "wildcardFileName": "*.csv"
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

其他格式源Other format source

若要以 ORC 格式通过 SFTP 复制数据,需要复制活动部分支持以下属性:To copy data from SFTP in ORC format, the following properties are supported in the copy activity source section:

属性Property 说明Description 必选Required
typetype 复制活动源的 type 属性必须设置为:FileSystemSource The type property of the copy activity source must be set to: FileSystemSource Yes
recursiverecursive 指示是要从子文件夹中以递归方式读取数据,还是只从指定的文件夹中读取数据。Indicates whether the data is read recursively from the sub folders or only from the specified folder. 当 recursive 设置为 true 且接收器是基于文件的存储时,将不会在接收器上复制/创建空的文件夹/子文件夹。Note when recursive is set to true and sink is file-based store, empty folder/sub-folder will not be copied/created at sink.
允许的值为:true(默认)、false Allowed values are: true (default), false
No
maxConcurrentConnectionsmaxConcurrentConnections 可以同时连接到存储库的连接数。The number of the connections to connect to storage store concurrently. 仅在要限制与数据存储的并发连接时指定。Specify only when you want to limit the concurrent connection to the data store. No

示例:Example:

"activities":[
    {
        "name": "CopyFromSFTP",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SFTP input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "FileSystemSource",
                "recursive": true
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

文件夹和文件筛选器示例Folder and file filter examples

本部分介绍使用通配符筛选器生成文件夹路径和文件名的行为。This section describes the resulting behavior of the folder path and file name with wildcard filters.

folderPathfolderPath fileNamefileName recursiverecursive 源文件夹结构和筛选器结果(用粗体表示的文件已检索)Source folder structure and filter result (files in bold are retrieved)
Folder* (为空,使用默认值)(empty, use default) falsefalse FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* (为空,使用默认值)(empty, use default) true FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* *.csv falsefalse FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv
Folder* *.csv true FolderAFolderA
    File1.csv    File1.csv
    File2.json    File2.json
    Subfolder1    Subfolder1
        File3.csv        File3.csv
        File4.json        File4.json
        File5.csv        File5.csv
AnotherFolderBAnotherFolderB
    File6.csv    File6.csv

Lookup 活动属性Lookup activity properties

若要了解有关属性的详细信息,请查看 Lookup 活动To learn details about the properties, check Lookup activity.

GetMetadata 活动属性GetMetadata activity properties

若要了解有关属性的详细信息,请查看 GetMetadata 活动To learn details about the properties, check GetMetadata activity

Delete 活动属性Delete activity properties

若要了解有关属性的详细信息,请查看 Delete 活动To learn details about the properties, check Delete activity

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.