Azure 数据工厂中的二进制格式Binary format in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

以下连接器支持二进制格式:Amazon S3Azure BlobAzure Data Lake Storage Gen2Azure 文件存储文件系统FTPGoogle 云存储HDFSHTTPSFTPBinary format is supported for the following connectors: Amazon S3, Azure Blob, Azure Data Lake Storage Gen2, Azure File Storage, File System, FTP, Google Cloud Storage, HDFS, HTTP, and SFTP.

可以在 Copy 活动GetMetadata 活动Delete 活动中使用二进制数据集。You can use Binary dataset in Copy activity, GetMetadata activity, or Delete activity. 使用二进制数据集时,ADF 不会分析文件内容,而是将其按原样处理。When using Binary dataset, ADF does not parse file content but treat it as-is.

备注

在复制活动中使用二进制数据集时,只能从二进制数据集复制到二进制数据集。When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset.

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article. 本部分提供二进制数据集支持的属性列表。This section provides a list of properties supported by the Binary dataset.

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为 BinaryThe type property of the dataset must be set to Binary . Yes
locationlocation 文件的位置设置。Location settings of the file(s). 每个基于文件的连接器在 location 下都有其自己的位置类型和支持的属性。Each file-based connector has its own location type and supported properties under location. 请在连接器文章 -> 数据集属性部分中查看详细信息See details in connector article -> Dataset properties section . Yes
compressioncompression 用来配置文件压缩的属性组。Group of properties to configure file compression. 如果需要在活动执行期间进行压缩/解压缩,请配置此部分。Configure this section when you want to do compression/decompression during activity execution. No
typetype 用于读取/写入二进制文件的压缩编解码器。The compression codec used to read/write binary files.
允许的值为 bzip2、gzip、deflate、ZipDeflate 或 TarGzip 。Allowed values are bzip2 , gzip , deflate , ZipDeflate , or TarGzip .
注意,使用复制活动解压缩 ZipDeflate/TarGzip 文件并将其写入基于文件的接收器数据存储时,默认情况下文件将提取到 <path specified in dataset>/<folder named as source compressed file>/ 文件夹,对复制活动源使用 preserveZipFileNameAsFolder/preserveCompressionFileNameAsFolder 来控制是否以文件夹结构形式保留压缩文件名 。Note when using copy activity to decompress ZipDeflate/TarGzip file(s) and write to file-based sink data store, by default files are extracted to the folder:<path specified in dataset>/<folder named as source compressed file>/, use preserveZipFileNameAsFolder/preserveCompressionFileNameAsFolder on copy activity source to control whether to preserve the name of the compressed file(s) as folder structure.
No
levellevel 压缩率。The compression ratio. 在 Copy 活动接收器中使用数据集时应用。Apply when dataset is used in Copy activity sink.
允许的值为 OptimalFastestAllowed values are Optimal or Fastest .
- Fastest :尽快完成压缩操作,不过,无法以最佳方式压缩生成的文件。- Fastest: The compression operation should complete as quickly as possible, even if the resulting file is not optimally compressed.
- Optimal :以最佳方式完成压缩操作,不过,需要耗费更长的时间。- Optimal : The compression operation should be optimally compressed, even if the operation takes a longer time to complete. 有关详细信息,请参阅 Compression Level(压缩级别)主题。For more information, see Compression Level topic.
No

下面是 Azure Blob 存储上的二进制数据集的示例:Below is an example of Binary dataset on Azure Blob Storage:

{
    "name": "BinaryDataset",
    "properties": {
        "type": "Binary",
        "linkedServiceName": {
            "referenceName": "<Azure Blob Storage linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "containername",
                "folderPath": "folder/subfolder",
            },
            "compression": {
                "type": "ZipDeflate"
            }
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供二进制源和接收器支持的属性列表。This section provides a list of properties supported by the Binary source and sink.

备注

在复制活动中使用二进制数据集时,只能从二进制数据集复制到二进制数据集。When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset.

二进制文件作为源Binary as source

复制活动的 *source* 节支持以下属性。The following properties are supported in the copy activity *source* section.

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为 BinarySourceThe type property of the copy activity source must be set to BinarySource . Yes
formatSettingsformatSettings 一组属性。A group of properties. 请参阅下面的“二进制读取设置”表。Refer to Binary read settings table below. No
storeSettingsstoreSettings 有关如何从数据存储读取数据的一组属性。A group of properties on how to read data from a data store. 每个基于文件的连接器在 storeSettings 下都有其自己支持的读取设置。Each file-based connector has its own supported read settings under storeSettings. 请在连接器文章 -> 复制活动属性部分中查看详细信息See details in connector article -> Copy activity properties section . No

formatSettings 下支持的“二进制读取设置”:Supported binary read settings under formatSettings:

属性Property 说明Description 必须Required
typetype formatSettings 的 type 必须设置为“BinaryReadSettings”。The type of formatSettings must be set to BinaryReadSettings . Yes
compressionPropertiescompressionProperties 一组属性,指示如何为给定的压缩编解码器解压缩数据。A group of properties on how to decompress data for a given compression codec. No
preserveZipFileNameAsFolderpreserveZipFileNameAsFolder
(在 compressionProperties->type 下为 ZipDeflateReadSettings( under compressionProperties->type as ZipDeflateReadSettings )
当输入数据集配置了 ZipDeflate 压缩时适用。Applies when input dataset is configured with ZipDeflate compression. 指示是否在复制过程中以文件夹结构形式保留源 zip 文件名。Indicates whether to preserve the source zip file name as folder structure during copy.
- 当设置为 true(默认值)时,数据工厂会将已解压缩的文件写入 <path specified in dataset>/<folder named as source zip file>/- When set to true (default) , Data Factory writes unzipped files to <path specified in dataset>/<folder named as source zip file>/.
- 当设置为 false 时,数据工厂会直接将未解压缩的文件写入 <path specified in dataset>- When set to false , Data Factory writes unzipped files directly to <path specified in dataset>. 请确保不同的源 zip 文件中没有重复的文件名,以避免产生冲突或出现意外行为。Make sure you don't have duplicated file names in different source zip files to avoid racing or unexpected behavior.
No
preserveCompressionFileNameAsFolderpreserveCompressionFileNameAsFolder
(在 compressionProperties->type 下为 TarGZipReadSettings( under compressionProperties->type as TarGZipReadSettings )
当输入数据集配置了 TarGzip 压缩时适用。Applies when input dataset is configured with TarGzip compression. 指示是否在复制过程中以文件夹结构形式保留源压缩文件名。Indicates whether to preserve the source compressed file name as folder structure during copy.
- 当设置为 true(默认值)时,数据工厂会将已解压缩的文件写入 <path specified in dataset>/<folder named as source compressed file>/- When set to true (default) , Data Factory writes decompressed files to <path specified in dataset>/<folder named as source compressed file>/.
- 当设置为 false 时,数据工厂会直接将已解压缩的文件写入 <path specified in dataset>- When set to false , Data Factory writes decompressed files directly to <path specified in dataset>. 请确保不同的源文件中没有重复的文件名,以避免产生冲突或出现意外行为。Make sure you don't have duplicated file names in different source files to avoid racing or unexpected behavior.
No
"activities": [
    {
        "name": "CopyFromBinary",
        "type": "Copy",
        "typeProperties": {
            "source": {
                "type": "BinarySource",
                "storeSettings": {
                    "type": "AzureBlobStorageReadSettings",
                    "recursive": true,
                    "deleteFilesAfterCompletion": true
                },
                "formatSettings": {
                    "type": "BinaryReadSettings",
                    "compressionProperties": {
                        "type": "ZipDeflateReadSettings",
                        "preserveZipFileNameAsFolder": false
                    }
                }
            },
            ...
        }
        ...
    }
]

二进制文件作为接收器Binary as sink

复制活动的 *sink* 节支持以下属性。The following properties are supported in the copy activity *sink* section.

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为 BinarySinkThe type property of the copy activity source must be set to BinarySink . Yes
storeSettingsstoreSettings 有关如何将数据写入到数据存储的一组属性。A group of properties on how to write data to a data store. 每个基于文件的连接器在 storeSettings 下都有其自身支持的写入设置。Each file-based connector has its own supported write settings under storeSettings. 请在连接器文章 -> 复制活动属性部分中查看详细信息See details in connector article -> Copy activity properties section . No

后续步骤Next steps