Azure 数据工厂中的 Avro 格式Avro format in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

如果要 分析 Avro 文件或以 Avro 格式写入数据 ,请遵循本文中的说明。Follow this article when you want to parse the Avro files or write the data into Avro format .

以下连接器支持 Avro 格式:Amazon S3Azure BlobAzure Data Lake Storage Gen2Azure 文件存储文件系统FTPGoogle 云存储HDFSHTTPSFTPAvro format is supported for the following connectors: Amazon S3, Azure Blob, Azure Data Lake Storage Gen2, Azure File Storage, File System, FTP, Google Cloud Storage, HDFS, HTTP, and SFTP.

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article. 本部分提供 Avro 数据集支持的属性列表。This section provides a list of properties supported by the Avro dataset.

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为 AvroThe type property of the dataset must be set to Avro . Yes
locationlocation 文件的位置设置。Location settings of the file(s). 每个基于文件的连接器在 location 下都有其自己的位置类型和支持的属性。Each file-based connector has its own location type and supported properties under location. 请在连接器文章 -> 数据集属性部分中查看详细信息See details in connector article -> Dataset properties section . Yes
avroCompressionCodecavroCompressionCodec 写入到 Avro 文件时要使用的压缩编解码器。The compression codec to use when writing to Avro files. 当从 Avro 文件进行读取时,数据工厂会基于文件元数据自动确定压缩编解码器。When reading from Avro files, Data Factory automatically determines the compression codec based on the file metadata.
支持的类型为“none” (默认值)、“deflate” 、“snappy” 。Supported types are " none " (default), " deflate ", " snappy ". 请注意,当前复制活动在读取/写入 Avro 文件时不支持 Snappy。Note currently Copy activity doesn't support Snappy when read/write Avro files.
No

备注

Avro 文件不支持列名称中包含空格。White space in column name is not supported for Avro files.

下面是 Azure Blob 存储上的 Avro 数据集的示例:Below is an example of Avro dataset on Azure Blob Storage:

{
    "name": "AvroDataset",
    "properties": {
        "type": "Avro",
        "linkedServiceName": {
            "referenceName": "<Azure Blob Storage linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, retrievable during authoring > ],
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "containername",
                "folderPath": "folder/subfolder",
            },
            "avroCompressionCodec": "snappy"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 Avro 源和接收器支持的属性列表。This section provides a list of properties supported by the Avro source and sink.

Avro 作为源Avro as source

复制活动的 *source* 节支持以下属性。The following properties are supported in the copy activity *source* section.

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为 AvroSourceThe type property of the copy activity source must be set to AvroSource . Yes
storeSettingsstoreSettings 有关如何从数据存储读取数据的一组属性。A group of properties on how to read data from a data store. 每个基于文件的连接器在 storeSettings 下都有其自己支持的读取设置。Each file-based connector has its own supported read settings under storeSettings. 请在连接器文章 -> 复制活动属性部分中查看详细信息See details in connector article -> Copy activity properties section . No

Avro 作为接收器Avro as sink

复制活动的 *sink* 节支持以下属性。The following properties are supported in the copy activity *sink* section.

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为 AvroSinkThe type property of the copy activity source must be set to AvroSink . Yes
formatSettingsformatSettings 一组属性。A group of properties. 请参阅下面的“Avro 写入设置”表。Refer to Avro write settings table below. No
storeSettingsstoreSettings 有关如何将数据写入到数据存储的一组属性。A group of properties on how to write data to a data store. 每个基于文件的连接器在 storeSettings 下都有其自身支持的写入设置。Each file-based connector has its own supported write settings under storeSettings. 请在连接器文章 -> 复制活动属性部分中查看详细信息See details in connector article -> Copy activity properties section . No

formatSettings 下支持的“Avro 写入设置”:Supported Avro write settings under formatSettings:

属性Property 说明Description 必须Required
typetype formatSettings 的类型必须设置为 AvroWriteSettings。The type of formatSettings must be set to AvroWriteSettings . Yes
maxRowsPerFilemaxRowsPerFile 在将数据写入到文件夹时,可选择写入多个文件,并指定每个文件的最大行数。When writing data into a folder, you can choose to write to multiple files and specify the max rows per file. No
fileNamePrefixfileNamePrefix 配置 maxRowsPerFile 时适用。Applicable when maxRowsPerFile is configured.
在将数据写入多个文件时,指定文件名前缀,生成的模式为 <fileNamePrefix>_00000.<fileExtension>Specify the file name prefix when writing data to multiple files, resulted in this pattern: <fileNamePrefix>_00000.<fileExtension>. 如果未指定,将自动生成文件名前缀。If not specified, file name prefix will be auto generated. 如果源是基于文件的存储或已启用分区选项的数据存储,则此属性不适用。This property does not apply when source is file-based store or partition-option-enabled data store.
No

数据类型支持Data type support

复制活动Copy activity

复制活动不支持 Avro 复杂数据类型(记录、枚举、数组、映射、联合与固定值)。Avro complex data types are not supported (records, enums, arrays, maps, unions, and fixed) in Copy Activity.

后续步骤Next steps