Azure 数据工厂中的 Excel 格式Excel format in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

如果要 分析 Excel 文件 ,请按此文的要求操作。Follow this article when you want to parse the Excel files . Azure 数据工厂同时支持“.xls”和“.xlsx”。Azure Data Factory supports both ".xls" and ".xlsx".

以下连接器支持 Excel 格式:Amazon S3Azure BlobAzure Data Lake Storage Gen2Azure 文件存储文件系统FTPGoogle 云存储HDFSHTTPSFTPExcel format is supported for the following connectors: Amazon S3, Azure Blob, Azure Data Lake Storage Gen2, Azure File Storage, File System, FTP, Google Cloud Storage, HDFS, HTTP, and SFTP. 它可以作为源,但不可作为接收器。It is supported as source but not sink.

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the Datasets article. 本部分提供 Excel 数据集支持的属性列表。This section provides a list of properties supported by the Excel dataset.

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为“Excel”。The type property of the dataset must be set to Excel . Yes
locationlocation 文件的位置设置。Location settings of the file(s). 每个基于文件的连接器在 location 下都有其自己的位置类型和支持的属性。Each file-based connector has its own location type and supported properties under location. Yes
sheetNamesheetName 要读取数据的 Excel 工作表名称。The Excel worksheet name to read data. Yes
rangerange 给定工作表中用于定位选择性数据的单元格范围,例如:The cell range in the given worksheet to locate the selective data, e.g.:
- 未指定:以表的形式从第一个非空行和列读取整个工作表- Not specified: reads the whole worksheet as a table from the first non-empty row and column
- A3:读取从给定单元格开始的表,动态检测下面的所有行和右侧的所有列- A3: reads a table starting from the given cell, dynamically detects all the rows below and all the columns to the right
- A3:H5:以表的形式读取此固定范围- A3:H5: reads this fixed range as a table
- A3:A3:读取此单个单元格- A3:A3: reads this single cell
No
firstRowAsHeaderfirstRowAsHeader 指定是否要将给定工作表/范围内的第一行视为带有列名的标题行。Specifies whether to treat the first row in the given worksheet/range as a header line with names of columns.
允许的值为 truetrue (默认值)。Allowed values are true and false (default).
No
nullValuenullValue 指定 null 值的字符串表示形式。Specifies the string representation of null value.
默认值为 空字符串The default value is empty string .
No
compressioncompression 用来配置文件压缩的属性组。Group of properties to configure file compression. 如果需要在活动执行期间进行压缩/解压缩,请配置此部分。Configure this section when you want to do compression/decompression during activity execution. No
typetype
(在 compression 下)( under compression )
用来读取/写入 JSON 文件的压缩编解码器。The compression codec used to read/write JSON files.
允许的值为 bzip2、gzip、deflate、ZipDeflate、TarGzip、snappy 或 lz4 。Allowed values are bzip2 , gzip , deflate , ZipDeflate , TarGzip , snappy , or lz4 . 默认设置是不压缩。Default is not compressed.
请注意,复制活动当前不支持“snappy”和“lz4”。Note currently Copy activity doesn't support "snappy" & "lz4".
注意 ,使用复制活动来解压缩 ZipDeflate 文件并写入到基于文件的接收器数据存储时,会将文件提取到以下文件夹:<path specified in dataset>/<folder named as source zip file>/Note when using copy activity to decompress ZipDeflate file(s) and write to file-based sink data store, files are extracted to the folder: <path specified in dataset>/<folder named as source zip file>/.
否。No.
levellevel
(在 compression 下)( under compression )
压缩率。The compression ratio.
允许的值为 OptimalFastestAllowed values are Optimal or Fastest .
- Fastest :尽快完成压缩操作,不过,无法以最佳方式压缩生成的文件。- Fastest: The compression operation should complete as quickly as possible, even if the resulting file is not optimally compressed.
- Optimal :以最佳方式完成压缩操作,不过,需要耗费更长的时间。- Optimal : The compression operation should be optimally compressed, even if the operation takes a longer time to complete. 有关详细信息,请参阅 Compression Level(压缩级别)主题。For more information, see Compression Level topic.
No

下面是 Azure Blob 存储上 Excel 数据集的示例:Below is an example of Excel dataset on Azure Blob Storage:

{
    "name": "ExcelDataset",
    "properties": {
        "type": "Excel",
        "linkedServiceName": {
            "referenceName": "<Azure Blob Storage linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, retrievable during authoring > ],
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "containername",
                "folderPath": "folder/subfolder",
            },
            "sheetName": "MyWorksheet",
            "range": "A3:H5",
            "firstRowAsHeader": true
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 Excel 源支持的属性列表。This section provides a list of properties supported by the Excel source.

Excel 作为源Excel as source

复制活动的 *source* 节支持以下属性。The following properties are supported in the copy activity *source* section.

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为“ExcelSource”。The type property of the copy activity source must be set to ExcelSource . Yes
storeSettingsstoreSettings 有关如何从数据存储读取数据的一组属性。A group of properties on how to read data from a data store. 每个基于文件的连接器在 storeSettings 下都有其自己支持的读取设置。Each file-based connector has its own supported read settings under storeSettings. No
"activities": [
    {
        "name": "CopyFromExcel",
        "type": "Copy",
        "typeProperties": {
            "source": {
                "type": "ExcelSource",
                "storeSettings": {
                    "type": "AzureBlobStorageReadSettings",
                    "recursive": true
                }
            },
            ...
        }
        ...
    }
]

后续步骤Next steps