Azure 数据资源管理器支持的用于引入的数据格式Data formats supported by Azure Data Explorer for ingestion

数据引入是指在 Azure 数据资源管理器中将数据添加到表中供查询这一过程。Data ingestion is the process by which data is added to a table and is made available for query in Azure Data Explorer. 对于 ingest-from-query 以外的所有引入方法,数据必须采用下述受支持格式中的一种。For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. 下表列出并说明了 Azure 数据资源管理器支持的数据引入格式。The following table lists and describes the formats that Azure Data Explorer supports for data ingestion.

格式Format 分机Extension 说明Description
AvroAvro .avro 一个 Avro 容器文件An Avro container file. 以下代码受支持:nulldeflate(当前不支持 snappy)。The following codes are supported: null, deflate (snappy is currently not supported).
ApacheAvroApacheAvro .avro Avro 格式的实验性本机实现,支持逻辑类型snappy 压缩编解码器。Experimental native implementation for Avro format with support for logical types and for snappy compression codec.
CSVCSV .csv 一个采用逗号分隔值 (,) 的文本文件。A text file with comma-separated values (,). 请参阅 RFC 4180:Common Format and MIME Type for Comma-Separated Values (CSV) Files(RFC 4180:逗号分隔值 (CSV) 文件的常见格式和 MIME 类型)。See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files.
JSONJSON .json 一个文本文件,其中包含使用 \n\r\n 分隔的 JSON 对象。A text file with JSON objects delimited by \n or \r\n. 请参阅 JSON Lines (JSONL)See JSON Lines (JSONL).
MultiJSONMultiJSON .multijson 一个文本文件,包含一个由属性包(每个包代表一条记录)组成的 JSON 数组,或者包含通过空格、\n\r\n 分隔的任意数目的属性包。A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n or \r\n. 每个属性包可以分布在多个行上。Each property bag can be spread on multiple lines. 此格式优先于 JSON,除非数据为非属性包。This format is preferred over JSON, unless the data is non-property bags.
ORCORC .orc Orc 文件。An Orc file.
ParquetParquet .parquet Parquet 文件。A Parquet file.
PSVPSV .psv 一个采用竖线分隔值 (|) 的文本文件。A text file with pipe-separated values (|).
RAWRAW .raw 一个文本文件,其整个内容就是一个字符串值。A text file whose entire contents is a single string value.
SCsvSCsv .scsv 一个采用分号分隔值 (;) 的文本文件。A text file with semicolon-separated values (;).
SOHsvSOHsv .sohsv 一个采用 SOH 分隔值的文本文件。A text file with SOH-separated values. (SOH 为 ASCII 代码点 1;此格式由 Hive on HDInsight 使用。)(SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)
TSVTSV .tsv 一个采用制表符分隔值 (\t) 的文本文件。A text file with tab-separated values (\t).
TSVETSVE .tsv 一个采用制表符分隔值 (\t) 的文本文件。A text file with tab-separated values (\t). 反斜杠字符 (\) 用于转义。A backslash character (\) is used for escaping.
TXTTXT .txt 一个文本文件,使用 \n 分隔的行。A text file with lines delimited by \n. 空行将被跳过。Empty lines are skipped.
W3CLOGFILEW3CLOGFILE .log 由 W3C 标准化的 Web 日志文件格式。Web log file format standardised by the W3C.

支持的数据压缩格式Supported data compression formats

可以使用下面的任何压缩算法来压缩 Blob 和文件:Blobs and files can be compressed through any of the following compression algorithms:

压缩Compression 分机Extension
GZipGZip .gz.gz
ZipZip .zip.zip

通过将扩展名追加到 Blob 或文件的名称上来指示压缩。Indicate compression by appending the extension to the name of the blob or file.

例如:For example:

  • MyData.csv.zip 指示格式化为 CSV 且使用 ZIP 进行压缩的 blob 或文件(存档或单个文件)MyData.csv.zip indicates a blob or a file formatted as CSV, compressed with ZIP (archive or a single file)
  • MyData.json.gz 指示格式化为 JSON 且使用 GZip 进行压缩的 blob 或文件。MyData.json.gz indicates a blob or a file formatted as JSON, compressed with GZip.

不包含格式扩展名而只包含压缩名的 Blob 名或文件名(例如 MyData.zip)也受支持。Blob or file names that don't include the format extensions but just compression (for example, MyData.zip) is also supported. 在这种情况下,必须将文件格式指定为引入属性,因为不能对它进行推断。In this case, the file format must be specified as an ingestion property because it cannot be inferred.

备注

某些压缩格式会在压缩的流中记录原始文件扩展名。Some compression formats keep track of the original file extension as part of the compressed stream. 在确定文件格式时,通常会忽略此扩展名。This extension is generally ignored for determining the file format. 如果不能根据(压缩的)Blob 或文件的名称确定文件格式,则必须通过 format 引入属性指定它。If the file format can't be determined from the (compressed) blob or file name, it must be specified through the format ingestion property.

后续步骤Next steps