将数据导出到存储Export data to storage

执行查询,将第一个结果集写入到通过存储连接字符串指定的外部存储。Executes a query and writes the first result set to an external storage, specified by a storage connection string.

语法Syntax

.export [async] [compressed] to OutputDataFormat ( StorageConnectionString [, ...] ) [with ( PropertyName = PropertyValue [, ...] )] <| Query.export [async] [compressed] to OutputDataFormat ( StorageConnectionString [, ...] ) [with ( PropertyName = PropertyValue [, ...] )] <| Query

参数Arguments

  • async:如果指定了此项,则指示此命令以异步模式运行。async: If specified, indicates that the command runs in asynchronous mode. 若要更详细地了解此模式下的行为,请参阅下文。See below for more details on the behavior in this mode.

  • compressed:如果指定了此项,则输出存储项目将压缩为 .gz 文件。compressed: If specified, the output storage artifacts are compressed as .gz files. 请参阅 compressionType,了解如何将 parquet 文件压缩为 snappy。See compressionType for compressing parquet files as snappy.

  • OutputDataFormat:指示由此命令写入的存储项目的数据格式。OutputDataFormat: Indicates the data format of the storage artifacts written by the command. 支持的值为 csvtsvjsonparquetSupported values are: csv, tsv, json, and parquet.

  • StorageConnectionString:指定一个或多个存储连接字符串,用于指示要将数据写入到其中的存储。StorageConnectionString: Specifies one or more storage connection strings that indicate which storage to write the data to. (为了实现可缩放写入,可以指定多个存储连接字符串。)每个这样的连接字符串都必须指明将数据写入存储时要使用的凭据。(More than one storage connection string may be specified for scalable writes.) Each such connection string must indicate the credentials to use when writing to storage. 例如,在写入到 Azure Blob 存储时,凭据可以是存储帐户密钥,也可以是具有读取、写入和列出 blob 权限的共享访问密钥 (SAS)。For example, when writing to Azure Blob Storage, the credentials can be the storage account key, or a shared access key (SAS) with the permissions to read, write, and list blobs.

备注

强烈建议将数据导出到与 Kusto 群集本身位于同一区域中的存储。It is highly recommended to export data to storage that is co-located in the same region as the Kusto cluster itself. 其中包括已导出的数据,这样,这些数据便可以传输到其他区域中的其他云服务。This includes data that is exported so it can be transferred to another cloud service in other regions. 写入应在本地执行,而读取则可以远程进行。Writes should be done locally, while reads can happen remotely.

  • PropertyName/PropertyValue:零个或零个以上的可选导出属性:PropertyName/PropertyValue: Zero or more optional export properties:
属性Property 类型Type 说明Description
sizeLimit long 在压缩之前要写入的单个存储项目的大小限制(字节)。The size limit in bytes of a single storage artifact being written (prior to compression). 允许的范围为 100MB(默认值)到 1GB。Allowed range is 100MB (default) to 1GB.
includeHeaders string 对于 csv/tsv 输出,此属性控制列标题的生成。For csv/tsv output, controls the generation of column headers. 可以是 none(默认值;不发出标题行)、all(将标题行发出到每个存储项目中)或 firstFile(仅将标题行发出到第一个存储项目中)之一。Can be one of none (default; no header lines emitted), all (emit a header line into every storage artifact), or firstFile (emit a header line into the first storage artifact only).
fileExtension string 表示存储项目的“扩展名”部分(例如 .csv.tsv)。Indicates the "extension" part of the storage artifact (for example, .csv or .tsv). 如果使用了压缩,还将追加 .gzIf compression is used, .gz will be appended as well.
namePrefix string 表示要添加到每个生成的存储项目名称前面的前缀。Indicates a prefix to add to each generated storage artifact name. 如果未指定此项,将使用随机前缀。A random prefix will be used if left unspecified.
encoding string 表示文本编码方式:UTF8NoBOM(默认值)或 UTF8BOMIndicates how to encode the text: UTF8NoBOM (default) or UTF8BOM.
compressionType string 表示要使用的压缩类型。Indicates the type of compression to use. 可能的值为 gzipsnappyPossible values are gzip or snappy. 默认值为 gzipDefault is gzip. snappy 可以(选择性地)用于 parquet 格式。snappy can (optionally) be used for parquet format.
distribution string 分布提示(singleper_nodeper_shard)。Distribution hint (single, per_node, per_shard). 如果值等于 single,则由单个线程将数据写入存储。If value equals single, a single thread will write to storage. 否则,导出会以并行方式从执行查询的所有节点进行写入。Otherwise, export will write from all nodes executing the query in parallel. 请参阅 evaluate 插件运算符See evaluate plugin operator. 默认为 per_shardDefaults to per_shard.
distributed bool 禁用/启用分布式导出。Disable/enable distributed export. 设置为 false 等效于 single 分布提示。Setting to false is equivalent to single distribution hint. 默认值为 true。Default is true.
persistDetails bool 指示此命令应保留其结果(请参阅 async 标志)。Indicates that the command should persist its results (see async flag). 在异步运行中默认为 true,但如果调用方不需要结果,则可将其关闭。Defaults to true in async runs, but can be turned off if the caller does not require the results). 在同步执行中默认为 false,但在这类执行中也可以开启。Defaults to false in synchronous executions, but can be turned on in those as well.
parquetRowGroupSize int 只有数据格式为 parquet 时才相关。Relevant only when data format is parquet. 控制已导出文件中的行组大小。Controls the row group size in the exported files. 默认的行组大小为 100000 条记录。Default row group size is 100000 records.

结果Results

这些命令会返回一个表,该表描述了生成的存储项目。The commands returns a table that describes the generated storage artifacts. 每条记录描述一个项目,并包括项目的存储路径和该项目包含的数据记录数。Each record describes a single artifact and includes the storage path to the artifact and how many data records it holds.

PathPath NumRecordsNumRecords
http://storage1.blob.core.chinacloudapi.cn/containerName/export_1_d08afcae2f044c1092b279412dcb571b.csv 10 个10
http://storage1.blob.core.chinacloudapi.cn/containerName/export_2_454c0f1359e24795b6529da8a0101330.csv 1515

异步模式Asynchronous mode

如果指定了 async 标志,则命令将以异步模式执行。If the async flag is specified, the command executes in asynchronous mode. 在此模式下,命令会立即返回一个操作 ID,而数据导出则在后台继续进行,直到完成。In this mode, the command returns immediately with an operation ID, and data export continues in the background until completion. 可以使用此命令返回的操作 ID 通过以下命令跟踪其进度并最终跟踪其结果:The operation ID returned by the command can be used to track its progress and ultimately its results via the following commands:

例如,在成功完成后,可使用以下命令来检索结果:For example, after a successful completion, you can retrieve the results using:

.show operation f008dc1e-2710-47d8-8d34-0d562f5f8615 details

示例Examples

在此示例中,Kusto 运行查询,然后将查询生成的第一个记录集导出到一个或多个压缩的 CSV blob。In this example, Kusto runs the query and then exports the first recordset produced by the query to one or more compressed CSV blobs. 列名标签添加为每个 blob 的第一行。Column name labels are added as the first row for each blob.

.export
  async compressed
  to csv (
    h@"https://storage1.blob.core.chinacloudapi.cn/containerName;secretKey",
    h@"https://storage1.blob.core.chinacloudapi.cn/containerName2;secretKey"
  ) with (
    sizeLimit=100000,
    namePrefix=export,
    includeHeaders=all,
    encoding =UTF8NoBOM
  )
  <| myLogs | where id == "moshe" | limit 10000

已知问题Known issues

执行 export 命令期间发生故障Failures during export command

  • export 命令在执行期间可能会暂时失败。The export command can transiently fail during execution. 如果 export 命令失败,不会删除已写入到存储的项目。When the export command fails, artifacts that were already written to storage are not deleted. 这些项目将保留在存储中。These artifacts will remain in storage. 如果该命令失败,应假定导出不完整(即使已写入了一些项目)。If the command fails, assume the export is incomplete, even if some artifacts were written. 跟踪命令完成情况和成功完成时导出的项目的最佳方法是使用 .show operations.show operation details 命令。The best way to track both completion of the command and the artifacts exported upon successful completion is by using the .show operations and .show operation details commands.

  • 默认情况下,export 命令是分布式的,所有包含数据的盘区都以并发方式将数据导出/写入到存储。By default, the export command is distributed such that all extents that contain data to export write to storage concurrently. 进行大容量导出时,如果此类盘区的数量很大,则可能会导致存储空间的负载过高,从而导致存储受限或出现暂时性存储错误。On large exports, when the number of such extents is high, this may lead to high load on storage that results in storage throttling, or transient storage errors. 在这种情况下,建议你尝试增加为 export 命令提供的存储帐户数(目的是将负载分布到各个帐户中),并且/或者通过将分布提示设置为 per_node 来减少并发性(请参阅命令属性)。In such cases, it is recommended to try increasing the number of storage accounts provided to the export command (the load will be distributed between the accounts) and/or to reduce the concurrency by setting the distribution hint to per_node (see command properties). 还可以完全禁用分布,但这可能会显著影响命令性能。Entirely disabling distribution is also possible, but this may significantly impact the command performance.