IngestionBatching 策略IngestionBatching policy


在引入过程中,Kusto 试图通过在等待引入时对小的引入数据块执行批处理来优化吞吐量。During the ingestion process Kusto attempts to optimize for throughput by batching small ingress data chunks together as they await ingestion. 这种批处理减少了引入过程消耗的资源,并且不需要资源在引入后优化非批处理引入产生的小型数据分片。This sort of batching reduces the resources consumed by the ingestion process, as well as does not require post-ingestion resources to optimize the small data shards produced by non-batched ingestion.

但是,在引入之前执行批处理有一个缺点,就是这么做会产生强制性延迟,这样从请求引入数据到准备好进行查询之间的端到端时间会更长。There is a downside, however, to performing batching before ingestion, which is the introduction of a forced delay, so that the end-to-end time from requesting the ingestion of data until it is ready for query is larger.

若想能够进行权衡,可以使用 IngestionBatching 策略。To allow control of this trade-off, one may use the IngestionBatching policy. 此策略仅应用于队列中的引入,并在批量处理小型 blob 时提供允许的最大强制延迟。This policy gets applied to queued ingestion only, and provides the maximum forced delay to allow when batching small blobs together.


如上所述,要批量引入的数据有一个最优大小。As explained above, there is an optimal size of data to be ingested in bulk. 当前其大小大约是 1GB 的未压缩数据。Currently that size is about 1 GB of uncompressed data. 在包含远小于最优大小的数据的 blob 中完成的引入并非最优的,因此在排了队的引入中,Kusto 将把这些小型的 blob 一起进行批处理。Ingestion that is done in blobs that hold much less data than the optimal size is non-optimal, and therefore in queued ingestion Kusto will batch such small blobs together. 批处理的实施会持续到第一个条件变为 true 为止:Batching is done until the first condition becomes true:

  1. 批处理数据的总大小达到最优大小,或The total size of the batched data reaches the optimal size, or
  2. 已达到 IngestionBatching 策略允许的最大延迟时间、总大小或 blob 数The maximum delay time, total size, or number of blobs allowed by the IngestionBatching policy is reached

可以在数据库或表上设置 IngestionBatching 策略。The IngestionBatching policy can be set on databases, or tables. 默认情况下,如果未定义策略,Kusto 会为批处理使用以下默认值:最大延迟时间为“5 分钟”,总共“1000”个项,总大小为“1G” 。By default, if not policy is defined, Kusto will use a default value of 5 minutes as the maximum delay time, 1000 items, total size of 1G for batching.


将此策略设置为非常小的值会增加群集的 COGS 并降低性能。The impact of setting this policy to a very small value is an increase in the COGS of the cluster and reduced performance. 此外,在限制内,由于并行管理多个引入进程而产生开销,减小该值实际上可能导致增加有效的端到端引入延迟。Additionally, in the limit, reducing this value might actually result in increased effective end-to-end ingestion latency, due to the overhead of managing multiple ingestion processes in parallel.

其他资源Additional resources