查询限制Query limits

Kusto 是一个即席查询引擎,它承载着大型数据集,并尝试通过在内存中保留所有相关数据来满足查询。Kusto is an ad-hoc query engine that hosts large data sets and attempts to satisfy queries by holding all relevant data in-memory. 存在一个固有的风险,即查询会无限地独占服务资源。There's an inherent risk that queries will monopolize the service resources without bounds. Kusto 以默认查询限制的形式提供了许多内置保护。Kusto provides a number of built-in protections in the form of default query limits. 如果要考虑消除这些限制,请首先确定这样做实际上是否会带来任何价值。If you're considering removing these limits, first determine whether you actually gain any value by doing so.

有关查询并发性的限制Limit on query concurrency

查询并发性 是指群集针对同时运行的查询数施加的限制。Query concurrency is a limit that a cluster imposes on a number of queries running at the same time.

  • 查询并发限制的默认值取决于运行它的 SKU 群集,计算方式如下:Cores-Per-Node x 10The default value of the query concurrency limit depends on the SKU cluster it's running on, and is calculated as: Cores-Per-Node x 10.
    • 例如,对于在 D14v2 SKU 上设置的群集(其中每台计算机都有 16 个 vCore),默认查询并发限制为 16 cores x10 = 160For example, for a cluster that's set-up on D14v2 SKU, where each machine has 16 vCores, the default Query Concurrency limit is 16 cores x10 = 160.
  • 可以通过配置查询限制策略来更改默认值。The default value can be changed by configuring the query throttling policy.
    • 可以在群集上并发运行的实际查询数取决于各种因素。The actual number of queries that can run concurrently on a cluster depends on various factors. 最主要的因素包括群集 SKU、群集的可用资源和查询模式。The most dominant factors are cluster SKU, cluster's available resources, and query patterns. 可以根据对类似生产的查询模式执行的负载测试来配置查询限制策略。Query throttling policy can be configured based on load tests performed on production-like query patterns.

有关结果集大小的限制(结果截断)Limit on result set size (result truncation)

结果截断 是针对查询返回的结果集默认设置的限制。Result truncation is a limit set by default on the result set returned by the query. Kusto 将返回给客户端的记录数限制为 500,000,将这些记录的总数据大小限制为 64 MB。Kusto limits the number of records returned to the client to 500,000 , and the overall data size for those records to 64 MB. 当超出其中任一限制时,查询将失败并显示“部分查询失败”。When either of these limits is exceeded, the query fails with a "partial query failure". 超出总数据大小将生成包含以下消息的异常:Exceeding overall data size will generate an exception with the message:

The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal data size limit 67108864 (E_QUERY_RESULT_SET_TOO_LARGE).'

超出记录数则会失败,并会显示以下异常:Exceeding the number of records will fail with an exception that says:

The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal record count limit 500000 (E_QUERY_RESULT_SET_TOO_LARGE).'

有许多策略可用于处理此错误。There are a number of strategies for dealing with this error.

  • 将查询修改为仅返回令人感兴趣的数据,从而减小结果集的大小。Reduce the result set size by modifying the query to only return interesting data. 当初始的失败查询范围太广时,此策略非常有用。This strategy is useful when the initial failing query is too "wide". 例如,查询未抛弃不需要的数据列。For example, the query doesn't project away data columns that aren't needed.
  • 通过将查询后处理(例如聚合)切换到查询本身中来减小结果集的大小。Reduce the result set size by shifting post-query processing, such as aggregations, into the query itself. 当查询输出被馈送到另一个处理系统并且该系统随后执行其他聚合时,此策略非常有用。The strategy is useful in scenarios where the output of the query is fed to another processing system, and that then does additional aggregations.
  • 要从服务中导出大量数据时,请从查询切换到使用数据导出Switch from queries to using data export when you want to export large sets of data from the service.
  • 使用下面列出的 set 语句或客户端请求属性中的标志来指示服务取消此查询限制。Instruct the service to suppress this query limit using set statements listed below or flags in client request properties.

用于减小查询生成的结果集大小的方法包括:Methods for reducing the result set size produced by the query include:

你可以通过使用 notruncation 请求选项来禁用结果截断。You can disable result truncation by using the notruncation request option. 建议你仍然实施某种形式的限制。We recommend that some form of limitation is still put in place.

例如:For example:

set notruncation;
MyTable | take 1000000

还可以通过设置 truncationmaxsize(以字节为单位的最大数据大小,默认为 64 MB)和 truncationmaxrecords(最大记录数,默认为 500,000)的值,对结果截断进行更精确的控制。It's also possible to have more refined control over result truncation by setting the value of truncationmaxsize (maximum data size in bytes, defaults to 64 MB) and truncationmaxrecords (maximum number of records, defaults to 500,000). 例如,下面的查询将结果截断设置为在 1105 条记录或 1MB 时发生,以先超出的值为准。For example, the following query sets result truncation to happen at either 1,105 records or 1MB, whichever is exceeded.

set truncationmaxsize=1048576;
set truncationmaxrecords=1105;
MyTable | where User=="UserId1"

去除结果截断限制意味着你打算将大量数据移出 Kusto。Removing the result truncation limit means that you intend to move bulk data out of Kusto.

你可以去除结果截断限制,以方便使用 .export 命令进行导出,或方便以后进行聚合。You can remove the result truncation limit either for export purposes by using the .export command or for later aggregation. 如果选择稍后进行聚合,请考虑使用 Kusto 进行聚合。If you choose later aggregation, consider aggregating by using Kusto.

Kusto 提供的许多客户端库能够以将“无限大的”结果流式传输给调用方的方式来处理这些结果。Kusto provides a number of client libraries that can handle "infinitely large" results by streaming them to the caller. 请使用这些库中的一个,并将其配置为流式传输模式。Use one of these libraries, and configure it to streaming mode. 例如,使用 .NET Framework 客户端 (Microsoft.Azure.Kusto.Data),并将连接字符串的 streaming 属性设置为 true ,或使用始终会流式传输结果的 ExecuteQueryV2Async() 调用。For example, use the .NET Framework client (Microsoft.Azure.Kusto.Data) and either set the streaming property of the connection string to true , or use the ExecuteQueryV2Async() call that always streams results.

默认情况下,结果截断不仅仅应用于返回给客户端的结果流。Result truncation is applied by default, not just to the result stream returned to the client. 默认情况下,它还应用于跨群集查询中一个群集向另一个群集发出的任何子查询,并具有类似的效果。It's also applied by default to any subquery that one cluster issues to another cluster in a cross-cluster query, with similar effects.

每个迭代器的内存限制Limit on memory per iterator

每个结果集迭代器的最大内存 是 Kusto 用于防范“失控”查询的另一个限制。Max memory per result set iterator is another limit used by Kusto to protect against "runaway" queries. 此限制由请求选项 maxmemoryconsumptionperiterator 表示,用于设置单个查询计划结果集迭代器可以容纳的内存量的上限。This limit, represented by the request option maxmemoryconsumptionperiterator, sets an upper bound on the amount of memory that a single query plan result set iterator can hold. 此限制适用于本来就不能进行流式传输的特定迭代器,例如 join。下面是发生这种情况时将返回的一些错误消息:This limit applies to the specific iterators that aren't streaming by nature, such as join.) Here are a few error messages that will return when this situation happens:

The ClusterBy operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete E_RUNAWAY_QUERY.

The DemultiplexedResultSetCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The ExecuteAndCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The Sort operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The Summarize operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The TopNestedAggregator operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The TopNested operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

默认情况下,此值设置为 5 GB。By default, this value is set to 5 GB. 最多可将此值增加到计算机的物理内存的一半:You may increase this value by up to half the physical memory of the machine:

set maxmemoryconsumptionperiterator=68719476736;
MyTable | ...

在许多情况下,可以通过对数据集采样来避免超出此限制。In many cases, exceeding this limit can be avoided by sampling the data set. 下面的两个查询展示了如何进行采样。The two queries below show how to do the sampling. 第一个是统计采样,它使用一个随机数生成器。The first, is a statistical sampling, that uses a random number generator). 第二个是确定性采样,它是通过对数据集中的某个列(通常是某个 ID)进行哈希处理来执行的。The second, is deterministic sampling, done by hashing some column from the data set, usually some ID.

T | where rand() < 0.1 | ...

T | where hash(UserId, 10) == 1 | ...

每个节点的内存限制Limit on memory per node

每个节点每个查询的最大内存 是用于防范“失控”查询的另一个限制。Max memory per query per node is another limit used to protect against "runaway" queries. 此限制通过请求选项 max_memory_consumption_per_query_per_node 表示,用于设置可以在单个节点上用于特定查询的内存量的上限。This limit, represented by the request option max_memory_consumption_per_query_per_node, sets an upper bound on the amount of memory that can be used on a single node for a specific query.

set max_memory_consumption_per_query_per_node=68719476736;
MyTable | ...

对累积的字符串集的限制Limit on accumulated string sets

在各种查询操作中,Kusto 需要“收集”字符串值,并在内部将其置于缓冲区中,然后再开始生成结果。In various query operations, Kusto needs to "gather" string values and buffer them internally before it starts to produce results. 这些累积的字符串集在大小和可以容纳的项目数方面都受限。These accumulated string sets are limited in size and in how many items they can hold. 此外,每个单独的字符串不应超出特定的限制。Additionally, each individual string shouldn't exceed a certain limit. 超出上述任一限制将导致出现以下错误之一:Exceeding one of these limits will result in one of the following errors:

Runaway query (E_RUNAWAY_QUERY). (message: 'Accumulated string array getting too large and exceeds the limit of ...GB (see https://aka.ms/kustoquerylimits)')

Runaway query (E_RUNAWAY_QUERY). (message: 'Accumulated string array getting too large and exceeds the maximum count of ..GB items (see http://aka.ms/kustoquerylimits)')

目前没有任何开关可用来增加最大字符串集大小。There's currently no switch to increase the maximum string set size. 解决方法是重新表述查询以减少必须缓冲的数据量。As a workaround, rephrase the query to reduce the amount of data that has to be buffered. 可以抛弃不需要的列,不让 join 和 summarize 之类的运算符使用它们。You can project away unneeded columns before they're used by operators such as join and summarize. 也可使用随机执行查询策略。Or, you can use the shuffle query strategy.

限制执行超时Limit execution timeout

服务器超时 是应用于所有请求的服务端超时。Server timeout is a service-side timeout that is applied to all requests. Kusto 中的多个点针对运行中的请求(查询和控制命令)强制实施了超时限制:Timeout on running requests (queries and control commands) is enforced at multiple points in the Kusto:

  • 客户端库(如果使用)client library (if used)
  • 接受请求的服务终结点service endpoint that accepts the request
  • 处理请求的服务引擎service engine that processes the request

默认情况下,查询超时设置为 4 分钟,控制命令超时设置为 10 分钟。By default, timeout is set to four minutes for queries, and 10 minutes for control commands. 如果需要,可以增大该值(上限为 1 小时)。This value can be increased if needed (capped at one hour).

  • 如果你使用 Kusto.Explorer 进行查询,请使用“工具”>“选项”_ > _“连接”* >“查询服务器超时” 。If you query using Kusto.Explorer, use Tools > Options _ > _ Connections* > Query Server Timeout.
  • 以编程方式将 servertimeout 客户端请求属性设置为 System.TimeSpan 类型的值,最多为 1 小时。Programmatically, set the servertimeout client request property, a value of type System.TimeSpan, up to an hour.

有关超时的说明Notes about timeouts

  • 在客户端,超时将从创建请求时开始算起,直到响应开始到达客户端为止。On the client side, the timeout is applied from the request being created until the time that the response starts arriving to the client. 在客户端上读回有效负载所花费的时间不计入超时。The time it takes to read the payload back at the client isn't treated as part of the timeout. 这取决于调用方从流中拉取数据的速度。It depends on how quickly the caller pulls the data from the stream.
  • 此外,在客户端使用的实际超时值略高于用户所请求的服务器超时值。Also on the client side, the actual timeout value used is slightly higher than the server timeout value requested by the user. 此差异考虑了网络延迟。This difference, is to allow for network latencies.
  • 若要自动使用允许的最大请求超时,请将客户端请求属性 norequesttimeout 设置为 trueTo automatically use the maximum allowed request timeout, set the client request property norequesttimeout to true.

对查询 CPU 资源使用量的限制Limit on query CPU resource usage

Kusto 允许你在运行查询时使用群集拥有的全部 CPU 资源。Kusto lets you run queries and use as much CPU resources as the cluster has. 如果有多个查询正在运行,它会尝试在查询之间进行公平的轮循。It attempts to do a fair round-robin between queries if more than one is running. 对于即席查询,此方法可以实现最佳性能。This method yields the best performance for ad-hoc queries. 在其他时候,你可能希望限制用于特定查询的 CPU 资源。At other times, you may want to limit the CPU resources used for a particular query. 例如,如果你运行“后台作业”,系统可能会容忍更高的延迟,以便向并发即席查询授予高优先级。If you run a "background job", for example, the system might tolerate higher latencies to give concurrent ad-hoc queries high priority.

Kusto 支持在运行查询时指定两个客户端请求属性Kusto supports specifying two client request properties when running a query. 这两个属性是 query_fanout_threads_percent 和 query_fanout_nodes_percent。The properties are query_fanout_threads_percent and query_fanout_nodes_percent. 这两个属性都是默认值为最大值 (100) 的整数,但对于特定查询,可以将其降低为某个其他值。Both properties are integers that default to the maximum value (100), but may be reduced for a specific query to some other value.

第一个属性 query_fanout_threads_percent 控制线程使用的扇出因子。The first, query_fanout_threads_percent , controls the fanout factor for thread use. 如果为 100%,则群集将分配每个节点上的所有 CPU。When it's 100%, the cluster will assign all CPUs on each node. 例如,在 Azure D14 节点上部署的群集上的 16 个 CPU。For example, 16 CPUs on a cluster deployed on Azure D14 nodes. 当它为 50% 时,将使用一半的 CPU,依此类推。When it's 50%, then half of the CPUs will be used, and so on. 数值将向上舍入为一整个 CPU,因此将其设置为 0 是安全的。The numbers are rounded up to a whole CPU, so it's safe to set it to 0. 第二个属性 query_fanout_nodes_percent 控制每个子查询分发操作将使用群集中的多少个查询节点。The second, query_fanout_nodes_percent , controls how many of the query nodes in the cluster to use per subquery distribution operation. 它以类似的方式工作。It functions in a similar manner.

有关查询复杂性的限制Limit on query complexity

在查询执行期间,查询文本将转换为表示查询的关系运算符的树。During query execution, the query text is transformed into a tree of relational operators representing the query. 如果树深度超出了数千层的内部阈值,则查询会被视为太复杂,无法处理,因此会失败并显示错误代码。If the tree depth exceeds an internal threshold of several thousand levels, the query is considered too complex for processing, and will fail with an error code. 此失败表示关系运算符树超出了其限制。The failure indicates that the relational operators tree exceeds its limits. 超出了限制是因为查询中有很长的二元运算符列表链接在一起。Limits are exceeded because of queries with long lists of binary operators that are chained together. 例如:For example:

| where Column == "value1" or 
        Column == "value2" or 
        .... or
        Column == "valueN"

对于此特定案例,请使用 in() 运算符来重新编写查询。For this specific case, rewrite the query using the in() operator.

| where Column in ("value1", "value2".... "valueN")