dcount()(聚合函数)dcount() (aggregation function)

返回摘要组中标量表达式占用的非重复值数的估计值。Returns an estimate for the number of distinct values that are taken by a scalar expression in the summary group.

备注

dcount() 聚合函数主要用于估算大型集的基数。The dcount() aggregation function is primarily useful for estimating the cardinality of huge sets. 它以性能确定准确性,并可能在执行间返回不同的结果。It trades performance for accuracy, and may return a result that varies between executions. 输入的顺序可能会影响其输出。The order of inputs may have an effect on its output.

语法Syntax

... | summarize dcount (Expr [, Accuracy ]) ...... | summarize dcount (Expr [, Accuracy ]) ...

参数Arguments

  • Expr:要对其非重复值进行计数的标量表达式。Expr : A scalar expression whose distinct values are to be counted.
  • 准确度 :一个可选的 int 文本,用于定义请求的估计准确度。Accuracy : An optional int literal that defines the requested estimation accuracy. 有关支持的值,请参阅下文。See below for supported values. 如果未指定,则使用默认值 1If unspecified, the default value 1 is used.

返回Returns

返回对组中 Expr 的非重复值数的估计值。Returns an estimate of the number of distinct values of Expr in the group.

示例Example

PageViewLog | summarize countries=dcount(country) by continent

D 计数

获取按 G 分组的 V 的非重复值的精确计数。Get an exact count of distinct values of V grouped by G.

T | summarize by V, G | summarize count() by G

由于 V 的非重复值与 G 的非重复值的数量相乘,因此此计算需要大量内部内存。This calculation requires a great amount of internal memory, since distinct values of V are multiplied by the number of distinct values of G. 这可能会导致内存错误或执行时间过长。It may result in memory errors or large execution times. dcount() 提供了快速可靠的替代方法:dcount()provides a fast and reliable alternative:

T | summarize dcount(B) by G | count

估计准确度Estimation accuracy

dcount() 聚合函数使用 HyperLogLog (HLL) 算法的变体,该算法对集基数进行随机估算。The dcount() aggregate function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. 该算法提供一个“旋钮”,可用于平衡每个内存大小的准确度和执行时间:The algorithm provides a "knob" that can be used to balance accuracy and execution time per memory size:

精确度Accuracy 错误 (%)Error (%) 条目数Entry count
00 1.61.6 212212
11 0.80.8 214214
22 0.40.4 216216
33 0.280.28 217217
44 0.20.2 218218

备注

“条目数”列是 HLL 实现中 1 字节计数器的数目。The "entry count" column is the number of 1-byte counters in the HLL implementation.

如果集基数足够小,则该算法包括以下有关执行理想计数(零错误)的规定:The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:

  • 当准确度等级为 1 时,将返回 1000 个值When the accuracy level is 1, 1000 values are returned
  • 当准确度等级为 2 时,将返回 8000 个值When the accuracy level is 2, 8000 values are returned

错误边界基于概率,而不是基于理论界限。The error bound is probabilistic, not a theoretical bound. 值是错误分布的标准偏差 (sigma),99.7% 的估计值的相对误差小于 3 x sigma。The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.

下图显示所有受支持的准确度设置的相对估计误差的概率分布函数,以百分比为单位:The following image shows the probability distribution function of the relative estimation error, in percentages, for all supported accuracy settings:

HLL 错误分布