summarize 运算符summarize operator

生成可聚合输入表内容的表。Produces a table that aggregates the content of the input table.

Sales | summarize NumTransactions=count(), Total=sum(UnitPrice * NumUnits) by Fruit, StartOfMonth=startofmonth(SellDateTime)

返回一个表,其中包含销售交易数目和每种水果每个销售月份的总金额。Returns a table with how many sell transactions and the total amount per fruit and sell month. 输出列显示交易数目、交易价值、水果、以及记录了交易的月份开始时的日期/时间。The output columns show the count of transactions, transaction worth, fruit, and the datetime of the beginning of the month in which the transaction was recorded.

T | summarize count() by price_range=bin(price, 10.0)

此表显示每个价格区间([0,10.0]、[10.0,20.0] 等)内的项数。A table that shows how many items have prices in each interval [0,10.0], [10.0,20.0], and so on. 此示例中有一列表示计数,还有一列表示价格范围。This example has a column for the count and one for the price range. 忽略所有其他输入列。All other input columns are ignored.

语法Syntax

T | summarize [[ Column =] Aggregation [, ...]] [by [ Column =] GroupExpression [, ...]]T | summarize [[ Column =] Aggregation [, ...]] [by [ Column =] GroupExpression [, ...]]

参数Arguments

  • Column :结果列的可选名称。Column: Optional name for a result column. 默认为派生自表达式的名称。Defaults to a name derived from the expression.
  • 聚合:聚合函数(例如 count()avg())的调用,以列名作为参数。Aggregation: A call to an aggregation function such as count() or avg(), with column names as arguments. 请参阅聚合函数的列表See the list of aggregation functions.
  • GroupExpression:一个可以引用输入数据的标量表达式。GroupExpression: A scalar expression that can reference the input data. 所有组表达式有多少个不同的值,输出就会包含多少个记录。The output will have as many records as there are distinct values of all the group expressions.

备注

当输入表为空时,输出取决于是否使用了 GroupExpression:When the input table is empty, the output depends on whether GroupExpression is used:

  • 如果未提供 GroupExpression,则输出将为单个(空)行。If GroupExpression is not provided, the output will be a single (empty) row.
  • 如果提供了 GroupExpression,则输出将不包含任何行。If GroupExpression is provided, the output will have no rows.

返回Returns

输入行将排列成与 by 表达式具有相同值的组。The input rows are arranged into groups having the same values of the by expressions. 然后,对每个组计算指定的聚合函数,从而为每组生成行。Then the specified aggregation functions are computed over each group, producing a row for each group. 结果包含 by 列,还至少包含用于每个计算聚合的一列。The result contains the by columns and also at least one column for each computed aggregate. (某些聚合函数返回多个列。)(Some aggregation functions return multiple columns.)

结果有许多行,因为 by 值(可能为零)存在不同的组合。The result has as many rows as there are distinct combinations of by values (which may be zero). 如果未提供任何组键,则结果将包含单个记录。If there are no group keys provided, the result has a single record.

若要基于数值范围进行汇总,请使用 bin() 将范围减小为离散值。To summarize over ranges of numeric values, use bin() to reduce ranges to discrete values.

备注

  • 尽管可为聚合和分组表达式提供任意表达式,但使用简单列名称或将 bin() 应用于数值列会更加高效。Although you can provide arbitrary expressions for both the aggregation and grouping expressions, it's more efficient to use simple column names, or apply bin() to a numeric column.
  • 不再支持自动地每小时对日期/时间列进行分箱。The automatic hourly bins for datetime columns is no longer supported. 请改用显式分箱。Use explicit binning instead. 例如 summarize by bin(timestamp, 1h)For example, summarize by bin(timestamp, 1h).

聚合函数的列表List of aggregation functions

函数Function 描述Description
any()any() 返回组的随机非空值Returns a random non-empty value for the group
anyif()anyif() 返回组的随机非空值(带谓词)Returns a random non-empty value for the group (with predicate)
arg_max()arg_max() 当参数最大化时返回一个或多个表达式Returns one or more expressions when the argument is maximized
arg_min()arg_min() 当参数最小化时返回一个或多个表达式Returns one or more expressions when the argument is minimized
avg()avg() 返回整个组的平均值Returns an average value across the group
avgif()avgif() 返回整个组的平均值(带谓词)Returns an average value across the group (with predicate)
binary_all_andbinary_all_and 返回使用组的二元 AND 进行聚合的值Returns aggregated value using the binary AND of the group
binary_all_orbinary_all_or 返回使用组的二元 OR 进行聚合的值Returns aggregated value using the binary OR of the group
binary_all_xorbinary_all_xor 返回使用组的二元 XOR 进行聚合的值Returns aggregated value using the binary XOR of the group
buildschema()buildschema() 返回允许 dynamic 输入的所有值的最小架构Returns the minimal schema that admits all values of the dynamic input
count()count() 返回组的计数Returns a count of the group
countif()countif() 返回具有组谓词的计数Returns a count with the predicate of the group
dcount()dcount() 返回组元素的近似非重复计数Returns an approximate distinct count of the group elements
dcountif()dcountif() 返回组元素的近似非重复计数(带谓词)Returns an approximate distinct count of the group elements (with predicate)
make_bag()make_bag() 返回一个在组内包含动态值的属性包Returns a property bag of dynamic values within the group
make_bag_if()make_bag_if() 返回一个在组内包含动态值的属性包(带谓词)Returns a property bag of dynamic values within the group (with predicate)
make_list()make_list() 返回组中所有值的列表Returns a list of all the values within the group
make_list_if()make_list_if() 返回组中所有值的列表(带谓词)Returns a list of all the values within the group (with predicate)
make_list_with_nulls()make_list_with_nulls() 返回组中所有值(包括 null 值)的列表Returns a list of all the values within the group, including null values
make_set()make_set() 返回组中非重复值的集合Returns a set of distinct values within the group
make_set_if()make_set_if() 返回组中非重复值的集合(带谓词)Returns a set of distinct values within the group (with predicate)
max()max() 返回组内的最大值Returns the maximum value across the group
maxif()maxif() 返回组中的最大值(带谓词)Returns the maximum value across the group (with predicate)
min()min() 返回组内的最小值Returns the minimum value across the group
minif()minif() 返回组中的最小值(带谓词)Returns the minimum value across the group (with predicate)
percentiles()percentiles() 返回组的百分位近似值Returns the percentile approximate of the group
percentiles_array()percentiles_array() 返回组的百分位近似值Returns the percentiles approximates of the group
percentilesw()percentilesw() 返回组的加权百分位近似值Returns the weighted percentile approximate of the group
percentilesw_array()percentilesw_array() 返回组的加权百分位近似值Returns the weighted percentiles approximates of the group
stdev()stdev() 返回整个组的标准偏差Returns the standard deviation across the group
stdevif()stdevif() 返回整个组的标准偏差(带谓词)Returns the standard deviation across the group (with predicate)
sum()sum() 返回组中元素的总和Returns the sum of the elements within the group
sumif()sumif() 返回组中元素的总和(带谓词)Returns the sum of the elements within the group (with predicate)
variance()variance() 返回整个组的方差Returns the variance across the group
varianceif()varianceif() 返回整个组的方差(带谓词)Returns the variance across the group (with predicate)

对默认值进行聚合Aggregates default values

下表汇总了聚合的默认值:The following table summarizes the default values of aggregations:

运算符Operator 默认值Default value
count(), countif(), dcount(), dcountif()count(), countif(), dcount(), dcountif() 00
make_bag(), make_bag_if(), make_list(), make_list_if(), make_set(), make_set_if()make_bag(), make_bag_if(), make_list(), make_list_if(), make_set(), make_set_if() 空的动态数组 ([])empty dynamic array ([])
所有其他All others nullnull

对包含 null 值的实体使用这些聚合时,null 值会被忽略,并且不会参与计算(请参阅下面的示例)。When using these aggregates over entities which includes null values, the null values will be ignored and won't participate in the calculation (see examples below).

示例Examples

按水果和供应商汇总价格

示例Example

确定表中有 ActivityTypeCompletionStatus 的哪些唯一组合。Determine what unique combinations of ActivityType and CompletionStatus there are in a table. 没有聚合函数,只是有分组依据键。There are no aggregation functions, just group-by keys. 输出将只显示这些结果的列:The output will just show the columns for those results:

Activities | summarize by ActivityType, completionStatus
ActivityType completionStatus
dancing started
singing started
dancing abandoned
singing completed

示例Example

查找 Activities 表中所有记录的最小和最大时间戳。Finds the minimum and maximum timestamp of all records in the Activities table. 没有 group by 子句,因此输出中只有一行:There is no group-by clause, so there is just one row in the output:

Activities | summarize Min = min(Timestamp), Max = max(Timestamp)
Min Max
1975-06-09 09:21:45 2015-12-24 23:45:00

示例Example

为每个大陆创建一行,并显示发生活动的城市的计数。Create a row for each continent, showing a count of the cities in which activities occur. 由于“continent”的值很少,因此“by”子句中不需要使用任何分组函数:Because there are few values for "continent", no grouping function is needed in the 'by' clause:

Activities | summarize cities=dcount(city) by continent
cities continent
4290 Asia
3267 Europe
2673 North America

示例Example

下面的示例将计算每个活动类型的直方图。The following example calculates a histogram for each activity type. 由于 Duration 有许多值,因此请使用 bin 将其值按 10 分钟的间隔分组:Because Duration has many values, use bin to group its values into 10-minute intervals:

Activities | summarize count() by ActivityType, length=bin(Duration, 10m)
count_ ActivityType length
354 dancing 0:00:00.000
23 singing 0:00:00.000
2717 dancing 0:10:00.000
341 singing 0:10:00.000
725 dancing 0:20:00.000
2876 singing 0:20:00.000
......

对默认值进行聚合的示例Example for the aggregates default values

summarize 运算符的输入至少有一个空的分组依据键时,其结果也将为空。When the input of summarize operator has at least one empty group-by key, it's result is empty, too.

如果 summarize 运算符的输入没有空的分组依据键,则结果将是在 summarize 中使用的聚合的默认值:When the input of summarize operator doesn't have an empty group-by key, the result is the default values of the aggregates used in the summarize:

datatable(x:long)[]
| summarize any(x), arg_max(x, x), arg_min(x, x), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)
any_xany_x max_xmax_x max_x_xmax_x_x min_xmin_x min_x_xmin_x_x avg_xavg_x schema_xschema_x max_x1max_x1 min_x1min_x1 percentile_x_55percentile_x_55 hll_xhll_x stdev_xstdev_x sum_xsum_x sumif_xsumif_x tdigest_xtdigest_x variance_xvariance_x
datatable(x:long)[]
| summarize  count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)
count_xcount_x countif_countif_ dcount_xdcount_x dcountif_xdcountif_x
00 00 00 00
datatable(x:long)[]
| summarize  make_set(x), make_list(x)
set_xset_x list_xlist_x
[][] [][]

聚合平均值运算会对所有非 null 值求和,只计算参与计算的那些值(不会将 null 值考虑在内)。The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (will not take nulls into account).

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)
sum_ysum_y avg_yavg_y
55 55

常规计数会将 null 计在内:The regular count will count nulls:

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)
count_ycount_y
22
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)
set_yset_y set_y1set_y1
[5.0][5.0] [5.0][5.0]