reduce 运算符reduce operator

基于值相似性组合一组字符串。Groups a set of strings together based on values similarity.

T | reduce by LogMessage with threshold=0.1

对于每一个这样的组,它输出一个最能准确描述该组的模式(可能使用星号字符 * 表示通配符)、该组中值的计数以及该组的代表(该组中的某一原始值) 。For each such group, it outputs a pattern that best describes the group (possibly using the asterix (*) character to represent wildcards), a count of the number of values in the group, and a representative of the group (one of the original values in the group).

语法Syntax

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters] ]T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters] ]

参数Arguments

  • Expr:一个计算结果为 string 值的表达式。Expr: An expression that evaluates to a string value.
  • 阈值:范围 (0..1) 中的 real 文本。Threshold: A real literal in the range (0..1). 默认为 0.1。Default is 0.1. 对于大型输入,阈值应较小。For large inputs, threshold should be small.
  • 字符string 文本,其中包含一系列字符,这些字符将添加到不会中断字词的字符列表中。Characters: A string literal containing a list of characters to add to the list of characters that don't break a term. (例如,如果你希望 aaa=bbbbaaa:bbb 各自为一个整体字词,而不是在 =: 处中断,请使用 ":=" 作为字符串文本。)(For example, if you want aaa=bbbb and aaa:bbb to each be a whole term, rather than break on = and :, use ":=" as the string literal.)
  • ReduceKind:指定 reduce 风格。ReduceKind: Specifies the reduce flavor. 目前唯一的有效值为 sourceThe only valid value for the time being is source.

返回Returns

该运算符返回一个表,其中包含三列(PatternCountRepresentative),以及与组数相等的行。This operator returns a table with three columns (Pattern, Count, and Representative), and as many rows as there are groups. Pattern 是组的模式值,* 用作通配符(表示任意插入字符串),Count 计算该模式所表示的运算符输入中的行的数量,而 Representative 是该组的输入中的一个值。Pattern is the pattern value for the group, with * being used as a wildcard (representing arbitrary insertion strings), Count counts how many rows in the input to the operator are represented by this pattern, and Representative is one value from the input that falls into this group.

如果指定 [kind=source],则运算符会将 Pattern 列追加到现有表结构。If [kind=source] is specified, the operator will append the Pattern column to the existing table structure. 请注意,该风格的语法架构可能会受后续更改的影响。Note that the syntax an schema of this flavor might be subjected to future changes.

例如,reduce by city 的结果可能包括:For example, the result of reduce by city might include:

模式Pattern 计数Count RepresentativeRepresentative
San *San * 51825182 San BernardSan Bernard
Saint *Saint * 28462846 Saint LucySaint Lucy
MoscowMoscow 37263726 MoscowMoscow
* -on- ** -on- * 27302730 One -on- OneOne -on- One
ParisParis 27162716 ParisParis

具有自定义词汇切分的另一个示例:Another example with customized tokenization:

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X" 
模式Pattern 计数Count RepresentativeRepresentative
MachineLearning*MachineLearning* 10001000 MachineLearningX4MachineLearningX4

示例Examples

以下示例显示如何将 reduce 运算符应用于“sanitized”输入,在该输入中,要减少的列中 GUID 会在减少之前被替换The following example shows how one might apply the reduce operator to a "sanitized" input, in which GUIDs in the column being reduced are replaced prior to reducing

// Start with a few records from the Trace table.
Trace | take 10000
// We will reduce the Text column which includes random GUIDs.
// As random GUIDs interfere with the reduce operation, replace them all
// by the string "GUID".
| extend Text=replace(@"[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}", @"GUID", Text)
// Now perform the reduce. In case there are other "quasi-random" identifiers with embedded '-'
// or '_' characters in them, treat these as non-term-breakers.
| reduce by Text with characters="-_"

另请参阅See also

autoclusterautocluster

备注Notes

reduce 运算符的实现很大程度上基于 Risto Vaarandi 所著论文用于从事件日志中挖掘模式的数据聚类分析算法The implementation of reduce operator is largely based on the paper A Data Clustering Algorithm for Mining Patterns From Event Logs, by Risto Vaarandi.