basket 插件basket plugin

T | evaluate basket()

Basket 在数据中查找所有离散属性(维度)的频繁模式。Basket finds all frequent patterns of discrete attributes (dimensions) in the data. 然后,它返回在原始查询中通过了频率阈值的频繁模式。It then returns the frequent patterns that passed the frequency threshold in the original query. Basket 能确保查找出数据中的所有频繁模式,但不能保证存在多项式运行时。Basket is guaranteed to find every frequent pattern in the data, but isn't guaranteed to have polynomial runtime. 在行数方面,查询的运行时是线性的,但在列数(维度)方面,却可能是指数的。The runtime of the query is linear in the number of rows, but it might be exponential in the number of columns (dimensions). Basket 基于最初开发用于 basket 分析数据挖掘的 Apriori 算法。Basket is based on the Apriori algorithm originally developed for basket analysis data mining.

语法Syntax

T | evaluate basket( arguments )T | evaluate basket( arguments )

返回Returns

Basket 返回出现在行的比率阈值以上的所有频繁模式。Basket returns all frequent patterns appearing above the ratio threshold of the rows. 默认阈值为 0.05。The default threshold is 0.05. 每种模式均由结果中的一行表示。Each pattern is represented by a row in the results.

第一列是段 ID。The first column is the segment ID. 后两列是此模式从原始查询中所捕获行的数量和百分比 。The next two columns are the count and percentage of rows , from the original query, that are captured by the pattern. 其余列均来自原始查询。The remaining columns are from the original query. 它们的值是来自列中的特定值,或者是表示变量值的通配符值(默认为 null)。Their value is either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.

参数(全部可选)Arguments (all optional)

T | evaluate basket([ Threshold , WeightColumn , MaxDimensions , CustomWildcard , CustomWildcard , ...])T | evaluate basket([ Threshold , WeightColumn , MaxDimensions , CustomWildcard , CustomWildcard , ...])

所有参数都为可选参数,但必须按上述方式进行排序。All arguments are optional, but they must be ordered as above. 若要指示应使用默认值,请使用字符串波形值 -“~”。To indicate that the default value should be used, use the string tilde value - '~'. 请参阅以下示例。See examples below.

可用参数:Available arguments:

  • Threshold - 0.015 < double < 1 [默认值:0.05]Threshold - 0.015 < double < 1 [default: 0.05]

    为被视为频繁的行设置最小比率。Sets the minimal ratio of the rows to be considered frequent. 不会返回比率更小的模式。Patterns with a smaller ratio won't be returned.

    示例: T | evaluate basket(0.02)Example: T | evaluate basket(0.02)

  • WeightColumn - column_nameWeightColumn - column_name

    根据指定的权重考虑输入中的每一行。Considers each row in the input according to the specified weight. 默认情况下,每一行的权重都是“1”。By default, each row has a weight of '1'. 该参数必须是数值列的名称,例如,int、long、real。The argument must be a name of a numeric column, such as int, long, real. 权重列的常见用法是对已嵌入每一行的数据进行采样或存储/聚合。A common use of a weight column, is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.

    示例: T | evaluate basket('~', sample_Count)Example: T | evaluate basket('~', sample_Count)

  • MaxDimensions - 1 < int [默认值:5]MaxDimensions - 1 < int [default: 5]

    设置默认情况下受限制的每个 basket 不相关维度的最大数量,以最大程度地减少查询运行时。Sets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime.

    示例: T | evaluate basket('~', '~', 3)Example: T | evaluate basket('~', '~', 3)

  • CustomWildcard - "any_value_per_type"CustomWildcard - "any_value_per_type"

    为结果表中的特定类型设置通配符,表明当前模式对此列没有任何限制。Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn't have a restriction on this column. 默认值为 null。Default is null. 字符串的默认值为空字符串。The default for a string is an empty string. 如果默认值是可行数据值,应使用其他通配符值,例如 *If the default is a good value in the data, a different wildcard value should be used, such as *.

    例如:For example:

    T | evaluate basket('~', '~', '~', '*', int(-1), double(-1), long(0), datetime(1900-1-1))

示例Example

StormEvents 
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
段 IDSegmentId 计数Count 百分比Percent 状态State EventTypeEventType 损害Damage 损害农作物DamageCrops
00 45744574 77.777.7 NO 00
11 22782278 38.738.7 冰雹Hail NO 00
22 56755675 96.496.4 00
33 23712371 40.340.3 冰雹Hail 00
44 12791279 21.721.7 雷雨大风Thunderstorm Wind 00
55 24682468 41.941.9 冰雹Hail
66 13101310 22.322.3 YES
77 12911291 21.921.9 雷雨大风Thunderstorm Wind

使用自定义通配符的示例Example with custom wildcards

StormEvents 
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2, '~', '~', '*', int(-1))
段 IDSegmentId 计数Count 百分比Percent 状态State EventTypeEventType 损害Damage 损害农作物DamageCrops
00 45744574 77.777.7 * * NO 00
11 22782278 38.738.7 * 冰雹Hail NO 00
22 56755675 96.496.4 * * * 00
33 23712371 40.340.3 * 冰雹Hail * 00
44 12791279 21.721.7 * 雷雨大风Thunderstorm Wind * 00
55 24682468 41.941.9 * 冰雹Hail * -1-1
66 13101310 22.322.3 * * YES -1-1
77 12911291 21.921.9 * 雷雨大风Thunderstorm Wind * -1-1