sample 运算符sample operator

从输入表返回最大指定随机行数。Returns up to the specified number of random rows from the input table.

T | sample 5

备注

  • sample 适用于速度而不是对值的均匀分配。sample is geared for speed rather than even distribution of values. 具体来说,这意味着如果在联合了两个不同大小的数据集(如 unionjoin 运算符)的运算符之后使用,它将不会产生“公平”的结果。Specifically, it means that it will not produce 'fair' results if used after operators that union 2 data sets of different sizes (such as a union or join operators). 建议在表格引用和筛选器后立即使用 sampleIt's recommended to use sample right after the table reference and filters.
  • sample 是一个非确定性运算符,每次在查询过程中对其求值时,它都会返回不同的结果集。sample is a non-deterministic operator, and will return different result set each time it is evaluated during the query. 例如,以下查询将产生两个不同的行(即使其中一个预期会返回相同的行两次)。For example, the following query yields two different rows (even if one would expect to return the same row twice).

语法Syntax

T | sample NumberOfRowsT | sample NumberOfRows

参数Arguments

  • NumberOfRows:要返回的 T 的行数。NumberOfRows : The number of rows of T to return. 可以指定任何数值表达式。You can specify any numeric expression.

示例Examples

let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)
xx
8383
33

为确保在上面的示例中 _sample 计算一次,可以使用 materialize() 函数:To ensure that in example above _sample is calculated once, one can use materialize() function:

let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)
xx
3434
3434

若要对一定百分比的数据(而不是指定的行数)进行采样,则可以使用To sample a certain percentage of your data (rather than a specified number of rows), you can use

StormEvents | where rand() < 0.1

若要对项而不是行进行采样(例如对 10 个 ID 进行采样并获取这些 ID 的所有行),则可以将 sample-distinctin 运算符结合使用。To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids) you can use sample-distinct in combination with the in operator.

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)