sample 运算符sample operator
从输入表返回最大指定随机行数。Returns up to the specified number of random rows from the input table.
T | sample 5
备注
sample
适用于速度而不是对值的均匀分配。sample
is geared for speed rather than even distribution of values. 具体来说,这意味着如果在联合了两个不同大小的数据集(如union
或join
运算符)的运算符之后使用,它将不会产生“公平”的结果。Specifically, it means that it will not produce 'fair' results if used after operators that union 2 data sets of different sizes (such as aunion
orjoin
operators). 建议在表格引用和筛选器后立即使用sample
。It's recommended to usesample
right after the table reference and filters.sample
是一个非确定性运算符,每次在查询过程中对其求值时,它都会返回不同的结果集。sample
is a non-deterministic operator, and will return different result set each time it is evaluated during the query. 例如,以下查询将产生两个不同的行(即使其中一个预期会返回相同的行两次)。For example, the following query yields two different rows (even if one would expect to return the same row twice).
语法Syntax
T | sample
NumberOfRowsT | sample
NumberOfRows
参数Arguments
- NumberOfRows:要返回的 T 的行数。NumberOfRows : The number of rows of T to return. 可以指定任何数值表达式。You can specify any numeric expression.
示例Examples
let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)
xx |
---|
8383 |
33 |
为确保在上面的示例中 _sample
计算一次,可以使用 materialize() 函数:To ensure that in example above _sample
is calculated once, one can use materialize() function:
let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)
xx |
---|
3434 |
3434 |
若要对一定百分比的数据(而不是指定的行数)进行采样,则可以使用To sample a certain percentage of your data (rather than a specified number of rows), you can use
StormEvents | where rand() < 0.1
若要对项而不是行进行采样(例如对 10 个 ID 进行采样并获取这些 ID 的所有行),则可以将 sample-distinct
与 in
运算符结合使用。To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids) you can use sample-distinct
in combination with the in
operator.
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)