sample-distinct 运算符sample-distinct operator

返回单个列,该列包含所请求列的非重复值,且数目不超过指定数目。Returns a single column that contains up to the specified number of distinct values of the requested column.

运算符的默认的、仅限目前的风格是尝试尽快返回答案(而不是尝试生成一个公平的示例)the default (and currently only) flavor of the operator tries to return an answer as quickly as possible (rather than trying to make a fair sample)

T | sample-distinct 5 of DeviceId

语法Syntax

T | sample-distinct NumberOfValues of ColumnNameT | sample-distinct NumberOfValues of ColumnName

参数Arguments

  • NumberOfValues:要返回的 T 的非重复值的数目。NumberOfValues : The number distinct values of T to return. 可以指定任何数值表达式。You can specify any numeric expression.

提示Tips

通过在 let 语句中使用 sample-distinct,然后使用 in 运算符进行筛选,可以轻松地进行总体抽样(参见示例)Can be handy to sample a population by putting sample-distinct in a let statement and later filter using the in operator (see example)

如果要获得最靠前的值,而非仅仅一个示例,可以使用 top-hitters 运算符If you want the top values rather than just a sample, you can use the top-hitters operator

如果要对数据行(而不是特定列的值)进行抽样,请参阅 sample 运算符if you want to sample data rows (rather than values of a specific column), refer to the sample operator

示例Examples

从总体获取 10 个非重复值Get 10 distinct values from a population

StormEvents | sample-distinct 10 of EpisodeId

进行总体抽样,并执行进一步的计算,了解汇总不会超过查询限制。Sample a population and do further computation knowing the summarize won't exceed query limits.

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents 
| where EpisodeId in (sampleEpisodes) 
| summarize totalInjuries=sum(InjuriesDirect) by EpisodeId