Azure Monitor 日志查询中的高级聚合Advanced aggregations in Azure Monitor log queries

备注

在完成本课程之前,应先完成 Azure Monitor 查询中的聚合You should complete Aggregations in Azure Monitor queries before completing this lesson.

备注

可以在自己的 Log Analytics 环境中完成此练习,也可以使用我们的演示环境,其中包含大量样本数据。You can work through this exercise in your own Log Analytics environment, or you can use our Demo environment, which includes plenty of sample data.

本文介绍一些适用于 Azure Monitor 查询的较高级聚合选项。This article describes some of the more advanced aggregation options available to Azure Monitor queries.

生成列表和集Generating lists and sets

可以使用 makelist 根据特定列中的值顺序创建数据透视图。You can use makelist to pivot data by the order of values in a particular column. 例如,你可能想要浏览计算机上发生的最常见有序事件。For example, you may want to explore the most common order events take place on your machines. 实际上,可以根据每台计算机上 EventID 的顺序创建数据透视图。You can essentially pivot the data by the order of EventIDs on each machine.

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makelist(EventID) by Computer
ComputerComputer list_EventIDlist_EventID
computer1computer1 [704,701,1501,1500,1085,704,704,701][704,701,1501,1500,1085,704,704,701]
computer2computer2 [326,105,302,301,300,102][326,105,302,301,300,102]
...... ......

makelist 根据数据的传入顺序生成列表。makelist generates a list in the order that data was passed into it. 若要以最旧到最新的顺序排序事件,请在 order 语句中使用 asc,而不要使用 descTo sort events from oldest to newest, use asc in the order statement instead of desc.

创建只包含非重复值的列表也很有用。It is also useful to create a list of just distinct values. 此列表称为“集”,它是使用 makeset 生成的: This is called a Set and can be generated with makeset:

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makeset(EventID) by Computer
ComputerComputer list_EventIDlist_EventID
computer1computer1 [704,701,1501,1500,1085][704,701,1501,1500,1085]
computer2computer2 [326,105,302,301,300,102][326,105,302,301,300,102]
...... ......

makelist 一样,makeset 也能处理有序数据,并可基于行的传入顺序生成数组。Like makelist, makeset also works with ordered data and will generate the arrays based on the order of the rows that are passed into it.

展开列表Expanding lists

makelistmakeset 的反向操作是 mvexpand,该操作将值列表展开为单独的行。The inverse operation of makelist or makeset is mvexpand, which expands a list of values to separate rows. 它可以展开任意数目的动态列(包括 JSON 和数组)。It can expand across any number of dynamic columns, both JSON and array. 例如,可以在“检测信号”表中检查在过去一小时发送了检测信号的计算机中发送数据的解决方案: For example, you could check the Heartbeat table for solutions sending data from computers that sent a heartbeat in the last hour:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, Solutions
ComputerComputer 解决方案Solutions
computer1computer1 "security", "updates", "changeTracking""security", "updates", "changeTracking"
computer2computer2 "security", "updates""security", "updates"
computer3computer3 "antiMalware", "changeTracking""antiMalware", "changeTracking"
...... ......

使用 mvexpand 可以显示单独行(而不是逗号分隔列表)中的每个值:Use mvexpand to show each value in a separate row instead of a comma-separated list:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
ComputerComputer 解决方案Solutions
computer1computer1 "security""security"
computer1computer1 "updates""updates"
computer1computer1 "changeTracking""changeTracking"
computer2computer2 "security""security"
computer2computer2 "updates""updates"
computer3computer3 "antiMalware""antiMalware"
computer3computer3 "changeTracking""changeTracking"
...... ......

然后,可以再次使用 makelist 将项分组到一起,这次会看到每个解决方案的计算机列表:You could then use makelist again to group items together, and this time see the list of computers per solution:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
| summarize makelist(Computer) by tostring(Solutions) 
解决方案Solutions list_Computerlist_Computer
"security""security" ["computer1", "computer2"]["computer1", "computer2"]
"updates""updates" ["computer1", "computer2"]["computer1", "computer2"]
"changeTracking""changeTracking" ["computer1", "computer3"]["computer1", "computer3"]
"antiMalware""antiMalware" ["computer3"]["computer3"]
...... ......

处理缺失的 binHandling missing bins

需要为缺失的 bin 填写默认值时,非常适合应用 mvexpand。例如,假设你要通过浏览特定计算机的检测信号来查看该计算机的正常运行时间。A useful application of mvexpand is the need to fill default values in for missing bins. For example, suppose you're looking for the uptime of a particular machine by exploring its heartbeat. 此外,你想要查看 category 列中检测信号的源。You also want to see the source of the heartbeat which is in the category column. 通常,我们会使用一个简单的 summarize 语句,如下所示:Normally, we would use a simple summarize statement as follows:

Heartbeat
| where TimeGenerated > ago(12h)
| summarize count() by Category, bin(TimeGenerated, 1h)
CategoryCategory TimeGeneratedTimeGenerated count_count_
直接代理Direct Agent 2017-06-06T17:00:00Z2017-06-06T17:00:00Z 1515
直接代理Direct Agent 2017-06-06T18:00:00Z2017-06-06T18:00:00Z 6060
直接代理Direct Agent 2017-06-06T20:00:00Z2017-06-06T20:00:00Z 5555
直接代理Direct Agent 2017-06-06T21:00:00Z2017-06-06T21:00:00Z 5757
直接代理Direct Agent 2017-06-06T22:00:00Z2017-06-06T22:00:00Z 6060
...... ...... ......

不过,在这些结果中,与“2017-06-06T19:00:00Z”关联的存储桶缺失,因为在该小时没有任何检测信号数据。In these results though the bucket associated with "2017-06-06T19:00:00Z" is missing because there isn't any heartbeat data for that hour. 使用 make-series 函数将默认值赋给空存储桶。Use the make-series function to assign a default value to empty buckets. 这会针对每个类别生成包含两个额外数组列的行,其中一个列包含值,另一个列包含匹配时间存储桶:This will generate a row for each category with two extra array columns, one for values, and one for matching time buckets:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 
CategoryCategory count_count_ TimeGeneratedTimeGenerated
直接代理Direct Agent [15,60,0,55,60,57,60,...][15,60,0,55,60,57,60,...] ["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...]["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...]
...... ...... ......

count_ 数组的第三个元素预期为 0,TimeGenerated 数组中包含“2017-06-06T19:00:00.0000000Z”的匹配时间戳。The third element of the count_ array is a 0 as expected, and there is a matching timestamp of "2017-06-06T19:00:00.0000000Z" in the TimeGenerated array. 不过,此数组的格式难以阅读。This array format is difficult to read though. 使用 mvexpand 展开数组,并生成 summarize 所生成的相同格式输出:Use mvexpand to expand the arrays and produce the same format output as generated by summarize:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 
| mvexpand TimeGenerated, count_
| project Category, TimeGenerated, count_
CategoryCategory TimeGeneratedTimeGenerated count_count_
直接代理Direct Agent 2017-06-06T17:00:00Z2017-06-06T17:00:00Z 1515
直接代理Direct Agent 2017-06-06T18:00:00Z2017-06-06T18:00:00Z 6060
直接代理Direct Agent 2017-06-06T19:00:00Z2017-06-06T19:00:00Z 00
直接代理Direct Agent 2017-06-06T20:00:00Z2017-06-06T20:00:00Z 5555
直接代理Direct Agent 2017-06-06T21:00:00Z2017-06-06T21:00:00Z 5757
直接代理Direct Agent 2017-06-06T22:00:00Z2017-06-06T22:00:00Z 6060
...... ...... ......

将结果缩小为一组元素:let, makeset, toscalar, inNarrowing results to a set of elements: let, makeset, toscalar, in

一种常见的方案是基于一组条件选择某些特定实体的名称,然后将不同的数据集筛选为该实体集。A common scenario is to select the names of some specific entities based on a set of criteria and then filter a different data set down to that set of entities. 例如,可以查找已知缺少更新的计算机,并识别这些计算机调用的 IP:For example you might find computers that are known to have missing updates and identify IPs that these computers called out to:

let ComputersNeedingUpdate = toscalar(
    Update
    | summarize makeset(Computer)
    | project set_Computer
);
WindowsFirewall
| where Computer in (ComputersNeedingUpdate)

后续步骤Next steps

请参阅有关将 Kusto 查询语言与 Azure Monitor 日志数据配合使用的其他课程:See other lessons for using the Kusto query language with Azure Monitor log data: