示例Samples

下面是一些常见的查询需求,以及如何使用 Kusto 查询语言来满足这些需求。Below are a few common query needs and how the Kusto query language can be used to meet them.

显示柱形图Display a column chart

投影两列或更多列,将其用作图表的 x 轴和 y 轴。Project two or more columns and use them as the x and y axis of a chart.

StormEvents
| where isnotempty(EndLocation) 
| summarize event_count=count() by EndLocation
| top 10 by event_count
| render columnchart
  • 第一列形成 x 轴。The first column forms the x-axis. 它可以是数字、日期/时间或字符串。It can be numeric, datetime, or string.
  • 使用 wheresummarizetop 来限制显示的数据量。Use where, summarize, and top to limit the volume of data that you display.
  • 对结果进行排序以定义 x 轴的顺序。Sort the results to define the order of the x-axis.

柱形图的屏幕截图。y 轴的范围是从 0 到 50 左右。10 个彩色柱分别表示 10 个位置的值。

从启动和停止事件获取会话Get sessions from start and stop events

假设你有一个事件日志。Suppose you have a log of events. 某些事件标志着某个扩展活动或会话的开始或结束。Some events mark the start or end of an extended activity or session.

名称Name 城市City SessionIdSessionId TimestampTimestamp
开始Start LondonLondon 28173302817330 2015-12-09T10:12:02.322015-12-09T10:12:02.32
游戏Game LondonLondon 28173302817330 2015-12-09T10:12:52.452015-12-09T10:12:52.45
开始Start 曼彻斯特Manchester 42676674267667 2015-12-09T10:14:02.232015-12-09T10:14:02.23
停止Stop LondonLondon 28173302817330 2015-12-09T10:23:43.182015-12-09T10:23:43.18
取消Cancel 曼彻斯特Manchester 42676674267667 2015-12-09T10:27:26.292015-12-09T10:27:26.29
停止Stop 曼彻斯特Manchester 42676674267667 2015-12-09T10:28:31.722015-12-09T10:28:31.72

每个事件都有一个 SessionId。Every event has a SessionId. 问题是如何将具有相同 ID 的启动事件和停止事件进行匹配。The problem is to match up the start and stop events with the same ID.

let Events = MyLogTable | where ... ;

Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events 
        | where Name="Stop"
        | project StopTime=timestamp, SessionId) 
    on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime
  1. 使用 let 为表的投影命名,在进行联接操作之前尽量削减该表的内容。Use let to name a projection of the table that is pared down as far as possible before going into the join.
  2. 使用 Project 更改时间戳的名称,使开始和停止时间都可以出现在结果中。Use Project to change the names of the timestamps so that both the start and stop times can appear in the result. 它还会选择要在结果中显示的其他列。It also selects the other columns to see in the result.
  3. 使用 join 将同一活动的开始和结束条目进行匹配,为每个活动创建一行。Use join to match up the start and stop entries for the same activity, creating a row for each activity.
  4. 最后,project 又添加一个列,用于显示活动的持续时间。Finally, project again adds a column to show the duration of the activity.
城市City SessionIdSessionId StartTimeStartTime StopTimeStopTime 持续时间Duration
LondonLondon 28173302817330 2015-12-09T10:12:02.322015-12-09T10:12:02.32 2015-12-09T10:23:43.182015-12-09T10:23:43.18 00:11:40.4600:11:40.46
曼彻斯特Manchester 42676674267667 2015-12-09T10:14:02.232015-12-09T10:14:02.23 2015-12-09T10:28:31.722015-12-09T10:28:31.72 00:14:29.4900:14:29.49

在没有会话 ID 的情况下获取会话Get sessions, without session ID

假设不是那么方便,启动和停止事件没有可以用来进行匹配的会话 ID。Suppose that the start and stop events don't conveniently have a session ID that we can match with. 但是,我们有进行会话时客户端的 IP 地址。But we do have an IP address of the client where the session took place. 假设每个客户端地址一次只进行一个会话,我们可以将每个启动事件与同一 IP 地址发出的下一个停止事件进行匹配。Assuming each client address only conducts one session at a time, we can match each start event to the next stop event from the same IP address.

Events 
| where Name == "Start" 
| project City, ClientIp, StartTime = timestamp
| join  kind=inner
    (Events
    | where Name == "Stop" 
    | project StopTime = timestamp, ClientIp)
    on ClientIp
| extend duration = StopTime - StartTime 
    // Remove matches with earlier stops:
| where  duration > 0  
    // Pick out the earliest stop for each start and client:
| summarize arg_min(duration, *) by bin(StartTime,1s), ClientIp

联接会将每个启动时间与同一客户端 IP 地址的所有停止时间进行匹配。The join will match every start time with all the stop times from the same client IP address.

  1. 删除停止时间较早的匹配项。Remove matches with earlier stop times.
  2. 按启动时间和 IP 分组,以获取每个会话的组。Group by start time and IP to get a group for each session.
  3. 提供适用于 StartTime 参数的 bin 函数。Supply a bin function for the StartTime parameter. 如果你不这样做,则 Kusto 会自动使用 1 小时的箱,这样会将一些启动时间与错误的停止时间相匹配。If you don't, Kusto will automatically use 1-hour bins that will match some start times with the wrong stop times.

arg_min 会选取每个组中持续时间最短的行,而 * 参数则会传递给所有其他列。arg_min picks out the row with the smallest duration in each group, and the * parameter passes through all the other columns. 此参数会为每个列名添加前缀“min_”。The argument prefixes "min_" to each column name.

一个列出结果的表,其中的列表示每个客户端启动时间组合的启动时间、客户端 IP、持续时间、城市和最早停止时间。

添加代码,用于对持续时间进行计数(使用以便捷方式设置了大小的箱作为单位)。在此示例中,由于我们首选条形图,因此将通过除以 1s 将时间跨度转换为数字。Add code to count the durations in conveniently sized bins. In this example, because of a preference for a bar chart, divide by 1s to convert the timespans to numbers.

    // Count the frequency of each duration:
    | summarize count() by duration=bin(min_duration/1s, 10) 
      // Cut off the long tail:
    | where duration < 300
      // Display in a bar chart:
    | sort by duration asc | render barchart 

此柱形图描绘了指定范围内具有持续时间的会话数。超过 400 个会话持续 10 秒。不到 100 个会话持续 290 秒。

实际示例Real example

Logs  
| filter ActivityId == "ActivityId with Blablabla" 
| summarize max(Timestamp), min(Timestamp)  
| extend Duration = max_Timestamp - min_Timestamp 
 
wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"      
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)  
| extend TotalLaunchedMaps = extract("totalLaunchedMaps=([^,]+),", 1, EventText, typeof(real))  
| extend MapsSeconds = extract("mapsMilliseconds=([^,]+),", 1, EventText, typeof(real)) / 1000 
| extend TotalMapsSeconds = MapsSeconds  / TotalLaunchedMaps 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2'  
| filter TotalLaunchedMaps > 0 
| summarize sum(TotalMapsSeconds) by UnitOfWorkId  
| extend JobMapsSeconds = sum_TotalMapsSeconds * 1 
| project UnitOfWorkId, JobMapsSeconds 
| join ( 
wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"  
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)   
| extend TotalLaunchedReducers = extract("totalLaunchedReducers=([^,]+),", 1, EventText, typeof(real)) 
| extend ReducesSeconds = extract("reducesMilliseconds=([^,]+)", 1, EventText, typeof(real)) / 1000 
| extend TotalReducesSeconds = ReducesSeconds / TotalLaunchedReducers 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2'  
| filter TotalLaunchedReducers > 0 
| summarize sum(TotalReducesSeconds) by UnitOfWorkId  
| extend JobReducesSeconds = sum_TotalReducesSeconds * 1 
| project UnitOfWorkId, JobReducesSeconds ) 
on UnitOfWorkId 
| join ( 
wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"  
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend JobName = extract("jobName=([^,]+),", 1, EventText)  
| extend StepName = extract("stepName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)  
| extend LaunchTime = extract("launchTime=([^,]+),", 1, EventText, typeof(datetime))  
| extend FinishTime = extract("finishTime=([^,]+),", 1, EventText, typeof(datetime)) 
| extend TotalLaunchedMaps = extract("totalLaunchedMaps=([^,]+),", 1, EventText, typeof(real))  
| extend TotalLaunchedReducers = extract("totalLaunchedReducers=([^,]+),", 1, EventText, typeof(real)) 
| extend MapsSeconds = extract("mapsMilliseconds=([^,]+),", 1, EventText, typeof(real)) / 1000 
| extend ReducesSeconds = extract("reducesMilliseconds=([^,]+)", 1, EventText, typeof(real)) / 1000 
| extend TotalMapsSeconds = MapsSeconds  / TotalLaunchedMaps  
| extend TotalReducesSeconds = (ReducesSeconds / TotalLaunchedReducers / ReducesSeconds) * ReducesSeconds  
| extend CalculatedDuration = (TotalMapsSeconds + TotalReducesSeconds) * time(1s) 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2') 
on UnitOfWorkId 
| extend MapsFactor = TotalMapsSeconds / JobMapsSeconds 
| extend ReducesFactor = TotalReducesSeconds / JobReducesSeconds 
| extend CurrentLoad = 1536 + (768 * TotalLaunchedMaps) + (1536 * TotalLaunchedMaps) 
| extend NormalizedLoad = 1536 + (768 * TotalLaunchedMaps * MapsFactor) + (1536 * TotalLaunchedMaps * ReducesFactor) 
| summarize sum(CurrentLoad), sum(NormalizedLoad) by  JobName  
| extend SaveFactor = sum_NormalizedLoad / sum_CurrentLoad 

为一段时间内的并发会话绘制图表Chart concurrent sessions over time

假设你有一个活动表,其中包含活动的开始和结束时间。Suppose you have a table of activities with their start and end times. 展示一个随时间变化的图表,其中会显示在任意时间并发运行的活动数。Show a chart over time that displays how many activities are concurrently running at any time.

下面是一个名为 X 的示例输入。Here's a sample input, called X.

SessionIdSessionId StartTimeStartTime StopTimeStopTime
aa 10:01:0310:01:03 10:10:0810:10:08
bb 10:01:2910:01:29 10:03:1010:03:10
cc 10:03:0210:03:02 10:05:2010:05:20

对于箱为 1 分钟的图表,每隔 1 分钟会创建一些内容,因此针对每个正在运行的活动都有一个计数。For a chart in 1-minute bins, create something that, at each 1m interval, there's a count for each running activity.

下面是一个中间结果。Here's an intermediate result.

X | extend samples = range(bin(StartTime, 1m), StopTime, 1m)

range 按指定的间隔生成值的数组。range generates an array of values at the specified intervals.

SessionIdSessionId StartTimeStartTime StopTimeStopTime 示例samples
aa 10:01:3310:01:33 10:06:3110:06:31 [10:01:00,10:02:00,...10:06:00][10:01:00,10:02:00,...10:06:00]
bb 10:02:2910:02:29 10:03:4510:03:45 [10:02:00,10:03:00][10:02:00,10:03:00]
cc 10:03:1210:03:12 10:04:3010:04:30 [10:03:00,10:04:00][10:03:00,10:04:00]

请使用 mv-expand 展开这些数组,而不是保留它们。Instead of keeping those arrays, expand them by using mv-expand.

X | mv-expand samples = range(bin(StartTime, 1m), StopTime , 1m)
SessionIdSessionId StartTimeStartTime StopTimeStopTime 示例samples
aa 10:01:3310:01:33 10:06:3110:06:31 10:01:0010:01:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:02:0010:02:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:03:0010:03:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:04:0010:04:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:05:0010:05:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:06:0010:06:00
bb 10:02:2910:02:29 10:03:4510:03:45 10:02:0010:02:00
bb 10:02:2910:02:29 10:03:4510:03:45 10:03:0010:03:00
cc 10:03:1210:03:12 10:04:3010:04:30 10:03:0010:03:00
cc 10:03:1210:03:12 10:04:3010:04:30 10:04:0010:04:00

现在按采样时间对它们分组,并计算每个活动的发生次数。Now group these by sample time, counting the occurrences of each activity.

X
| mv-expand samples = range(bin(StartTime, 1m), StopTime , 1m)
| summarize count(SessionId) by bin(todatetime(samples),1m)
  • 请使用 todatetime(),因为 mv-expand 会生成动态类型的列。Use todatetime() because mv-expand yields a column of dynamic type.
  • 请使用 bin(),因为对于数值和日期,如果你未提供间隔,则 summarize 始终使用默认间隔来应用 bin 函数。Use bin() because, for numeric values and dates, summarize always applies a bin function with a default interval if you don't supply one.
count_SessionIdcount_SessionId 示例samples
11 10:01:0010:01:00
22 10:02:0010:02:00
33 10:03:0010:03:00
22 10:04:0010:04:00
11 10:05:0010:05:00
11 10:06:0010:06:00

结果可以呈现为条形图或时间图。The results can be rendered as a bar chart or time chart.

将空箱引入 summarize 中Introduce null bins into summarize

summarize 运算符应用于包含 datetime 列的组键时,请将那些值“分箱”为固定宽度的箱。When the summarize operator is applied over a group key that consists of a datetime column, "bin" those values to fixed-width bins.

let StartTime=ago(12h);
let StopTime=now()
T
| where Timestamp > StartTime and Timestamp <= StopTime 
| where ...
| summarize Count=count() by bin(Timestamp, 5m)

上面的示例会生成一个表(即 T),其中的单个行/行组会划分到时长为五分钟的每个 bin 中。The above example produces a table with a single row per group of rows in T that fall into each bin of five minutes. 它不会执行的操作是添加“空箱”-- 时间箱值在 StartTimeStopTime 之间且在 T 中没有对应行的行。What it doesn't do is add "null bins" -- rows for time bin values between StartTime and StopTime for which there's no corresponding row in T.

需要将这些箱“填充”到表中。下面是执行此操作的一种方法。It's desirable to "pad" the table with those bins. Here's one way to do it.

let StartTime=ago(12h);
let StopTime=now()
T
| where Timestamp > StartTime and Timestamp <= StopTime 
| summarize Count=count() by bin(Timestamp, 5m)
| where ...
| union ( // 1
  range x from 1 to 1 step 1 // 2
  | mv-expand Timestamp=range(StartTime, StopTime, 5m) to typeof(datetime) // 3
  | extend Count=0 // 4
  )
| summarize Count=sum(Count) by bin(Timestamp, 5m) // 5 

下面是上述查询的分步说明:Here's a step-by-step explanation of the above query:

  1. union 运算符用于向表中添加更多行。The union operator lets you add additional rows to a table. 这些行由 union 表达式生成。Those rows are produced by the union expression.
  2. range 运算符生成一个包含单个行/列的表。The range operator produces a table having a single row/column. 该表仅供 mv-expand 使用。The table is not used for anything other than for mv-expand to work on.
  3. 基于 range 函数的 mv-expand 运算符会根据 StartTimeEndTime 之间时长为 5 分钟的箱的数目创建相同数目的行。The mv-expand operator over the range function creates as many rows as there are 5-minute bins between StartTime and EndTime.
  4. 使用一个值为 0CountUse a Count of 0.
  5. summarize 运算符将 union 的原始(左侧或外部)参数中的箱分组到一起。The summarize operator groups together bins from the original (left, or outer) argument to union. 此运算符还会根据它的内部参数进行分箱(空箱行)。The operator also bins from the inner argument to it (the null bin rows). 此过程可确保每个箱在输出中都有一行,其值为零或原始计数。This process ensures that the output has one row per bin, whose value is either zero or the original count.

在 Kusto 中利用机器学习对数据进行更深入的分析Get more out of your data in Kusto with Machine Learning

可以通过很多有趣的用例来了解如何利用机器学习算法从遥测数据中获得有趣的见解。There are many interesting use cases that leverage machine learning algorithms and derive interesting insights out of telemetry data. 这些算法通常要求使用结构化程度很高的数据集作为其输入。Often, these algorithms require a very structured dataset as their input. 原始日志数据通常与所要求的结构和大小不符。The raw log data will usually not match the required structure and size.

首先需查找特定必应推理服务的错误率中的异常。Start by looking for anomalies in the error rate of a specific Bing Inferences service. 日志表有 650 亿条记录。The logs table has 65B records. 下面的简单查询筛选出 25 万条错误,创建一个包含错误计数的时序数据,该数据使用异常检测函数 series_decompose_anomaliesThe simple query below filters 250K errors, and creates a time series data of errors count that uses the anomaly detection function series_decompose_anomalies. 这些异常是由 Kusto 服务检测到的,在时序图表上突出显示为红点。The anomalies are detected by the Kusto service, and are highlighted as red dots on the time series chart.

Logs
| where Timestamp >= datetime(2015-08-22) and Timestamp < datetime(2015-08-23) 
| where Level == "e" and Service == "Inferences.UnusualEvents_Main" 
| summarize count() by bin(Timestamp, 5min)
| render anomalychart 

该服务识别了几个具有可疑错误率的时间段。The service identified few time buckets with suspicious error rate. 请使用 Kusto 放大此时间范围,并运行一个基于“Message”列进行聚合的查询。Use Kusto to zoom into this time frame, and run a query that aggregates on the ‘Message' column. 尝试找出最常见的错误。Try to find the top errors.

消息的整个堆栈跟踪的相关部分被截断,这样做是为了与页面大小更加吻合。The relevant parts of the entire stack trace of the message are trimmed out to better fit onto the page.

你可以看到,最常见的八个错误已被成功识别。You can see the successful identification of the top eight errors. 但是,后面会出现一长串错误,因为错误消息是通过一个包含不断变化的数据的格式字符串创建的。However, there follows a long series of errors, since the error message was created by a format string that contained changing data.

Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| summarize count() by Message 
| top 10 by count_ 
| project count_, Message 
count_count_ MessageMessage
71257125 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
71257125 InferenceHostService 调用失败...System.NullReferenceException:对象引用未设置为某个对象的实例...InferenceHostService call failed..System.NullReferenceException: Object reference not set to an instance of an object...
71247124 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object...
51125112 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object..
174174 InferenceHostService 调用失败...System.ServiceModel.CommunicationException:写入到管道时出错:...InferenceHostService call failed..System.ServiceModel.CommunicationException: There was an error writing to the pipe:...
1010 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
1010 推理系统错误...Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...Inference System error..Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...
33 InferenceHostService 调用失败...System.ServiceModel.CommunicationObjectFaultedException:...InferenceHostService call failed..System.ServiceModel.CommunicationObjectFaultedException:...
11 推理系统错误...SocialGraph.BOSS.OperationResponse...AIS TraceId:8292FC561AC64BED8FA243808FE74EFD...Inference System error... SocialGraph.BOSS.OperationResponse...AIS TraceId:8292FC561AC64BED8FA243808FE74EFD...
11 推理系统错误...SocialGraph.BOSS.OperationResponse...AIS TraceId:5F79F7587FF943EC9B641E02E701AFBF...Inference System error... SocialGraph.BOSS.OperationResponse...AIS TraceId: 5F79F7587FF943EC9B641E02E701AFBF...

这种情况下可以使用 reduce 运算符。This is where the reduce operator helps. 此运算符识别出 63 个不同错误是由代码中的同一跟踪检测点引起的,这有助于用户集中精力来处理该时间范围中更多有意义的错误跟踪。The operator identified 63 different errors that originated by the same trace instrumentation point in the code, and helps focus on additional meaningful error traces in that time window.

Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| reduce by Message with threshold=0.35
| project Count, Pattern
计数Count 模式Pattern
71257125 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
71257125 InferenceHostService 调用失败...System.NullReferenceException:对象引用未设置为某个对象的实例...InferenceHostService call failed..System.NullReferenceException: Object reference not set to an instance of an object...
71247124 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object...
51125112 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object..
174174 InferenceHostService 调用失败...System.ServiceModel.CommunicationException:写入到管道时出错:...InferenceHostService call failed..System.ServiceModel.CommunicationException: There was an error writing to the pipe:...
6363 推理系统错误...Microsoft.Bing.Platform.Inferences.*:写入 * 是为了将数据写入对象 BOSS.*:SocialGraph.BOSS.Reques...Inference System error..Microsoft.Bing.Platform.Inferences.*: Write * to write to the Object BOSS.*: SocialGraph.BOSS.Reques...
1010 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
1010 推理系统错误...Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...Inference System error..Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...
33 InferenceHostService 调用失败...System.ServiceModel.*:对象 System.ServiceModel.Channels.*+*(针对 **)为 *...,位于 Syst...InferenceHostService call failed..System.ServiceModel.*: The object, System.ServiceModel.Channels.*+*, for ** is the *... at Syst...

现在,你已经很好地了解了导致所检测到的异常的最常见错误。Now you have a good view into the top errors that contributed to the detected anomalies.

若要了解这些错误在整个示例系统中的影响,请注意以下事项:To understand the impact of these errors across the sample system:

  • “Logs”表包含其他维度数据,例如“Component”、“Cluster”,等等。The 'Logs' table contains additional dimensional data such as 'Component', 'Cluster', and so on.
  • 新的“autocluster”插件可以使用简单的查询来帮助获取该见解。The new 'autocluster' plugin can help derive that insight with a simple query.
  • 在下面的示例中,可以清楚地看到,最常见的四个错误中的每一个都特定于一个组件。In the example below, you can clearly see that each of the top four errors is specific to a component. 此外,尽管最常见的三个错误特定于 DB4 群集,但第四个错误在所有群集中都发生。Also, while the top three errors are specific to DB4 cluster, the fourth one happens across all clusters.
Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| evaluate autocluster()
计数Count 百分号 (%)Percent (%) 组件Component 群集Cluster MessageMessage
71257125 26.6426.64 InferenceHostServiceInferenceHostService DB4DB4 ...方法的 ExecuteAlgorithmMethod...ExecuteAlgorithmMethod for method ....
71257125 26.6426.64 未知组件Unknown Component DB4DB4 InferenceHostService 调用失败...InferenceHostService call failed....
71247124 26.6426.64 InferenceAlgorithmExecutorInferenceAlgorithmExecutor DB4DB4 推理系统意外错误...Unexpected Inference System error...
51125112 19.1119.11 InferenceAlgorithmExecutorInferenceAlgorithmExecutor * 推理系统意外错误...Unexpected Inference System error...

将值从一个集映射到另一个集Map values from one set to another

常见的用例是对值进行静态映射,这样可以使结果更易于演示。A common use case is static mapping of values, which can help in make results more presentable. 有关示例,请查看下表。For example, consider the next table. DeviceModel 指定设备的型号,这不是一种很方便的引用设备名称的形式。DeviceModel specifies a model of the device, which is not a very convenient form of referencing the device name.

DeviceModelDeviceModel 计数Count
iPhone5,1iPhone5,1 3232
iPhone3,2iPhone3,2 432432
iPhone7,2iPhone7,2 5555
iPhone5,2iPhone5,2 6666

下面是一个更好的表示形式。The following is a better representation.

FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

下面的两种方法演示了如何实现此表示形式。The two approaches below demonstrate how the representation can be achieved.

使用动态字典进行映射Mapping using dynamic dictionary

此方法展示了使用动态字典和动态访问器进行映射。The approach shows mapping with a dynamic dictionary and dynamic accessors.

// Data set definition
let Source = datatable(DeviceModel:string, Count:long)
[
  'iPhone5,1', 32,
  'iPhone3,2', 432,
  'iPhone7,2', 55,
  'iPhone5,2', 66,
];
// Query start here
let phone_mapping = dynamic(
  {
    "iPhone5,1" : "iPhone 5",
    "iPhone3,2" : "iPhone 4",
    "iPhone7,2" : "iPhone 6",
    "iPhone5,2" : "iPhone5"
  });
Source 
| project FriendlyName = phone_mapping[DeviceModel], Count
FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

使用静态表进行映射Map using static table

此方法展示了使用持久的表和联接运算符进行映射。The approach shows mapping with a persistent table and join operator.

创建映射表(只需创建一次)。Create the mapping table, just once.

.create table Devices (DeviceModel: string, FriendlyName: string) 

.ingest inline into table Devices 
    ["iPhone5,1","iPhone 5"]["iPhone3,2","iPhone 4"]["iPhone7,2","iPhone 6"]["iPhone5,2","iPhone5"]

现在,Devices 表的内容如下。Content of Devices now.

DeviceModelDeviceModel FriendlyNameFriendlyName
iPhone5,1iPhone5,1 iPhone 5iPhone 5
iPhone3,2iPhone3,2 iPhone 4iPhone 4
iPhone7,2iPhone7,2 iPhone 6iPhone 6
iPhone5,2iPhone5,2 iPhone5iPhone5

使用同样的方法创建一个测试表源。Use the same trick for creating a test table source.

.create table Source (DeviceModel: string, Count: int)

.ingest inline into table Source ["iPhone5,1",32]["iPhone3,2",432]["iPhone7,2",55]["iPhone5,2",66]

Join 和 project。Join and project.

Devices  
| join (Source) on DeviceModel  
| project FriendlyName, Count

结果:Result:

FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

创建和使用查询时间维度表Create and use query-time dimension tables

你经常需要将查询的结果与未存储在数据库中的某些临时维度表联接起来。You will often want to join the results of a query with some ad-hoc dimension table that is not stored in the database. 可以定义一个表达式,使其结果是限定于单个查询的一个表。It's possible to define an expression whose result is a table scoped to a single query. 例如:For example:

// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string)
  [
    "Heavy Rain", "HR",
    "Tornado",    "T"
  ]
;
DimTable
| join StormEvents on EventType
| summarize count() by Code

下面是一个稍微复杂一些的示例。Here's a slightly more complex example.

// Create a query-time dimension table using datatable
let TeamFoundationJobResult = datatable(Result:int, ResultString:string)
  [
    -1, 'None', 0, 'Succeeded', 1, 'PartiallySucceeded', 2, 'Failed',
    3, 'Stopped', 4, 'Killed', 5, 'Blocked', 6, 'ExtensionNotFound',
    7, 'Inactive', 8, 'Disabled', 9, 'JobInitializationError'
  ]
;
JobHistory
  | where PreciseTimeStamp > ago(1h)
  | where Service  != "AX"
  | where Plugin has "Analytics"
  | sort by PreciseTimeStamp desc
  | join kind=leftouter TeamFoundationJobResult on Result
  | extend ExecutionTimeSpan = totimespan(ExecutionTime)
  | project JobName, StartTime, ExecutionTimeSpan, ResultString, ResultMessage

检索每个标识的最新记录(按时间戳)Retrieve the latest records (by timestamp) per identity

假设你有一个表,其中包含:Suppose you have a table that includes:

  • 一个 ID 列(例如用户 ID 或节点 ID),标识与每行关联的实体an ID column that identifies the entity with which each row is associated, such as a User ID or a Node ID
  • 一个 timestamp 列,为行提供时间参考a timestamp column that provides the time reference for the row
  • 其他列other columns

如果要求查询针对 ID 列的每个值返回最新的两个记录(在这里,“最新”是指“其 timestamp 值最高”),则可使用 top-nested 运算符创建该查询。A query that returns the latest two records for each value of the ID column, where "latest" is defined as "having the highest value of timestamp" can be made with the top-nested operator.

例如:For example:

datatable(id:string, timestamp:datetime, bla:string)           // #1
  [
  "Barak",  datetime(2015-01-01), "1",
  "Barak",  datetime(2016-01-01), "2",
  "Barak",  datetime(2017-01-20), "3",
  "Donald", datetime(2017-01-20), "4",
  "Donald", datetime(2017-01-18), "5",
  "Donald", datetime(2017-01-19), "6"
  ]
| top-nested   of id        by dummy0=max(1),                  // #2
  top-nested 2 of timestamp by dummy1=max(timestamp),          // #3
  top-nested   of bla       by dummy2=max(1)                   // #4
| project-away dummy0, dummy1, dummy2                          // #5

注意:下面的编号指的是代码示例中的编号(在最右侧)。Notes Numbering below refers to numbers in the code sample, far right.

  1. datatable 是生成用于演示的一些测试数据的方法。The datatable is a way to produce some test data for demonstration purposes. 通常,在这里将使用实际数据。Normally, you'd use real data here.
  2. 此行实质上表示“返回 id 的所有非重复值”。This line essentially means "return all distinct values of id".
  3. 然后,对于通过 max 函数选取的前两个记录,此行会返回以下项:This line then returns, for the top two records that maximize:
    • timestampthe timestamp column
    • 上一级别的列(此处只有一个列:idthe columns of the previous level (here, just id)
    • 在此级别指定的列(此处为 timestampthe column specified at this level (here, timestamp)
  4. 此行将为上一级别返回的每个记录添加 bla 列的值。This line adds the values of the bla column for each of the records returned by the previous level. 如果表中存在你感兴趣的其他列,则可针对每个这样的列重复执行此行。If the table has other columns of interest, you can repeat this line for every such column.
  5. 最后的行使用 project-away 运算符删除通过 top-nested 引入的“额外”列。This final line uses the project-away operator to remove the "extra" columns introduced by top-nested.

通过某项“占总数的百分比”计算扩展一个表Extend a table with some percent-of-total calculation

包含数值列的表格表达式如果采用“占总数的百分比”作为值,则对用户更有用。A tabular expression that includes a numeric column, is more useful to the user when it is accompanied, alongside, with its value as a percentage of the total. 例如,假设有一个查询生成了下表:For example, assume that there is a query that produces the following table:

SomeSeriesSomeSeries SomeIntSomeInt
FooFoo 100100
条形图​​Bar 200200

如果要将该表显示为:If you want to display this table as:

SomeSeriesSomeSeries SomeIntSomeInt PctPct
FooFoo 100100 33.333.3
条形图​​Bar 200200 66.666.6

则需要计算 SomeInt 列的总计(总和),然后将此列的每个值除以总计。Then you need to calculate the total (sum) of the SomeInt column, and then divide each value of this column by the total. 对任意结果使用 as 运算符For arbitrary results use the as operator.

// The following table literal represents a long calculation
// that ends up with an anonymous tabular value:
datatable (SomeInt:int, SomeSeries:string) [
  100, "Foo",
  200, "Bar",
]
// We now give this calculation a name ("X"):
| as X
// Having this name we can refer to it in the sub-expression
// "X | summarize sum(SomeInt)":
| extend Pct = 100 * bin(todouble(SomeInt) / toscalar(X | summarize sum(SomeInt)), 0.001)

对滑动窗口执行聚合Perform aggregations over a sliding window

以下示例展示了如何使用滑动窗口来汇总列。The following example shows how to summarize columns using a sliding window. 请使用下表,其中包含按时间戳列出的水果价格。Use the table below, which contains prices of fruits by timestamps. 使用为期七天的滑动窗口计算每天每种水果的最小成本、最大成本和总成本。Calculate the min, max, and sum costs of each fruit per day, using a sliding window of seven days. 结果集中的每条记录都是对前七天的聚合,分析期间的每一天在结果中都有一条对应的记录。Each record in the result set aggregates the previous seven days, and the result contains a record per day in the analysis period.

fruits 表:The fruits table:

TimestampTimestamp 水果Fruit 价格Price
2018-09-24 21:00:00.00000002018-09-24 21:00:00.0000000 BananasBananas 33
2018-09-25 20:00:00.00000002018-09-25 20:00:00.0000000 苹果Apples 99
2018-09-26 03:00:00.00000002018-09-26 03:00:00.0000000 BananasBananas 44
2018-09-27 10:00:00.00000002018-09-27 10:00:00.0000000 PlumsPlums 88
2018-09-28 07:00:00.00000002018-09-28 07:00:00.0000000 BananasBananas 66
2018-09-29 21:00:00.00000002018-09-29 21:00:00.0000000 BananasBananas 88
2018-09-30 01:00:00.00000002018-09-30 01:00:00.0000000 PlumsPlums 22
2018-10-01 05:00:00.00000002018-10-01 05:00:00.0000000 BananasBananas 00
2018-10-02 02:00:00.00000002018-10-02 02:00:00.0000000 BananasBananas 00
2018-10-03 13:00:00.00000002018-10-03 13:00:00.0000000 PlumsPlums 44
2018-10-04 14:00:00.00000002018-10-04 14:00:00.0000000 苹果Apples 88
2018-10-05 05:00:00.00000002018-10-05 05:00:00.0000000 BananasBananas 22
2018-10-06 08:00:00.00000002018-10-06 08:00:00.0000000 PlumsPlums 88
2018-10-07 12:00:00.00000002018-10-07 12:00:00.0000000 BananasBananas 00

滑动窗口聚合查询。The sliding window aggregation query. 查询结果后面是说明:An explanation follows the query results:

let _start = datetime(2018-09-24);
let _end = _start + 13d; 
Fruits 
| extend _bin = bin_at(Timestamp, 1d, _start) // #1 
| extend _endRange = iif(_bin + 7d > _end, _end, 
                            iff( _bin + 7d - 1d < _start, _start,
                                iff( _bin + 7d - 1d < _bin, _bin,  _bin + 7d - 1d)))  // #2
| extend _range = range(_bin, _endRange, 1d) // #3
| mv-expand _range to typeof(datetime) limit 1000000 // #4
| summarize min(Price), max(Price), sum(Price) by Timestamp=bin_at(_range, 1d, _start) ,  Fruit // #5
| where Timestamp >= _start + 7d; // #6

TimestampTimestamp 水果Fruit min_Pricemin_Price max_Pricemax_Price sum_Pricesum_Price
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 苹果Apples 99 99 99
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 BananasBananas 00 88 1818
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 PlumsPlums 22 88 1010
2018-10-02 00:00:00.00000002018-10-02 00:00:00.0000000 BananasBananas 00 88 1818
2018-10-02 00:00:00.00000002018-10-02 00:00:00.0000000 PlumsPlums 22 88 1010
2018-10-03 00:00:00.00000002018-10-03 00:00:00.0000000 PlumsPlums 22 88 1414
2018-10-03 00:00:00.00000002018-10-03 00:00:00.0000000 BananasBananas 00 88 1414
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 BananasBananas 00 88 1414
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 PlumsPlums 22 44 66
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 苹果Apples 88 88 88
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 BananasBananas 00 88 1010
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 PlumsPlums 22 44 66
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 苹果Apples 88 88 88
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 PlumsPlums 22 88 1414
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 BananasBananas 00 22 22
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 苹果Apples 88 88 88
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 BananasBananas 00 22 22
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 PlumsPlums 44 88 1212
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 苹果Apples 88 88 88

查询详细信息:Query details:

查询会在输入表中的每条记录实际出现后的七天内对其进行“延伸”(复制)。The query "stretches" (duplicates) each record in the input table throughout the seven days after its actual appearance. 每条记录实际出现 7 次。Each record actually appears seven times. 因此,每日聚合包含之前七天的所有记录。As a result, the daily aggregation includes all records of the previous seven days.

下面的分步说明编号指的是代码示例中的编号(在最右侧):Step-by-step explanation Numbering below refers to numbers in the code sample, far right:

  1. 将每条记录分箱到某一天(相对于 _start)。Bin each record to one day (relative to _start).
  2. 确定每条记录的范围的结束时间 - _bin + 7d,除非这超出了 (start, end) 范围,如果超出,会对其进行调整。Determine the end of the range per record - _bin + 7d, unless this is out of the (start, end) range, in which case it is adjusted.
  3. 对于每条记录,创建一个 7 天(时间戳)的数组,从当前记录的日期开始。For each record, create an array of seven days (timestamps), starting at the current record's day.
  4. 对数组执行 mv-expand 操作,以便将每条记录复制成 7 条记录,彼此间隔 1 天。mv-expand the array, thus duplicating each record to seven records, 1 day apart from each other.
  5. 针对每天执行聚合函数。Perform the aggregation function for each day. 由于 #4,这实际上汇总了过去七天的数据。Due to #4, this actually summarizes the past seven days.
  6. 第一个七天的数据不完整。The data for the first seven days is incomplete. 第一个七天没有 7 天的回溯期。There's no 7d lookback period for the first seven days. 第一个七天会从最终结果中排除。The first seven days are excluded from the final result. 在示例中,它们仅参与 2018-10-01 的聚合。In the example, they only participate in the aggregation for 2018-10-01.

查找之前的事件Find preceding event

下一示例演示了如何在 2 个数据集中查找之前的事件。The next example demonstrates how to find a preceding event between 2 data sets.

目的:有两个数据集:A 和 B。对于 B 中的每条记录,请在 A 中查找其之前的事件(即,A 中比 B 要“早”的 arg_max 记录)。Purpose: : There are two data sets, A and B. For each record in B find its preceding event in A (that is, the arg_max record in A that is still “older” than B). 下面是使用以下示例数据集时的预期输出。Below is the expected output for the following sample data sets.

let A = datatable(Timestamp:datetime, ID:string, EventA:string)
[
    datetime(2019-01-01 00:00:00), "x", "Ax1",
    datetime(2019-01-01 00:00:01), "x", "Ax2",
    datetime(2019-01-01 00:00:02), "y", "Ay1",
    datetime(2019-01-01 00:00:05), "y", "Ay2",
    datetime(2019-01-01 00:00:00), "z", "Az1"
];
let B = datatable(Timestamp:datetime, ID:string, EventB:string)
[
    datetime(2019-01-01 00:00:03), "x", "B",
    datetime(2019-01-01 00:00:04), "x", "B",
    datetime(2019-01-01 00:00:04), "y", "B",
    datetime(2019-01-01 00:02:00), "z", "B"
];
A; B
TimestampTimestamp IDID EventBEventB
2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 xx Ax1Ax1
2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 zz Az1Az1
2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 xx Ax2Ax2
2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 yy Ay1Ay1
2019-01-01 00:00:05.00000002019-01-01 00:00:05.0000000 yy Ay2Ay2

TimestampTimestamp IDID EventAEventA
2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 xx BB
2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 xx BB
2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 yy BB
2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 zz BB

预期输出:Expected output:

IDID TimestampTimestamp EventBEventB A_TimestampA_Timestamp EventAEventA
xx 2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 BB 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 Ax2Ax2
xx 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 BB 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 Ax2Ax2
yy 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 BB 2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 Ay1Ay1
zz 2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 BB 2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 Az1Az1

对于此问题,可以采用两种不同的建议方法。There are two different approaches suggested for this problem. 你应该基于你的特定数据集来测试这两种方法,找到最适合你的方法。You should test both on your specific data set, to find the one most suitable for you.

备注

每种方法针对不同数据集的运行方式可能不同。Each method may run differently on different data sets.

建议 1Suggestion #1

此建议按 ID 和时间戳对两个数据集进行序列化,然后将所有 B 事件与其之前的所有 A 事件分组到一起,然后从组中的所有 A 中选取 arg_maxThis suggestion serializes both data sets by ID and timestamp, then groups all B events with all their preceding A events, and picks the arg_max out of all the As in the group.

A
| extend A_Timestamp = Timestamp, Kind="A"
| union (B | extend B_Timestamp = Timestamp, Kind="B")
| order by ID, Timestamp asc 
| extend t = iff(Kind == "A" and (prev(Kind) != "A" or prev(Id) != ID), 1, 0)
| extend t = row_cumsum(t)
| summarize Timestamp=make_list(Timestamp), EventB=make_list(EventB), arg_max(A_Timestamp, EventA) by t, ID
| mv-expand Timestamp to typeof(datetime), EventB to typeof(string)
| where isnotempty(EventB)
| project-away t

建议 2Suggestion #2

此建议需要一个最长回溯期(A 中的记录可能比 B 中的记录“早”多长时间)。满足该要求后,此方法会根据 ID 和此回溯期联接这两个数据集。This suggestion requires a max-lookback-period (how much “older” the record in A may be, when compared to B. The method then joins the two data sets on ID and this lookback period. 联接会生成所有可能的候选项(在回溯期内早于 B 的所有 A 记录),然后通过 arg_min(TimestampB – TimestampA) 筛选在时间上最靠近 B 的项。The join produces all possible candidates, all A records which are older than B and within the lookback period, and then the closest one to B is filtered by arg_min(TimestampB – TimestampA). 回溯期越短,查询结果就会越好。The shorter the lookback period is, the better the query results will be.

在下面的示例中,回溯期设置为 1m,ID 为“z”的记录没有相应的“A”事件,因为它的“A”早 2m。In the example below, the lookback period is set to 1m, and the record with ID 'z' does not have a corresponding 'A' event, since its 'A' is older by 2m.

let _maxLookbackPeriod = 1m;  
let _internalWindowBin = _maxLookbackPeriod / 2;
let B_events = B 
    | extend ID = new_guid()
    | extend _time = bin(Timestamp, _internalWindowBin)
    | extend _range = range(_time - _internalWindowBin, _time + _maxLookbackPeriod, _internalWindowBin) 
    | mv-expand _range to typeof(datetime) 
    | extend B_Timestamp = Timestamp, _range;
let A_events = A 
    | extend _time = bin(Timestamp, _internalWindowBin)
    | extend _range = range(_time - _internalWindowBin, _time + _maxLookbackPeriod, _internalWindowBin) 
    | mv-expand _range to typeof(datetime) 
    | extend A_Timestamp = Timestamp, _range;
B_events
    | join kind=leftouter (
        A_events
) on ID, _range
| where isnull(A_Timestamp) or (A_Timestamp <= B_Timestamp and B_Timestamp <= A_Timestamp + _maxLookbackPeriod)
| extend diff = coalesce(B_Timestamp - A_Timestamp, _maxLookbackPeriod*2)
| summarize arg_min(diff, *) by ID
| project ID, B_Timestamp, A_Timestamp, EventB, EventA
IDId B_TimestampB_Timestamp A_TimestampA_Timestamp EventBEventB EventAEventA
xx 2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 BB Ax2Ax2
xx 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 BB Ax2Ax2
yy 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 BB Ay1Ay1
zz 2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 BB