Azure 数据资源管理器和 Azure Monitor 的查询示例Samples for queries for Azure Data Explorer and Azure Monitor

本文介绍 Azure 数据资源管理器中的常见查询需求,以及如何使用 Kusto 查询语言来满足这些需求。This article identifies common query needs in Azure Data Explorer and how you can use the Kusto Query Language to meet them.

显示柱形图Display a column chart

若要投影两列或多列,然后将这些列用作图表的 x 轴和 y 轴,请执行以下步骤:To project two or more columns, and then use the columns as the x-axis and y-axis of a chart:

StormEvents
| where isnotempty(EndLocation) 
| summarize event_count=count() by EndLocation
| top 10 by event_count
| render columnchart
  • 第一列形成 x 轴。The first column forms the x-axis. 它可以是数字、日期时间或字符串。It can be numeric, date-time, or string.
  • 使用 wheresummarizetop 以限制显示的数据量。Use where, summarize, and top to limit the volume of data you display.
  • 对结果进行排序以定义 x 轴的顺序。Sort the results to define the order of the x-axis.

柱形图的屏幕截图,其中包含十个着色列,分别说明 10 个位置的值。

从启动和停止事件获取会话Get sessions from start and stop events

在事件日志中,某些事件标记了扩展活动或会话的开始时间或结束时间。In a log of events, some events mark the start or end of an extended activity or session.

名称Name 城市City SessionIdSessionId TimestampTimestamp
开始Start LondonLondon 28173302817330 2015-12-09T10:12:02.322015-12-09T10:12:02.32
游戏Game LondonLondon 28173302817330 2015-12-09T10:12:52.452015-12-09T10:12:52.45
开始Start 曼彻斯特Manchester 42676674267667 2015-12-09T10:14:02.232015-12-09T10:14:02.23
停止Stop LondonLondon 28173302817330 2015-12-09T10:23:43.182015-12-09T10:23:43.18
取消Cancel 曼彻斯特Manchester 42676674267667 2015-12-09T10:27:26.292015-12-09T10:27:26.29
停止Stop 曼彻斯特Manchester 42676674267667 2015-12-09T10:28:31.722015-12-09T10:28:31.72

每个事件都有一个会话 ID (SessionId)。Every event has a session ID (SessionId). 质询是将开始事件和停止事件与会话 ID 匹配。The challenge is to match start and stop events with a session ID.

示例:Example:

let Events = MyLogTable | where ... ;

Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events 
        | where Name="Stop"
        | project StopTime=timestamp, SessionId) 
    on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime

若要将开始事件和停止事件与会话 ID 匹配,请执行以下操作:To match start and stop events with a session ID:

  1. 使用 let 命名表的投影,在开始联接之前尽可能缩减该表。Use let to name a projection of the table that's pared down as far as possible before starting the join.
  2. 使用 project 更改时间戳的名称,使开始时间和停止时间都出现在结果中。Use project to change the names of the timestamps so that both the start time and the stop time appear in the results. project 还可选择要在结果中查看的其他列。project also selects the other columns to view in the results.
  3. 使用 join 匹配同一活动的开始和停止项。Use join to match the start and stop entries for the same activity. 为每个活动创建一行。A row is created for each activity.
  4. 再次使用 project 添加列以显示活动的持续时间。Use project again to add a column to show the duration of the activity.

输出如下:Here's the output:

城市City SessionIdSessionId StartTimeStartTime StopTimeStopTime 持续时间Duration
LondonLondon 28173302817330 2015-12-09T10:12:02.322015-12-09T10:12:02.32 2015-12-09T10:23:43.182015-12-09T10:23:43.18 00:11:40.4600:11:40.46
曼彻斯特Manchester 42676674267667 2015-12-09T10:14:02.232015-12-09T10:14:02.23 2015-12-09T10:28:31.722015-12-09T10:28:31.72 00:14:29.4900:14:29.49

在不使用会话 ID 的情况下获取会话Get sessions without using a session ID

假设不是那么方便,启动和停止事件没有可以用来进行匹配的会话 ID。Suppose that the start and stop events don't conveniently have a session ID that we can match with. 但是,我们确实具有发生会话的客户端 IP 地址。But, we do have the IP address of the client in which the session took place. 假设每个客户端地址一次只执行一个会话,我们可以将每个开始事件与同一 IP 地址的下一个停止事件匹配:Assuming each client address conducts only one session at a time, we can match each start event to the next stop event from the same IP address:

示例:Example:

Events 
| where Name == "Start" 
| project City, ClientIp, StartTime = timestamp
| join  kind=inner
    (Events
    | where Name == "Stop" 
    | project StopTime = timestamp, ClientIp)
    on ClientIp
| extend duration = StopTime - StartTime 
    // Remove matches with earlier stops:
| where  duration > 0  
    // Pick out the earliest stop for each start and client:
| summarize arg_min(duration, *) by bin(StartTime,1s), ClientIp

join 将每个开始时间与来自同一客户端 IP 地址的所有停止时间相匹配。The join matches every start time with all the stop times from the same client IP address. 示例代码:The sample code:

  • 删除具有较早停止时间的匹配项。Removes matches with earlier stop times.
  • 按开始时间和 IP 地址分组,使每个会话都有一个组。Groups by start time and IP address to get a group for each session.
  • StartTime 参数提供 bin 函数。Supplies a bin function for the StartTime parameter. 如果不执行此步骤,Kusto 会自动使用一小时箱将一些开始时间与错误的停止时间相匹配。If you don't do this step, Kusto automatically uses one-hour bins that match some start times with the wrong stop times.

arg_min 查找每个组中持续时间最短的行,并且 * 参数将传递所有其他列。arg_min finds the row with the smallest duration in each group, and the * parameter passes through all the other columns.

该参数在每个列名前加 min_ 前缀。The argument prefixes min_ to each column name.

列出结果的表的屏幕截图,其中显示的列有开始时间、客户端 IP、持续时间、城市以及每个客户端/开始时间组合的最早停止时间。

添加代码以计算大小适宜的箱中的持续时间。在此示例中,由于首选条形图,请除以 1s 以将时间跨度转换为数字:Add code to count the durations in conveniently sized bins. In this example, because of a preference for a bar chart, divide by 1s to convert the timespans to numbers:

    // Count the frequency of each duration:
    | summarize count() by duration=bin(min_duration/1s, 10) 
      // Cut off the long tail:
    | where duration < 300
      // Display in a bar chart:
    | sort by duration asc | render barchart 

柱形图的屏幕截图,其中描述了会话持续时间在指定范围内的会话数。

完整示例Full example

Logs  
| filter ActivityId == "ActivityId with Blablabla" 
| summarize max(Timestamp), min(Timestamp)  
| extend Duration = max_Timestamp - min_Timestamp 

wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"      
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)  
| extend TotalLaunchedMaps = extract("totalLaunchedMaps=([^,]+),", 1, EventText, typeof(real))  
| extend MapsSeconds = extract("mapsMilliseconds=([^,]+),", 1, EventText, typeof(real)) / 1000 
| extend TotalMapsSeconds = MapsSeconds  / TotalLaunchedMaps 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2'  
| filter TotalLaunchedMaps > 0 
| summarize sum(TotalMapsSeconds) by UnitOfWorkId  
| extend JobMapsSeconds = sum_TotalMapsSeconds * 1 
| project UnitOfWorkId, JobMapsSeconds 
| join ( 
wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"  
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)   
| extend TotalLaunchedReducers = extract("totalLaunchedReducers=([^,]+),", 1, EventText, typeof(real)) 
| extend ReducesSeconds = extract("reducesMilliseconds=([^,]+)", 1, EventText, typeof(real)) / 1000 
| extend TotalReducesSeconds = ReducesSeconds / TotalLaunchedReducers 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2'  
| filter TotalLaunchedReducers > 0 
| summarize sum(TotalReducesSeconds) by UnitOfWorkId  
| extend JobReducesSeconds = sum_TotalReducesSeconds * 1 
| project UnitOfWorkId, JobReducesSeconds ) 
on UnitOfWorkId 
| join ( 
wabitrace  
| filter Timestamp >= datetime(2015-01-12 11:00:00Z)  
| filter Timestamp < datetime(2015-01-12 13:00:00Z)  
| filter EventText like "NotifyHadoopApplicationJobPerformanceCounters"  
| extend Tenant = extract("tenantName=([^,]+),", 1, EventText) 
| extend Environment = extract("environmentName=([^,]+),", 1, EventText)  
| extend JobName = extract("jobName=([^,]+),", 1, EventText)  
| extend StepName = extract("stepName=([^,]+),", 1, EventText)  
| extend UnitOfWorkId = extract("unitOfWorkId=([^,]+),", 1, EventText)  
| extend LaunchTime = extract("launchTime=([^,]+),", 1, EventText, typeof(datetime))  
| extend FinishTime = extract("finishTime=([^,]+),", 1, EventText, typeof(datetime)) 
| extend TotalLaunchedMaps = extract("totalLaunchedMaps=([^,]+),", 1, EventText, typeof(real))  
| extend TotalLaunchedReducers = extract("totalLaunchedReducers=([^,]+),", 1, EventText, typeof(real)) 
| extend MapsSeconds = extract("mapsMilliseconds=([^,]+),", 1, EventText, typeof(real)) / 1000 
| extend ReducesSeconds = extract("reducesMilliseconds=([^,]+)", 1, EventText, typeof(real)) / 1000 
| extend TotalMapsSeconds = MapsSeconds  / TotalLaunchedMaps  
| extend TotalReducesSeconds = (ReducesSeconds / TotalLaunchedReducers / ReducesSeconds) * ReducesSeconds  
| extend CalculatedDuration = (TotalMapsSeconds + TotalReducesSeconds) * time(1s) 
| filter Tenant == 'DevDiv' and Environment == 'RollupDev2') 
on UnitOfWorkId 
| extend MapsFactor = TotalMapsSeconds / JobMapsSeconds 
| extend ReducesFactor = TotalReducesSeconds / JobReducesSeconds 
| extend CurrentLoad = 1536 + (768 * TotalLaunchedMaps) + (1536 * TotalLaunchedMaps) 
| extend NormalizedLoad = 1536 + (768 * TotalLaunchedMaps * MapsFactor) + (1536 * TotalLaunchedMaps * ReducesFactor) 
| summarize sum(CurrentLoad), sum(NormalizedLoad) by  JobName  
| extend SaveFactor = sum_NormalizedLoad / sum_CurrentLoad 

为一段时间内的并发会话绘制图表Chart concurrent sessions over time

假设你有一个活动表,其中包含活动的开始时间和结束时间。Suppose you have a table of activities and their start and end times. 你可显示一个图表,用于显示一段时间内同时运行的活动数。You can show a chart that displays how many activities run concurrently over time.

下面是一个示例输入,称为 XHere's a sample input, called X:

SessionIdSessionId StartTimeStartTime StopTimeStopTime
aa 10:01:0310:01:03 10:10:0810:10:08
bb 10:01:2910:01:29 10:03:1010:03:10
cc 10:03:0210:03:02 10:05:2010:05:20

对于使用一分钟箱的图表,建议按 1 分钟间隔对每个正在运行的活动进行计数。For a chart in one-minute bins, you want to count each running activity at each one-minute interval.

下面是一个中间结果:Here's an intermediate result:

X | extend samples = range(bin(StartTime, 1m), StopTime, 1m)

range 按指定间隔生成一组值:range generates an array of values at the specified intervals:

SessionIdSessionId StartTimeStartTime StopTimeStopTime 示例samples
aa 10:01:3310:01:33 10:06:3110:06:31 [10:01:00,10:02:00,...10:06:00][10:01:00,10:02:00,...10:06:00]
bb 10:02:2910:02:29 10:03:4510:03:45 [10:02:00,10:03:00][10:02:00,10:03:00]
cc 10:03:1210:03:12 10:04:3010:04:30 [10:03:00,10:04:00][10:03:00,10:04:00]

使用 mv-expand 展开这些数组,而不是保留它们:Instead of keeping those arrays, expand them by using mv-expand:

X | mv-expand samples = range(bin(StartTime, 1m), StopTime , 1m)
SessionIdSessionId StartTimeStartTime StopTimeStopTime 示例samples
aa 10:01:3310:01:33 10:06:3110:06:31 10:01:0010:01:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:02:0010:02:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:03:0010:03:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:04:0010:04:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:05:0010:05:00
aa 10:01:3310:01:33 10:06:3110:06:31 10:06:0010:06:00
bb 10:02:2910:02:29 10:03:4510:03:45 10:02:0010:02:00
bb 10:02:2910:02:29 10:03:4510:03:45 10:03:0010:03:00
cc 10:03:1210:03:12 10:04:3010:04:30 10:03:0010:03:00
cc 10:03:1210:03:12 10:04:3010:04:30 10:04:0010:04:00

现在,按采样时间对结果进行分组,并计算每个活动的发生次数:Now, group the results by sample time and count the occurrences of each activity:

X
| mv-expand samples = range(bin(StartTime, 1m), StopTime , 1m)
| summarize count_SessionId = count() by bin(todatetime(samples),1m)
  • 使用 todatetime() 是因为 mv-expand 会生成动态类型的列。Use todatetime() because mv-expand results in a column of dynamic type.
  • 使用 bin() 是因为对于数值和日期,如果未提供时间间隔,summarize 始终使用默认时间间隔应用 bin() 函数。Use bin() because, for numeric values and dates, if you don't supply an interval, summarize always applies a bin() function by using a default interval.

输出如下:Here's the output:

count_SessionIdcount_SessionId 示例samples
11 10:01:0010:01:00
22 10:02:0010:02:00
33 10:03:0010:03:00
22 10:04:0010:04:00
11 10:05:0010:05:00
11 10:06:0010:06:00

可使用条形图或时间表来呈现结果。You can use a bar chart or timechart to render the results.

将空箱引入 summarizeIntroduce null bins into summarize

当对包含日期时间列的组键应用 summarize 运算符时,请将这些值放入固定宽度的箱中:When the summarize operator is applied over a group key that consists of a date-time column, bin those values to fixed-width bins:

let StartTime=ago(12h);
let StopTime=now()
T
| where Timestamp > StartTime and Timestamp <= StopTime 
| where ...
| summarize Count=count() by bin(Timestamp, 5m)

此示例生成一个表,其中 T 中的每组行都有一行,这些行用的是五分钟箱。This example produces a table that has a single row per group of rows in T that fall into each bin of five minutes.

该段代码不会添加“空箱”,即在StartTimeStopTime 之间不会为时间箱值添加空箱行,因为 T 中没有对应行。What the code doesn't do is add "null bins"—rows for time bin values between StartTime and StopTime for which there's no corresponding row in T. 建议使用这些箱来“填充”表。以下是这样操作的一种方法:It's a good idea to "pad" the table with those bins. Here's one way to do it:

let StartTime=ago(12h);
let StopTime=now()
T
| where Timestamp > StartTime and Timestamp <= StopTime 
| summarize Count=count() by bin(Timestamp, 5m)
| where ...
| union ( // 1
  range x from 1 to 1 step 1 // 2
  | mv-expand Timestamp=range(StartTime, StopTime, 5m) to typeof(datetime) // 3
  | extend Count=0 // 4
  )
| summarize Count=sum(Count) by bin(Timestamp, 5m) // 5 

下面是上述查询的分步说明:Here's a step-by-step explanation of the preceding query:

  1. 使用 union 运算符向表添加更多行。Use the union operator to add more rows to a table. 这些行由 union 表达式生成。Those rows are produced by the union expression.
  2. range 运算符将生成只有一行一列的表。The range operator produces a table that has a single row and column. 除用于 mv-expand 外,该表不用于任何其他用途。The table isn't used for anything other than for mv-expand to work on.
  3. range 上的 mv-expand 运算符创建的行数与 StartTimeEndTime 之间的五分钟箱所具有的行数相等。The mv-expand operator over the range function creates as many rows as there are five-minute bins between StartTime and EndTime.
  4. 使用一个值为 0CountUse a Count of 0.
  5. summarize 运算符将 union 的原始(左侧或外部)参数中的箱分组到一起。The summarize operator groups together bins from the original (left, or outer) argument to union. 此运算符还会根据它的内部参数进行分箱(空箱行)。The operator also bins from the inner argument to it (the null bin rows). 此过程可确保输出中每个箱都有一行,其值为零或为原始计数。This process ensures that the output has one row per bin whose value is either zero or the original count.

结合使用机器学习与 Kusto,从数据中获取更多信息Get more from your data by using Kusto with machine learning

许多有趣的用例使用机器学习算法,并从遥测数据中获得有趣的见解。Many interesting use cases use machine learning algorithms and derive interesting insights from telemetry data. 通常,这些算法需要一个结构严谨的数据集作为输入。Often, these algorithms require a strictly structured dataset as their input. 原始日志数据通常与所需的结构和大小不匹配。The raw log data usually doesn't match the required structure and size.

首先查找特定必应推理服务的错误率中的异常。Start by looking for anomalies in the error rate of a specific Bing inferences service. 日志表有 650 亿条记录。The logs table has 65 billion records. 以下基本查询筛选了 250,000 个错误,然后对使用异常情况检测函数 series_decompose_anomalies 的错误计数创建了时序。The following basic query filters 250,000 errors, and then creates a time series of error count that uses the anomaly detection function series_decompose_anomalies. 这些异常是由 Kusto 服务检测到的,并在时序图上以红点突出显示。The anomalies are detected by the Kusto service and are highlighted as red dots on the time series chart.

Logs
| where Timestamp >= datetime(2015-08-22) and Timestamp < datetime(2015-08-23) 
| where Level == "e" and Service == "Inferences.UnusualEvents_Main" 
| summarize count() by bin(Timestamp, 5min)
| render anomalychart 

该服务识别了几个具有可疑错误率的时间存储桶。The service identified few time buckets that had suspicious error rates. 使用 Kusto 放大此时间范围。Use Kusto to zoom into this timeframe. 然后运行在 Message 列上聚合的查询。Then, run a query that aggregates on the Message column. 尝试找出最常见的错误。Try to find the top errors.

消息的整个堆栈跟踪的相关部分将被截断,因此结果在页面上的显示效果更好。The relevant parts of the entire stack trace of the message are trimmed out, so the results fit better on the page.

你可以看到成功识别出了前八个错误。You can see successful identification of the top eight errors. 但是,后面是一长串错误,因为错误消息是使用包含更改数据的格式字符串创建的:However, next is a long series of errors, because the error message was created by using a format string that contained changing data:

Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| summarize count() by Message 
| top 10 by count_ 
| project count_, Message 
count_count_ MessageMessage
71257125 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
71257125 InferenceHostService 调用失败...System.NullReferenceException:对象引用未设置为某个对象的实例...InferenceHostService call failed..System.NullReferenceException: Object reference not set to an instance of an object...
71247124 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object...
51125112 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object..
174174 InferenceHostService 调用失败...System.ServiceModel.CommunicationException:写入到管道时出错:...InferenceHostService call failed..System.ServiceModel.CommunicationException: There was an error writing to the pipe:...
1010 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
1010 推理系统错误...Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...Inference System error..Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...
33 InferenceHostService 调用失败...System.ServiceModel.CommunicationObjectFaultedException:...InferenceHostService call failed..System.ServiceModel.CommunicationObjectFaultedException:...
11 推理系统错误...SocialGraph.BOSS.OperationResponse...AIS TraceId:8292FC561AC64BED8FA243808FE74EFD...Inference System error... SocialGraph.BOSS.OperationResponse...AIS TraceId:8292FC561AC64BED8FA243808FE74EFD...
11 推理系统错误...SocialGraph.BOSS.OperationResponse...AIS TraceId:5F79F7587FF943EC9B641E02E701AFBF...Inference System error... SocialGraph.BOSS.OperationResponse...AIS TraceId: 5F79F7587FF943EC9B641E02E701AFBF...

此时,使用 reduce 运算符会有帮助。At this point, using the reduce operator helps. 该运算符在代码中识别出了 63 个不同错误,这些错误均源自同一跟踪检测点。The operator identified 63 different errors that originated at the same trace instrumentation point in the code. reduce 有助于将重点放在该时间范围内的其他有意义的错误跟踪。reduce helps focus on additional meaningful error traces in that time window.

Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| reduce by Message with threshold=0.35
| project Count, Pattern
计数Count 模式Pattern
71257125 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
71257125 InferenceHostService 调用失败...System.NullReferenceException:对象引用未设置为某个对象的实例...InferenceHostService call failed..System.NullReferenceException: Object reference not set to an instance of an object...
71247124 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object...
51125112 推理系统意外错误...System.NullReferenceException:对象引用未设置为某个对象的实例...Unexpected Inference System error..System.NullReferenceException: Object reference not set to an instance of an object...
174174 InferenceHostService 调用失败...System.ServiceModel.CommunicationException:写入到管道时出错:...InferenceHostService call failed..System.ServiceModel.CommunicationException: There was an error writing to the pipe:...
6363 推理系统错误...Microsoft.Bing.Platform.Inferences.*:写入 * 是为了将数据写入对象 BOSS.*:SocialGraph.BOSS.Reques...Inference System error..Microsoft.Bing.Platform.Inferences.*: Write * to write to the Object BOSS.*: SocialGraph.BOSS.Reques...
1010 'RunCycleFromInterimData' 方法的 ExecuteAlgorithmMethod 已失败...ExecuteAlgorithmMethod for method 'RunCycleFromInterimData' has failed...
1010 推理系统错误...Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...Inference System error..Microsoft.Bing.Platform.Inferences.Service.Managers.UserInterimDataManagerException:...
33 InferenceHostService 调用失败..System.ServiceModel.*:** 的对象 System.ServiceModel.Channels.*+* 是位于 Syst... 的 *...InferenceHostService call failed..System.ServiceModel.*: The object, System.ServiceModel.Channels.*+*, for ** is the *... at Syst...

现在,你清楚地了解了导致检测到的异常的前几个错误。Now, you have a good view into the top errors that contributed to the detected anomalies.

若要了解这些错误对示例系统的影响,请考虑以下事项:To understand the effect of these errors across the sample system, consider that:

  • Logs 表包含额外的维度数据,如 ComponentClusterThe Logs table contains additional dimensional data, like Component and Cluster.
  • 新的 autocluster 插件可使用简单的查询来帮助获得组件和群集见解。The new autocluster plugin can help derive component and cluster insight with a simple query.

在下面的示例中,你可清楚地看到前四个错误的每一个都特定于某个组件。In the following example, you can clearly see that each of the top four errors is specific to a component. 此外,尽管前三个错误特定于 DB4 群集,但第四个错误会在所有群集中发生。Also, although the top three errors are specific to the DB4 cluster, the fourth error happens across all clusters.

Logs
| where Timestamp >= datetime(2015-08-22 05:00) and Timestamp < datetime(2015-08-22 06:00)
| where Level == "e" and Service == "Inferences.UnusualEvents_Main"
| evaluate autocluster()
计数Count 百分比 (%)Percentage (%) 组件Component 群集Cluster MessageMessage
71257125 26.6426.64 InferenceHostServiceInferenceHostService DB4DB4 方法的 ExecuteAlgorithmMethod...ExecuteAlgorithmMethod for method...
71257125 26.6426.64 未知组件Unknown Component DB4DB4 InferenceHostService 调用失败…InferenceHostService call failed...
71247124 26.6426.64 InferenceAlgorithmExecutorInferenceAlgorithmExecutor DB4DB4 推理系统意外错误...Unexpected Inference System error...
51125112 19.1119.11 InferenceAlgorithmExecutorInferenceAlgorithmExecutor * 推理系统意外错误...Unexpected Inference System error...

将值从一个集映射到另一个集Map values from one set to another

常见的查询用例是值的静态映射。A common query use case is static mapping of values. 静态映射可使结果更加直观。Static mapping can help make results more presentable.

例如在下表中,DeviceModel 指定设备型号。For example, in the next table, DeviceModel specifies a device model. 使用设备型号不是引用设备名称的一种简便方式。Using the device model isn't a convenient form of referencing the device name.

DeviceModelDeviceModel 计数Count
iPhone5,1iPhone5,1 3232
iPhone3,2iPhone3,2 432432
iPhone7,2iPhone7,2 5555
iPhone5,2iPhone5,2 6666

使用易记名称更方便:Using a friendly name is more convenient:

FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

接下来的两个示例演示如何将标识设备的方法从使用设备型号更改为使用易记名称。The next two examples demonstrate how to change from using a device model to a friendly name to identify a device.

使用动态字典进行映射Map by using a dynamic dictionary

可使用动态字典和动态访问器来实现映射。You can achieve mapping by using a dynamic dictionary and dynamic accessors. 例如:For example:

// Dataset definition
let Source = datatable(DeviceModel:string, Count:long)
[
  'iPhone5,1', 32,
  'iPhone3,2', 432,
  'iPhone7,2', 55,
  'iPhone5,2', 66,
];
// Query start here
let phone_mapping = dynamic(
  {
    "iPhone5,1" : "iPhone 5",
    "iPhone3,2" : "iPhone 4",
    "iPhone7,2" : "iPhone 6",
    "iPhone5,2" : "iPhone5"
  });
Source 
| project FriendlyName = phone_mapping[DeviceModel], Count
FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

使用静态表进行映射Map by using a static table

还可使用永久性表和 join 运算符来实现映射。You also can achieve mapping by using a persistent table and a join operator.

  1. 仅创建一次映射表:Create the mapping table only once:

    .create table Devices (DeviceModel: string, FriendlyName: string) 
    
    .ingest inline into table Devices 
        ["iPhone5,1","iPhone 5"]["iPhone3,2","iPhone 4"]["iPhone7,2","iPhone 6"]["iPhone5,2","iPhone5"]
    
  2. 创建设备内容表:Create a table of the device contents:

    DeviceModelDeviceModel FriendlyNameFriendlyName
    iPhone5,1iPhone5,1 iPhone 5iPhone 5
    iPhone3,2iPhone3,2 iPhone 4iPhone 4
    iPhone7,2iPhone7,2 iPhone 6iPhone 6
    iPhone5,2iPhone5,2 iPhone5iPhone5
  3. 创建测试表源:Create a test table source:

    .create table Source (DeviceModel: string, Count: int)
    
    .ingest inline into table Source ["iPhone5,1",32]["iPhone3,2",432]["iPhone7,2",55]["iPhone5,2",66]
    
  4. 联接表并运行 project:Join the tables and run the project:

    Devices  
    | join (Source) on DeviceModel  
    | project FriendlyName, Count
    

输出如下:Here's the output:

FriendlyNameFriendlyName 计数Count
iPhone 5iPhone 5 3232
iPhone 4iPhone 4 432432
iPhone 6iPhone 6 5555
iPhone5iPhone5 6666

创建和使用查询时间维度表Create and use query-time dimension tables

通常,建议将查询结果与未存储在数据库中的即席维度表联接起来。Often, you'll want to join the results of a query with an ad-hoc dimension table that isn't stored in the database. 你可定义一个表达式,将其结果定义为范围限定为单个查询的表。You can define an expression whose result is a table scoped to a single query.

例如:For example:

// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string)
  [
    "Heavy Rain", "HR",
    "Tornado",    "T"
  ]
;
DimTable
| join StormEvents on EventType
| summarize count() by Code

下面是一个稍微复杂一点的示例:Here's a slightly more complex example:

// Create a query-time dimension table using datatable
let TeamFoundationJobResult = datatable(Result:int, ResultString:string)
  [
    -1, 'None', 0, 'Succeeded', 1, 'PartiallySucceeded', 2, 'Failed',
    3, 'Stopped', 4, 'Killed', 5, 'Blocked', 6, 'ExtensionNotFound',
    7, 'Inactive', 8, 'Disabled', 9, 'JobInitializationError'
  ]
;
JobHistory
  | where PreciseTimeStamp > ago(1h)
  | where Service  != "AX"
  | where Plugin has "Analytics"
  | sort by PreciseTimeStamp desc
  | join kind=leftouter TeamFoundationJobResult on Result
  | extend ExecutionTimeSpan = totimespan(ExecutionTime)
  | project JobName, StartTime, ExecutionTimeSpan, ResultString, ResultMessage

检索每个标识的最新记录(按时间戳)Retrieve the latest records (by timestamp) per identity

假设你有一个表,其中包含:Suppose you have a table that includes:

  • 一个 ID 列,标识与每行关联的实体,如用户 ID 或节点 IDAn ID column that identifies the entity with which each row is associated, such as a user ID or a node ID
  • 一个 timestamp 列,提供该行的时间参考A timestamp column that provides the time reference for the row
  • 其他列Other columns

可使用 top-nested 运算符进行查询,为 ID 列的每个值返回最新的两条记录,其中“最新”定义为“具有最高的 timestamp 值” :You can use the top-nested operator to make a query that returns the latest two records for each value of the ID column, where latest is defined as having the highest value of timestamp:

datatable(id:string, timestamp:datetime, bla:string)           // #1
  [
  "Barak",  datetime(2015-01-01), "1",
  "Barak",  datetime(2016-01-01), "2",
  "Barak",  datetime(2017-01-20), "3",
  "Donald", datetime(2017-01-20), "4",
  "Donald", datetime(2017-01-18), "5",
  "Donald", datetime(2017-01-19), "6"
  ]
| top-nested   of id        by dummy0=max(1),                  // #2
  top-nested 2 of timestamp by dummy1=max(timestamp),          // #3
  top-nested   of bla       by dummy2=max(1)                   // #4
| project-away dummy0, dummy1, dummy2                          // #5

下面是上述查询的分步说明(编号指的是代码注释中的数字):Here's a step-by-step explanation of the preceding query (the numbering refers to the numbers in the code comments):

  1. datatable 是生成用于演示的一些测试数据的方法。The datatable is a way to produce some test data for demonstration purposes. 通常,在这里将使用实际数据。Normally, you'd use real data here.
  2. 此行实质上意味着返回 id 的所有非重复值。This line essentially means return all distinct values of id.
  3. 然后,对于通过 max 函数选取的前两个记录,此行会返回以下项:This line then returns, for the top two records that maximize:
    • timestampThe timestamp column
    • 上一层的列(此处为 idThe columns of the preceding level (here, just id)
    • 在此层指定的列(此处为 timestampThe column specified at this level (here, timestamp)
  4. 此行为上一层返回的每条记录添加 bla 列的值。This line adds the values of the bla column for each of the records returned by the preceding level. 如果表中有你感兴趣的其他列,可对每一列重复此行。If the table has other columns you're interested in, you can repeat this line for each of those columns.
  5. 最后一行使用 project-away 运算符删除 top-nested 引入的多余列。The final line uses the project-away operator to remove the "extra" columns that are introduced by top-nested.

按总计算的百分比来扩展表Extend a table by a percentage of the total calculation

当表格表达式附有其值占总和的百分比时,包含数值列的表格表达式对用户来说更有用。A tabular expression that includes a numeric column is more useful to the user when it's accompanied by its value as a percentage of the total.

例如,假定查询生成下表:For example, assume that a query produces the following table:

SomeSeriesSomeSeries SomeIntSomeInt
AppleApple 100100
香蕉Banana 200200

建议按如下所示显示表:You want to show the table like this:

SomeSeriesSomeSeries SomeIntSomeInt PctPct
AppleApple 100100 33.333.3
香蕉Banana 200200 66.666.6

若要更改表的显示方式,请计算 SomeInt 列的总和,然后将此列的每个值除以总和。To change the way the table appears, calculate the total (sum) of the SomeInt column, and then divide each value of this column by the total. 对于任意结果,请使用 as 运算符For arbitrary results, use the as operator.

例如:For example:

// The following table literally represents a long calculation
// that ends up with an anonymous tabular value:
datatable (SomeInt:int, SomeSeries:string) [
  100, "Apple",
  200, "Banana",
]
// We now give this calculation a name ("X"):
| as X
// Having this name we can refer to it in the sub-expression
// "X | summarize sum(SomeInt)":
| extend Pct = 100 * bin(todouble(SomeInt) / toscalar(X | summarize sum(SomeInt)), 0.001)

对滑动窗口执行聚合Perform aggregations over a sliding window

下面的示例演示如何使用滑动窗口对列进行汇总。The following example shows how to summarize columns by using a sliding window. 若要查询,请使用下表,其中包含按时间戳列出的水果价格。For the query, use the following table, which contains prices of fruits by timestamps.

使用范围为七天的滑动窗口计算每种水果每日的最低成本、最高成本和总成本。Calculate the minimum, maximum, and sum costs of each fruit per day by using a sliding window of seven days. 结果集中的每条记录都聚合前七天的数据,并且结果包含分析期内每一天的记录。Each record in the result set aggregates the preceding seven days, and the results contain a record per day in the analysis period.

水果表:Fruit table:

TimestampTimestamp 水果Fruit 价格Price
2018-09-24 21:00:00.00000002018-09-24 21:00:00.0000000 BananasBananas 33
2018-09-25 20:00:00.00000002018-09-25 20:00:00.0000000 苹果Apples 99
2018-09-26 03:00:00.00000002018-09-26 03:00:00.0000000 BananasBananas 44
2018-09-27 10:00:00.00000002018-09-27 10:00:00.0000000 PlumsPlums 88
2018-09-28 07:00:00.00000002018-09-28 07:00:00.0000000 BananasBananas 66
2018-09-29 21:00:00.00000002018-09-29 21:00:00.0000000 BananasBananas 88
2018-09-30 01:00:00.00000002018-09-30 01:00:00.0000000 PlumsPlums 22
2018-10-01 05:00:00.00000002018-10-01 05:00:00.0000000 BananasBananas 00
2018-10-02 02:00:00.00000002018-10-02 02:00:00.0000000 BananasBananas 00
2018-10-03 13:00:00.00000002018-10-03 13:00:00.0000000 PlumsPlums 44
2018-10-04 14:00:00.00000002018-10-04 14:00:00.0000000 苹果Apples 88
2018-10-05 05:00:00.00000002018-10-05 05:00:00.0000000 BananasBananas 22
2018-10-06 08:00:00.00000002018-10-06 08:00:00.0000000 PlumsPlums 88
2018-10-07 12:00:00.00000002018-10-07 12:00:00.0000000 BananasBananas 00

下面是滑动窗口聚合查询。Here's the sliding window aggregation query. 请参阅查询结果后面的说明。See the explanation after the query result.

let _start = datetime(2018-09-24);
let _end = _start + 13d; 
Fruits 
| extend _bin = bin_at(Timestamp, 1d, _start) // #1 
| extend _endRange = iif(_bin + 7d > _end, _end, 
                            iff( _bin + 7d - 1d < _start, _start,
                                iff( _bin + 7d - 1d < _bin, _bin,  _bin + 7d - 1d)))  // #2
| extend _range = range(_bin, _endRange, 1d) // #3
| mv-expand _range to typeof(datetime) limit 1000000 // #4
| summarize min(Price), max(Price), sum(Price) by Timestamp=bin_at(_range, 1d, _start) ,  Fruit // #5
| where Timestamp >= _start + 7d; // #6

输出如下:Here's the output:

TimestampTimestamp 水果Fruit min_Pricemin_Price max_Pricemax_Price sum_Pricesum_Price
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 苹果Apples 99 99 99
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 BananasBananas 00 88 1818
2018-10-01 00:00:00.00000002018-10-01 00:00:00.0000000 PlumsPlums 22 88 1010
2018-10-02 00:00:00.00000002018-10-02 00:00:00.0000000 BananasBananas 00 88 1818
2018-10-02 00:00:00.00000002018-10-02 00:00:00.0000000 PlumsPlums 22 88 1010
2018-10-03 00:00:00.00000002018-10-03 00:00:00.0000000 PlumsPlums 22 88 1414
2018-10-03 00:00:00.00000002018-10-03 00:00:00.0000000 BananasBananas 00 88 1414
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 BananasBananas 00 88 1414
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 PlumsPlums 22 44 66
2018-10-04 00:00:00.00000002018-10-04 00:00:00.0000000 苹果Apples 88 88 88
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 BananasBananas 00 88 1010
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 PlumsPlums 22 44 66
2018-10-05 00:00:00.00000002018-10-05 00:00:00.0000000 苹果Apples 88 88 88
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 PlumsPlums 22 88 1414
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 BananasBananas 00 22 22
2018-10-06 00:00:00.00000002018-10-06 00:00:00.0000000 苹果Apples 88 88 88
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 BananasBananas 00 22 22
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 PlumsPlums 44 88 1212
2018-10-07 00:00:00.00000002018-10-07 00:00:00.0000000 苹果Apples 88 88 88

查询会在输入表中的每条记录实际出现后的七天内对其进行“延伸”(复制)。The query "stretches" (duplicates) each record in the input table throughout the seven days after its actual appearance. 每条记录实际出现 7 次。Each record actually appears seven times. 因此,每日聚合包含前七天的所有记录。As a result, the daily aggregation includes all records of the preceding seven days.

下面是上述查询的分步说明:Here's a step-by-step explanation of the preceding query:

  1. 将每个记录放入一天箱中(相对于 _start)。Bin each record to one day (relative to _start).
  2. 确定每条记录的范围结束时间:_bin + 7d,除非该值超出 _start_end 范围(如果出现这种情况,会进行调整)。Determine the end of the range per record: _bin + 7d, unless the value is out of the range of _start and _end, in which case, it's adjusted.
  3. 对于每条记录,创建一个 7 天(时间戳)的数组,从当前记录的日期开始。For each record, create an array of seven days (timestamps), starting at the current record's day.
  4. mv-expand 数组,因此将每条记录复制为七条记录,每条记录彼此之间相隔一天。mv-expand the array, thus duplicating each record to seven records, one day apart from each other.
  5. 针对每天执行聚合函数。Perform the aggregation function for each day. 由于 #4,此步骤实际上汇总了过去七天的数据。Due to #4, this step actually summarizes the past seven days.
  6. 前七天的数据不完整,因为前七天没有七天回溯期。The data for the first seven days is incomplete because there's no seven-day lookback period for the first seven days. 第一个七天会从最终结果中排除。The first seven days are excluded from the final result. 在此示例中,它们仅参与了 2018-10-01 的聚合。In the example, they participate only in the aggregation for 2018-10-01.

查找前一个事件Find the preceding event

下一示例演示如何在两个数据集之间查找前一个事件。The next example demonstrates how to find a preceding event between two datasets.

你有两个数据集:A 和 B。对于数据集 B 中的每条记录,在数据集 A 中查找其前一个事件(即 A 中仍早于 B 的 arg_max 记录)。You have two datasets, A and B. For each record in dataset B, find its preceding event in dataset A (that is, the arg_max record in A that is still older than B).

下面是示例数据集:Here are the sample datasets:

let A = datatable(Timestamp:datetime, ID:string, EventA:string)
[
    datetime(2019-01-01 00:00:00), "x", "Ax1",
    datetime(2019-01-01 00:00:01), "x", "Ax2",
    datetime(2019-01-01 00:00:02), "y", "Ay1",
    datetime(2019-01-01 00:00:05), "y", "Ay2",
    datetime(2019-01-01 00:00:00), "z", "Az1"
];
let B = datatable(Timestamp:datetime, ID:string, EventB:string)
[
    datetime(2019-01-01 00:00:03), "x", "B",
    datetime(2019-01-01 00:00:04), "x", "B",
    datetime(2019-01-01 00:00:04), "y", "B",
    datetime(2019-01-01 00:02:00), "z", "B"
];
A; B
TimestampTimestamp IDID EventBEventB
2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 xx Ax1Ax1
2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 zz Az1Az1
2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 xx Ax2Ax2
2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 yy Ay1Ay1
2019-01-01 00:00:05.00000002019-01-01 00:00:05.0000000 yy Ay2Ay2

TimestampTimestamp IDID EventAEventA
2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 xx BB
2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 xx BB
2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 yy BB
2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 zz BB

预期输出:Expected output:

IDID TimestampTimestamp EventBEventB A_TimestampA_Timestamp EventAEventA
xx 2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 BB 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 Ax2Ax2
xx 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 BB 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 Ax2Ax2
yy 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 BB 2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 Ay1Ay1
zz 2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 BB 2019-01-01 00:00:00.00000002019-01-01 00:00:00.0000000 Az1Az1

对于此问题,我们推荐两种不同的方法。We recommend two different approaches for this problem. 你可在特定数据集上测试这两种方法,以找到最适合你场景的方法。You can test both on your specific dataset to find the one that is most suitable for your scenario.

备注

每种方法在不同数据集上的运行方式可能不同。Each approach might run differently on different datasets.

方法 1Approach 1

此方法按 ID 和时间戳序列化这两个数据集。This approach serializes both datasets by ID and timestamp. 然后,它将数据集 B 中的所有事件与数据集 A 中的所有先前事件分为一组。最后,它会在该组中选出数据集 A 中所有事件的 arg_maxThen, it groups all events in dataset B with all their preceding events in dataset A. Finally, it picks the arg_max out of all the events in dataset A in the group.

A
| extend A_Timestamp = Timestamp, Kind="A"
| union (B | extend B_Timestamp = Timestamp, Kind="B")
| order by ID, Timestamp asc 
| extend t = iff(Kind == "A" and (prev(Kind) != "A" or prev(Id) != ID), 1, 0)
| extend t = row_cumsum(t)
| summarize Timestamp=make_list(Timestamp), EventB=make_list(EventB), arg_max(A_Timestamp, EventA) by t, ID
| mv-expand Timestamp to typeof(datetime), EventB to typeof(string)
| where isnotempty(EventB)
| project-away t

方法 2Approach 2

这种解决问题的方法需要最长的回溯期。This approach to solving the problem requires a maximum lookback period. 此方法将查看数据集 A 中的记录可能比数据集 B 早了多长时间。然后,该方法将根据 ID 和此回溯期联接两个数据集。The approach looks at how much older the record in dataset A might be compared to dataset B. The method then joins the two datasets based on ID and this lookback period.

join 会生成所有可能的候选项,数据集 A 的所有记录都早于数据集 B 中的记录,并且在回溯期内。The join produces all possible candidates, all dataset A records that are older than records in dataset B and within the lookback period. 然后,通过 arg_min (TimestampB - TimestampA) 筛选最接近数据集 B 的一条记录。Then, the closest one to dataset B is filtered by arg_min (TimestampB - TimestampA). 回溯期越短,查询结果就会越好。The shorter the lookback period is, the better the query results will be.

在以下示例中,回溯期设置为 1mIn the following example, the lookback period is set to 1m. ID 为 z 的记录没有对应的 A 事件,因为其 A 事件早了两分钟。The record with ID z doesn't have a corresponding A event because its A event is older by two minutes.

let _maxLookbackPeriod = 1m;  
let _internalWindowBin = _maxLookbackPeriod / 2;
let B_events = B 
    | extend ID = new_guid()
    | extend _time = bin(Timestamp, _internalWindowBin)
    | extend _range = range(_time - _internalWindowBin, _time + _maxLookbackPeriod, _internalWindowBin) 
    | mv-expand _range to typeof(datetime) 
    | extend B_Timestamp = Timestamp, _range;
let A_events = A 
    | extend _time = bin(Timestamp, _internalWindowBin)
    | extend _range = range(_time - _internalWindowBin, _time + _maxLookbackPeriod, _internalWindowBin) 
    | mv-expand _range to typeof(datetime) 
    | extend A_Timestamp = Timestamp, _range;
B_events
    | join kind=leftouter (
        A_events
) on ID, _range
| where isnull(A_Timestamp) or (A_Timestamp <= B_Timestamp and B_Timestamp <= A_Timestamp + _maxLookbackPeriod)
| extend diff = coalesce(B_Timestamp - A_Timestamp, _maxLookbackPeriod*2)
| summarize arg_min(diff, *) by ID
| project ID, B_Timestamp, A_Timestamp, EventB, EventA
IDID B_TimestampB_Timestamp A_TimestampA_Timestamp EventBEventB EventAEventA
xx 2019-01-01 00:00:03.00000002019-01-01 00:00:03.0000000 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 BB Ax2Ax2
xx 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 2019-01-01 00:00:01.00000002019-01-01 00:00:01.0000000 BB Ax2Ax2
yy 2019-01-01 00:00:04.00000002019-01-01 00:00:04.0000000 2019-01-01 00:00:02.00000002019-01-01 00:00:02.0000000 BB Ay1Ay1
zz 2019-01-01 00:02:00.00000002019-01-01 00:02:00.0000000 BB

后续步骤Next steps

本文介绍 Azure Monitor 中的常见查询需求,以及如何使用 Kusto 查询语言来满足这些需求。This article identifies common query needs in Azure Monitor and how you can use the Kusto Query Language to meet them.

字符串操作String operations

以下各节将举例说明在使用 Kusto 查询语言时如何使用字符串。The following sections give examples of how to work with strings when using the Kusto Query Language.

字符串及其转义方式Strings and how to escape them

字符串值包装在单引号或双引号中。String values are wrapped with either single or double quotes. 在字符左侧添加反斜杠 (\) 以转义字符:\t 表示制表符,\n 表示换行符,\" 表示单引号字符。Add the backslash (\) to the left of a character to escape the character: \t for tab, \n for newline, and \" for the single quote character.

print "this is a 'string' literal in double \" quotes"
print 'this is a "string" literal in single \' quotes'

为了防止“\”用作转义字符,请添加“@”作为字符串的前缀:To prevent "\" from acting as an escape character, add "@" as a prefix to the string:

print @"C:\backslash\not\escaped\with @ prefix"

字符串比较String comparisons

运算符Operator 说明Description 区分大小写Case-sensitive 示例(生成 trueExample (yields true)
== 等于Equals Yes "aBc" == "aBc"
!= 不等于Not equals Yes "abc" != "ABC"
=~ 等于Equals No "abc" =~ "ABC"
!~ 不等于Not equals No "aBc" !~ "xyz"
has 右侧值是左侧值的完整术语Right-side value is a whole term in left-side value No "North America" has "america"
!has 右侧值不是左侧值的完整术语Right-side value isn't a full term in left-side value No "North America" !has "amer"
has_cs 右侧值是左侧值的完整术语Right-side value is a whole term in left-side value Yes "North America" has_cs "America"
!has_cs 右侧值不是左侧值的完整术语Right-side value isn't a full term in left-side value Yes "North America" !has_cs "amer"
hasprefix 右侧值是左侧值的术语前缀Right-side value is a term prefix in left-side value No "North America" hasprefix "ame"
!hasprefix 右侧值不是左侧值的术语前缀Right-side value isn't a term prefix in left-side value No "North America" !hasprefix "mer"
hasprefix_cs 右侧值是左侧值的术语前缀Right-side value is a term prefix in left-side value Yes "North America" hasprefix_cs "Ame"
!hasprefix_cs 右侧值不是左侧值的术语前缀Right-side value isn't a term prefix in left-side value Yes "North America" !hasprefix_cs "CA"
hassuffix 右侧值是左侧值的术语后缀Right-side value is a term suffix in left-side value No "North America" hassuffix "ica"
!hassuffix 右侧值不是左侧值的术语后缀Right-side value isn't a term suffix in left-side value No "North America" !hassuffix "americ"
hassuffix_cs 右侧值是左侧值的术语后缀Right-side value is a term suffix in left-side value Yes "North America" hassuffix_cs "ica"
!hassuffix_cs 右侧值不是左侧值的术语后缀Right-side value isn't a term suffix in left-side value Yes "North America" !hassuffix_cs "icA"
contains 右侧值作为左侧值的子序列出现Right-side value occurs as a subsequence of left-side value No "FabriKam" contains "BRik"
!contains 右侧值不会出现在左侧值中Right-side value doesn't occur in left-side value No "Fabrikam" !contains "xyz"
contains_cs 右侧值作为左侧值的子序列出现Right-side value occurs as a subsequence of left-side value Yes "FabriKam" contains_cs "Kam"
!contains_cs 右侧值不会出现在左侧值中Right-side value doesn't occur in left-side value Yes "Fabrikam" !contains_cs "Kam"
startswith 右侧值是左侧值的初始子序列Right-side value is an initial subsequence of left-side value No "Fabrikam" startswith "fab"
!startswith 右侧值不是左侧值的初始子序列Right-side value isn't an initial subsequence of left-side value No "Fabrikam" !startswith "kam"
startswith_cs 右侧值是左侧值的初始子序列Right-side value is an initial subsequence of left-side value Yes "Fabrikam" startswith_cs "Fab"
!startswith_cs 右侧值不是左侧值的初始子序列Right-side value isn't an initial subsequence of left-side value Yes "Fabrikam" !startswith_cs "fab"
endswith 右侧值是左侧值的结束子序列Right-side value is a closing subsequence of left-side value No "Fabrikam" endswith "Kam"
!endswith 右侧值不是左侧值的结束子序列Right-side value isn't a closing subsequence of left-side value No "Fabrikam" !endswith "brik"
endswith_cs 右侧值是左侧值的结束子序列Right-side value is a closing subsequence of left-side value Yes "Fabrikam" endswith "Kam"
!endswith_cs 右侧值不是左侧值的结束子序列Right-side value isn't a closing subsequence of left-side value Yes "Fabrikam" !endswith "brik"
matches regex 左侧值包含右侧值的匹配项Left-side value contains a match for right-side value Yes "Fabrikam" matches regex "b.*k"
in 等于某个元素Equals to one of the elements Yes "abc" in ("123", "345", "abc")
!in 不等于任何元素Not equals to any of the elements Yes "bca" !in ("123", "345", "abc")

countofcountof

计算字符串中子字符串的出现次数。Counts occurrences of a substring within a string. 可以匹配纯字符串,也可使用正则表达式 (regex)。Can match plain strings or use a regular expression (regex). 纯字符串匹配项可能重叠,但 regex 匹配项不会重叠。Plain string matches might overlap, but regex matches don't overlap.

countof(text, search [, kind])
  • text:输入字符串text: The input string
  • search:用于在 text 内部进行匹配的纯字符串或 regexsearch: Plain string or regex to match inside text
  • kindnormal | regex(默认值:normal)。kind: normal | regex (default: normal).

返回搜索字符串可在容器中匹配的次数。Returns the number of times that the search string can be matched in the container. 纯字符串匹配项可能重叠,但 regex 匹配项不会重叠。Plain string matches might overlap, but regex matches don't overlap.

纯字符串匹配项Plain string matches

print countof("The cat sat on the mat", "at");  //result: 3
print countof("aaa", "a");  //result: 3
print countof("aaaa", "aa");  //result: 3 (not 2!)
print countof("ababa", "ab", "normal");  //result: 2
print countof("ababa", "aba");  //result: 2

正则表达式匹配项Regex matches

print countof("The cat sat on the mat", @"\b.at\b", "regex");  //result: 3
print countof("ababa", "aba", "regex");  //result: 1
print countof("abcabc", "a.c", "regex");  // result: 2

extractextract

从特定字符串中获取正则表达式的匹配项。Gets a match for a regular expression from a specific string. 还可将提取的子字符串转换为指定的类型。Optionally, can convert the extracted substring to the specified type.

extract(regex, captureGroup, text [, typeLiteral])
  • regex:正则表达式。regex: A regular expression.
  • captureGroup:指示待提取的捕获组的正整数常量。captureGroup: A positive integer constant that indicates the capture group to extract. 使用 0 代表整个匹配项,1 代表正则表达式中第一个 () 匹配的值,并用 2 及以上数字代表后续括号。Use 0 for the entire match, 1 for the value matched by the first parenthesis () in the regular expression, and 2 or more for subsequent parentheses.
  • text - 要搜索的字符串。text - The string to search.
  • typeLiteral - 可选的类型文本(例如 typeof(long))。typeLiteral - An optional type literal (for example, typeof(long)). (如果支持)提取的子字符串将转换成此类型。If provided, the extracted substring is converted to this type.

返回与指定捕获组 captureGroup 匹配的子字符串,并且该字符串可转换为 typeLiteralReturns the substring matched against the indicated capture group captureGroup, optionally converted to typeLiteral. 如果没有匹配项或类型转换失败,则返回 null。If there's no match or the type conversion fails, returns null.

下面的示例从检测信号记录中提取 ComputerIP 的最后一个八位字节:The following example extracts the last octet of ComputerIP from a heartbeat record:

Heartbeat
| where ComputerIP != "" 
| take 1
| project ComputerIP, last_octet=extract("([0-9]*$)", 1, ComputerIP) 

下面的示例提取最后一个八位字节,将其强制转换为 real 类型(数字),然后计算下一个 IP 值:The following example extracts the last octet, casts it to a real type (number), and then calculates the next IP value:

Heartbeat
| where ComputerIP != "" 
| take 1
| extend last_octet=extract("([0-9]*$)", 1, ComputerIP, typeof(real)) 
| extend next_ip=(last_octet+1)%255
| project ComputerIP, last_octet, next_ip

在下一示例中,字符串 Trace 用于搜索 Duration 的定义。In the next example, the string Trace is searched for a definition of Duration. 匹配项强制转换为 real 并与时间常量 (1 s) 相乘,该常量将 Duration 强制转换为 timespan 类型。The match is cast to real and multiplied by a time constant (1 s), which then casts Duration to type timespan.

let Trace="A=12, B=34, Duration=567, ...";
print Duration = extract("Duration=([0-9.]+)", 1, Trace, typeof(real));  //result: 567
print Duration_seconds =  extract("Duration=([0-9.]+)", 1, Trace, typeof(real)) * time(1s);  //result: 00:09:27

isempty、isnotempty、notempty isempty, isnotempty, notempty

  • 如果参数是空字符串或 null,则 isempty 返回 true(请参阅 isnull)。isempty returns true if the argument is an empty string or null (see isnull).
  • 如果参数不是空字符串也不是 null,则 isnotempty 返回 true(请参阅 isnotnull)。isnotempty returns true if the argument isn't an empty string or null (see isnotnull). 别名:notemptyAlias: notempty.
isempty(value)
isnotempty(value)

示例Example

print isempty("");  // result: true

print isempty("0");  // result: false

print isempty(0);  // result: false

print isempty(5);  // result: false

Heartbeat | where isnotempty(ComputerIP) | take 1  // return 1 Heartbeat record in which ComputerIP isn't empty

parseurlparseurl

将 URL 拆分为不同的组成部分(如 protocol、host 和 port),然后返回包含字符串形式的组成部分的字典对象。Splits a URL into its parts, like protocol, host, and port, and then returns a dictionary object that contains the parts as strings.

parseurl(urlstring)

示例Example

print parseurl("http://user:pass@contoso.com/icecream/buy.aspx?a=1&b=2#tag")

输出如下:Here's the output:

{
    "Scheme" : "http",
    "Host" : "contoso.com",
    "Port" : "80",
    "Path" : "/icecream/buy.aspx",
    "Username" : "user",
    "Password" : "pass",
    "Query Parameters" : {"a":"1","b":"2"},
    "Fragment" : "tag"
}

replacereplace

将所有正则表达式匹配项替换为另一字符串。Replaces all regex matches with another string.

replace(regex, rewrite, input_text)
  • regex:作为匹配依据的正则表达式。regex: The regular expression to match by. 可在 () 中包含捕获组。It can contain capture groups in parentheses ().
  • rewrite:通过匹配 regex 实现的任何匹配项的 regex 替换项。rewrite: The replacement regex for any match made by matching a regex. 使用 \0 引用整个匹配项,使用 \1 引用第一个捕获组,使用 \2 等引用后续捕获组。Use \0 to refer to the whole match, \1 for the first capture group, \2, and so on, for subsequent capture groups.
  • input_text:要在其中搜索的输入字符串。input_text: The input string to search in.

在使用 rewrite 计算结果替换 regex 的所有匹配项后返回 text。Returns the text after replacing all matches of regex with evaluations of rewrite. 匹配项不会重叠。Matches don't overlap.

示例Example

SecurityEvent
| take 1
| project Activity 
| extend replaced = replace(@"(\d+) -", @"Activity ID \1: ", Activity) 

输出如下:Here's the output:

活动Activity 已替换Replaced
4663 - 尝试访问某个对象4663 - An attempt was made to access an object 活动 ID 4663:尝试访问某个对象。Activity ID 4663: An attempt was made to access an object.

splitsplit

根据指定的分隔符拆分特定的字符串,然后返回生成的子字符串的数组。Splits a specific string according to a specified delimiter, and then returns an array of the resulting substrings.

split(source, delimiter [, requestedIndex])
  • source:要根据指定的分隔符拆分的字符串。source: The string to be split according to the specified delimiter.
  • delimiter:用于拆分源字符串的分隔符。delimiter: The delimiter that will be used to split the source string.
  • requestedIndex:可选的从零开始的索引。requestedIndex: An optional zero-based index. 如果已提供,则返回的字符串数组只包含该项(如果存在)。If provided, the returned string array holds only that item (if it exists).

示例Example

print split("aaa_bbb_ccc", "_");    // result: ["aaa","bbb","ccc"]
print split("aa_bb", "_");          // result: ["aa","bb"]
print split("aaa_bbb_ccc", "_", 1); // result: ["bbb"]
print split("", "_");               // result: [""]
print split("a__b", "_");           // result: ["a","","b"]
print split("aabbcc", "bb");        // result: ["aa","cc"]

strcatstrcat

连接字符串参数(支持 1-16 个参数)。Concatenates string arguments (supports 1-16 arguments).

strcat("string1", "string2", "string3")

示例Example

print strcat("hello", " ", "world") // result: "hello world"

strlenstrlen

返回字符串的长度。Returns the length of a string.

strlen("text_to_evaluate")

示例Example

print strlen("hello")   // result: 5

substringsubstring

从特定的字符串提取某个子字符串(从指定的索引开始)。Extracts a substring from a specific source string, starting at the specified index. (可选)可以指定请求子字符串的长度。Optionally, you can specify the length of the requested substring.

substring(source, startingIndex [, length])
  • source:从中提取子字符串的源字符串。source: The source string that the substring is taken from.
  • startingIndex:请求子字符串的从零开始的起始字符位置。startingIndex: The zero-based starting character position of the requested substring.
  • length:可选的参数,可用于指定返回的子字符串的请求长度。length: An optional parameter that you can use to specify the requested length of the returned substring.

示例Example

print substring("abcdefg", 1, 2);   // result: "bc"
print substring("123456", 1);       // result: "23456"
print substring("123456", 2, 2);    // result: "34"
print substring("ABCD", 0, 2);  // result: "AB"

tolower、toupper tolower, toupper

将特定的字符串转换为全部小写或全部大写。Converts a specific string to all lowercase or all uppercase.

tolower("value")
toupper("value")

示例Example

print tolower("HELLO"); // result: "hello"
print toupper("hello"); // result: "HELLO"

日期和时间操作Date and time operations

以下各节将举例说明在使用 Kusto 查询语言时如何使用日期和时间值。The following sections give examples of how to work with date and time values when using the Kusto Query Language.

日期时间基本信息Date-time basics

Kusto 查询语言主要具有两种与日期和时间相关的数据类型:datetimetimespanThe Kusto Query Language has two main data types associated with dates and times: datetime and timespan. 所有日期均以 UTC 表示。All dates are expressed in UTC. 尽管支持多种日期时间格式,但首选 ISO-8601 格式。Although multiple date-time formats are supported, the ISO-8601 format is preferred.

时间跨度的表示形式是十进制后面接时间单位:Timespans are expressed as a decimal followed by a time unit:

简写Shorthand 时间单位Time unit
dd dayday
hh hourhour
mm minuteminute
ss secondsecond
msms 毫秒millisecond
微秒microsecond 微秒microsecond
时钟周期tick 纳秒nanosecond

可通过使用 todatetime 运算符强制转换字符串来创建日期时间值。You can create date-time values by casting a string using the todatetime operator. 例如,要查看在特定时间范围内发送的 VM 检测信号,可使用指定时间范围的 between 运算符:For example, to review the VM heartbeats sent in a specific timeframe, use the between operator to specify a time range:

Heartbeat
| where TimeGenerated between(datetime("2018-06-30 22:46:42") .. datetime("2018-07-01 00:57:27"))

还有一个常见的情况是将日期时间值与当前时间进行比较。Another common scenario is comparing a date-time value to the present. 例如,要查看过去两分钟内的所有检测信号,可结合使用 now 运算符和表示两分钟的时间跨度:For example, to see all heartbeats over the last two minutes, you can use the now operator together with a timespan that represents two minutes:

Heartbeat
| where TimeGenerated > now() - 2m

还可通过快捷方式使用此函数:A shortcut is also available for this function:

Heartbeat
| where TimeGenerated > now(-2m)

最简短且可读性最强的方式是使用 ago 运算符:The shortest and most readable method is using the ago operator:

Heartbeat
| where TimeGenerated > ago(2m)

假设你不知道开始和结束时间,但知道开始时间和持续时间。Suppose that instead of knowing the start and end times, you know the start time and the duration. 可重写查询:You can rewrite the query:

let startDatetime = todatetime("2018-06-30 20:12:42.9");
let duration = totimespan(25m);
Heartbeat
| where TimeGenerated between(startDatetime .. (startDatetime+duration) )
| extend timeFromStart = TimeGenerated - startDatetime

转换时间单位Convert time units

建议以非默认时间单位表示日期/时间或时间跨度值。You might want to express a date-time or timespan value in a time unit other than the default. 例如,如果你要检查过去 30 分钟内的错误事件,且需要一个显示事件发生时间的计算列,可使用以下查询:For example, if you're reviewing error events from the past 30 minutes and need a calculated column that shows how long ago the event happened, you can use this query:

Event
| where TimeGenerated > ago(30m)
| where EventLevelName == "Error"
| extend timeAgo = now() - TimeGenerated 

timeAgo 列包含的值(如 00:09:31.5118992)采用的是 hh:mm:ss.fffffff 格式。The timeAgo column holds values like 00:09:31.5118992, which are formatted as hh:mm:ss.fffffff. 如果要将这些值的格式设置为自开始时间以来的 number 分钟数,则用该值除以 1mIf you want to format these values to the number of minutes since the start time, divide that value by 1m:

Event
| where TimeGenerated > ago(30m)
| where EventLevelName == "Error"
| extend timeAgo = now() - TimeGenerated
| extend timeAgoMinutes = timeAgo/1m 

按时间间隔进行聚合和分桶Aggregations and bucketing by time intervals

还有一种常见的情况是,需要按特定时间单位获取特点时间段内的统计数据。Another common scenario is the need to obtain statistics for a specific time period in a specific time unit. 对于这种情况,可使用 bin 运算符作为 summarize 子句的一部分。For this scenario, you can use a bin operator as part of a summarize clause.

使用以下查询获取过去半小时内每 5 分钟发生的事件数:Use the following query to get the number of events that occurred every five minutes during the past half-hour:

Event
| where TimeGenerated > ago(30m)
| summarize events_count=count() by bin(TimeGenerated, 5m) 

此查询会生成以下表:This query produces the following table:

TimeGenerated(UTC)TimeGenerated(UTC) events_countevents_count
2018-08-01T09:30:00.0002018-08-01T09:30:00.000 5454
2018-08-01T09:35:00.0002018-08-01T09:35:00.000 4141
2018-08-01T09:40:00.0002018-08-01T09:40:00.000 4242
2018-08-01T09:45:00.0002018-08-01T09:45:00.000 4141
2018-08-01T09:50:00.0002018-08-01T09:50:00.000 4141
2018-08-01T09:55:00.0002018-08-01T09:55:00.000 1616

还可使用函数(如 startofday)来创建结果存储桶:Another way to create buckets of results is to use functions like startofday:

Event
| where TimeGenerated > ago(4d)
| summarize events_count=count() by startofday(TimeGenerated) 

输出如下:Here's the output:

timestamptimestamp count_count_
2018-07-28T00:00:00.0002018-07-28T00:00:00.000 7,1367,136
2018-07-29T00:00:00.0002018-07-29T00:00:00.000 12,31512,315
2018-07-30T00:00:00.0002018-07-30T00:00:00.000 16,84716,847
2018-07-31T00:00:00.0002018-07-31T00:00:00.000 12,61612,616
2018-08-01T00:00:00.0002018-08-01T00:00:00.000 5,4165,416

时区Time zones

由于所有日期/时间值都以 UTC 表示,因此将这些值转换为本地时区通常很有用。Because all date-time values are expressed in UTC, it's often useful to convert these values into the local time zone. 例如,使用此计算将 UTC 转换为 PST 时间:For example, use this calculation to convert UTC to PST times:

Event
| extend localTimestamp = TimeGenerated - 8h

聚合Aggregations

以下各节将举例说明在使用 Kusto 查询语言时如何聚合查询结果。The following sections give examples of how to aggregate the results of a query when using the Kusto Query Language.

计数count

在应用任一筛选器后,计算结果集中的行数。Count the number of rows in the result set after any filters are applied. 下面的示例返回过去 30 分钟内 Perf 表中的总行数。The following example returns the total number of rows in the Perf table from the last 30 minutes. 结果将在名为 count_ 的列中返回,除非为该列分配了特定名称:The results are returned in a column named count_ unless you assign a specific name to the column:

Perf
| where TimeGenerated > ago(30m) 
| summarize count()
Perf
| where TimeGenerated > ago(30m) 
| summarize num_of_records=count() 

时间表可视化效果可用于查看随时间变化的趋势:A timechart visualization might be useful to see a trend over time:

Perf 
| where TimeGenerated > ago(30m) 
| summarize count() by bin(TimeGenerated, 5m)
| render timechart

本例中输出以 5 分钟为间隔显示了 Perf 记录计数趋势线:The output from this example shows the Perf record count trend line in five-minute intervals:

以 5 分钟为间隔显示 Perf 记录计数趋势线的折线图的屏幕截图。

dcount、dcountif dcount, dcountif

使用 dcountdcountif 计算特定列中非重复值的数量。Use dcount and dcountif to count distinct values in a specific column. 以下查询计算过去一小时内发送检测信号的非重复计算机的数量:The following query evaluates how many distinct computers sent heartbeats in the last hour:

Heartbeat 
| where TimeGenerated > ago(1h) 
| summarize dcount(Computer)

要想只计算发送检测信号的 Linux 计算机,请使用 dcountifTo count only the Linux computers that sent heartbeats, use dcountif:

Heartbeat 
| where TimeGenerated > ago(1h) 
| summarize dcountif(Computer, OSType=="Linux")

计算子组Evaluate subgroups

要在数据中的子组上执行计数或其他聚合,请使用 by 关键字。To perform a count or other aggregations on subgroups in your data, use the by keyword. 例如,要计算每个国家/地区发送检测信号的非重复 Linux 计算机的数量,请使用以下查询:For example, to count the number of distinct Linux computers that sent heartbeats in each country or region, use this query:

Heartbeat 
| where TimeGenerated > ago(1h) 
| summarize distinct_computers=dcountif(Computer, OSType=="Linux") by RemoteIPCountry
RemoteIPCountryRemoteIPCountry distinct_computersdistinct_computers
美国United States 1919
加拿大Canada 33
爱尔兰Ireland 00
英国United Kingdom 00
荷兰Netherlands 22

要分析更小的数据子组,请在 by 部分中添加列名称。To analyze even smaller subgroups of your data, add column names to the by section. 例如,想要根据每个操作系统类型 (OSType) 来计算来自每个国家或地区的非重复计算机的数量:For example, you might want to count the distinct computers from each country or region per type of operating system (OSType):

Heartbeat 
| where TimeGenerated > ago(1h) 
| summarize distinct_computers=dcountif(Computer, OSType=="Linux") by RemoteIPCountry, OSType

百分位数Percentile

要查找中值,请使用带值的 percentile 函数指定百分位数:To find the median value, use the percentile function with a value to specify the percentile:

Perf
| where TimeGenerated > ago(30m) 
| where CounterName == "% Processor Time" and InstanceName == "_Total" 
| summarize percentiles(CounterValue, 50) by Computer

还可指定不同的百分位数,以获取每一个的聚合结果:You also can specify different percentiles to get an aggregated result for each:

Perf
| where TimeGenerated > ago(30m) 
| where CounterName == "% Processor Time" and InstanceName == "_Total" 
| summarize percentiles(CounterValue, 25, 50, 75, 90) by Computer

结果可能显示某些计算机 CPU 具有相似的中值。The results might show that some computer CPUs have similar median values. 不过,尽管某些计算机稳定在中值附近,但其他计算机报告的 CPU 值会更低/更高。However, although some computers are steady around the median, others have reported much lower and higher CPU values. 高值和低值意味着计算机出现高峰期。The high and low values mean that the computers have experienced spikes.

VarianceVariance

要直接计算值的方差,请使用标准偏差和方差方法:To directly evaluate the variance of a value, use the standard deviation and variance methods:

Perf
| where TimeGenerated > ago(30m) 
| where CounterName == "% Processor Time" and InstanceName == "_Total" 
| summarize stdev(CounterValue), variance(CounterValue) by Computer

要分析 CPU 使用量的稳定性,一种很好的方式是将 stdev 与中值计算相结合:A good way to analyze the stability of CPU usage is to combine stdev with the median calculation:

Perf
| where TimeGenerated > ago(130m) 
| where CounterName == "% Processor Time" and InstanceName == "_Total" 
| summarize stdev(CounterValue), percentiles(CounterValue, 50) by Computer

生成列表和集Generate lists and sets

可使用 makelist 根据特定列中的值顺序创建数据透视图。You can use makelist to pivot data by the order of values in a specific column. 例如,你可能想要浏览计算机上发生的最常见有序事件。For example, you might want to explore the most common order events that take place on your computers. 实际上,可以根据每台计算机上 EventID 值的顺序创建数据透视图:You can essentially pivot the data by the order of EventID values on each computer:

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makelist(EventID) by Computer

输出如下:Here's the output:

ComputerComputer list_EventIDlist_EventID
computer1computer1 [704,701,1501,1500,1085,704,704,701][704,701,1501,1500,1085,704,704,701]
computer2computer2 [326,105,302,301,300,102][326,105,302,301,300,102]
...... ......

makelist 根据数据的传入顺序生成列表。makelist generates a list in the order that data was passed into it. 若要以最旧到最新的顺序排序事件,请在 order 语句中使用 asc,而不要使用 descTo sort events from oldest to newest, use asc in the order statement instead of desc.

你可能会发现创建仅有非重复值的列表很有用。You might find it useful to create a list only of distinct values. 此列表称为 set,你可使用 makeset 命令生成它:This list is called a set, and you can generate it by using the makeset command:

Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makeset(EventID) by Computer

输出如下:Here's the output:

ComputerComputer list_EventIDlist_EventID
computer1computer1 [704,701,1501,1500,1085][704,701,1501,1500,1085]
computer2computer2 [326,105,302,301,300,102][326,105,302,301,300,102]
...... ......

makelist 一样,makeset 也使用已排序的数据。Like makelist, makeset also works with ordered data. makeset 命令根据传入的行的顺序生成数组。The makeset command generates arrays based on the order of the rows that are passed into it.

展开列表Expand lists

makelistmakeset 的反运算是 mv-expandThe inverse operation of makelist or makeset is mv-expand. mv-expand 命令将值列表展开为多个单独行。The mv-expand command expands a list of values to separate rows. 它可以展开任意数目的动态列(包括 JSON 和数组列)。It can expand across any number of dynamic columns, including JSON and array columns. 例如,可以在 Heartbeat 表中检查在过去一小时发送了检测信号的计算机中发送数据的解决方案:For example, you can check the Heartbeat table for solutions that sent data from computers that sent a heartbeat in the past hour:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, Solutions

输出如下:Here's the output:

ComputerComputer 解决方案Solutions
computer1computer1 "security", "updates", "changeTracking""security", "updates", "changeTracking"
computer2computer2 "security", "updates""security", "updates"
computer3computer3 "antiMalware", "changeTracking""antiMalware", "changeTracking"
...... ......

使用 mv-expand 可以显示单独行(而不是逗号分隔列表)中的每个值:Use mv-expand to show each value in a separate row instead of in a comma-separated list:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mv-expand Solutions

输出如下:Here's the output:

ComputerComputer 解决方案Solutions
computer1computer1 "security""security"
computer1computer1 "updates""updates"
computer1computer1 "changeTracking""changeTracking"
computer2computer2 "security""security"
computer2computer2 "updates""updates"
computer3computer3 "antiMalware""antiMalware"
computer3computer3 "changeTracking""changeTracking"
...... ......

可使用 makelist 将各个项组合在一起。You can use makelist to group items together. 在输出中,可以查看每个解决方案的计算机列表:In the output, you can see the list of computers per solution:

Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mv-expand Solutions
| summarize makelist(Computer) by tostring(Solutions) 

输出如下:Here's the output:

解决方案Solutions list_Computerlist_Computer
"security""security" ["computer1", "computer2"]["computer1", "computer2"]
"updates""updates" ["computer1", "computer2"]["computer1", "computer2"]
"changeTracking""changeTracking" ["computer1", "computer3"]["computer1", "computer3"]
"antiMalware""antiMalware" ["computer3"]["computer3"]
...... ......

缺少箱Missing bins

为缺失的箱填写默认值时,mv-expand 应用程序很有用。例如,假设你要通过浏览特定计算机的检测信号来查看该计算机的正常运行时间。A useful application of mv-expand is filling in default values for missing bins. For example, suppose you're looking for the uptime of a specific computer by exploring its heartbeat. 此外,你还希望查看 Category 列中检测信号的源。You also want to see the source of the heartbeat, which is in the Category column. 通常,我们将使用基本的 summarize 语句:Normally, we would use a basic summarize statement:

Heartbeat
| where TimeGenerated > ago(12h)
| summarize count() by Category, bin(TimeGenerated, 1h)

输出如下:Here's the output:

类别Category TimeGeneratedTimeGenerated count_count_
直接代理Direct Agent 2017-06-06T17:00:00Z2017-06-06T17:00:00Z 1515
直接代理Direct Agent 2017-06-06T18:00:00Z2017-06-06T18:00:00Z 6060
直接代理Direct Agent 2017-06-06T20:00:00Z2017-06-06T20:00:00Z 5555
直接代理Direct Agent 2017-06-06T21:00:00Z2017-06-06T21:00:00Z 5757
直接代理Direct Agent 2017-06-06T22:00:00Z2017-06-06T22:00:00Z 6060
...... ...... ......

在输出中,与“2017-06-06T19:00:00Z”关联的存储桶缺失,因为在这一小时内没有任何检测信号数据。In the output, the bucket that's associated with "2017-06-06T19:00:00Z" is missing because there isn't any heartbeat data for that hour. 使用 make-series 函数将默认值赋给空存储桶。Use the make-series function to assign a default value to empty buckets. 为每个类别生成一行。A row is generated for each category. 输出包含两个额外数组列,其中一个列包含值,另一个列包含匹配时间存储桶:The output includes two extra array columns, one for values and one for matching time buckets:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 

输出如下:Here's the output:

类别Category count_count_ TimeGeneratedTimeGenerated
直接代理Direct Agent [15,60,0,55,60,57,60,...][15,60,0,55,60,57,60,...] ["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...]["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...]
...... ...... ......

count_ 数组的第三个元素为 0,与预期相同。The third element of the count_ array is 0, as expected. TimeGenerated 数组匹配的时间戳为“2017-06-06T19:00:00.0000000Z”。The TimeGenerated array has a matching time stamp of "2017-06-06T19:00:00.0000000Z". 但此数组的格式难以阅读。But, this array format is difficult to read. 使用 mv-expand 展开数组,并生成 summarize 所生成的相同格式输出:Use mv-expand to expand the arrays and produce the same format output as generated by summarize:

Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category 
| mv-expand TimeGenerated, count_
| project Category, TimeGenerated, count_

输出如下:Here's the output:

类别Category TimeGeneratedTimeGenerated count_count_
直接代理Direct Agent 2017-06-06T17:00:00Z2017-06-06T17:00:00Z 1515
直接代理Direct Agent 2017-06-06T18:00:00Z2017-06-06T18:00:00Z 6060
直接代理Direct Agent 2017-06-06T19:00:00Z2017-06-06T19:00:00Z 00
直接代理Direct Agent 2017-06-06T20:00:00Z2017-06-06T20:00:00Z 5555
直接代理Direct Agent 2017-06-06T21:00:00Z2017-06-06T21:00:00Z 5757
直接代理Direct Agent 2017-06-06T22:00:00Z2017-06-06T22:00:00Z 6060
...... ...... ......

将结果缩小为一组元素:let、makeset、toscalar、in Narrow results to a set of elements: let, makeset, toscalar, in

一种常见的方案是基于一组条件选择特定实体的名称,然后将不同的数据集筛选为该实体集。A common scenario is to select the names of specific entities based on a set of criteria, and then filter a different dataset down to that set of entities. 例如,可以查找已知缺少更新的计算机,并识别这些计算机调用的 IP 地址。For example, you might find computers that are known to have missing updates and identify IP addresses that these computers called out to.

下面是一个示例:Here's an example:

let ComputersNeedingUpdate = toscalar(
    Update
    | summarize makeset(Computer)
    | project set_Computer
);
WindowsFirewall
| where Computer in (ComputersNeedingUpdate)

联接Joins

可使用联接在同一查询中分析来自多个表的数据。You can use joins to analyze data from multiple tables in the same query. 联接通过匹配指定列的值来合并两个数据集的行。A join merges the rows of two datasets by matching values of specified columns.

下面是一个示例:Here's an example:

SecurityEvent 
| where EventID == 4624     // sign-in events
| project Computer, Account, TargetLogonId, LogonTime=TimeGenerated
| join kind= inner (
    SecurityEvent 
    | where EventID == 4634     // sign-out events
    | project TargetLogonId, LogoffTime=TimeGenerated
) on TargetLogonId
| extend Duration = LogoffTime-LogonTime
| project-away TargetLogonId1 
| top 10 by Duration desc

在本例中,第一个数据集筛选所有登录事件。In the example, the first dataset filters for all sign-in events. 该数据集与筛选所有注销事件的第二个数据集进行联接。That dataset is joined with a second dataset that filters for all sign-out events. 投影的列有 ComputerAccountTargetLogonIdTimeGeneratedThe projected columns are Computer, Account, TargetLogonId, and TimeGenerated. 数据集由共享列 TargetLogonId 进行关联。The datasets are correlated by a shared column, TargetLogonId. 输出实为具有登录时间和注销时间的记录,每个关联对应一条记录。The output is a single record per correlation that has both the sign-in and sign-out time.

如果两个数据集都具有相同名称的列,则右侧数据集的列将会获得一个索引号。If both datasets have columns that have the same name, the columns of the right-side dataset are given an index number. 在本例中,结果将显示来自左侧表的 TargetLogonId 及其值,以及来自右侧表的 TargetLogonId1 及其值。In this example, the results would show TargetLogonId with values from the left-side table and TargetLogonId1 with values from the right-side table. 在此情况下,已使用 project-away 运算符删除第二个 TargetLogonId1In this case, the second TargetLogonId1 column was removed by using the project-away operator.

备注

为提高性能,请使用 project 运算符仅保留已联接的数据集的相关列。To improve performance, keep only the relevant columns of the joined datasets by using the project operator.

请使用以下语法来联接两个数据集,其中所联接的键在两个表之间具有不同的名称:Use the following syntax to join two datasets in which the joined key has a different name between the two tables:

Table1
| join ( Table2 ) 
on $left.key1 == $right.key2

查找表Lookup tables

联接的一个常见用法是将 datatable 用于静态值映射。A common use of joins is to use datatable for static value mapping. 使用 datatable 可使结果更加直观。Using datatable can help make results more presentable. 例如,可使用每个事件 ID 的事件名称来丰富安全事件数据:For example, you can enrich security event data with the event name for each event ID:

let DimTable = datatable(EventID:int, eventName:string)
  [
    4625, "Account activity",
    4688, "Process action",
    4634, "Account activity",
    4658, "The handle to an object was closed",
    4656, "A handle to an object was requested",
    4690, "An attempt was made to duplicate a handle to an object",
    4663, "An attempt was made to access an object",
    5061, "Cryptographic operation",
    5058, "Key file operation"
  ];
SecurityEvent
| join kind = inner
 DimTable on EventID
| summarize count() by eventName

输出如下:Here's the output:

eventNameeventName count_count_
关闭某个对象的句柄The handle to an object was closed 290,995290,995
请求某个对象的句柄A handle to an object was requested 154,157154,157
尝试复制某个对象的句柄An attempt was made to duplicate a handle to an object 144,305144,305
尝试访问某个对象An attempt was made to access an object 123,669123,669
加密操作Cryptographic operation 153,495153,495
重要文件操作Key file operation 153,496153,496

JSON 和数据结构JSON and data structures

嵌套的对象是指在键值对的数组或映射中包含其他对象的对象。Nested objects are objects that contain other objects in an array or in a map of key-value pairs. 这些对象表示为 JSON 字符串。The objects are represented as JSON strings. 本节介绍如何使用 JSON 来检索数据和分析嵌套的对象。This section describes how you can use JSON to retrieve data and analyze nested objects.

使用 JSON 字符串Work with JSON strings

使用 extractjson 访问已知路径中的特定 JSON 元素。Use extractjson to access a specific JSON element in a known path. 此函数需要使用以下约定的路径表达式:This function requires a path expression that uses the following conventions:

  • 使用 $ 引用根文件夹。Use $ to refer to the root folder.
  • 使用括号/圆点表示法来表示索引和元素,如下例所示。Use the bracket or dot notation to refer to indexes and elements as illustrated in the following examples.

对索引使用括号,并使用圆点来分隔元素:Use brackets for indexes and dots to separate elements:

let hosts_report='{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}';
print hosts_report
| extend status = extractjson("$.hosts[0].status", hosts_report)

此示例有点相似,但它只使用括号表示法:This example is similar, but it uses only the brackets notation:

let hosts_report='{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}';
print hosts_report 
| extend status = extractjson("$['hosts'][0]['status']", hosts_report)

如果只有一个元素,则只能使用圆点表示法:For only one element, you can use only the dot notation:

let hosts_report=dynamic({"location":"North_DC", "status":"running", "rate":5});
print hosts_report 
| extend status = hosts_report.status

parsejsonparsejson

要访问 JSON 结构中的多个元素,将其作为动态对象进行访问更为简单。It's easiest to access multiple elements in your JSON structure as a dynamic object. 使用 parsejson 将文本数据强制转换为动态对象。Use parsejson to cast text data to a dynamic object. 将 JSON 转换为动态类型后,可使用额外的函数来分析数据。After you convert the JSON to a dynamic type, you can use additional functions to analyze the data.

let hosts_object = parsejson('{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}');
print hosts_object 
| extend status0=hosts_object.hosts[0].status, rate1=hosts_object.hosts[1].rate

arraylengtharraylength

使用 arraylength 计算数组中元素的数量:Use arraylength to count the number of elements in an array:

let hosts_object = parsejson('{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}');
print hosts_object 
| extend hosts_num=arraylength(hosts_object.hosts)

mv-expandmv-expand

使用 mv-expand 将对象的属性拆分到单独的行中:Use mv-expand to break the properties of an object into separate rows:

let hosts_object = parsejson('{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}');
print hosts_object 
| mv-expand hosts_object.hosts[0]

屏幕截图显示了带有位置值、状态值和速率值的 host_0。

buildschemabuildschema

使用 buildschema 获取允许对象的所有值的架构:Use buildschema to get the schema that admits all values of an object:

let hosts_object = parsejson('{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"location":"South_DC", "status":"stopped", "rate":3}]}');
print hosts_object 
| summarize buildschema(hosts_object)

结果是 JSON 格式的架构:The result is a schema in JSON format:

{
    "hosts":
    {
        "indexer":
        {
            "location": "string",
            "rate": "int",
            "status": "string"
        }
    }
}

此架构显示了对象字段的名称及其匹配的数据类型。The schema describes the names of the object fields and their matching data types.

嵌套的对象可能具有不同架构,如下例所示:Nested objects might have different schemas, as in the following example:

let hosts_object = parsejson('{"hosts": [{"location":"North_DC", "status":"running", "rate":5},{"status":"stopped", "rate":"3", "range":100}]}');
print hosts_object 
| summarize buildschema(hosts_object)

图表Charts

以下各节将举例说明在使用 Kusto 查询语言时如何使用图表。The following sections give examples of how to work with charts when using the Kusto Query Language.

将结果制成图表Chart the results

首先查看在过去 1 小时内,每个操作系统占用的计算机数量:Begin by reviewing the number of computers per operating system during the past hour:

Heartbeat
| where TimeGenerated > ago(1h)
| summarize count(Computer) by OSType  

默认情况下,以表形式显示结果:By default, the results display as a table:

以表形式显示查询结果的屏幕截图。

要优化视图效果,请选择“图表”,然后选择“饼图”选项以直观显示结果 :For a more useful view, select Chart, and then select the Pie option to visualize the results:

以饼图形式显示查询结果的屏幕截图。

时间图表Timecharts

显示处理器时间的平均值、第 50 位百分值和第 95 位百分位值(按 1 小时的箱数计)。Show the average and the 50th and 95th percentiles of processor time in bins of one hour.

下面的查询将生成多个序列。The following query generates multiple series. 在结果中,可选择要在时间表中显示哪些序列。In the results, you can choose which series to show in the timechart.

Perf
| where TimeGenerated > ago(1d) 
| where CounterName == "% Processor Time" 
| summarize avg(CounterValue), percentiles(CounterValue, 50, 95)  by bin(TimeGenerated, 1h)

选择“折线图”显示选项:Select the Line chart display option:

显示多序列折线图的屏幕截图。

参考线Reference line

参考线可帮助你轻松识别指标是否超出特定阈值。A reference line can help you easily identify whether the metric exceeded a specific threshold. 要向图表添加一行,请添加一个常数列来扩展数据集:To add a line to a chart, extend the dataset by adding a constant column:

Perf
| where TimeGenerated > ago(1d) 
| where CounterName == "% Processor Time" 
| summarize avg(CounterValue), percentiles(CounterValue, 50, 95)  by bin(TimeGenerated, 1h)
| extend Threshold = 20

显示具有阈值参考线的多序列折线图的屏幕截图。

多个维度Multiple dimensions

summarizeby 字句中的多个表达式在结果中创建多个行。Multiple expressions in the by clause of summarize create multiple rows in the results. 会为每个值组合创建一行。One row is created for each combination of values.

SecurityEvent
| where TimeGenerated > ago(1d)
| summarize count() by tostring(EventID), AccountType, bin(TimeGenerated, 1h)

当你以图表形式查看结果时,图表使用 by 子句的第一列。When you view the results as a chart, the chart uses the first column from the by clause. 以下示例演示使用 EventID 值创建的堆积柱形图。The following example shows a stacked column chart that's created by using the EventID value. 维度必须为 string 类型。Dimensions must be of the string type. 在此示例中,EventID 值强制转换为 stringIn this example, the EventID value is cast to string:

显示基于 EventID 列的条形图的屏幕截图。

可通过选择列名的下拉箭头在列之间进行切换:You can switch between columns by selecting the drop-down arrow for the column name:

显示基于 AccountType 列的条形图的屏幕截图,其中显示了列选择器。

智能分析Smart analytics

本节包含使用 Azure Log Analytics 中的智能分析函数分析用户活动的示例。This section includes examples that use smart analytics functions in Azure Log Analytics to analyze user activity. 可以使用这些示例来分析你自己的由 Azure Application Insights 监视的应用程序,或者使用这些查询中的概念来对其他数据进行类似的分析。You can use these examples to analyze your own applications that are monitored by Azure Application Insights, or use the concepts in these queries for similar analysis on other data.

队列分析Cohorts analytics

队列分析跟踪特定用户组(称为队列)的活动。Cohort analysis tracks the activity of specific groups of users, known as cohorts. 队列分析通过测量回头用户的比率,尝试衡量服务的吸引力。Cohort analytics attempts to measure how appealing a service is by measuring the rate of returning users. 按照用户首次使用该服务的时间对用户进行分组。Users are grouped by the time they first used the service. 分析队列时,预期会发现活动量相比前几个跟踪周期有所下降。When analyzing cohorts, we expect to find a decrease in activity over the first tracked periods. 按照首次观测队列成员的周次来设置每个队列的标题。Each cohort is titled by the week its members were observed for the first time.

以下示例分析用户在首次使用该服务后的 5 周内完成的活动数:The following example analyzes the number of activities users completed during five weeks after their first use of the service:

let startDate = startofweek(bin(datetime(2017-01-20T00:00:00Z), 1d));
let week = range Cohort from startDate to datetime(2017-03-01T00:00:00Z) step 7d;
// For each user, we find the first and last timestamp of activity
let FirstAndLastUserActivity = (end:datetime) 
{
    customEvents
    | where customDimensions["sourceapp"]=="ai-loganalyticsui-prod"
    // Check 30 days back to see first time activity.
    | where timestamp > startDate - 30d
    | where timestamp < end      
    | summarize min=min(timestamp), max=max(timestamp) by user_AuthenticatedId
};
let DistinctUsers = (cohortPeriod:datetime, evaluatePeriod:datetime) {
    toscalar (
    FirstAndLastUserActivity(evaluatePeriod)
    // Find members of the cohort: only users that were observed in this period for the first time.
    | where min >= cohortPeriod and min < cohortPeriod + 7d  
    // Pick only the members that were active during the evaluated period or after.
    | where max > evaluatePeriod - 7d
    | summarize dcount(user_AuthenticatedId)) 
};
week 
| where Cohort == startDate
// Finally, calculate the desired metric for each cohort. In this sample, we calculate distinct users but you can change
// this to any other metric that would measure the engagement of the cohort members.
| extend 
    r0 = DistinctUsers(startDate, startDate+7d),
    r1 = DistinctUsers(startDate, startDate+14d),
    r2 = DistinctUsers(startDate, startDate+21d),
    r3 = DistinctUsers(startDate, startDate+28d),
    r4 = DistinctUsers(startDate, startDate+35d)
| union (week | where Cohort == startDate + 7d 
| extend 
    r0 = DistinctUsers(startDate+7d, startDate+14d),
    r1 = DistinctUsers(startDate+7d, startDate+21d),
    r2 = DistinctUsers(startDate+7d, startDate+28d),
    r3 = DistinctUsers(startDate+7d, startDate+35d) )
| union (week | where Cohort == startDate + 14d 
| extend 
    r0 = DistinctUsers(startDate+14d, startDate+21d),
    r1 = DistinctUsers(startDate+14d, startDate+28d),
    r2 = DistinctUsers(startDate+14d, startDate+35d) )
| union (week | where Cohort == startDate + 21d 
| extend 
    r0 = DistinctUsers(startDate+21d, startDate+28d),
    r1 = DistinctUsers(startDate+21d, startDate+35d) ) 
| union (week | where Cohort == startDate + 28d 
| extend 
    r0 = DistinctUsers (startDate+28d, startDate+35d) )
// Calculate the retention percentage for each cohort by weeks
| project Cohort, r0, r1, r2, r3, r4,
          p0 = r0/r0*100,
          p1 = todouble(r1)/todouble (r0)*100,
          p2 = todouble(r2)/todouble(r0)*100,
          p3 = todouble(r3)/todouble(r0)*100,
          p4 = todouble(r4)/todouble(r0)*100 
| sort by Cohort asc

输出如下:Here's the output:

显示基于活动的队列表的屏幕截图。

滚动每月活动用户数和用户粘性Rolling monthly active users and user stickiness

下面的示例将时间序列分析与 series_fir 函数结合使用。The following example uses time-series analysis with the series_fir function. 可使用 series_fir 函数进行滑动窗口计算。You can use the series_fir function for sliding window computations. 受监视的示例应用程序是一个通过自定义事件跟踪用户活动的网上商店。The sample application being monitored is an online store that tracks users' activity through custom events. 查询将跟踪两种类型的用户活动:AddToCartCheckoutThe query tracks two types of user activities: AddToCart and Checkout. 如果某用户在特定日内至少完成一次结帐,该查询会将该用户定义为活动用户。It defines an active user as a user who completed a checkout at least once on a specific day.

let endtime = endofday(datetime(2017-03-01T00:00:00Z));
let window = 60d;
let starttime = endtime-window;
let interval = 1d;
let user_bins_to_analyze = 28;
// Create an array of filters coefficients for series_fir(). A list of '1' in our case will produce a simple sum.
let moving_sum_filter = toscalar(range x from 1 to user_bins_to_analyze step 1 | extend v=1 | summarize makelist(v)); 
// Level of engagement. Users will be counted as engaged if they completed at least this number of activities.
let min_activity = 1;
customEvents
| where timestamp > starttime  
| where customDimensions["sourceapp"] == "ai-loganalyticsui-prod"
// We want to analyze users who actually checked out in our website.
| where (name == "Checkout") and user_AuthenticatedId <> ""
// Create a series of activities per user.
| make-series UserClicks=count() default=0 on timestamp 
    in range(starttime, endtime-1s, interval) by user_AuthenticatedId
// Create a new column that contains a sliding sum. 
// Passing 'false' as the last parameter to series_fir() prevents normalization of the calculation by the size of the window.
// For each time bin in the *RollingUserClicks* column, the value is the aggregation of the user activities in the 
// 28 days that preceded the bin. For example, if a user was active once on 2016-12-31 and then inactive throughout 
// January, then the value will be 1 between 2016-12-31 -> 2017-01-28 and then 0s. 
| extend RollingUserClicks=series_fir(UserClicks, moving_sum_filter, false)
// Use the zip() operator to pack the timestamp with the user activities per time bin.
| project User_AuthenticatedId=user_AuthenticatedId , RollingUserClicksByDay=zip(timestamp, RollingUserClicks)
// Transpose the table and create a separate row for each combination of user and time bin (1 day).
| mv-expand RollingUserClicksByDay
| extend Timestamp=todatetime(RollingUserClicksByDay[0])
// Mark the users that qualify according to min_activity.
| extend RollingActiveUsersByDay=iff(toint(RollingUserClicksByDay[1]) >= min_activity, 1, 0)
// And finally, count the number of users per time bin.
| summarize sum(RollingActiveUsersByDay) by Timestamp
// First 28 days contain partial data, so we filter them out.
| where Timestamp > starttime + 28d
// Render as timechart.
| render timechart

输出如下:Here's the output:

显示某月内按日期排序的滚动活动用户的图表的屏幕截图。

下面的示例将前面的查询转换为可重用的函数。The following example turns the preceding query into a reusable function. 然后,该示例使用查询来计算滚动用户粘性。The example then uses the query to calculate rolling user stickiness. 此查询中的活动用户定义为该用户在特定日内至少完成一次结帐。An active user in this query is defined as a user who completed a checkout at least once on a specific day.

let rollingDcount = (sliding_window_size: int, event_name:string)
{
    let endtime = endofday(datetime(2017-03-01T00:00:00Z));
    let window = 90d;
    let starttime = endtime-window;
    let interval = 1d;
    let moving_sum_filter = toscalar(range x from 1 to sliding_window_size step 1 | extend v=1| summarize makelist(v));    
    let min_activity = 1;
    customEvents
    | where timestamp > starttime
    | where customDimensions["sourceapp"]=="ai-loganalyticsui-prod"
    | where (name == event_name)
    | where user_AuthenticatedId <> ""
    | make-series UserClicks=count() default=0 on timestamp 
        in range(starttime, endtime-1s, interval) by user_AuthenticatedId
    | extend RollingUserClicks=fir(UserClicks, moving_sum_filter, false)
    | project User_AuthenticatedId=user_AuthenticatedId , RollingUserClicksByDay=zip(timestamp, RollingUserClicks)
    | mv-expand RollingUserClicksByDay
    | extend Timestamp=todatetime(RollingUserClicksByDay[0])
    | extend RollingActiveUsersByDay=iff(toint(RollingUserClicksByDay[1]) >= min_activity, 1, 0)
    | summarize sum(RollingActiveUsersByDay) by Timestamp
    | where Timestamp > starttime + 28d
};
// Use the moving_sum_filter with bin size of 28 to count MAU.
rollingDcount(28, "Checkout")
| join
(
    // Use the moving_sum_filter with bin size of 1 to count DAU.
    rollingDcount(1, "Checkout")
)
on Timestamp
| project sum_RollingActiveUsersByDay1 *1.0 / sum_RollingActiveUsersByDay, Timestamp
| render timechart

输出如下:Here's the output:

显示一段时间内用户粘性的图表的屏幕截图。

回归分析Regression analysis

此示例演示如何专门根据应用程序的跟踪日志,为服务中断创建自动检测器。This example demonstrates how to create an automated detector for service disruptions based exclusively on an application's trace logs. 该检测器会在应用程序的相对错误和警告跟踪数量中查找异常的突发性增长。The detector seeks abnormal, sudden increases in the relative amount of error and warning traces in the application.

使用两种方法来根据跟踪日志数据评估服务状态:Two techniques are used to evaluate the service status based on trace logs data:

  • 使用 make-series 将半结构化文本跟踪日志转换为指标,用于表示正跟踪线与负跟踪线之间的比率。Use make-series to convert semi-structured textual trace logs into a metric that represents the ratio between positive and negative trace lines.
  • 结合时序分析和双线线性回归,使用 series_fit_2linesseries_fit_line 执行高级跳步检测。Use series_fit_2lines and series_fit_line for advanced step-jump detection by using time-series analysis with a two-line linear regression.
let startDate = startofday(datetime("2017-02-01"));
let endDate = startofday(datetime("2017-02-07"));
let minRsquare = 0.8;  // Tune the sensitivity of the detection sensor. Values close to 1 indicate very low sensitivity.
// Count all Good (Verbose + Info) and Bad (Error + Fatal + Warning) traces, per day.
traces
| where timestamp > startDate and timestamp < endDate
| summarize 
    Verbose = countif(severityLevel == 0),
    Info = countif(severityLevel == 1), 
    Warning = countif(severityLevel == 2),
    Error = countif(severityLevel == 3),
    Fatal = countif(severityLevel == 4) by bin(timestamp, 1d)
| extend Bad = (Error + Fatal + Warning), Good = (Verbose + Info)
// Determine the ratio of bad traces, from the total.
| extend Ratio = (todouble(Bad) / todouble(Good + Bad))*10000
| project timestamp , Ratio
// Create a time series.
| make-series RatioSeries=any(Ratio) default=0 on timestamp in range(startDate , endDate -1d, 1d) by 'TraceSeverity' 
// Apply a 2-line regression to the time series.
| extend (RSquare2, SplitIdx, Variance2,RVariance2,LineFit2)=series_fit_2lines(RatioSeries)
// Find out if our 2-line is trending up or down.
| extend (Slope,Interception,RSquare,Variance,RVariance,LineFit)=series_fit_line(LineFit2)
// Check whether the line fit reaches the threshold, and if the spike represents an increase (rather than a decrease).
| project PatternMatch = iff(RSquare2 > minRsquare and Slope>0, "Spike detected", "No Match")

后续步骤Next steps