教程Tutorial

了解 Kusto 查询语言的最好方法是查看一些简单的查询,以便使用包含一些示例数据的数据库来“感受”该语言。The best way to learn about the Kusto query language is to look at some simple queries to get the "feel" for the language using a database with some sample data. 本文中演示的查询应在该数据库上运行。The queries demonstrated in this article should run on that database. 这个示例数据库中的 StormEvents 表提供了在美国发生的风暴的一些相关信息。The StormEvents table in this sample database provides some information about storms that happened in the U.S.

统计行数Count rows

我们的示例数据库有一个名为 StormEvents 的表。Our example database has a table called StormEvents. 为了了解其大小,可通过管道符号将其内容注入只计算行数的运算符中:To find out how big it is, we'll pipe its content into an operator that simply counts the rows:

  • 语法: 查询是一个数据源(通常是表名称),可以选择后跟一对或多对管道字符和一些表格运算符。Syntax: A query is a data source (usually a table name), optionally followed by one or more pairs of the pipe character and some tabular operator.
StormEvents | count

结果如下:Here's the result:

计数Count
5906659066

count 运算符count operator.

project:选择列的子集project: select a subset of columns

使用 project 挑选出所需列。Use project to pick out just the columns you want. 请参阅下面的示例,它同时使用 projecttake 运算符。See example below that uses both project and take operator.

where:按布尔表达式筛选where: filtering by a Boolean expression

让我们只查看 2007 年 2 月期间 California 中的 floodLet's see only the floods in California during Feb-2007:

StormEvents
| where StartTime > datetime(2007-02-01) and StartTime < datetime(2007-03-01)
| where EventType == 'Flood' and State == 'CALIFORNIA'
| project StartTime, EndTime , State , EventType , EpisodeNarrative
StartTimeStartTime EndTimeEndTime 状态State EventTypeEventType EpisodeNarrativeEpisodeNarrative
2007-02-19 00:00:00.00000002007-02-19 00:00:00.0000000 2007-02-19 08:00:00.00000002007-02-19 08:00:00.0000000 CALIFORNIACALIFORNIA 洪水Flood 19 日凌晨,一股锋面系统穿过南部的 San Joaquin 山谷,给西部的 Kern 县带来了短时强降雨。A frontal system moving across the Southern San Joaquin Valley brought brief periods of heavy rain to western Kern County in the early morning hours of the 19th. 据报道,Taft 附近的 166 号州高速公路发生不严重的洪水。Minor flooding was reported across State Highway 166 near Taft.

take:显示 n 行take: show me n rows

让我们看看一些数据 - 示例 5 行中有什么?Let's see some data - what's in a sample 5 rows?

StormEvents
| take 5
| project  StartTime, EndTime, EventType, State, EventNarrative  
StartTimeStartTime EndTimeEndTime EventTypeEventType 状态State EventNarrativeEventNarrative
2007-09-18 20:00:00.00000002007-09-18 20:00:00.0000000 2007-09-19 18:00:00.00000002007-09-19 18:00:00.0000000 暴雨Heavy Rain 佛罗里达州FLORIDA 在 24 小时的时间里,Volusia 县沿海部分地区降雨量高达 9 英寸。As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County.
2007-09-20 21:57:00.00000002007-09-20 21:57:00.0000000 2007-09-20 22:05:00.00000002007-09-20 22:05:00.0000000 龙卷风Tornado 佛罗里达州FLORIDA 龙卷风袭击了位于 West Crooked 湖北端的 Eustis 镇。A tornado touched down in the Town of Eustis at the northern end of West Crooked Lake. 该龙卷风迅速增强为 EF1 级别,并向西北偏北移动,穿过 Eustis。The tornado quickly intensified to EF1 strength as it moved north northwest through Eustis. 其移动轨迹将近 2 英里长,最大宽度为 300 码。The track was just under two miles long and had a maximum width of 300 yards. 龙卷风摧毁了 7 座房屋。The tornado destroyed 7 homes. 27 所房屋受到严重破坏,81 所房屋报告有轻微损坏。Twenty seven homes received major damage and 81 homes reported minor damage. 没有发生严重的人员伤亡,财产损失估计为 620 万美元。There were no serious injuries and property damage was set at $6.2 million.
2007-09-29 08:11:00.00000002007-09-29 08:11:00.0000000 2007-09-29 08:11:00.00000002007-09-29 08:11:00.0000000 海龙卷Waterspout 大西洋南部ATLANTIC SOUTH Melbourne 海滩东南的大西洋上形成了一个海龙卷,并短暂地向海岸移动。A waterspout formed in the Atlantic southeast of Melbourne Beach and briefly moved toward shore.
2007-12-20 07:50:00.00000002007-12-20 07:50:00.0000000 2007-12-20 07:53:00.00000002007-12-20 07:53:00.0000000 雷雨大风Thunderstorm Wind 密西西比州MISSISSIPPI 许多大树被吹倒,有些倒在电线上。Numerous large trees were blown down with some down on power lines. 东部 Adams 县遭到破坏。Damage occurred in eastern Adams county.
2007-12-30 16:00:00.00000002007-12-30 16:00:00.0000000 2007-12-30 16:05:00.00000002007-12-30 16:05:00.0000000 雷雨大风Thunderstorm Wind 佐治亚州GEORGIA 县政府报告说,在 206 国道附近的 Quincey Batten 环线上,有几棵树被吹倒。The county dispatch reported several trees were blown down along Quincey Batten Loop near State Road 206. 预估了树木移除费用。The cost of tree removal was estimated.

但是 take 不按特定顺序显示表中的行,所以现在对它们进行排序。But take shows rows from the table in no particular order, so let's sort them.

  • limittake 的别名,具有相同的效果。limit is an alias for take and will have the same effect.

sort 和 topsort and top

  • 语法: 有些运算符具有由关键字(如 by)引入的参数。Syntax: Some operators have parameters introduced by keywords such as by.
  • desc = 降序,asc = 升序。desc = descending order, asc = ascending.

显示前 n 行,按特定列排序:Show me the first n rows, ordered by a particular column:

StormEvents
| top 5 by StartTime desc
| project  StartTime, EndTime, EventType, State, EventNarrative  
StartTimeStartTime EndTimeEndTime EventTypeEventType 状态State EventNarrativeEventNarrative
2007-12-31 22:30:00.00000002007-12-31 22:30:00.0000000 2007-12-31 23:59:00.00000002007-12-31 23:59:00.0000000 冬季风暴Winter Storm 密歇根州MICHIGAN 这场大雪持续到元旦凌晨。This heavy snow event continued into the early morning hours on New Year's Day.
2007-12-31 22:30:00.00000002007-12-31 22:30:00.0000000 2007-12-31 23:59:00.00000002007-12-31 23:59:00.0000000 冬季风暴Winter Storm 密歇根州MICHIGAN 这场大雪持续到元旦凌晨。This heavy snow event continued into the early morning hours on New Year's Day.
2007-12-31 22:30:00.00000002007-12-31 22:30:00.0000000 2007-12-31 23:59:00.00000002007-12-31 23:59:00.0000000 冬季风暴Winter Storm 密歇根州MICHIGAN 这场大雪持续到元旦凌晨。This heavy snow event continued into the early morning hours on New Year's Day.
2007-12-31 23:53:00.00000002007-12-31 23:53:00.0000000 2007-12-31 23:53:00.00000002007-12-31 23:53:00.0000000 疾风High Wind CALIFORNIACALIFORNIA 据报道,Ventura 县的山区有北到东北阵风,风速约为每小时 58 英里。North to northeast winds gusting to around 58 mph were reported in the mountains of Ventura county.
2007-12-31 23:53:00.00000002007-12-31 23:53:00.0000000 2007-12-31 23:53:00.00000002007-12-31 23:53:00.0000000 疾风High Wind CALIFORNIACALIFORNIA 温泉 RAWS 传感器报告说,北风达到每小时 58 英里。The Warm Springs RAWS sensor reported northerly winds gusting to 58 mph.

同样可以通过依次使用 sorttake 运算符来实现Same can be achieved by using sort and then take operator

StormEvents
| sort by StartTime desc
| take 5
| project  StartTime, EndLat, EventType, EventNarrative

extend:计算派生的列extend: compute derived columns

通过计算每行中的值来创建新列:Create a new column by computing a value in every row:

StormEvents
| limit 5
| extend Duration = EndTime - StartTime 
| project StartTime, EndTime, Duration, EventType, State
StartTimeStartTime EndTimeEndTime 持续时间Duration EventTypeEventType 状态State
2007-09-18 20:00:00.00000002007-09-18 20:00:00.0000000 2007-09-19 18:00:00.00000002007-09-19 18:00:00.0000000 22:00:0022:00:00 暴雨Heavy Rain 佛罗里达州FLORIDA
2007-09-20 21:57:00.00000002007-09-20 21:57:00.0000000 2007-09-20 22:05:00.00000002007-09-20 22:05:00.0000000 00:08:0000:08:00 龙卷风Tornado 佛罗里达州FLORIDA
2007-09-29 08:11:00.00000002007-09-29 08:11:00.0000000 2007-09-29 08:11:00.00000002007-09-29 08:11:00.0000000 00:00:0000:00:00 海龙卷Waterspout 大西洋南部ATLANTIC SOUTH
2007-12-20 07:50:00.00000002007-12-20 07:50:00.0000000 2007-12-20 07:53:00.00000002007-12-20 07:53:00.0000000 00:03:0000:03:00 雷雨大风Thunderstorm Wind 密西西比州MISSISSIPPI
2007-12-30 16:00:00.00000002007-12-30 16:00:00.0000000 2007-12-30 16:05:00.00000002007-12-30 16:05:00.0000000 00:05:0000:05:00 雷雨大风Thunderstorm Wind 佐治亚州GEORGIA

可以重用列名并将计算结果分配给同一列。It is possible to reuse column name and assign calculation result to the same column. 例如:For example:

print x=1
| extend x = x + 1, y = x
| extend x = x + 1
xx yy
33 11

Scalar 表达式可包含所有常用运算符(+-*/%),并含有一系列有用的函数。Scalar expressions can include all the usual operators (+, -, *, /, %), and there's a range of useful functions.

summarize:聚合行组summarize: aggregate groups of rows

统计每个国家/地区的事件数:Count how many events come from each country:

StormEvents
| summarize event_count = count() by State

summarize 将在 by 子句中拥有相同值的行组合在一起,然后使用聚合函数(如 count)将每个组合并到单个行。summarize groups together rows that have the same values in the by clause, and then uses the aggregation function (such as count) to combine each group into a single row. 因此,在此情况下,每个州都有相应的行,还有一个列,用来表示该州的行计数。So in this case, there's a row for each state, and a column for the count of rows in that state.

有一系列聚合函数,并且可以在一个 summarize 运算符中使用其中多个函数,以生成多个计算列。There's a range of aggregation functions, and you can use several of them in one summarize operator to produce several computed columns. 例如,我们可以获取每个州的风暴计数,以及每个州每个独特类型的风暴总计数,For example, we could get the count of storms in each state and also a sum of the unique type of storms per state,
然后我们可以用 top 来获取受风暴影响最严重的州:then we could use top to get the most storm-affected states:

StormEvents 
| summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State
| top 5 by StormCount desc
状态State StormCountStormCount TypeOfStormsTypeOfStorms
德克萨斯TEXAS 47014701 2727
KANSASKANSAS 31663166 2121
衣阿华州IOWA 23372337 1919
ILLINOISILLINOIS 20222022 2323
MISSOURIMISSOURI 20162016 2020

summarize 的结果是:The result of a summarize has:

  • by 命名的各列;each column named in by;
  • 每个计算表达式相应的列;a column for each computed expression;
  • 每个 by 值组合相应的行。a row for each combination of by values.

按标量值汇总Summarize by scalar values

可以在 by 子句中使用标量(数值、时间或间隔)值,但需要将这些值放入箱中。You can use scalar (numeric, time, or interval) values in the by clause, but you'll want to put the values into bins.
bin() 函数可用于此目的:The bin() function is useful for this:

StormEvents
| where StartTime > datetime(2007-02-14) and StartTime < datetime(2007-02-21)
| summarize event_count = count() by bin(StartTime, 1d)

这会将所有时间戳缩短为 1 天的间隔:This reduces all the timestamps to intervals of 1 day:

StartTimeStartTime event_countevent_count
2007-02-14 00:00:00.00000002007-02-14 00:00:00.0000000 180180
2007-02-15 00:00:00.00000002007-02-15 00:00:00.0000000 6666
2007-02-16 00:00:00.00000002007-02-16 00:00:00.0000000 164164
2007-02-17 00:00:00.00000002007-02-17 00:00:00.0000000 103103
2007-02-18 00:00:00.00000002007-02-18 00:00:00.0000000 2222
2007-02-19 00:00:00.00000002007-02-19 00:00:00.0000000 5252
2007-02-20 00:00:00.00000002007-02-20 00:00:00.0000000 6060

bin() 在许多语言中与 floor() 函数等效。The bin() is the same as the floor() function in many languages. 它简单地将每个值缩减为最接近提供的系数的倍数,这样 summarize 便可以将行分配给组。It simply reduces every value to the nearest multiple of the modulus that you supply, so that summarize can assign the rows to groups.

Render:显示图表或表Render: display a chart or table

投影两列,将其用作图表的 x 轴和 y 轴:Project two columns and use them as the x and y axis of a chart:

StormEvents 
| summarize event_count=count(), mid = avg(BeginLat) by State 
| sort by mid
| where event_count > 1800
| project State, event_count
| render columnchart

按州划分的风暴事件计数柱状图

虽然我们在项目操作中删除了 mid,但是如果我们希望图表中按顺序显示国家/地区,则仍然需要它。Although we removed mid in the project operation, we still need it if we want the chart to display the countries in that order.

严格地说,“render”(呈现)是客户端的一项功能,而不是查询语言的一部分。Strictly speaking, 'render' is a feature of the client rather than part of the query language. 尽管如此,它已经集成到语言中,在预想结果时会非常有用。Still, it's integrated into the language and is very useful for envisioning your results.

时间图表Timecharts

回到数字箱,显示一个时序:Going back to numeric bins, let's display a time series:

StormEvents
| summarize event_count=count() by bin(StartTime, 1d)
| render timechart

按时间划分的折线图事件

多个序列Multiple series

summarize by 子句中使用多个值可为值的每个组合创建单独的行:Use multiple values in a summarize by clause to create a separate row for each combination of values:

StormEvents 
| where StartTime > datetime(2007-06-04) and StartTime < datetime(2007-06-10) 
| where Source in ("Source","Public","Emergency Manager","Trained Spotter","Law Enforcement")
| summarize count() by bin(StartTime, 10h), Source

按源划分的表计数

只需将呈现术语添加到上面:| render timechartJust add the render term to the above: | render timechart.

按源划分的折线图计数

注意,render timechart 使用第一列作为 x 轴,然后将其他列显示为单独的行。Notice that render timechart uses the first column as the x-axis, and then displays the other columns as separate lines.

每日平均周期Daily average cycle

平均一天的活动如何变化?How does activity vary over the average day?

按一天的时间取模对事件计数,将其量化到小时。Count events by the time modulo one day, binned into hours. 请注意,我们使用 floor 而不是 bin:Note that we use floor instead of bin:

StormEvents
| extend hour = floor(StartTime % 1d , 1h)
| summarize event_count=count() by hour
| sort by hour asc
| render timechart

按小时划分的时间图计数

目前,render 无法正确标记持续时间,但我们可以使用 | render columnchart 代替:Currently, render doesn't label durations properly, but we could use | render columnchart instead:

按小时计算的柱形图计数

比较多个每日系列Compare multiple daily series

不同州一天中不同时间的活动如何变化?How does activity vary over the time of day in different states?

StormEvents
| extend hour= floor( StartTime % 1d , 1h)
| where State in ("GULF OF MEXICO","MAINE","VIRGINIA","WISCONSIN","NORTH DAKOTA","NEW JERSEY","OREGON")
| summarize event_count=count() by hour, State
| render timechart

按小时和州划分的的时间图表

除以 1h 可将 x 轴转换为小时数而不是持续时间:Divide by 1h to turn the x-axis into hour number instead of a duration:

StormEvents
| extend hour= floor( StartTime % 1d , 1h)/ 1h
| where State in ("GULF OF MEXICO","MAINE","VIRGINIA","WISCONSIN","NORTH DAKOTA","NEW JERSEY","OREGON")
| summarize event_count=count() by hour, State
| render columnchart

按小时和州划分的柱形图

联接Join

如何基于两个给定的 EventType 查找哪个州同时发生了这两种类型的事件?How to find for two given EventTypes in what state both of them happened?

可以先拉取类型为第一个 EventType 的风暴事件,再拉取第二个 EventType 的风暴事件,然后在“州”上联接这两个集合。You can pull storm events with the first EventType and with the second EventType and then join the two sets on State.

StormEvents
| where EventType == "Lightning"
| join (
    StormEvents 
    | where EventType == "Avalanche"
) on State  
| distinct State

联接闪电和雪崩事件

关于联接的用户会话示例User session example of join

本部分不使用 StormEvents 表。This section does not use the StormEvents table.

假设你的数据包含标记每个用户会话的开始和结束的事件,并且每个会话都具有唯一的 ID。Assume you have data that includes events marking the start and end of each user session, with a unique ID for each session.

每个用户会话的持续时间是多少?How long does each user session last?

通过使用 extend 为这两个时间戳提供别名,就可以计算会话持续时间。By using extend to provide an alias for the two timestamps, you can then compute the session duration.

Events
| where eventName == "session_started"
| project start_time = timestamp, stop_time, country, session_id
| join ( Events
    | where eventName == "session_ended"
    | project stop_time = timestamp, session_id
    ) on session_id
| extend duration = stop_time - start_time
| project start_time, stop_time, country, duration
| take 10

用户会话扩展

执行联接前,最好使用 project 选中所需列。It's good practice to use project to select just the columns we need before performing the join. 在同一子句中,重命名时间戳列。In the same clauses, we rename the timestamp column.

绘制分布图Plot a distribution

不同长度的风暴分别有多少个?How many storms are there of different lengths?

StormEvents
| extend  duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize event_count = count()
    by bin(duration, 5m)
| sort by duration asc
| render timechart

按持续时间划分的事件计数时间表

或使用 | render columnchartOr use | render columnchart:

按持续时间划分的柱形图事件计数时间表

百分位数Percentiles

哪些持续时间范围涵盖不同风暴百分比?What ranges of durations cover different percentages of storms?

使用上述查询,但将 render 替换为:Use the above query, but replace render with:

| summarize percentiles(duration, 5, 20, 50, 80, 95)

在本例中,我们没有提供 by 子句,因此结果为单行:In this case, we provided no by clause, so the result is a single row:

按持续时间划分的表汇总百分位数

从中我们可以发现:From which we can see that:

  • 5% 的风暴持续时间小于 5 分钟;5% of storms have a duration of less than 5m;
  • 50%的风暴持续时间小于 1 小时 25 分钟;50% of storms last less than 1h 25m;
  • 5% 的风暴持续时间至少为 2 小时 50 分钟。5% of storms last at least 2h 50m.

若要获取各个州的单独细分情况,只需通过两个 summarize 运算符单独运算州列:To get a separate breakdown for each state, we just have to bring the state column separately through both summarize operators:

StormEvents
| extend  duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize event_count = count()
    by bin(duration, 5m), State
| sort by duration asc
| summarize percentiles(duration, 5, 20, 50, 80, 95) by State

按州划分的表汇总百分位数持续时间

Let:将结果分配给变量Let: Assign a result to a variable

在上面的“联接”示例中,使用 let 分隔查询表达式的各个部分。Use let to separate out the parts of the query expression in the 'join' example above. 结果不变:The results are unchanged:

let LightningStorms = 
    StormEvents
    | where EventType == "Lightning";
let AvalancheStorms = 
    StormEvents
    | where EventType == "Avalanche";
LightningStorms 
| join (AvalancheStorms) on State
| distinct State

提示:在 Kusto 客户端中,请勿在此各部分间放入空白行。Tip: In the Kusto client, don't put blank lines between the parts of this. 请务必执行所有运算。Make sure to execute all of it.

在查询中组合来自多个数据库的数据Combining data from several databases in a query

有关详细讨论,请参阅跨数据库查询See cross-database queries for detailed discussion

编写样式查询时:When you write a query of the style:

Logs | where ...

名为 Logs 的表必须位于默认数据库中。The table named Logs has to be in your default database. 如果要从其他数据库访问表,请使用以下语法:If you want to access tables from another database use the following syntax:

database("db").Table

因此,如果你有名为 Diagnostics 和 Telemetry 的数据库,并想要将它们的一些数据关联起来,则可以编写(假设 Diagnostics 是默认数据库) So if you have databases named Diagnostics and Telemetry and want to correlate some of their data, you might write (assuming Diagnostics is your default database)

Logs | join database("Telemetry").Metrics on Request MachineId | ...

或者如果你的默认数据库是 Telemetryor if your default database is Telemetry

union Requests, database("Diagnostics").Logs | ...

上述所有操作都假设两个数据库都位于当前连接的群集中。All of the above assumed that both databases reside in the cluster you are currently connected to. 假设 Telemetry 数据库属于另一个名为 TelemetryCluster.kusto.chinacloudapi.cn 的群集,为了访问它,你需要 Suppose that Telemetry database belonged to another cluster named TelemetryCluster.kusto.chinacloudapi.cn then to access it you'll need

Logs | join cluster("TelemetryCluster").database("Telemetry").Metrics on Request MachineId | ...

注意:指定群集后,数据库是必需的Note: when the cluster is specified the database is mandatory

Azure Monitor 不支持此功能This capability isn't supported in Azure Monitor