Azure 数据资源管理器中的时序分析Time series analysis in Azure Data Explorer

Azure 数据资源管理器 (ADX) 持续从云服务或 IoT 设备收集遥测数据。Azure Data Explorer (ADX) performs on-going collection of telemetry data from cloud services or IoT devices. 分析这些数据可获得各种见解,例如,监视服务的运行状况、物理生产流程和使用趋势。This data can be analyzed for various insights such as monitoring service health, physical production processes, and usage trends. 分析是针对所选指标的时序执行的,以找出某种模式与其典型基线模式之间的偏差。Analysis is done on time series of selected metrics to find a deviation in the pattern compared to its typical baseline pattern. ADX 原生支持创建、操作和分析多个时序。ADX contains native support for creation, manipulation, and analysis of multiple time series. 本主题介绍如何使用 ADX 在几秒钟内创建和分析数千个时序,实现近实时的监视解决方案和工作流。In this topic, learn how ADX is used to create and analyze thousands of time series in seconds, enabling near real-time monitoring solutions and workflows.

时序的创建Time series creation

在本部分,我们将使用 make-series 运算符方便直观地创建大量正则时序,并根据需要填充缺失值。In this section, we'll create a large set of regular time series simply and intuitively using the make-series operator, and fill-in missing values as needed. 时序分析的第一个步骤是将原始遥测表分区并转换为一组时序。The first step in time series analysis is to partition and transform the original telemetry table to a set of time series. 该表通常包含时间戳列、上下文维度和可选指标。The table usually contains a timestamp column, contextual dimensions, and optional metrics. 维度用于将数据分区。The dimensions are used to partition the data. 目标是按固定的时间间隔为每个分区创建数千个时序。The goal is to create thousands of time series per partition at regular time intervals.

输入表 demo_make_series1 包含任意 Web 服务流量的 60 万条记录。The input table demo_make_series1 contains 600K records of arbitrary web service traffic. 使用以下命令对 10 条记录采样:Use the command below to sample 10 records:

[ 单击以运行查询 ][Click to run query]

demo_make_series1 | take 10 

生成的表包含一个时间戳列、三个上下文维度列,但不包含指标:The resulting table contains a timestamp column, three contextual dimensions columns, and no metrics:

时间戳TimeStamp BrowserVerBrowserVer OsVerOsVer 国家/地区Country/Region
2016-08-25 09:12:35.40200002016-08-25 09:12:35.4020000 Chrome 51.0Chrome 51.0 Windows 7Windows 7 英国United Kingdom
2016-08-25 09:12:41.11200002016-08-25 09:12:41.1120000 Chrome 52.0Chrome 52.0 Windows 10Windows 10
2016-08-25 09:12:46.23000002016-08-25 09:12:46.2300000 Chrome 52.0Chrome 52.0 Windows 7Windows 7 英国United Kingdom
2016-08-25 09:12:46.51000002016-08-25 09:12:46.5100000 Chrome 52.0Chrome 52.0 Windows 10Windows 10 英国United Kingdom
2016-08-25 09:12:46.55700002016-08-25 09:12:46.5570000 Chrome 52.0Chrome 52.0 Windows 10Windows 10 立陶宛共和国Republic of Lithuania
2016-08-25 09:12:47.04700002016-08-25 09:12:47.0470000 Chrome 52.0Chrome 52.0 Windows 8.1Windows 8.1 印度India
2016-08-25 09:12:51.36000002016-08-25 09:12:51.3600000 Chrome 52.0Chrome 52.0 Windows 10Windows 10 英国United Kingdom
2016-08-25 09:12:51.69300002016-08-25 09:12:51.6930000 Chrome 52.0Chrome 52.0 Windows 7Windows 7 荷兰Netherlands
2016-08-25 09:12:56.42400002016-08-25 09:12:56.4240000 Chrome 52.0Chrome 52.0 Windows 10Windows 10 英国United Kingdom
2016-08-25 09:13:08.72300002016-08-25 09:13:08.7230000 Chrome 52.0Chrome 52.0 Windows 10Windows 10 印度India

由于没有指标,我们只能生成一组时序,用于表示由 OS 使用以下查询分区的流量计数本身:Since there are no metrics, we can only build a set of time series representing the traffic count itself, partitioned by OS using the following query:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp in range(min_t, max_t, 1h) by OsVer
| render timechart 
  • 使用 make-series 运算符创建由三个时序组成的集,其中:Use the make-series operator to create a set of three time series, where:
  • 实际时序数据结构是每个时间箱的聚合值的数值数组。The actual time series data structure is a numeric array of the aggregated value per each time bin. 我们将使用 render timechart 进行可视化。We use render timechart for visualization.

上表包含三个分区。In the table above, we have three partitions. 如下图所示,我们可为每个 OS 版本创建不同的时序:Windows 10(红色)、7(蓝色)和 8.1(绿色):We can create a separate time series: Windows 10 (red), 7 (blue) and 8.1 (green) for each OS version as seen in the graph:

时序分区

时序分析函数Time series analysis functions

在本部分,我们将执行典型的时序处理函数。In this section, we'll perform typical series processing functions. 创建一组时序后,ADX 支持使用时序文档中所述的一系列函数(该列表不断扩充)来处理和分析这些时序。Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the time series documentation. 下面将介绍几个用于处理和分析时序的有代表性函数。We will describe a few representative functions for processing and analyzing time series.

筛选Filtering

在信号处理中,筛选是常见的活动,可用于完成时序处理任务(例如,平滑化干扰信号、变化检测)。Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).

  • 有两个泛型筛选函数:There are two generic filtering functions:
    • series_fir():应用 FIR 筛选器。series_fir(): Applying FIR filter. 用于方便计算变化检测中时序的移动平均值和差异。Used for simple calculation of moving average and differentiation of the time series for change detection.
    • series_iir():应用 IIR 筛选器。series_iir(): Applying IIR filter. 用于指数平滑与累计求和。Used for exponential smoothing and cumulative sum.
  • 通过将大小为 5 个箱的新移动平均时序(名为 ma_num)添加到查询,来 Extend(扩展)时序集:Extend the time series set by adding a new moving average series of size 5 bins (named ma_num) to the query:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp in range(min_t, max_t, 1h) by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| render timechart

时序筛选

回归分析Regression analysis

ADX 支持使用分段线性回归分析来评估时序的趋势。ADX supports segmented linear regression analysis to estimate the trend of the time series.

  • 使用 series_fit_line() 将最佳线条拟合到时序即可实现一般趋势检测。Use series_fit_line() to fit the best line to a time series for general trend detection.
  • 使用 series_fit_2lines() 可以检测相对于基线的趋势变化,这种变化在监视方案中非常有用。Use series_fit_2lines() to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.

时序查询中 series_fit_line()series_fit_2lines() 函数的示例:Example of series_fit_line() and series_fit_2lines() functions in a time series query:

[ 单击以运行查询 ][Click to run query]

demo_series2
| extend series_fit_2lines(y), series_fit_line(y)
| render linechart with(xcolumn=x)

时序回归

  • 蓝色:原始时序Blue: original time series
  • 绿色:拟合的线条Green: fitted line
  • 红色:两条拟合的线条Red: two fitted lines

备注

该函数准确检测到了跳接(级别变化)点。The function accurately detected the jump (level change) point.

季节性检测Seasonality detection

许多指标遵循季节性(周期)模式。Many metrics follow seasonal (periodic) patterns. 云服务的用户流量通常包含每日和每周模式,在工作日的大约中段时间,流量是最高的;在夜间和周末,流量是最低的。User traffic of cloud services usually contains daily and weekly patterns that are highest around the middle of the business day and lowest at night and over the weekend. IoT 传感器按固定的间隔测量指标。IoT sensors measure in periodic intervals. 温度、压力或湿度等物理测量值也可能呈现季节性的行为。Physical measurements such as temperature, pressure, or humidity may also show seasonal behavior.

以下示例针对 Web 服务的一个月流量(2 小时箱)应用季节性检测:The following example applies seasonality detection on one month traffic of a web service (2-hour bins):

[ 单击以运行查询 ][Click to run query]

demo_series3
| render timechart 

时序季节性

备注

如果特定的非重复周期不存在,则表示出现异常It's an anomaly if specific distinct periods don't exist

[ 单击以运行查询 ][Click to run query]

demo_series3
| project (periods, scores) = series_periods_detect(num, 0., 14d/2h, 2) //to detect the periods in the time series
| mv-expand periods, scores
| extend days=2h*todouble(periods)/1d
periodsperiods 评分scores daysdays
8484 0.8206227860555950.820622786055595 77
1212 0.7646014058035020.764601405803502 11

该函数会检测每日和每周季节性。The function detects daily and weekly seasonality. 每日评分小于每周评分,因为周末的天数不同于工作日天数。The daily scores less than the weekly because weekend days are different from weekdays.

元素对应的函数Element-wise functions

可针对时序执行算术和逻辑运算。Arithmetic and logical operations can be done on a time series. 使用 series_subtract() 可以计算残差时序(原始指标与平滑化指标之差),并查看剩留信号中的异常:Using series_subtract() we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp in range(min_t, max_t, 1h) by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| extend residual_num=series_subtract(num, ma_num) //to calculate residual time series
| where OsVer == "Windows 10"   // filter on Win 10 to visualize a cleaner chart 
| render timechart

时序运算

  • 蓝色:原始时序Blue: original time series
  • 红色:经过平滑处理的时序Red: smoothed time series
  • 绿色:剩余的时序Green: residual time series

时序的大规模工作流Time series workflow at scale

以下示例演示如何在几秒钟内针对数千个时序大规模运行这些函数,以进行异常情况检测。The example below shows how these functions can run at scale on thousands of time series in seconds for anomaly detection. 若要查看 DB 服务的读取计数指标在过去四天的几个示例遥测记录,请运行以下查询:To see a few sample telemetry records of a DB service's read count metric over four days run the following query:

[ 单击以运行查询 ][Click to run query]

demo_many_series1
| take 4 
TIMESTAMPTIMESTAMP LocLoc anonOpanonOp DBDB DataReadDataRead
2016-09-11 21:00:00.00000002016-09-11 21:00:00.0000000 Loc 9Loc 9 51178539340496300895117853934049630089 262262 00
2016-09-11 21:00:00.00000002016-09-11 21:00:00.0000000 Loc 9Loc 9 51178539340496300895117853934049630089 241241 00
2016-09-11 21:00:00.00000002016-09-11 21:00:00.0000000 Loc 9Loc 9 -865998331941149874-865998331941149874 262262 279862279862
2016-09-11 21:00:00.00000002016-09-11 21:00:00.0000000 Loc 9Loc 9 371921734563783410371921734563783410 255255 00

和简单的统计信息:And simple statistics:

[ 单击以运行查询 ][Click to run query]

demo_many_series1
| summarize num=count(), min_t=min(TIMESTAMP), max_t=max(TIMESTAMP) 
numnum min_tmin_t max_tmax_t
21774722177472 2016-09-08 00:00:00.00000002016-09-08 00:00:00.0000000 2016-09-11 23:00:00.00000002016-09-11 23:00:00.0000000

在读取指标的 1 小时箱中生成时序(总共 4 天 * 24 小时 = 96 个点)会产生正态模式波动:Building a time series in 1-hour bins of the read metric (total four days * 24 hours = 96 points), results in normal pattern fluctuation:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP in range(min_t, max_t, 1h)
| render timechart with(ymin=0) 

大规模时序

上述行为有误导性,因为单个正态时序是从数千个可能具有异常模式的不同实例聚合而成的。The above behavior is misleading, since the single normal time series is aggregated from thousands of different instances that may have abnormal patterns. 因此,我们需为每个实例创建一个时序。Therefore, we create a time series per instance. 实例由 Loc(位置)、anonOp(运算)和 DB(特定计算机)定义。An instance is defined by Loc (location), anonOp (operation), and DB (specific machine).

可以创建多少个时序?How many time series can we create?

[ 单击以运行查询 ][Click to run query]

demo_many_series1
| summarize by Loc, Op, DB
| count
计数Count
1833918339

现在,我们将创建由读取计数指标的 18339 个时序组成的集。Now, we're going to create a set of 18339 time series of the read count metric. by 子句添加到 make-series 语句,应用线性回归,并选择具有最明显递减趋势的前两个时序:We add the by clause to the make-series statement, apply linear regression, and select the top two time series that had the most significant decreasing trend:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP in range(min_t, max_t, 1h) by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc 
| render timechart with(title='Service Traffic Outage for 2 instances (out of 18339)')

前两个时序

显示实例:Display the instances:

[ 单击以运行查询 ][Click to run query]

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP in range(min_t, max_t, 1h) by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc
| project Loc, Op, DB, slope 
LocLoc OpOp DBDB slopeslope
Loc 15Loc 15 3737 11511151 -102743.910227889-102743.910227889
Loc 13Loc 13 3737 12491249 -86303.2334644601-86303.2334644601

在不到两分钟的时间内,ADX 分析了接近 20,000 个时序并检测到了读取计数骤然下降的两个异常时序。In less than two minutes, ADX analyzed close to 20,000 time series and detected two abnormal time series in which the read count suddenly dropped.

将这些高级功能与 ADX 的高速性能相结合,可为时序分析提供独特且强大的解决方案。These advanced capabilities combined with ADX fast performance supply a unique and powerful solution for time series analysis.

后续步骤Next steps