在 Azure 数据资源管理器中进行异常情况检测和预测Anomaly detection and forecasting in Azure Data Explorer

Azure 数据资源管理器持续从云服务或 IoT 设备收集遥测数据。Azure Data Explorer performs on-going collection of telemetry data from cloud services or IoT devices. 分析这些数据可获得各种见解,例如,监视服务的运行状况、物理生产流程、使用趋势和负载预测。This data is analyzed for various insights such as monitoring service health, physical production processes, usage trends, and load forecast. 分析是针对所选指标的时序执行的,以找出指标模式与其典型正常基线模式之间的偏差。The analysis is done on time series of selected metrics to locate a deviation pattern of the metric relative to its typical normal baseline pattern. Azure 数据资源管理器原生支持创建、操作和分析多个时序。Azure Data Explorer contains native support for creation, manipulation, and analysis of multiple time series. 它可以在几秒钟内创建和分析数千个时序,实现近实时的监视解决方案和工作流。It can create and analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows.

本文将详细介绍 Azure 数据资源管理器时序异常情况检测和预测功能。This article details the Azure Data Explorer time series anomaly detection and forecasting capabilities. 适用的时序函数基于一个可靠的已知分解模型,其中的每个原始时序将分解成季节性组件、趋势组件和残余组件。The applicable time series functions are based on a robust well-known decomposition model, where each original time series is decomposed into seasonal, trend, and residual components. 异常情况是根据残余组件上的离群值检测的,而预测则是通过推算季节性组件和趋势组件执行的。Anomalies are detected by outliers on the residual component, while forecasting is done by extrapolating the seasonal and trend components. Azure 数据资源管理器实现显著增强了基本分解模型,它可以自动检测季节性、可靠分析离群值,并使用矢量化实现在几秒钟内处理数千个时序。The Azure Data Explorer implementation significantly enhances the basic decomposition model by automatic seasonality detection, robust outlier analysis, and vectorized implementation to process thousands of time series in seconds.

先决条件Prerequisites

有关时序功能的概述,请参阅 Azure 数据资源管理器中的时序分析Read Time series analysis in Azure Data Explorer for an overview of time series capabilities.

时序分解模型Time series decomposition model

用于时序预测和异常情况检测的 Azure 数据资源管理器本机实现使用一个已知的分解模型。Azure Data Explorer native implementation for time series prediction and anomaly detection uses a well-known decomposition model. 此模型将应用到预期的指标时序,以揭示定期行为和趋势的行为(例如服务流量、组件检测信号和 IoT 定期度量值),从而预测将来的指标值和检测异常的指标值。This model is applied to time series of metrics expected to manifest periodic and trend behavior, such as service traffic, component heartbeats, and IoT periodic measurements to forecast future metric values and detect anomalous ones. 此回归过程的假设条件是,时序是随机分布的,而不是存在事先已知的季节性行为和趋势行为。The assumption of this regression process is that other than the previously known seasonal and trend behavior, the time series is randomly distributed. 然后,你可以通过季节性组件和趋势组件(统称为基线)预测将来的指标值,并忽略残余部分。You can then forecast future metric values from the seasonal and trend components, collectively named baseline, and ignore the residual part. 也可以仅使用残余部分基于离群值分析检测异常值。You can also detect anomalous values based on outlier analysis using only the residual portion. 若要创建分解模型,请使用函数 series_decompose()To create a decomposition model, use the function series_decompose(). series_decompose() 函数采用一系列时序,并自动将每个时序分解成其季节性、趋势、残余和基线组件。The series_decompose() function takes a set of time series and automatically decomposes each time series to its seasonal, trend, residual, and baseline components.

例如,可以使用以下查询分解内部 Web 服务的流量:For example, you can decompose traffic of an internal web service by using the following query:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition', ysplit=panels)

时序分解

  • 原始时序带有 num(如红色所示)标签。The original time series is labeled num (in red).
  • 分解过程首先使用函数 series_periods_detect() 自动检测季节性,并提取季节性模式(如紫色所示)。The process starts by auto detection of the seasonality by using the function series_periods_detect() and extracts the seasonal pattern (in purple).
  • 从原始时序中减去季节性模式,并使用函数 series_fit_line() 运行线性回归,以找到趋势组件(如浅蓝色所示)。The seasonal pattern is subtracted from the original time series and a linear regression is run using the function series_fit_line() to find the trend component (in light blue).
  • 该函数减去趋势,余下的部分是残余组件(如绿色所示)。The function subtracts the trend and the remainder is the residual component (in green).
  • 最后,该函数将季节性组件和趋势组件相加,以生成基线(如蓝色所示)。Finally, the function adds the seasonal and trend components to generate the baseline (in blue).

时序异常情况检测Time series anomaly detection

函数 series_decompose_anomalies() 查找一组时序中的异常点。The function series_decompose_anomalies() finds anomalous points on a set of time series. 此函数调用 series_decompose() 来生成分解模型,然后对残余组件运行 series_outliers()This function calls series_decompose() to build the decomposition model and then runs series_outliers() on the residual component. series_outliers() 使用 Tukey 隔离测试计算残余组件的每个点的异常评分。series_outliers() calculates anomaly scores for each point of the residual component using Tukey's fence test. 异常评分大于 1.5 或小于 -1.5 分别表示异常有轻微的上升或下降。Anomaly scores above 1.5 or below -1.5 indicate a mild anomaly rise or decline respectively. 异常评分大于 3.0 或小于 -3.0 表示明显的异常。Anomaly scores above 3.0 or below -3.0 indicate a strong anomaly.

使用以下查询可以检测内部 Web 服务流量的异常:The following query allows you to detect anomalies in internal web service traffic:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.

时序异常情况检测

  • 原始时序(如红色所示)。The original time series (in red).
  • 基线(季节性 + 趋势)组件(如蓝色所示)。The baseline (seasonal + trend) component (in blue).
  • 原始时序顶层的异常点(如紫色所示)。The anomalous points (in purple) on top of the original time series. 异常点明显偏离于预期的基线值。The anomalous points significantly deviate from the expected baseline values.

时序预测Time series forecasting

函数 series_decompose_forecast() 预测一组时序的未来值。The function series_decompose_forecast() predicts future values of a set of time series. 此函数调用 series_decompose() 生成分解模型,然后针对每个时序,推断未来的基线组件。This function calls series_decompose() to build the decomposition model and then, for each time series, extrapolates the baseline component into the future.

使用以下查询可以预测下一周的 Web 服务流量:The following query allows you to predict next week's web service traffic:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week by Time Series Decmposition')

时序预测

  • 原始指标(如红色所示)。Original metric (in red). 未来值缺失,已按默认设置为 0。Future values are missing and set to 0, by default.
  • 推断基线组件(如蓝色所示)以预测下一周的值。Extrapolate the baseline component (in blue) to predict next week’s values.

可伸缩性Scalability

Azure 数据资源管理器查询语言语法允许通过单个调用来处理多个时序。Azure Data Explorer query language syntax enables a single call to process multiple time series. 其独特的优化实现可以提高性能,在近实时方案中监视数千个计数器时,若要有效进行异常情况检测和预测,这种优势非常关键。Its unique optimized implementation allows for fast performance, which is critical for effective anomaly detection and forecasting when monitoring thousands of counters in near real-time scenarios.

以下查询显示同时处理三个时序的结果:The following query shows the processing of three time series simultaneously:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| extend offset=case(sid=='TS3', 4000000, sid=='TS2', 2000000, 0)   //  add artificial offset for easy visualization of multiple time series
| extend num=series_add(num, offset)
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week for 3 time series')

时序可伸缩性

摘要Summary

本文档详细介绍了用于时序异常情况检测和预测的本机 Azure 数据资源管理器函数。This document details native Azure Data Explorer functions for time series anomaly detection and forecasting. 每个原始时序将分解成季节性、趋势和残余组件,以检测异常情况和/或进行预测。Each original time series is decomposed into seasonal, trend and residual components for detecting anomalies and/or forecasting. 这些功能可用于近实时监视方案,例如故障检测、预测性维护以及需求和负载预测。These functionalities can be used for near real-time monitoring scenarios, such as fault detection, predictive maintenance, and demand and load forecasting.

后续步骤Next steps

了解 Azure 数据资源管理器中的机器学习功能Learn about Machine learning capabilities in Azure Data Explorer.