series_decompose_anomalies()series_decompose_anomalies()

异常情况检测基于序列分解。Anomaly Detection is based on series decomposition. 有关详细信息,请参阅 series_decompose()For more information, see series_decompose().

此函数采用包含某个序列(动态数值数组)的表达式作为输入,并提取有分数的异常点。The function takes an expression containing a series (dynamic numerical array) as input, and extracts anomalous points with scores.

语法Syntax

series_decompose_anomalies (Series [, Threshold, Seasonality, Trend, Test_points, AD_method, Seasonality_threshold ])series_decompose_anomalies (Series [, Threshold, Seasonality, Trend, Test_points, AD_method, Seasonality_threshold ])

参数Arguments

  • Series:动态数组单元格(数值数组),通常是 make-seriesmake_list 运算符生成的输出Series : Dynamic array cell that is an array of numeric values, typically the resulting output of make-series or make_list operators
  • 阈值 :异常阈值,默认值为 1.5(k 值),用于检测轻度异常或更严重的异常Threshold : Anomaly threshold, default 1.5 (k value) for detecting mild or stronger anomalies
  • Seasonality:一个用于控制周期性分析的整数,其中包含Seasonality : An integer controlling the seasonal analysis, containing either
    • -1:使用 series_periods_detect 自动检测周期性 [默认值]-1: Autodetect seasonality (using series_periods_detect) [default]
    • 0:无周期性(即,跳过提取此组件的操作)0: No seasonality (that is, skip extracting this component)
    • 期间:一个正整数,按箱单元数指定所需期间。period: Positive integer, specifying the expected period in number of bins unit. 例如,如果将序列按 1 小时的箱分箱,则一个每周期间为 168 箱For example, if the series is in one hour bins, a weekly period is 168 bins
  • Trend:一个用于控制趋势分析的字符串,包含Trend : A string controlling the trend analysis, containing either
    • “avg”:将趋势组件定义为系列的平均值 [默认值]"avg": Define trend component as average of the series [default]
    • “none”:无趋势,跳过提取此组件的操作"none": No trend, skip extracting this component
    • “linefit”:使用线性回归提取趋势组件"linefit": Extract trend component using linear regression
  • Test_points:0 [默认值] 或正整数,指定要从学习(回归)过程中排除的序列末尾处的点的数目。Test_points : 0 [default] or a positive integer, that specifies the number of points at the end of the series to exclude from the learning (regression) process. 应当设置此参数,用于预测This parameter should be set for forecasting purposes
  • AD_method:一个字符串,用于控制对残差时序的异常情况检测方法,其中包含以下项之一:AD_method : A string controlling the anomaly detection method on the residual time series, containing one of:
    • “ctukey”:Tukey 的围栏测试,采用自定义的第 10 - 90 百分位范围 [默认值]“ctukey”: Tukey’s fence test with custom 10th-90th percentile range [default]
    • “tukey”:Tukey 的围栏测试,采用标准的第 25 - 75 百分位范围 。有关残差时序的详细信息,请参阅 series_outliers“tukey”: Tukey’s fence test with standard 25th-75th percentile range For more information on residual time series, see series_outliers
  • Seasonality_threshold:当 Seasonality 设置为 autodetect 时周期性分数的阈值。Seasonality_threshold : The threshold for seasonality score when Seasonality is set to autodetect. 默认分数阈值为 0.6The default score threshold is 0.6. 有关详细信息,请参阅 series_periods_detectFor more information, see series_periods_detect

返回Returns

该函数返回以下各个序列:The function returns the following respective series:

  • ad_flag:一个三元序列,其中包含 (+1, -1, 0),分别标记为正常/关闭/无异常ad_flag: A ternary series containing (+1, -1, 0) marking up/down/no anomaly respectively
  • ad_score:异常分数ad_score: Anomaly score
  • baseline:序列的预测值,通过分解获得baseline: The predicted value of the series, according to the decomposition

算法The algorithm

此函数执行以下步骤:This function follows these steps:

  1. 使用相应的参数调用 series_decompose() 来创建基线和残差序列。Calls series_decompose() with the respective parameters, to create the baseline and residuals series.
  2. 通过使用所选的异常情况检测方法对残差序列应用 series_outliers() 来计算 ad_score 序列。Calculates ad_score series by applying series_outliers() with the chosen anomaly detection method on the residuals series.
  3. 通过对 ad_score 应用阈值来计算 ad_flag 序列,分别标记为正常/关闭/无异常。Calculates the ad_flag series by applying the threshold on the ad_score to mark up/down/no anomaly respectively.

示例Examples

检测每周周期性中的异常Detect anomalies in weekly seasonality

在下面的示例中,将生成一个包含每周周期性的序列,然后向其中添加一些离群值。In the following example, generate a series with weekly seasonality, and then add some outliers to it. series_decompose_anomalies 会自动检测周期性并生成用于捕获重复模式的基线。series_decompose_anomalies autodetects the seasonality and generates a baseline that captures the repetitive pattern. 在 ad_score 组件中可以清楚地辨认出你添加的离群值。The outliers you added can be clearly spotted in the ad_score component.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)) // generate a series with weekly seasonality
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y)
| render timechart  

显示了基线和离群值的每周周期性

检测具有趋势的每周周期性中的异常Detect anomalies in weekly seasonality with trend

在此示例中,将向上一示例中的序列添加趋势。In this example, add a trend to the series from the previous example. 首先,使用默认参数运行 series_decompose_anomalies,其中的趋势 avg 默认值仅取平均值,不计算趋势。First, run series_decompose_anomalies with the default parameters in which the trend avg default value only takes the average and doesn't compute the trend. 与前面的示例相比,生成的基线不包含趋势,并且准确度更低。The generated baseline doesn't contain the trend and is less exact, compared to the previous example. 因此,你在数据中插入的某些离群值会因为差异较大而不被检测到。Consequently, some of the outliers you inserted in the data aren't detected because of the higher variance.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y)
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart

包含趋势的每周周期性离群值

接下来运行同一示例,但由于你预计序列中存在趋势,因此请在 trend 参数中指定 linefitNext, run the same example, but since you're expecting a trend in the series, specify linefit in the trend parameter. 你可以看到,基线更接近输入序列。You can see that the baseline is much closer to the input series. 将会检测到所有插入的离群值,还会检测到一些误报。All the inserted outliers are detected, and also some false positives. 请参阅接下来有关调整阈值的示例。See the next example on tweaking the threshold.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y, 1.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart  

包含 linefit 趋势的每周周期性异常

调整异常情况检测阈值Tweak the anomaly detection threshold

在上一示例中,一些干扰点被当作异常情况检测出来。A few noisy points were detected as anomalies in the previous example. 现在,将异常情况检测阈值从默认值 1.5 提高到 2.5。Now increase the anomaly detection threshold from a default of 1.5 to 2.5. 使用此百分位距,将只会检测到更严重的异常。Use this interpercentile range, so that only stronger anomalies are detected. 现在,将只会检测到你在数据中插入的离群值。Now, only the outliers you inserted in the data, will be detected.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and onlgoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y, 2.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart  

异常阈值较高的每周序列异常