series_periods_detect()series_periods_detect()

找出一个时序中最重要的周期。Finds the most significant periods that exist in a time series.

通常,可以按两个重要的期间将测量应用程序流量的指标特征化:每周和每天。Often, a metric measuring an application’s traffic, is characterized by two significant periods: a weekly and a daily. 函数 series_periods_detect() 检测时序中的这两个主导期间。The function series_periods_detect() detects these two dominant periods in a time series.
此函数采用以下输入:The function takes as input:

  • 一个包含时序动态数组的列。A column containing a dynamic array of time series. 通常,该列是 make-series 运算符生成的输出。Typically, the column is the resulting output of make-series operator.
  • 两个 real 数字,定义最小和最大期间大小(要搜索的箱数)。Two real numbers defining the minimal and maximal period size, the number of bins to search for. 例如,对于 1 小时箱,每日周期的大小将为 24。For example, for a 1h bin, the size of a daily period would be 24.
  • 一个 long 数字,定义函数要搜索的总期间数。A long number defining the total number of periods for the function to search.

该函数输出两个列:The function outputs two columns:

  • periods:一个动态数组,其中包含已找到的期间(以箱大小为单位,按其分数排序)。periods : A dynamic array containing the periods that have been found, in units of the bin size, ordered by their scores.
  • scores:一个动态数组,其中包含 0 到 1 之间的值。scores : A dynamic array containing values between 0 and 1. 每个数组都会度量 periods 数组中的一个期间在其相应位置上的重要性。Each array measures the significance of a period in its respective position in the periods array.

语法Syntax

series_periods_detect(x, min_period, max_period, num_periods)series_periods_detect(x, min_period, max_period, num_periods)

参数Arguments

  • x:动态数组标量表达式(数值数组),通常是 make-seriesmake_list 运算符生成的输出。x : Dynamic array scalar expression that is an array of numeric values, typically the resulting output of make-series or make_list operators.
  • min_period:一个 real 数字,指定要搜索的最小周期。min_period : A real number specifying the minimal period to search for.
  • max_period:一个 real 数字,指定要搜索的最大周期。max_period : A real number specifying the maximal period to search for.
  • num_periods:一个 long 数字,指定所需的最大周期数。num_periods : A long number specifying the maximum required number of periods. 此数字将是输出动态数组的长度。This number will be the length of the output dynamic arrays.

重要

  • 该算法可检测至少包含 4 个点且最多包含序列长度一半的期间。The algorithm can detect periods containing at least 4 points and at most half of the series length.

  • 请将 min_period 设置为略低于预计在时序中找到的期间数,而将 max_period 设置为略高于该期间数。Set the min_period a little below, and max_period a little above, the periods you expect to find in the time series. 例如,如果你有每小时聚合的信号,并且你要查找每日和每周期间(分别是 24 小时和 168 小时),则可以设置 min_period =0.8*24, max_period =1.2*168,围绕这些期间留出 20% 的富余。For example, if you have an hourly aggregated signal, and you look for both daily and weekly periods (24 and 168 hours respectively), you can set min_period =0.8*24, max_period =1.2*168, and leave 20% margins around these periods.

  • 输入时序必须有规律。The input time series must be regular. 也就是说,聚合在常量箱中(使用 make-series 创建的时序始终满足此条件)。That is, aggregated in constant bins, which is always the case if it has been created using make-series. 否则,输出就没有意义。Otherwise, the output is meaningless.

示例Example

下面的查询嵌入了应用程序某个月流量的快照,一天聚合两次。The following query embeds a snapshot of a month of an application’s traffic, aggregated twice a day. 箱大小为 12 小时。The bin size is 12 hours.

print y=dynamic([80,139,87,110,68,54,50,51,53,133,86,141,97,156,94,149,95,140,77,61,50,54,47,133,72,152,94,148,105,162,101,160,87,63,53,55,54,151,103,189,108,183,113,175,113,178,90,71,62,62,65,165,109,181,115,182,121,178,114,170])
| project x=range(1, array_length(y), 1), y  
| render linechart 

序列期间

对此序列运行 series_periods_detect() 将得到每周周期(长度为 14 个点)。Running series_periods_detect() on this series, results in the weekly period, 14 points long.

print y=dynamic([80,139,87,110,68,54,50,51,53,133,86,141,97,156,94,149,95,140,77,61,50,54,47,133,72,152,94,148,105,162,101,160,87,63,53,55,54,151,103,189,108,183,113,175,113,178,90,71,62,62,65,165,109,181,115,182,121,178,114,170])
| project x=range(1, array_length(y), 1), y  
| project series_periods_detect(y, 0.0, 50.0, 2)
series_periods_detect_y_periodsseries_periods_detect_y_periods series_periods_detect_y_periods_scoresseries_periods_detect_y_periods_scores
[14.0, 0.0][14.0, 0.0] [0.84, 0.0][0.84, 0.0]

备注

由于采样太粗糙(箱大小为 12 小时),因此没有找到也可以在图表中看到的每日期间,因此包含 2 箱的每日期间低于算法所要求的包含 4 个点的最小期间大小。The daily period that can be also seen in the chart wasn't found becasue the sampling is too coarse (12h bin size), so a daily period of 2 bins is below the minimum period size of 4 points, required by the algorithm.