make-series 运算符make-series operator

沿指定的轴创建指定聚合值的序列。Create series of specified aggregated values along a specified axis.

T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by fruit, supplier

语法Syntax

T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn [from start] [to end] step step [by [Column =] GroupExpression [, ...]] T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn [from start] [to end] step step [by [Column =] GroupExpression [, ...]]

参数Arguments

  • Column:结果列的可选名称。Column: Optional name for a result column. 默认为派生自表达式的名称。Defaults to a name derived from the expression.

  • DefaultValue:将使用默认值,而不是不存在的值。DefaultValue: Default value that will be used instead of absent values. 如果没有任何行包含特定的 AxisColumn 和 GroupExpression 值,则在结果中,将为数组的相应元素分配 DefaultValue 。If there is no row with specific values of AxisColumn and GroupExpression, then in the results the corresponding element of the array will be assigned a DefaultValue. 如果省略 DefaultValue,则假定为 0。If DefaultValue is omitted, then 0 is assumed.

  • 聚合:聚合函数(如 count()avg())的调用,使用列名作为参数。Aggregation: A call to an aggregation function such as count() or avg(), with column names as arguments. 请参阅聚合函数列表See the list of aggregation functions. 只能将返回数值结果的聚合函数与 make-series 运算符配合使用。Only aggregation functions that return numeric results can be used with the make-series operator.

  • AxisColumn:将用作序列排序依据的列。AxisColumn: A column on which the series will be ordered. 可将其视为时间线,但接受除 datetime 之外的任何数字类型。It could be considered as timeline, but besides datetime any numeric types are accepted.

  • start:(可选)要生成的每个序列的 AxisColumn 下限值 。start: (optional) The low bound value of the AxisColumn for each of the series to be built. start、end 以及 step 用于生成由给定范围内使用指定 step 的 AxisColumn 值组成的数组 。start, end, and step are used to build an array of AxisColumn values within a given range and using specified step. 所有 Aggregation 值分别按顺序排列到此数组。All Aggregation values are ordered respectively to this array. 此 AxisColumn 数组也是输出中与 AxisColumn 同名的最后一个输出列 。This AxisColumn array is also the last output column in the output that has the same name as AxisColumn. 如果未指定 start 值,则其是每个序列中包含数据的第一个 bin (step)。If a start value is not specified, the start is the first bin (step) which has data in each series.

  • 结束:(可选)AxisColumn 的上限(不含)值 。end: (optional) The high bound (non-inclusive) value of the AxisColumn. 时序的最后一个索引小于此值(并且将是小于 end 的以下值:start 加上 step 的整数倍) 。The last index of the time series is smaller than this value (and will be start plus integer multiple of step that is smaller than end). 如果未提供 end 值,则其将是每个序列包含数据的最后一个 bin (step) 的上限。If end value is not provided, it will be the upper bound of the last bin (step) which has data per each series.

  • step:AxisColumn 数组中两个连续元素之间的差异(即 bin 大小)。step: The difference between two consecutive elements of the AxisColumn array (that is, the bin size).

  • GroupExpression:各列的表达式,提供一组非重复值。GroupExpression: An expression over the columns that provides a set of distinct values. 通常,它是已提供一组受限值的列名。Typically it's a column name that already provides a restricted set of values.

  • MakeSeriesParameters:零个或更多(以空格分隔)Name = Value 形式的参数,用于控制行为 。MakeSeriesParameters: Zero or more (space-separated) parameters in the form of Name = Value that control the behavior. 支持以下参数:The following parameters are supported:

    名称Name Values 说明Description
    kind nonempty 当 make-series 运算符的输入为空时,生成默认结果Produces default result when the input of make-series operator is empty

返回Returns

输入行将排列成与 by 表达式以及 bin_at(AxisColumn, step, start) 表达式具有相同值的组 。The input rows are arranged into groups having the same values of the by expressions and the bin_at(AxisColumn, step, start) expression. 然后,对每个组计算指定的聚合函数,从而为每组生成行。Then the specified aggregation functions are computed over each group, producing a row for each group. 结果包含 by 列和 AxisColumn 列,还至少包含用于每个计算聚合的一列。The result contains the by columns, AxisColumn column and also at least one column for each computed aggregate. (不支持聚合多个列或非数值结果。)(Aggregation that multiple columns or non-numeric results are not supported.)

此中间结果包含的行数与 bybin_at(AxisColumn, step, start) 值的不同组合数相同 。This intermediate result has as many rows as there are distinct combinations of by and bin_at(AxisColumn, step, start) values.

最后,中间结果中的行被排列为具有相同 by 表达式值的组,且所有聚合值都排列为数组(dynamic 类型的值)。Finally the rows from the intermediate result arranged into groups having the same values of the by expressions and all aggregated values are arranged into arrays (values of dynamic type). 对于每个聚合,都有一列包含其名称相同的数组。For each aggregation, there is one column containing its array with the same name. 范围函数的输出中包含所有 AxisColumn 值的最后一列。The last column in the output of the range function with all AxisColumn values. 对于所有行,其值都重复。Its value is repeated for all rows.

由于使用默认值时将填充缺失的 bin,因此生成的数据透视表中所有序列具有相同的 bin 数(即聚合值)Due to the fill missing bins by default value, the resulting pivot table has the same number of bins (that is, aggregated values) for all series

注意Note

尽管可为聚合和分组表达式提供任意表达式,但使用简单的列名更加高效。Although you can provide arbitrary expressions for both the aggregation and grouping expressions, it's more efficient to use simple column names.

替代语法Alternate Syntax

T | make-series [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn in range(start, stop, step) [by [Column =] GroupExpression [, ...]] T | make-series [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn in range(start, stop, step) [by [Column =] GroupExpression [, ...]]

使用替代语法生成的序列与使用主语法生成的序列有两个方面的差异:The generated series from the alternate syntax differs from the main syntax in two aspects:

  • stop 值包含在内。The stop value is inclusive.
  • 使用 bin() 而不是 bin_at() 生成索引轴分箱,这意味着生成的序列中可能不包括 start。Binning the index axis is generated with bin() and not bin_at(), which means that start may not be included in the generated series.

建议使用 make-series 的主要语法,而不是替代语法。It is recommended to use the main syntax of make-series and not the alternate syntax.

分布和随机选择Distribution and Shuffle

make-series 支持使用语法 hint.shufflekey summarize shufflekey hintsmake-series supports summarize shufflekey hints using the syntax hint.shufflekey.

聚合函数列表List of aggregation functions

函数Function 描述Description
any()any() 返回组的随机非空值Returns a random non-empty value for the group
avg()avg() 返回整个组的平均值Returns an average value across the group
count()count() 返回组的计数Returns a count of the group
countif()countif() 返回具有组谓词的计数Returns a count with the predicate of the group
dcount()dcount() 返回组元素的近似非重复计数Returns an approximate distinct count of the group elements
max()max() 返回组内的最大值Returns the maximum value across the group
min()min() 返回组内的最小值Returns the minimum value across the group
stdev()stdev() 返回整个组的标准偏差Returns the standard deviation across the group
sum()sum() 返回组中元素的总和Returns the sum of the elements within the group
variance()variance() 返回整个组的方差Returns the variance across the group

序列分析函数列表List of series analysis functions

函数Function 描述Description
series_fir()series_fir() 应用有限脉冲响应滤波器Applies Finite Impulse Response filter
series_iir()series_iir() 应用无限脉冲响应滤波器Applies Infinite Impulse Response filter
series_fit_line()series_fit_line() 查找与输入最近似的直线Finds a straight line that is the best approximation of the input
series_fit_line_dynamic()series_fit_line_dynamic() 查找与输入最近似的线,返回动态对象Finds a line that is the best approximation of the input, returning dynamic object
series_fit_2lines()series_fit_2lines() 查找与输入最近似的两条线Finds two lines that are the best approximation of the input
series_fit_2lines_dynamic()series_fit_2lines_dynamic() 查找与输入最近似的两条线,返回动态对象Finds two lines that are the best approximation of the input, returning dynamic object
series_outliers()series_outliers() 对序列中的异常点进行评分Scores anomaly points in a series
series_periods_detect()series_periods_detect() 找出一个时序中最重要的周期Finds the most significant periods that exist in a time series
series_periods_validate()series_periods_validate() 检查时序是否包含给定长度的定期模式Checks whether a time series contains periodic patterns of given lengths
series_stats_dynamic()series_stats_dynamic() 返回包含常用统计信息(最小值/最大值/方差/标准偏差/平均值)的多个列Return multiple columns with the common statistics (min/max/variance/stdev/average)
series_stats()series_stats() 生成包含常用统计信息(最小值/最大值/方差/标准偏差/平均值)的动态值Generates a dynamic value with the common statistics (min/max/variance/stdev/average)

序列内插函数列表List of series interpolation functions

函数Function 描述Description
series_fill_backward()series_fill_backward() 在序列中对缺失值执行后向填充内插Performs backward fill interpolation of missing values in a series
series_fill_const()series_fill_const() 用指定的常数值替换序列中缺失的值Replaces missing values in a series with a specified constant value
series_fill_forward()series_fill_forward() 在序列中对缺失值执行前向填充内插Performs forward fill interpolation of missing values in a series
series_fill_linear()series_fill_linear() 在序列中对缺失值执行线性内插Performs linear interpolation of missing values in a series
  • 注意:默认情况下,内插函数假定 null 为缺失值。Note: Interpolation functions by default assume null as a missing value. 因此,如果要对序列使用内插函数,请在 make-series 中指定 default=double(null)。Therefore specify default=double(null) in make-series if you intend to use interpolation functions for the series.

示例Example

一张表,其中显示了从每家供应商订购的每种水果的数量和平均价格组成的数组,该表的排序依据是指定范围的时间戳。A table that shows arrays of the numbers and average prices of each fruit from each supplier ordered by the timestamp with specified range. 水果与供应商的每个不同组合在输出中都占一行。There's a row in the output for each distinct combination of fruit and supplier. 输出列显示水果、供应商,以及由以下元素组成的数组:计数、平均值和整个时间线 (2016-01-01 - 2016-01-10)。The output columns show the fruit, supplier, and arrays of: count, average, and the whole timeline (from 2016-01-01 until 2016-01-10). 所有数组都按各自的时间戳排序,并且所有间隙均用默认值(在本示例中为 0)填充。All arrays are sorted by the respective timestamp and all gaps are filled with default values (0 in this example). 忽略所有其他输入列。All other input columns are ignored.

T | make-series PriceAvg=avg(Price) default=0
on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit

Makeseries

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| make-series avg(metric) on timestamp from stime to etime step interval 
avg_metricavg_metric timestamptimestamp
[ 4.0, 3.0, 5.0, 0.0, 10.5, 4.0, 3.0, 8.0, 6.5 ][ 4.0, 3.0, 5.0, 0.0, 10.5, 4.0, 3.0, 8.0, 6.5 ] [ "2017-01-01T00:00:00.0000000Z", "2017-01-02T00:00:00.0000000Z", "2017-01-03T00:00:00.0000000Z", "2017-01-04T00:00:00.0000000Z", "2017-01-05T00:00:00.0000000Z", "2017-01-06T00:00:00.0000000Z", "2017-01-07T00:00:00.0000000Z", "2017-01-08T00:00:00.0000000Z", "2017-01-09T00:00:00.0000000Z" ][ "2017-01-01T00:00:00.0000000Z", "2017-01-02T00:00:00.0000000Z", "2017-01-03T00:00:00.0000000Z", "2017-01-04T00:00:00.0000000Z", "2017-01-05T00:00:00.0000000Z", "2017-01-06T00:00:00.0000000Z", "2017-01-07T00:00:00.0000000Z", "2017-01-08T00:00:00.0000000Z", "2017-01-09T00:00:00.0000000Z" ]

make-series 的输入为空时,make-series 的默认行为也会生成一个空结果。When the input to make-series is empty, the default behavior of make-series produces an empty result as well.

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| limit 0
| make-series avg(metric) default=1.0 on timestamp from stime to etime step interval 
| count 
计数Count
00

make-series 中使用 kind=nonempty 将生成默认值的非空结果:Using kind=nonempty in make-series will produce a non-empty result of the default values:

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| limit 0
| make-series kind=nonempty avg(metric) default=1.0 on timestamp from stime to etime step interval 
avg_metricavg_metric timestamptimestamp
[[
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.0,1.0,
1.01.0
]]
[[
"2017-01-01T00:00:00.0000000Z","2017-01-01T00:00:00.0000000Z",
"2017-01-02T00:00:00.0000000Z","2017-01-02T00:00:00.0000000Z",
"2017-01-03T00:00:00.0000000Z","2017-01-03T00:00:00.0000000Z",
"2017-01-04T00:00:00.0000000Z","2017-01-04T00:00:00.0000000Z",
"2017-01-05T00:00:00.0000000Z","2017-01-05T00:00:00.0000000Z",
"2017-01-06T00:00:00.0000000Z","2017-01-06T00:00:00.0000000Z",
"2017-01-07T00:00:00.0000000Z","2017-01-07T00:00:00.0000000Z",
"2017-01-08T00:00:00.0000000Z","2017-01-08T00:00:00.0000000Z",
"2017-01-09T00:00:00.0000000Z""2017-01-09T00:00:00.0000000Z"
]]