series_fit_lowess_fl()series_fit_lowess_fl()

函数 series_fit_lowess_fl() 对序列应用 LOWESS 回归。The function series_fit_lowess_fl() applies a LOWESS regression on a series. 此函数采用包含多个序列(动态数值阵列)的表,并生成一条 LOWESS 曲线,它是原始序列的平滑化版本。This function takes a table with multiple series (dynamic numerical arrays) and generates a LOWESS Curve, which is a smoothed version of the original series.

备注

语法Syntax

T | invoke series_fit_lowess_fl(y_series, y_fit_series, [fit_size, x_series, x_istime]) T | invoke series_fit_lowess_fl(y_series, y_fit_series, [fit_size, x_series, x_istime])

参数Arguments

  • y_series:包含因变量的输入表列的名称。y_series: The name of the input table column containing the dependent variable. 此列是要拟合的序列。This column is the series to fit.
  • y_fit_series:用于存储已拟合序列的列的名称。y_fit_series: The name of the column to store the fitted series.
  • fit_size:对于每个点,其各自的 fit_size 最近点会应用局部回归。fit_size: For each point, the local regression is applied on its respective fit_size closest points. 此参数是可选的,默认值为 5。This parameter is optional, default to 5.
  • x_series:包含自变量(即 x 轴或时间轴)的列的名称。x_series: The name of the column containing the independent variable, that is, the x or time axis. 此参数为可选,只有间距不均匀的序列才需要。This parameter is optional, and is needed only for unevenly spaced series. 默认值为空字符串,因为对于间距均匀的序列的回归,x 是冗余的。The default value is an empty string, as x is redundant for the regression of an evenly spaced series.
  • x_istime:仅当指定了 x_series 并且它是日期/时间的向量时,才需要此布尔参数。x_istime: This boolean parameter is needed only if x_series is specified and it's a vector of datetime. 此参数是可选的,默认值为 False。This parameter is optional, default to False.

使用情况Usage

series_fit_lowess_fl() 是用户定义的表格函数,需使用 invoke 运算符进行应用。series_fit_lowess_fl() is a user-defined function tabular function, to be applied using the invoke operator. 可以在查询中嵌入该函数的代码,或者在数据库中安装该函数。You can either embed its code in your query, or install it in your database. 用法选项有两种:临时使用和永久使用。There are two usage options: ad hoc and persistent usage. 有关示例,请参阅下面的选项卡。See the below tabs for examples.

如果是临时使用,请使用 let 语句嵌入该函数的代码。For ad hoc usage, embed its code using let statement. 不需要权限。No permission is required.

let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code=
        '\n'
        'y_series = kargs["y_series"]\n'
        'y_fit_series = kargs["y_fit_series"]\n'
        'fit_size = kargs["fit_size"]\n'
        'x_series = kargs["x_series"]\n'
        'x_istime = kargs["x_istime"]\n'
        '\n'
        'import statsmodels.api as sm\n'
        'def lowess_fit(ts_row, x_col, y_col, fsize):\n'
        '    y = ts_row[y_col]\n'
        '    fraction = fsize/len(y)\n'
        '    if x_col == "": # If there is no x column creates sequential range [1, len(y)]\n'
        '       x = np.arange(len(y)) + 1\n'
        '    else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.\n'
        '       if x_istime: \n'
        '           x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))\n'
        '           x = x - x.min()\n'
        '           x = x / x.max()\n'
        '           x = x * (len(x) - 1) + 1\n'
        '       else:\n'
        '           x = ts_row[x_col]\n'
        '    lowess = sm.nonparametric.lowess\n'
        '    z = lowess(y, x, return_sorted=False, frac=fraction)\n'
        '    return list(z)\n'
        '\n'
        'result = df\n'
        'result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))\n'
    ;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart

Graph showing nine points LOWESS 拟合到规则时序

示例Examples

以下示例假定已安装该函数:The following examples assume the function is already installed:

测试不规则时序Test irregular time series

下面的示例测试不规则(间距不均匀)时序The following example tests irregular (unevenly spaced) time series

let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart 

Graph showing nine points LOWESS 拟合到不规则时序

将 LOWESS 与多项式拟合进行比较Compare LOWESS versus polynomial fit

下面的示例包含 x 和 y 轴上有干扰信息的第五阶多项式。The following example contains fifth order polynomial with noise on x and y axes. 请参阅 LOWESS 与多项式拟合的比较。See comparison of LOWESS versus polynomial fit.

range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart

Graphs of LOWESS 针对 x 和 y 轴上有干扰信息的第五阶多项式的多项式拟合