series_fit_poly_fl()

函数 series_fit_poly_fl()用户定义函数 (UDF),它对一个序列应用多项式回归。 此函数获取包含多个序列(动态数值阵列)的表,并使用多项式回归为每个序列生成拟合效果最佳的高阶多项式。 此函数针对序列范围返回多项式系数和内插多项式。

注意

  • 请使用本机函数 series_fit_poly() 而不是本文档中所述的函数。 本机函数提供相同的功能,并且在性能和可伸缩性方面更好。 本文档仅供参考。
  • 对于间距均匀的序列(由 make-series 运算符创建)的线性回归,请使用本机函数 series_fit_line()

先决条件

  • 必须在群集上启用 Python 插件。 这是函数中使用的内联 Python 所必需的。
  • 必须在数据库上启用 Python 插件。 这是函数中使用的内联 Python 所必需的。

语法

T | invoke series_fit_poly_fl(y_series,y_fit_series,fit_coeff,degree, [ x_series ], [ x_istime ])

详细了解语法约定

参数

客户 类型​​ 必需 说明
y_series string 包含依赖变量的输入表列的名称。 即,要拟合的序列。
y_fit_series string 存储最佳拟合序列的列的名称。
fit_coeff string 存储最佳拟合多项式系数的列的名称。
degree int 要拟合的多项式所需的阶。 例如,1 用于线性回归,2 用于二次回归,等等。
x_series string 包含独立变量的列的名称,即 x 轴(或时间轴)。 此参数为可选,只有间距不均匀的序列才需要。 默认值为空字符串,因为对于间距均匀的序列的回归,x 是冗余的。
x_istime bool 仅当指定了 x_series 并且它是 datetime 的向量时,才需要此参数。

函数定义

可以通过将函数的代码嵌入为查询定义的函数,或将其创建为数据库中的存储函数来定义函数,如下所示:

使用以下 let 语句定义函数。 不需要任何权限。

重要

let 语句不能独立运行。 它必须后跟一个表格表达式语句。 若要运行 series_fit_poly_fl() 的工作示例,请参阅示例

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]

        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff

        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

示例

以下示例使用 invoke 运算符运行函数。

将五阶多项式拟合到常规时序

若要使用查询定义的函数,请在嵌入的函数定义后调用它。

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]

        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff

        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)

输出

Graph showing fifth order polynomial fit to a regular time series.

测试不规则时序

若要使用查询定义的函数,请在嵌入的函数定义后调用它。

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]

        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff

        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-2d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create unevenly spaced time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True)
| render timechart with(ycolumns=num, fnum)

输出

Graph showing eighth order polynomial fit to an irregular time series.

x 轴和 y 轴上有干扰信息的第 5 阶多项式

若要使用查询定义的函数,请在嵌入的函数定义后调用它。

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]

        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff

        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_fit = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x')
|fork (project-away coeff) (project coeff | mv-expand coeff)
| render linechart

输出

Graph of fit of fifth order polynomial with noise on x & y axes

Coefficients of fit of fifth order polynomial with noise.

不支持此功能。