series_fit_poly_fl()series_fit_poly_fl()
series_fit_poly_fl()
函数对序列应用多项式回归。The function series_fit_poly_fl()
applies a polynomial regression on a series. 此函数获取包含多个序列(动态数值阵列)的表,并使用多项式回归为每个序列生成拟合效果最佳的高阶多项式。This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression. 此函数针对序列范围返回多项式系数和内插多项式。This function returns both the polynomial coefficients and the interpolated polynomial over the range of the series.
备注
使用本机函数 series_fit_poly()。Use the native function series_fit_poly(). 下面的函数仅供参考。The function below is for reference only.
备注
series_fit_poly_fl()
是 UDF(用户定义的函数)。series_fit_poly_fl()
is a UDF (user-defined function).- 此函数包含内联 Python,需要在群集上启用 python() 插件。This function contains inline Python and requires enabling the python() plugin on the cluster. 有关详细信息,请参阅用法。For more information, see usage.
- 对于间距均匀的序列(由 make-series 运算符创建)的线性回归,请使用本机函数 series_fit_line()。For linear regression of an evenly spaced series, as created by make-series operator, use the native function series_fit_line().
语法Syntax
T | invoke series_fit_poly_fl(
y_series,
y_fit_series,
fit_coeff,
degree, [
x_series,
x_istime ])
T | invoke series_fit_poly_fl(
y_series,
y_fit_series,
fit_coeff,
degree, [
x_series,
x_istime ])
参数Arguments
- y_series:包含因变量的输入表列的名称。y_series : The name of the input table column containing the dependent variable. 即,要拟合的序列。That is, the series to fit.
- y_fit_series:存储最佳拟合序列的列的名称。y_fit_series : The name of the column to store the best fit series.
- fit_coeff:存储最佳拟合多项式系数的列的名称。fit_coeff : The name of the column to store the best fit polynomial coefficients.
- degree:要拟合的多项式所需的阶。degree : The required order of the polynomial to fit. 例如,1 用于线性回归,2 用于二次回归,等等。For example, 1 for linear regression, 2 for quadratic regression, and so on.
- x_series:包含自变量(即 x 轴或时间轴)的列的名称。x_series : The name of the column containing the independent variable, that is, the x or time axis. 此参数为可选,只有间距不均匀的序列才需要。This parameter is optional, and is needed only for unevenly spaced series. 默认值为空字符串,因为对于间距均匀的序列的回归,x 是冗余的。The default value is an empty string, as x is redundant for the regression of an evenly-spaced series.
- x_istime:此布尔参数为可选。x_istime : This boolean parameter is optional. 仅当指定了 x_series 并且它是 datetime 的向量时,才需要此参数。This parameter is needed only if x_series is specified and it's a vector of datetime.
使用情况Usage
series_fit_poly_fl()
是用户定义的表格函数,需使用 invoke 运算符进行应用。series_fit_poly_fl()
is a user-defined function tabular function, to be applied using the invoke operator. 可以在查询中嵌入该函数的代码,或者在数据库中安装该函数。You can either embed its code in your query, or install it in your database. 用法选项有两种:临时使用和永久使用。There are two usage options: ad hoc and persistent usage. 有关示例,请参阅下面的选项卡。See the below tabs for examples.
如果是临时使用,请使用 let 语句嵌入该函数的代码。For ad hoc usage, embed its code using let statement. 不需要权限。No permission is required.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code=
'\n'
'y_series = kargs["y_series"]\n'
'y_fit_series = kargs["y_fit_series"]\n'
'fit_coeff = kargs["fit_coeff"]\n'
'degree = kargs["degree"]\n'
'x_series = kargs["x_series"]\n'
'x_istime = kargs["x_istime"]\n'
'\n'
'def fit(ts_row, x_col, y_col, deg):\n'
' y = ts_row[y_col]\n'
' if x_col == "": # If there is no x column creates sequential range [1, len(y)]\n'
' x = np.arange(len(y)) + 1\n'
' else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.\n'
' if x_istime: \n'
' x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))\n'
' x = x - x.min()\n'
' x = x / x.max()\n'
' x = x * (len(x) - 1) + 1\n'
' else:\n'
' x = ts_row[x_col]\n'
' coeff = np.polyfit(x, y, deg)\n'
' p = np.poly1d(coeff)\n'
' z = p(x)\n'
' return z, coeff\n'
'\n'
'result = df\n'
'if len(df):\n'
' result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")\n'
;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)
其他示例Additional examples
以下示例假定已安装该函数:The following examples assume the function is already installed:
测试不规律(间距不均匀)的时序Test irregular (unevenly spaced) time series
let max_t = datetime(2016-09-03); demo_make_series1 | where TimeStamp between ((max_t-2d)..max_t) | summarize num=count() by bin(TimeStamp, 5m), OsVer | order by TimeStamp asc | where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create unevenly spaced time series | summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer | extend fnum = dynamic(null), coeff=dynamic(null) | invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True) | render timechart with(ycolumns=num, fnum)
x 轴和 y 轴上有干扰信息的第 5 阶多项式Fifth order polynomial with noise on x & y axes
range x from 1 to 200 step 1 | project x = rand()*5 - 2.3 | extend y = pow(x, 5)-8*pow(x, 3)+10*x+6 | extend y = y + (rand() - 0.5)*0.5*y | summarize x=make_list(x), y=make_list(y) | extend y_fit = dynamic(null), coeff=dynamic(null) | invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x') |fork (project-away coeff) (project coeff | mv-expand coeff) | render linechart