series_fit_poly()series_fit_poly()

应用从自变量 (x_series) 到因变量 (y_series) 的多项式回归。Applies a polynomial regression from an independent variable (x_series) to a dependent variable (y_series). 此函数获取包含多个序列(动态数值阵列)的表,并使用多项式回归为每个序列生成拟合效果最佳的高阶多项式。This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression.

提示

  • 对于间距均匀的序列(由 make-series 运算符创建)的线性回归,请使用更简单的函数 series_fit_line()For linear regression of an evenly spaced series, as created by make-series operator, use the simpler function series_fit_line(). 请参阅示例 2See Example 2.
  • 如果提供了 x_series,并且回归程度很高,请考虑规范化为 [0-1] 范围。If x_series is supplied, and the regression is done for a high degree, consider normalizing to the [0-1] range. 请参阅示例 3See Example 3.
  • 如果 x_series 的类型为 datetime,则必须将其转换为 double 类型,并对其进行规范化。If x_series is of datetime type, it must be converted to double and normalized. 请参阅示例 3See Example 3.
  • 有关使用内联 Python 实现多项式回归的参考,请参阅 series_fit_poly_fl()For reference implementation of polynomial regression using inline Python, see series_fit_poly_fl().

语法Syntax

T | extend series_fit_poly(y_series, x_series, degree)T | extend series_fit_poly(y_series, x_series, degree)

参数Arguments

参数Argument 说明Description 必需/可选Required/optional 说明Notes
y_seriesy_series 包含因变量的动态数值阵列。Dynamic numerical array containing the dependent variable. 必须Required
x_seriesx_series 包含自变量的动态数值阵列。Dynamic numerical array containing the independent variable. 可选。Optional. 仅对于间距不均匀的序列是必需的。Required only for unevenly spaced series. 如果未指定,则将其设置为默认值 [1,2, ..., length(y_series)]。If not given, it's set to a default value of [1,2, ..., length(y_series)].
degreedegree 要拟合的多项式所需的阶。The required order of the polynomial to fit. 例如,1 用于线性回归,2 用于二次回归,等等。For example, 1 for linear regression, 2 for quadratic regression, and so on. 可选Optional 默认值为 1(线性回归)。Defaults to 1 (linear regression).

返回Returns

series_fit_poly() 函数返回以下列:The series_fit_poly() function returns the following columns:

  • rsquare:r-square 是用于衡量拟合质量的标准。rsquare: r-square is a standard measure of the fit quality. 此值是 [0-1] 范围内的数字,其中 1 表示拟合度最好,0 表示数据无序,与任何直线均不拟合。The value's a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and doesn't fit any line.
  • coefficients:数值阵列,保存给定拟合度的最佳拟合多项式的系数,从最高幂系数到最低幂系数进行排序。coefficients: Numerical array holding the coefficients of the best fitted polynomial with the given degree, ordered from the highest power coefficient to the lowest.
  • variance:因变量 (y_series) 的方差。variance: Variance of the dependent variable (y_series).
  • rvariance:剩余方差,即输入数据值和近似数据值之间的方差。rvariance: Residual variance that is the variance between the input data values the approximated ones.
  • poly_fit:数值阵列,其中包含拟合度最好的多项式的一系列值。poly_fit: Numerical array holding a series of values of the best fitted polynomial. 序列长度等于因变量 (y_series) 的长度。The series length is equal to the length of the dependent variable (y_series). 该值用于绘制图表。The value's used for charting.

示例Examples

示例 1Example 1

x 轴和 y 轴上有干扰信息的第 5 阶多项式:A fifth order polynomial with noise on x & y axes:

range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend series_fit_poly(y, x, 5)
| project-rename fy=series_fit_poly_y_poly_fit, coeff=series_fit_poly_y_coefficients
|fork (project x, y, fy) (project-away x, y, fy)
| render linechart 

此图显示了拟合到有干扰信息的序列的第 5 阶多项式

拟合到有干扰信息的序列的第 5 阶多项式的系数

示例 2Example 2

验证拟合度 = 1 的 series_fit_poly 是否与 series_fit_line 匹配:Verify that series_fit_poly with degree=1 matches series_fit_line:

demo_series1
| extend series_fit_line(y)
| extend series_fit_poly(y)
| project-rename y_line = series_fit_line_y_line_fit, y_poly = series_fit_poly_y_poly_fit
| fork (project x, y, y_line, y_poly) (project-away id, x, y, y_line, y_poly) 
| render linechart with(xcolumn=x, ycolumns=y, y_line, y_poly)

显示线性回归的图

线性回归系数

示例 3Example 3

不规律(间距不均匀)的时序:Irregular (unevenly spaced) time series:

//
//  x-axis must be normalized to the range [0-1] if either degree is relatively big (>= 5) or original x range is big.
//  so if x is a time axis it must be normalized as conversion of timestamp to long generate huge numbers (number of 100 nano-sec ticks from 1/1/1970)
//
//  Normalization: x_norm = (x - min(x))/(max(x) - min(x))
//
irregular_ts
| extend series_stats(series_add(TimeStamp, 0))                                                                 //  extract min/max of time axis as doubles
| extend x = series_divide(series_subtract(TimeStamp, series_stats__min), series_stats__max-series_stats__min)  // normalize time axis to [0-1] range
| extend series_fit_poly(num, x, 8)
| project-rename fnum=series_fit_poly_num_poly_fit
| render timechart with(ycolumns=num, fnum)

此图显示了拟合到不规律时序的第 8 阶多项式