quantize_fl()quantize_fl()

函数 quantize_fl() 对指标列进行分箱。The function quantize_fl() bins metric columns. 它基于 K-Means 算法将指标列量化到类别标签。It quantizes metric columns to categorical labels, based on the K-Means algorithm.

备注

语法Syntax

T | invoke quantize_fl(num_bins, in_cols, out_cols, labels)T | invoke quantize_fl(num_bins, in_cols, out_cols, labels)

参数Arguments

  • num_bins:所需的箱数。num_bins : Required number of bins.
  • in_cols:动态数组,其中包含要量化的列的名称。in_cols : Dynamic array containing the names of the columns to quantize.
  • out_cols:动态数组,其中包含分箱值各自的输出列的名称。out_cols : Dynamic array containing the names of the respective output columns for the binned values.
  • labels:包含标签名称的动态数组。labels : Dynamic array containing the label names. 此参数是可选的。This parameter is optional. 如果未提供 Labels ,将使用箱范围。If Labels isn't supplied, bin ranges will be used.

使用情况Usage

quantize_fl() 是用户定义的表格函数,需使用 invoke 运算符进行应用。quantize_fl() is a user-defined tabular function, to be applied using the invoke operator. 可以在查询中嵌入其代码,或将其安装在数据库中。You can either embed its code in your query, or install it in your database. 用法选项有两种:临时使用和永久使用。There are two usage options: ad hoc and persistent usage. 请参阅下面选项卡上的示例。See the below tabs for examples.

如果是临时使用,请使用 let 语句嵌入其代码。For ad hoc usage, embed its code using the let statement. 不需要权限。No permission is required.

let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
    let kwargs = pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
    let code =
        '\n'
        'from sklearn.preprocessing import KBinsDiscretizer\n'
        '\n'
        'num_bins = kargs["num_bins"]\n'
        'in_cols = kargs["in_cols"]\n'
        'out_cols = kargs["out_cols"]\n'
        'labels = kargs["labels"]\n'
        '\n'
        'result = df\n'
        'binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")\n'
        'df_in = df[in_cols]\n'
        'bdata = binner.fit_transform(df_in)\n'
        'if labels is None:\n'
        '    for i in range(len(out_cols)):    # loop on each column and convert it to binned labels\n'
        '        ii = np.round(binner.bin_edges_[i], 3)\n'
        '        labels = [str(ii[j-1]) + \'-\' + str(ii[j]) for j in range(1, num_bins+1)]\n'
        '        result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))\n'
        'else:\n'
        '    result[out_cols] = np.take(labels, bdata.astype(int))\n'
        ;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
//
union 
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))
x    x_label    x_bin
1    Low        1.0-7.75
2    Low        1.0-7.75
3    Low        1.0-7.75
4    Low        1.0-7.75
5    Low        1.0-7.75
20   High       17.5-25.0
21   High       17.5-25.0
22   High       17.5-25.0
23   High       17.5-25.0
24   High       17.5-25.0
25   High       17.5-25.0
10   Med        7.75-17.5
11   Med        7.75-17.5
12   Med        7.75-17.5
13   Med        7.75-17.5
14   Med        7.75-17.5
15   Med        7.75-17.5