predict_fl()predict_fl()
函数 predict_fl()
使用经过训练的现有机器学习模型进行预测。The function predict_fl()
predicts using an existing trained machine learning model. 此模型使用 Scikit-learn 构建,已序列化为字符串,并已保存在标准 Azure 数据资源管理器表中。This model was built using Scikit-learn, serialized to string, and saved in a standard Azure Data Explorer table.
备注
predict_fl()
是 UDF(用户定义的函数)。predict_fl()
is a UDF (user-defined function).- 此函数包含内联 Python,需要在群集上启用 python() 插件。This function contains inline Python and requires enabling the python() plugin on the cluster. 有关详细信息,请参阅用法。For more information, see usage.
语法Syntax
T | invoke predict_fl(
models_tbl,
model_name,
features_cols,
pred_col)
T | invoke predict_fl(
models_tbl,
model_name,
features_cols,
pred_col)
参数Arguments
- models_tbl:包含所有序列化模型的表的名称。models_tbl : The name of the table containing all serialized models. 此表必须包含以下列:This table must contain the following columns:
- name:模型名称name : the model name
- timestamp:模型训练时间timestamp : time of model training
- model:序列化模型的字符串表示形式model : string representation of the serialized model
- model_name:要使用的特定模型的名称。model_name : The name of the specific model to use.
- features_cols:动态数组,其中包含模型用于预测的特征列的名称。features_cols : Dynamic array containing the names of the features columns that are used by the model for prediction.
- pred_col:存储预测的列的名称。pred_col : The name of the column that stores the predictions.
使用情况Usage
predict_fl()
是用户定义的表格函数,需使用 invoke 运算符进行应用。predict_fl()
is a user-defined tabular function to be applied using the invoke operator. 可以在查询中嵌入该函数的代码,或者在数据库中安装该函数。You can either embed its code in your query, or install it in your database. 用法选项有两种:临时使用和持久使用。There are two usage options: ad hoc and persistent usage. 请参阅下面选项卡上的示例。See the below tabs for examples.
如果是临时使用,请使用 let 语句嵌入代码。For ad hoc usage, embed the code using the let statement. 不需要权限。No permission is required.
let predict_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code =
'\n'
'import pickle\n'
'import binascii\n'
'\n'
'smodel = kargs["smodel"]\n'
'features_cols = kargs["features_cols"]\n'
'pred_col = kargs["pred_col"]\n'
'bmodel = binascii.unhexlify(smodel)\n'
'clf1 = pickle.loads(bmodel)\n'
'df1 = df[features_cols]\n'
'predictions = clf1.predict(df1)\n'
'\n'
'result = df\n'
'result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])'
'\n'
;
samples
| evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=false
| invoke predict_fl(ML_Models, 'Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
混淆矩阵:Confusion matrix:
Occupancy pred_Occupancy n
TRUE TRUE 3006
FALSE TRUE 112
TRUE FALSE 15
FALSE FALSE 9284