“评估模型”模块Evaluate Model module

本文介绍 Azure 机器学习设计器(预览版)中的一个模块。This article describes a module in Azure Machine Learning designer (preview).

使用此模块可以度量已训练模型的准确度。Use this module to measure the accuracy of a trained model. 提供包含通过模型生成的评分的数据集后,“评估模型”模块将计算一组符合行业标准的评估指标。You provide a dataset containing scores generated from a model, and the Evaluate Model module computes a set of industry-standard evaluation metrics.

“评估模型”返回的指标取决于评估的模型类型:The metrics returned by Evaluate Model depend on the type of model that you are evaluating:

  • 分类模型Classification Models
  • 回归模型Regression Models
  • 聚类分析模型Clustering Models

提示

如果你还不熟悉模型评估,我们建议观看 Stephen Elston 博士的视频系列,该系列是 EdX 机器学习课程的一部分.If you are new to model evaluation, we recommend the video series by Dr. Stephen Elston, as part of the machine learning course from EdX.

如何使用“评估模型”How to use Evaluate Model

  1. 评分模型的“得分数据集”输出或将数据分配到聚类的“结果数据集”输出连接到“评估模型”的左侧输入端口。Connect the Scored dataset output of the Score Model or Result dataset output of the Assign Data to Clusters to the left input port of Evaluate Model.

备注

如果使用“在数据集中选择列”等模块来选择部分输入数据集,请确保存在“实际标签”列(用于模型训练)、“评分概率”列和“评分标签”列以计算指标(如 AUC、二进制分类/异常检测的准确性)。If use modules like "Select Columns in Dataset" to select part of input dataset, please ensure Actual label column (used in training), 'Scored Probabilities' column and 'Scored Labels' column exist to calculate metrics like AUC, Accuracy for binary classification/anomaly detection. 存在“实际标签”列、“评分标签”列以计算多类分类/回归的指标。Actual label column, 'Scored Labels' column exist to calculate metrics for multi-class classification/regression. 存在“赋值”列、“DistancesToClusterCenter no.X”列(X 是重心索引,范围为 0,...,重心数量 -1)以计算聚类分析的指标。'Assignments' column, columns 'DistancesToClusterCenter no.X' (X is centroid index, ranging from 0, ..., Number of centroids-1) exist to calculate metrics for clustering.

  1. [可选] 将评分模型的“得分数据集”输出或第二个模型的“将数据分配到聚类”的“结果数据集”输出连接到“评估模型”的右侧输入端口 。[Optional] Connect the Scored dataset output of the Score Model or Result dataset output of the Assign Data to Clusters for the second model to the right input port of Evaluate Model. 你可以在相同数据上轻松比较两个不同模型的结果。You can easily compare results from two different models on the same data. 两个输入算法应为同一算法类型。The two input algorithms should be the same algorithm type. 你也可以使用不同的参数对相同数据运行两次,然后比较两次运行的评分。Or, you might compare scores from two different runs over the same data with different parameters.

    备注

    算法类型是指机器学习算法下的“双类分类”、“多类分类”、“回归”、“聚类分析”。Algorithm type refers to 'Two-class Classification', 'Multi-class Classification', 'Regression', 'Clustering' under 'Machine Learning Algorithms'.

  2. 提交管道以生成评估分数。Submit the pipeline to generate the evaluation scores.

结果Results

运行“评估模型”后,选择模块以打开右侧的“评估模型”导航面板 。After you run Evaluate Model, select the module to open up the Evaluate Model navigation panel on the right. 然后,选择“输出 + 日志”选项卡,然后在该选项卡上,“数据输出”部分包含多个图标。Then, choose the Outputs + Logs tab, and on that tab the Data Outputs section has several icons. “可视化”图标有一个条形图图标,这是查看结果的第一种方法。The Visualize icon has a bar graph icon, and is a first way to see the results.

对于二元分类,单击“可视化”图标后,可以直观显示二元混淆矩阵。For binary-classification, after you click Visualize icon, you can visualize the binary confusion matrix. 对于多元分类,可以在“输出 + 日志”选项卡下找到混淆矩阵绘图文件,如下所示:For multi-classification, you can find the confusion matrix plot file under the Outputs + Logs tab like following:

预览已上传的图像Preview of uploaded image

如果将数据集连接到“评估模型”的两种输入,结果将包含这两个数据集或这两个模型的指标。If you connect datasets to both inputs of Evaluate Model, the results will contain metrics for both set of data, or both models. 附加到左侧端口的模型或数据先显示在报告中,其后是附加到右侧端口的数据集或模型的指标。The model or data attached to the left port is presented first in the report, followed by the metrics for the dataset, or model attached on the right port.

例如,下图表示使用不同参数的相同数据上生成的两个聚类分析模型的结果比较。For example, the following image represents a comparison of results from two clustering models that were built on the same data, but with different parameters.

比较两个模型

因为这是聚类分析模型,所以,计算结果不同于比较两个回归模型的分数或两个分类模型的结果。Because this is a clustering model, the evaluation results are different than if you compared scores from two regression models, or compared two classification models. 不过,提供的结果在总体上是相同的。However, the overall presentation is the same.

指标Metrics

本部分介绍针对支持与“评估模型”配合使用的特定模型类型返回的指标:This section describes the metrics returned for the specific types of models supported for use with Evaluate Model:

分类模型的指标Metrics for classification models

评估二元分类模型时,会报告以下指标。The following metrics are reported when evaluating binary classification models.

  • “准确度”衡量分类模型的优劣,即真实结果占总体的比例。Accuracy measures the goodness of a classification model as the proportion of true results to total cases.

  • “精准率”是真实结果与所有正面结果之比。Precision is the proportion of true results over all positive results. 查准率 = TP/(TP+FP)Precision = TP/(TP+FP)

  • “查全率”是实际检索到的相关实例总数的分数。Recall is the fraction of the total amount of relevant instances that were actually retrieved. 查全率 = TP/(TP+FN)Recall = TP/(TP+FN)

  • “F1 分数”计算为查准率与查全率的加权平均值,介于 0 到 1 之间,理想的 F1 分数值为 1。F1 score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F1 score value is 1.

  • “AUC”度量绘制的曲线下面的面积(在 y 轴上绘制真报率,在 x 轴上绘制误报率)。AUC measures the area under the curve plotted with true positives on the y axis and false positives on the x axis. 此指标非常有用,因为它提供单个数字让你比较不同类型的模型。This metric is useful because it provides a single number that lets you compare models of different types.

回归模型的指标Metrics for regression models

回归模型返回的指标旨在估算错误量。The metrics returned for regression models are designed to estimate the amount of error. 如果观测值与预测值之间的差很小,则认为模型能够很好地拟合数据。A model is considered to fit the data well if the difference between observed and predicted values is small. 不过,查看残差模式(任何一个预测点与其对应实际值之间的差)可以很好地判断模型中的潜在偏差。However, looking at the pattern of the residuals (the difference between any one predicted point and its corresponding actual value) can tell you a lot about potential bias in the model.

评估回归模型时,将报告以下指标。The following metrics are reported for evaluating regression models.

  • “平均绝对误差 (MAE)”度量预测对实际结果的接近程度;因此,分数越低越好。Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better.

  • “均方根误差 (RMSE)”创建单个值用于汇总模型中的误差。Root mean squared error (RMSE) creates a single value that summarizes the error in the model. 求差的平方时,指标将忽略过预测与欠预测之差。By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.

  • “相对绝对误差 (RAE)”是预期值与实际值之间的相对绝对差;之所以是相对的,是因为平均差将除以算术平均值。Relative absolute error (RAE) is the relative absolute difference between expected and actual values; relative because the mean difference is divided by the arithmetic mean.

  • “相对平方误差 (RSE)”类似地通过除以实际值的总平方误差来归一化预测值的总平方误差。Relative squared error (RSE) similarly normalizes the total squared error of the predicted values by dividing by the total squared error of the actual values.

  • “决定系数”(通常称为 R 2)将模型的预测能力表示为 0 和 1 之间的值。Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. 如果为 0,则模型是随机的(不解释任何信息);1 表示完美拟合。Zero means the model is random (explains nothing); 1 means there is a perfect fit. 但是,在解释 R2 值时应谨慎,因为低值可能是完全正常的,而高值可能会令人怀疑。However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect.

聚类分析模型指标Metrics for clustering models

由于聚类分析模型在许多方面与分类和回归模型有很大差别,因此评估模型也会为聚类分析模型返回一组不同的统计信息。Because clustering models differ significantly from classification and regression models in many respects, Evaluate Model also returns a different set of statistics for clustering models.

聚类分析模型返回的统计信息说明分配给每个聚类的数据点数量、聚类之间的隔离量以及每个聚类中数据点的聚集程度。The statistics returned for a clustering model describe how many data points were assigned to each cluster, the amount of separation between clusters, and how tightly the data points are bunched within each cluster.

聚类分析模型的统计信息是整个数据集的平均值,并附带有包含每个聚类的统计信息的行。The statistics for the clustering model are averaged over the entire dataset, with additional rows containing the statistics per cluster.

评估聚类分析模型时,将报告以下指标。The following metrics are reported for evaluating clustering models.

  • “到其他中心的平均距离”列中的分数表示该聚类中每个点与所有其他聚类中心的平均距离。The scores in the column, Average Distance to Other Center, represent how close, on average, each point in the cluster is to the centroids of all other clusters.

  • “到聚类中心的平均距离”列中的分数表示某个聚类中所有点到该聚类中心的接近程度。The scores in the column, Average Distance to Cluster Center, represent the closeness of all points in a cluster to the centroid of that cluster.

  • “点数”列显示为每个聚类分配了多少数据点,以及所有聚类中数据点的总数。The Number of Points column shows how many data points were assigned to each cluster, along with the total overall number of data points in any cluster.

    如果分配给聚类的数据点数量小于可用的数据点总数,则意味着无法将数据点分配给聚类。If the number of data points assigned to clusters is less than the total number of data points available, it means that the data points could not be assigned to a cluster.

  • “到聚类中心的最大距离”列中的分数表示每个点与该点的聚类中心之间的最大距离。The scores in the column, Maximal Distance to Cluster Center, represent the max of the distances between each point and the centroid of that point's cluster.

    如果此数字较高,则可能表示该聚类相当分散。If this number is high, it can mean that the cluster is widely dispersed. 你应该将统计信息与“到聚类中心的平均距离”一起查看,以确定聚类的分布情况。You should review this statistic together with the Average Distance to Cluster Center to determine the cluster's spread.

  • 各部分结果底部的“组合评估”分数列出了在该特定模型中创建的聚类的平均得分。The Combined Evaluation score at the bottom of the each section of results lists the averaged scores for the clusters created in that particular model.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.