评估自动化机器学习试验结果Evaluate automated machine learning experiment results

本文介绍如何查看和评估自动化机器学习、自动化 ML 和试验的结果。In this article, learn how to view and evaluate the results of your automated machine learning, automated ML, experiments. 这些试验包含多个运行,其中每个运行都会创建一个模型。These experiments consist of multiple runs, where each run creates a model. 为了帮助你评估每个模型,自动化 ML 会自动生成特定于试验类型的性能指标和图表。To help you evaluate each model, automated ML automatically generates performance metrics and charts specific to your experiment type.

例如,自动化 ML 为分类和回归模型提供不同的图表。For example, automated ML provides different charts for classification and regression models.

分类Classification 回归Regression
  • 混淆矩阵Confusion matrix
  • 精度-召回率图表Precision-Recall chart
  • 接收方操作特征 (ROC)Receiver operating characteristics (or ROC)
  • 提升曲线Lift curve
  • 增益曲线Gains curve
  • 校准图Calibration plot
  • 预测与真实Predicted vs. True
  • 残差直方图Histogram of residuals
  • 先决条件Prerequisites

    查看运行结果View run results

    自动化机器学习试验完成后,可通过 Azure 机器学习工作室,在机器学习工作区中找到运行历史记录。After your automated machine learning experiment completes, a history of the runs can be found in your machine learning workspace via the Azure Machine Learning studio.

    对于 SDK 试验,在使用 RunDetails Jupyter 小组件时,可在运行期间看到相同的结果。For SDK experiments, you can see these same results during a run when you use the RunDetails Jupyter widget.

    以下步骤和动画演示了如何在工作室中查看运行历史记录以及特定模型的性能指标和图表。The following steps and animation show how to view the run history and performance metrics and charts of a specific model in the studio.

    查看运行历史记录、模型性能指标和图表的步骤

    若要在工作室中查看运行历史记录、模型性能指标和图表:To view the run history and model performance metrics and charts in the studio:

    1. 登录到工作室并导航到你的工作区。Sign into the studio and navigate to your workspace.
    2. 在工作区的左侧面板中,选择“运行”。In the left panel of the workspace, select Runs.
    3. 在试验列表中,选择要探索的试验。In the list of experiments, select the one you want to explore.
    4. 在底部的表中,选择“运行”。In the bottom table, select the Run.
    5. 在“模型”选项卡中,选择要浏览的模型的“算法名称” 。In the Models tab, select the Algorithm name for the model that you want to explore.
    6. 在“指标”选项卡上,选择要为该模型评估的指标和图表。On the Metrics tab, select what metrics and charts you want to evaluate for that model.

    分类性能指标Classification performance metrics

    下表汇总了模型性能指标,这些指标是自动化 ML 针对每个为试验生成的分类模型计算的。The following table summarizes the model performance metrics that automated ML calculates for each classification model generated for your experiment.

    指标Metric 说明Description 计算Calculation 其他参数Extra Parameters
    AUC_macroAUC_macro AUC 是接收方操作特性曲线下面的区域。AUC is the Area under the Receiver Operating Characteristic Curve. Macro 是每个类的 AUC 算术平均值。Macro is the arithmetic mean of the AUC for each class. 计算Calculation average="macro"average="macro"
    AUC_microAUC_micro AUC 是接收方操作特性曲线下面的区域。AUC is the Area under the Receiver Operating Characteristic Curve. 通过组合每个类中的真报率和误报率来全局计算 Micro。Micro is computed globally by combining the true positives and false positives from each class. 计算Calculation average="micro"average="micro"
    AUC_weightedAUC_weighted AUC 是接收方操作特性曲线下面的区域。AUC is the Area under the Receiver Operating Characteristic Curve. Weighted 是每个类的评分算术平均值,按每个类中的真实实例数加权。Weighted is the arithmetic mean of the score for each class, weighted by the number of true instances in each class. 计算Calculation average="weighted"average="weighted"
    accuracyaccuracy Accuracy 是与真实标签完全匹配的预测标签百分比。Accuracy is the percent of predicted labels that exactly match the true labels. 计算Calculation None
    average_precision_score_macroaverage_precision_score_macro 平均精度以每个阈值实现的加权精度汇总精度-召回率曲线,使用前一阈值中的召回率增量作为权重。Average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight. Macro 是每个类的平均精度评分算术平均值。Macro is the arithmetic mean of the average precision score of each class. 计算Calculation average="macro"average="macro"
    average_precision_score_microaverage_precision_score_micro 平均精度以每个阈值实现的加权精度汇总精度-召回率曲线,使用前一阈值中的召回率增量作为权重。Average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight. 通过组合每个交接中的真报率和误报率来全局计算 Micro。Micro is computed globally by combining the true positives and false positives at each cutoff. 计算Calculation average="micro"average="micro"
    average_precision_score_weightedaverage_precision_score_weighted 平均精度以每个阈值实现的加权精度汇总精度-召回率曲线,使用前一阈值中的召回率增量作为权重。Average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight. Weighted 是每个类的平均精度评分算术平均值,按每个类中的真实实例数加权。Weighted is the arithmetic mean of the average precision score for each class, weighted by the number of true instances in each class. 计算Calculation average="weighted"average="weighted"
    balanced_accuracybalanced_accuracy 平衡准确度是每个类的召回率算术平均值。Balanced accuracy is the arithmetic mean of recall for each class. 计算Calculation average="macro"average="macro"
    f1_score_macrof1_score_macro F1 评分是精度和召回率的调和平均值。F1 score is the harmonic mean of precision and recall. Macro 是每个类的 F1 评分算术平均值。Macro is the arithmetic mean of F1 score for each class. 计算Calculation average="macro"average="macro"
    f1_score_microf1_score_micro F1 评分是精度和召回率的调和平均值。F1 score is the harmonic mean of precision and recall. 通过统计真报率、漏报率和误报率总值来全局计算 Micro。Micro is computed globally by counting the total true positives, false negatives, and false positives. 计算Calculation average="micro"average="micro"
    f1_score_weightedf1_score_weighted F1 评分是精度和召回率的调和平均值。F1 score is the harmonic mean of precision and recall. 按每个类的 F1 评分类频率计算的加权平均值Weighted mean by class frequency of F1 score for each class 计算Calculation average="weighted"average="weighted"
    log_losslog_loss 这是(多项式) 逻辑回归及其扩展(例如神经网络)中使用的损失函数,在给定概率分类器的预测的情况下,定义为真实标签的负对数可能性。This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier's predictions. 对于在 {0,1} 中包含真实标签 yt,且包含 yt=1 的估计概率 yp 的单个样本,对数损失为 -log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp))。For a single sample with true label yt in {0,1} and estimated probability yp that yt = 1, the log loss is -log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp)). 计算Calculation None
    norm_macro_recallnorm_macro_recall 规范化宏召回率是已规范化的宏召回率,因此,随机性能的评分为 0,完美性能的评分为 1。Normalized Macro Recall is Macro Recall normalized so that random performance has a score of 0 and perfect performance has a score of 1. 可以通过公式 norm_macro_recall := (recall_score_macro - R)/(1 - R) 来计算此值,其中,R 是随机预测的 recall_score_macro 预期值(例如,对于二元分类,R=0.5;对于 C 类分类问题,R=(1/C))。This is achieved by norm_macro_recall := (recall_score_macro - R)/(1 - R), where R is the expected value of recall_score_macro for random predictions (i.e., R=0.5 for binary classification and R=(1/C) for C-class classification problems). 计算Calculation average = "macro"average = "macro"
    precision_score_macroprecision_score_macro Precision 是正确标记的积极预测元素的百分比。Precision is the percent of positively predicted elements that are correctly labeled. Macro 是每个类的精度算术平均值。Macro is the arithmetic mean of precision for each class. 计算Calculation average="macro"average="macro"
    precision_score_microprecision_score_micro Precision 是正确标记的积极预测元素的百分比。Precision is the percent of positively predicted elements that are correctly labeled. 通过统计真报率和误报率总值来全局计算 Micro。Micro is computed globally by counting the total true positives and false positives. 计算Calculation average="micro"average="micro"
    precision_score_weightedprecision_score_weighted Precision 是正确标记的积极预测元素的百分比。Precision is the percent of positively predicted elements that are correctly labeled. Weighted 是每个类的精度算术平均值,按每个类中的真实实例数加权。Weighted is the arithmetic mean of precision for each class, weighted by number of true instances in each class. 计算Calculation average="weighted"average="weighted"
    recall_score_macrorecall_score_macro Recall 是特定类的正确标记元素的百分比。Recall is the percent of correctly labeled elements of a certain class. Macro 是每个类的召回率算术平均值。Macro is the arithmetic mean of recall for each class. 计算Calculation average="macro"average="macro"
    recall_score_microrecall_score_micro Recall 是特定类的正确标记元素的百分比。Recall is the percent of correctly labeled elements of a certain class. 通过统计真报率、漏报率和误报率总值来全局计算 MicroMicro is computed globally by counting the total true positives, false negatives and false positives 计算Calculation average="micro"average="micro"
    recall_score_weightedrecall_score_weighted Recall 是特定类的正确标记元素的百分比。Recall is the percent of correctly labeled elements of a certain class. Weighted 是每个类的召回率算术平均值,按每个类中的真实实例数加权。Weighted is the arithmetic mean of recall for each class, weighted by number of true instances in each class. 计算Calculation average="weighted"average="weighted"
    weighted_accuracyweighted_accuracy 加权准确度是当分配给每个示例的权重等于该示例的真实类中的真实实例比例时的准确度。Weighted accuracy is accuracy where the weight given to each example is equal to the proportion of true instances in that example's true class. 计算Calculation sample_weight 是等于目标中每个元素的该类比例的向量sample_weight is a vector equal to the proportion of that class for each element in the target

    二元分类指标与多类指标Binary vs. multiclass metrics

    自动化 ML 不区分二元分类指标与多类指标。Automated ML doesn't differentiate between binary and multiclass metrics. 不管数据集有两个类还是两个以上的类,都会报告相同的验证指标。The same validation metrics are reported whether a dataset has two classes or more than two classes. 但是,某些指标旨在用于多类分类。However, some metrics are intended for multiclass classification. 正如你所期望的那样,这些指标在应用于二元分类数据集时不会将任何类视为 true 类。When applied to a binary dataset, these metrics won't treat any class as the true class, as you might expect. 明确用于多类的指标以 micromacroweighted 为后缀。Metrics that are clearly meant for multiclass are suffixed with micro, macro, or weighted. 示例包括 average_precision_scoref1_scoreprecision_scorerecall_scoreAUCExamples include average_precision_score, f1_score, precision_score, recall_score, and AUC.

    例如,多类平均召回率(micromacroweighted)不按 tp / (tp + fn) 计算召回率,而是对二进制分类数据集的两个类进行平均。For example, instead of calculating recall as tp / (tp + fn), the multiclass averaged recall (micro, macro, or weighted) averages over both classes of a binary classification dataset. 这相当于分别计算 true 类和 false 类的召回率,然后取二者的平均值。This is equivalent to calculating the recall for the true class and the false class separately, and then taking the average of the two.

    混淆矩阵Confusion matrix

    混淆矩阵描述了分类模型的性能。A confusion matrix describes the performance of a classification model. 每一行显示真实类的实例或数据集中的实际类,每一列表示模型预测的类的实例。Each row displays the instances of the true, or actual class in your dataset, and each column represents the instances of the class that was predicted by the model.

    对于每个混淆矩阵,自动化 ML 将显示每个预测标签(列)相对于真实标签(行)的频率。For each confusion matrix, automated ML shows the frequency of each predicted label (column) compared against the true label (row). 颜色越暗,该特定矩阵部分中的计数越大。The darker the color, the higher the count in that particular part of the matrix.

    良好的模型是怎样的?What does a good model look like?

    混淆矩阵将数据集的实际值与模型提供的预测值进行比较。A confusion matrix compares the actual value of the dataset against the predicted values that the model gave. 因此,如果机器学习模型的大部分值沿对角线分布(表示模型预测了正确的值),则该模型具有较高的准确度。Because of this, machine learning models have higher accuracy if the model has most of its values along the diagonal, meaning the model predicted the correct value. 如果模型中的类不平衡,混淆矩阵可帮助检测有偏差的模型。If a model has class imbalance, the confusion matrix helps to detect a biased model.

    示例 1:准确度较差的分类模型Example 1: A classification model with poor accuracy

    准确度较差的分类模型

    示例 2:准确度较高的分类模型Example 2: A classification model with high accuracy

    准确度较高的分类模型

    示例 3:准确度较高且模型预测偏差较高的分类模型Example 3: A classification model with high accuracy and high bias in model predictions

    准确度较高且模型预测偏差较高的分类模型

    精度-召回率图表Precision-recall chart

    精准率-召回率曲线显示模型的精准率与召回率之间的关系。The precision-recall curve shows the relationship between precision and recall from a model. 术语“精准率”表示模型正确标记所有实例的能力。The term precision represents the ability for a model to label all instances correctly. “召回率”表示分类器查找特定标签的所有实例的能力。Recall represents the ability for a classifier to find all instances of a particular label.

    使用此图表可以比较每个模型的精度-召回率曲线,以确定哪个模型的精度与召回率关系可接受,可以解决特定的业务问题。With this chart, you can compare the precision-recall curves for each model to determine which model has an acceptable relationship between precision and recall for your particular business problem. 此图表显示宏观平均精度-召回率、微观平均精度-召回率,以及与模型的所有类关联的精度-召回率。This chart shows Macro Average Precision-Recall, Micro Average Precision-Recall, and the precision-recall associated with all classes for a model.

    “宏观平均”将单独计算每个类的指标,然后取平均值,并同等处理所有类。Macro-average computes the metric independently of each class and then takes the average, treating all classes equally. 然而,“微观平均”将聚合所有类的贡献来计算平均值。However, micro-average aggregates the contributions of all the classes to compute the average. 如果数据集中存在类不平衡的情况,则最好是使用微观平均。Micro-average is preferable if there is class imbalance present in the dataset.

    良好的模型是怎样的?What does a good model look like?

    根据业务问题的目标,理想的精准率-召回率曲线可能各不相同。Depending on the goal of the business problem, the ideal precision-recall curve could differ.

    示例 1:精准率和召回率较低的分类模型Example 1: A classification model with low precision and low recall

    精准率和召回率较低的分类模型

    示例 2:精准率和召回率大约为 100% 的分类模型Example 2: A classification model with ~100% precision and ~100% recall

    精准率和召回率较高的分类模型 A classification model high precision and recall

    ROC 图ROC chart

    接收方操作特征 (ROC) 是特定模型的正确分类标签与错误分类标签的对比图。The receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model. 在类失衡严重的情况下基于数据集训练模型时,ROC 曲线提供的信息可能较少,因为多数类可能会掩盖少数类的贡献。The ROC curve can be less informative when training models on datasets with high class imbalance, as the majority class can drown out contribution from minority classes.

    可以按照正确分类的样本的比例将该 ROC 图下的区域可视化。You can visualize the area under the ROC chart as the proportion of correctly classified samples. ROC 图的高级用户可能会查看曲线下区域之外的区域,直观了解以分类阈值或决策边界函数形式表示的真正率和假正率。An advanced user of the ROC chart might look beyond the area under the curve and get an intuition for the true positive and false positive rates as a function of the classification threshold or decision boundary.

    良好的模型是怎样的?What does a good model look like?

    最佳模型是 ROC 曲线接近左上角,即真正率为 100%,假正率为 0%。An ROC curve that approaches the top left corner with 100% true positive rate and 0% false positive rate will be the best model. 随机模型将显示为一条从左下角到右上角的平直线。A random model would display as a flat line from the bottom left to the top right corner. 比随机模型更差的模型会显示在 y=x 这条线的下方。Worse than random would dip below the y=x line.

    示例 1:真报标签较少且误报标签较多的分类模型Example 1: A classification model with low true labels and high false labels

    真报标签较少且误报标签较多的分类模型

    示例 2:真报标签较多且误报标签较少的分类模型Example 2: A classification model with high true labels and low false labels

    真报标签较多且误报标签较少的分类模型 a classification model with high true labels and low false labels

    提升图Lift chart

    提升图评估分类模型的性能。Lift charts evaluate the performance of classification models. 提升图显示某个模型的表现优于随机模型的次数。A lift chart shows how many times better a model performs compared to a random model. 这里提供的是一个相对表现(考虑到类的数量越多,分类越困难)。This gives you a relative performance that takes into account the fact that classification gets harder as you increase the number of classes. 与具有两个类的数据集相比,随机模型对具有 10 个类的数据集中的样本进行预测时,错误率更高。A random model incorrectly predicts a higher fraction of samples from a dataset with ten classes compared to a dataset with two classes.

    可将使用 Azure 机器学习自动生成的模型的提升与基线(随机模型)进行比较,以查看该特定模型的值增益情况。You can compare the lift of the model built automatically with Azure Machine Learning to the baseline (random model) in order to view the value gain of that particular model.

    良好的模型是怎样的?What does a good model look like?

    一个性能更好的模型会有一个提升曲线,该曲线在图上更高,并且离基线更远。A better performing model will have a lift curve that is higher on the graph and further from the baseline.

    示例 1:表现比随机选择模型差的分类模型Example 1: A classification model that performs poorly compared to a random selection model

    比随机选择模型更差的分类模型

    示例 2:表现比随机选择模型更好的分类模型Example 2: A classification model that performs better than a random selection model

    表现更好的分类模型 A classification model that performs better

    累积增益图Cumulative gains chart

    累积增益图按数据的每个部分评估分类模型的表现。A cumulative gains chart evaluates the performance of a classification model by each portion of the data. 该图按数据集的每个百分位,显示与始终不正确的模型相比,已进行准确分类的额外样本数。For each percentile of the data set, the chart shows how many more samples have been accurately classified compared to a model that's always incorrect. 此信息提供了查看随附提升图中的结果的另一种方式。This information provides another way of looking at the results in the accompanying lift chart.

    借助累积增益图,可使用与模型所需增益相对应的百分比来选择分类截止值。The cumulative gains chart helps you choose the classification cutoff using a percentage that corresponds to a desired gain from the model. 可将累积增益图与基线(不正确的模型)进行比较,查看在每个置信度百分比处正确分类的样本的百分比。You can compare the cumulative gains chart to the baseline (incorrect model) to see the percent of samples that were correctly classified at each confidence percentile.

    良好的模型是怎样的?What does a good model look like?

    与提升图类似,累积增益曲线在基准之上越高,模型的性能越好。Similar to a lift chart, the higher your cumulative gains curve is above the baseline, the better your model is performing. 此外,累积增益曲线越接近图表的左上角,模型相对于基线获得的增益越大。Additionally, the closer your cumulative gains curve is to the top left corner of the graph, the greater gain your model is achieving versus the baseline.

    示例 1:增益极低的分类模型Example 1: A classification model with minimal gain

    增益极低的分类模型

    示例 2:增益很高的分类模型Example 2: A classification model with significant gain

    增益很高的分类模型

    校准图Calibration chart

    校准图显示了预测模型的置信度。A calibration plot displays the confidence of a predictive model. 它通过显示预测概率和实际概率之间的关系来实现,其中“概率”表示一个特定实例属于某个标签的可能性。It does this by showing the relationship between the predicted probability and the actual probability, where "probability" represents the likelihood that a particular instance belongs under some label.

    对于所有分类问题,可以查看微观平均、宏观平均以及给定预测模型中每个类的校准行。For all classification problems, you can review the calibration line for micro-average, macro-average, and each class in a given predictive model.

    “宏观平均”将单独计算每个类的指标,然后取平均值,并同等处理所有类。Macro-average computes the metric independently of each class and then take the average, treating all classes equally. 然而,“微观平均”将聚合所有类的贡献来计算平均值。However, micro-average aggregates the contributions of all the classes to compute the average.

    良好的模型是怎样的?What does a good model look like?

    进行了适当校准的模型会与 y=x 这条线吻合,会正确预测样本属于每个类的概率。A well-calibrated model aligns with the y=x line, where it correctly predicts the probability that samples belong to each class. 置信度过高的模型在预测接近零和一的概率时会出现高估的情况,但很少出现无法确定每个样本的类的情况。An over-confident model will over-predict probabilities close to zero and one, rarely being uncertain about the class of each sample.

    示例 1:适当校准的模型Example 1: A well-calibrated model

     适当校准的模型more well-calibrated model

    示例 2:置信度过高的模型Example 2: An over-confident model

    置信度过高的模型

    回归性能指标Regression performance metrics

    下表汇总了模型性能指标,这些指标是自动化 ML 针对每个为试验生成的回归模型或预测模型计算的。The following table summarizes the model performance metrics that automated ML calculates for each regression or forecasting model that is generated for your experiment.

    指标Metric 说明Description 计算Calculation 其他参数Extra Parameters
    explained_varianceexplained_variance 解释方差是数学模型计算给定数据集的方差时遵循的比例。Explained variance is the proportion to which a mathematical model accounts for the variation of a given data set. 它是原始数据方差与误差方差之间的递减百分比。It is the percent decrease in variance of the original data to the variance of the errors. 当误差的平均值为 0 时,它等于确定系数(请参见下面的 r2_score)。When the mean of the errors is 0, it is equal to the coefficient of determination (see r2_score below). 计算Calculation None
    r2_scorer2_score R^2 是确定系数或平方误差与输出平均值的基线模型相比减少的百分比。R^2 is the coefficient of determination or the percent reduction in squared errors compared to a baseline model that outputs the mean. 计算Calculation None
    spearman_correlationspearman_correlation 斯皮尔曼相关是两个数据集之间的关系单一性的非参数测量法。Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two datasets. 与皮尔逊相关不同,斯皮尔曼相关不假设两个数据集呈正态分布。Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. 与其他相关系数一样,此参数在 -1 和 +1 之间变化,0 表示不相关。Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. -1 或 +1 相关表示确切的单一关系。Correlations of -1 or +1 imply an exact monotonic relationship. 正相关表示 y 随着 x 的递增而递增。Positive correlations imply that as x increases, so does y. 负相关表示 y 随着 x 的递增而递减。Negative correlations imply that as x increases, y decreases. 计算Calculation None
    mean_absolute_errormean_absolute_error 平均绝对误差是目标与预测之间的差的预期绝对值Mean absolute error is the expected value of absolute value of difference between the target and the prediction 计算Calculation None
    normalized_mean_absolute_errornormalized_mean_absolute_error 规范化平均绝对误差是平均绝对误差除以数据范围后的值Normalized mean absolute error is mean Absolute Error divided by the range of the data 计算Calculation 除以数据范围Divide by range of the data
    median_absolute_errormedian_absolute_error 平均绝对误差是目标与预测之间的所有绝对差的中间值。Median absolute error is the median of all absolute differences between the target and the prediction. 此损失值可靠地反映离群值。This loss is robust to outliers. 计算Calculation None
    normalized_median_absolute_errornormalized_median_absolute_error 规范化中间绝对误差是中间绝对误差除以数据范围后的值Normalized median absolute error is median absolute error divided by the range of the data 计算Calculation 除以数据范围Divide by range of the data
    root_mean_squared_errorroot_mean_squared_error 均方根误差是目标与预测之间的预期平方差的平方根Root mean squared error is the square root of the expected squared difference between the target and the prediction 计算Calculation None
    normalized_root_mean_squared_errornormalized_root_mean_squared_error 规范化均方根误差是均方根误差除以数据范围后的值Normalized root mean squared error is root mean squared error divided by the range of the data 计算Calculation 除以数据范围Divide by range of the data
    root_mean_squared_log_errorroot_mean_squared_log_error 均方根对数误差是预期平方对数误差的平方根Root mean squared log error is the square root of the expected squared logarithmic error 计算Calculation None
    normalized_root_mean_squared_log_errornormalized_root_mean_squared_log_error 规范化均方根对数误差指均方根对数误差除以数据范围后的值Normalized Root mean squared log error is root mean squared log error divided by the range of the data 计算Calculation 除以数据范围Divide by range of the data

    预测值与真实值图Predicted vs. True chart

    预测与“真实”显示回归问题的预测值与其相关真实值之间的关系。Predicted vs. True shows the relationship between a predicted value and its correlating true value for a regression problem.

    每次运行后,可以查看每个回归模型的预测与真实图形。After each run, you can see a predicted vs. true graph for each regression model. 为了保护数据隐私,值已装箱在一起,每个箱的大小在图表区域的下半部分显示为条形图。To protect data privacy, values are binned together and the size of each bin is shown as a bar graph on the bottom portion of the chart area. 可将预测模型(带有浅色阴影,其中显示了误差边际)与模型的理想值进行比较。You can compare the predictive model, with the lighter shade area showing error margins, against the ideal value of where the model should be.

    良好的模型是怎样的?What does a good model look like?

    可以使用此图来衡量模型的性能,因为预测值与 y=x 行越接近,预测模型的性能就越佳。This graph can be used to measure performance of a model as the closer to the y=x line the predicted values are, the better the performance of a predictive model.

    示例 1:性能较低的回归模型Example 1: A regression model with low performance

    预测准确度较低的回归模型

    示例 2:性能较高的回归模型Example 2: A regression model with high performance

    预测准确度较高的回归模型

    残差直方图Histogram of residuals chart

    自动化 ML 自动提供残差图来显示回归模型预测中的误差分布。Automated ML automatically provides a residuals chart to show the distribution of errors in the predictions of a regression model. 残差是指预测与实际值之间的差 (y_pred - y_true)。A residual is the difference between the prediction and the actual value (y_pred - y_true).

    良好的模型是怎样的?What does a good model look like?

    若要显示偏差较小的误差边际,应该将残差直方图绘制成以零为中心的钟形曲线。To show a margin of error with low bias, the histogram of residuals should be shaped as a bell curve, centered around zero.

    示例 1:误差中带有偏差的回归模型Example 1: A regression model with bias in its errors

    误差中带有偏差的 SA 回归模型

    示例 2:误差分布更均匀的回归模型Example 2: A regression model with a more even distribution of errors

    误差较均匀分布的回归模型

    模型可解释性和特征重要性Model interpretability and feature importance

    自动化 ML 为运行提供机器学习可解释性仪表板。Automated ML provides a machine learning interpretability dashboard for your runs.

    若要详细了解如何启用可解释性特征,请参阅可解释性:自动化机器学习中的模型说明For more information on enabling interpretability features, see Interpretability: model explanations in automated machine learning.

    备注

    解释客户端目前不支持 ForecastTCN 模型。The ForecastTCN model is not currently supported by the Explanation Client. 如果此模型作为最佳模型返回,则不会返回解释仪表板,并且不支持按需解释运行。This model does not return an explanation dashboard if it is returned as the best model, and does not support on-demand explanation runs.

    后续步骤Next steps