Evaluate model performance in Machine Learning Studio (classic)

03/20/2017

APPLIES TO: Applies to. Machine Learning Studio (classic) Does not apply to. Azure Machine Learning

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning

ML Studio (classic) documentation is being retired and may not be updated in the future.

In this article, you can learn about the metrics you can use to monitor model performance in Machine Learning Studio (classic). Evaluating the performance of a model is one of the core stages in the data science process. It indicates how successful the scoring (predictions) of a dataset has been by a trained model. Machine Learning Studio (classic) supports model evaluation through two of its main machine learning modules:

These modules allow you to see how your model performs in terms of a number of metrics that are commonly used in machine learning and statistics.

Evaluating models should be considered along with:

Three common supervised learning scenarios are presented:

regression
binary classification
multiclass classification

Evaluation vs. Cross Validation

Evaluation and cross validation are standard ways to measure the performance of your model. They both generate evaluation metrics that you can inspect or compare against those of other models.

Evaluate Model expects a scored dataset as input (or two in case you would like to compare the performance of two different models). Therefore, you need to train your model using the Train Model module and make predictions on some dataset using the Score Model module before you can evaluate the results. The evaluation is based on the scored labels/probabilities along with the true labels, all of which are output by the Score Model module.

Alternatively, you can use cross validation to perform a number of train-score-evaluate operations (10 folds) automatically on different subsets of the input data. The input data is split into 10 parts, where one is reserved for testing, and the other 9 for training. This process is repeated 10 times and the evaluation metrics are averaged. This helps in determining how well a model would generalize to new datasets. The Cross-Validate Model module takes in an untrained model and some labeled dataset and outputs the evaluation results of each of the 10 folds, in addition to the averaged results.

In the following sections, we will build simple regression and classification models and evaluate their performance, using both the Evaluate Model and the Cross-Validate Model modules.

Evaluating a Regression Model

Assume we want to predict a car's price using features such as dimensions, horsepower, engine specs, and so on. This is a typical regression problem, where the target variable (price) is a continuous numeric value. We can fit a linear regression model that, given the feature values of a certain car, can predict the price of that car. This regression model can be used to score the same dataset we trained on. Once we have the predicted car prices, we can evaluate the model performance by looking at how much the predictions deviate from the actual prices on average. To illustrate this, we use the Automobile price data (Raw) dataset available in the Saved Datasets section in Machine Learning Studio (classic).

Creating the Experiment

Add the following modules to your workspace in Machine Learning Studio (classic):

Connect the ports as shown below in Figure 1 and set the Label column of the Train Model module to price.

Evaluating a Regression Model

Figure 1. Evaluating a Regression Model.

Inspecting the Evaluation Results

After running the experiment, you can click on the output port of the Evaluate Model module and select Visualize to see the evaluation results. The evaluation metrics available for regression models are: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.

The term "error" here represents the difference between the predicted value and the true value. The absolute value or the square of this difference is usually computed to capture the total magnitude of error across all instances, as the difference between the predicted and true value could be negative in some cases. The error metrics measure the predictive performance of a regression model in terms of the mean deviation of its predictions from the true values. Lower error values mean the model is more accurate in making predictions. An overall error metric of zero means that the model fits the data perfectly.

The coefficient of determination, which is also known as R squared, is also a standard way of measuring how well the model fits the data. It can be interpreted as the proportion of variation explained by the model. A higher proportion is better in this case, where 1 indicates a perfect fit.

Linear Regression Evaluation Metrics

Figure 2. Linear Regression Evaluation Metrics.

Using Cross Validation

As mentioned earlier, you can perform repeated training, scoring, and evaluations automatically using the Cross-Validate Model module. All you need in this case is a dataset, an untrained model, and a Cross-Validate Model module (see figure below). You need to set the label column to price in the Cross-Validate Model module's properties.

Cross-Validating a Regression Model

Figure 3. Cross-Validating a Regression Model.

After running the experiment, you can inspect the evaluation results by clicking on the right output port of the Cross-Validate Model module. This will provide a detailed view of the metrics for each iteration (fold), and the averaged results of each of the metrics (Figure 4).

Cross-Validation Results of a Regression Model

Figure 4. Cross-Validation Results of a Regression Model.

Evaluating a Binary Classification Model

In a binary classification scenario, the target variable has only two possible outcomes, for example: {0, 1} or {false, true}, {negative, positive}. Assume you are given a dataset of adult employees with some demographic and employment variables, and that you are asked to predict the income level, a binary variable with the values {"<=50 K", ">50 K"}. In other words, the negative class represents the employees who make less than or equal to 50 K per year, and the positive class represents all other employees. As in the regression scenario, we would train a model, score some data, and evaluate the results. The main difference here is the choice of metrics Machine Learning Studio (classic) computes and outputs. To illustrate the income level prediction scenario, we will use the Adult dataset to create a Studio (classic) experiment and evaluate the performance of a two-class logistic regression model, a commonly used binary classifier.