评估推荐器Evaluate Recommender

本文介绍如何使用 Azure 机器学习设计器(预览版)中的评估推荐器模块。This article describes how to use the Evaluate Recommender module in Azure Machine Learning designer (preview). 目的是衡量推荐模型的预测是否准确。The goal is to measure the accuracy of predictions that a recommendation model has made. 使用此模块,可以评估各种推荐结果:By using this module, you can evaluate different kinds of recommendations:

  • 为用户和项预测的评分Ratings predicted for a user and an item
  • 推荐给用户的项Items recommended for a user

使用推荐模型创建预测时,每种受支持的预测类型返回的结果略有不同。When you create predictions by using a recommendation model, slightly different results are returned for each of these supported prediction types. 评估推荐器模块将根据评分数据集中的列格式推断出预测的类型。The Evaluate Recommender module deduces the kind of prediction from the column format of the scored dataset. 例如,评分数据集可能包含:For example, the scored dataset might contain:

  • 用户-项-评分三元组User-item-rating triples
  • 用户及其推荐项Users and their recommended items

此模块还根据所进行的预测类型应用恰当的性能指标。The module also applies the appropriate performance metrics, based on the type of prediction being made.

如何配置评估推荐器How to configure Evaluate Recommender

“评估推荐器”模块将推荐模型用于“实际”数据,并将“实际”数据与预测输出进行比较。The Evaluate Recommender module compares the prediction output by using a recommendation model with the corresponding "ground truth" data. 例如,评分 SVD 推荐器模块会生成多个评分数据集,可使用评估推荐器分析它们。For example, the Score SVD Recommender module produces scored datasets that you can analyze by using Evaluate Recommender.

要求Requirements

评估推荐器需要以下数据集作为输入。Evaluate Recommender requires the following datasets as input.

测试数据集Test dataset

测试数据集包含用户-项-评分三元组形式的“实际”数据。The test dataset contains the "ground truth" data in the form of user-item-rating triples.

评分数据集Scored dataset

评分数据集包含推荐模型生成的预测。The scored dataset contains the predictions that the recommendation model generated.

此数据集中的列取决于你在评分过程中执行的预测类型。The columns in this second dataset depend on the kind of prediction that you performed during the scoring process. 例如,评分数据集可能包含以下某项:For example, the scored dataset might contain either of the following:

  • 用户、项和用户可能为该项给出的评分Users, items, and the ratings that the user would likely give for the item
  • 用户及向其推荐的项的列表A list of users and items recommended for them

指标Metrics

根据输入类型生成模型的性能指标。Performance metrics for the model are generated based on the type of input. 以下各节将详细介绍。The following sections give details.

评估预测评分Evaluate predicted ratings

评估预测评分时,评分数据集(评估推荐器的第二个输入)必须包含满足以下要求的用户-项-评分三元组:When you're evaluating predicted ratings, the scored dataset (the second input to Evaluate Recommender) must contain user-item-rating triples that meet these requirements:

  • 数据集的第一列包含用户标识符。The first column of the dataset contains the user identifiers.
  • 第二列包含项标识符。The second column contains the item identifiers.
  • 第三列包含相应的用户-项评分。The third column contains the corresponding user-item ratings.

重要

若要使计算成功,列名称必须分别为 UserItemRatingFor evaluation to succeed, the column names must be User, Item, and Rating, respectively.

评估推荐器将“实际”数据集中的评分与评分数据集的预测评分进行比较。Evaluate Recommender compares the ratings in the "ground truth" dataset to the predicted ratings of the scored dataset. 然后,评估推荐器计算平均绝对误差 (MAE) 和均方根误差 (RMSE)。It then computes the mean absolute error (MAE) and the root mean squared error (RMSE).

评估项目建议Evaluate item recommendations

评估项目建议时,使用包含为每个用户推荐的项的评分数据集:When you're evaluating item recommendations, use a scored dataset that includes the recommended items for each user:

  • 数据集的第一列必须包含用户标识符。The first column of the dataset must contain the user identifier.
  • 所有后续列都应包含相应的推荐项标识符,并按项与用户的相关度排序。All subsequent columns should contain the corresponding recommended item identifiers, ordered by how relevant an item is to the user.

在连接此数据集之前,建议对数据集进行排序,使相关度最高的项排在最前面。Before you connect this dataset, we recommend that you sort the dataset so that the most relevant items come first.

重要

若要使评估推荐器正常工作,列名称必须为 UserItem 1Item 2Item 3 等。For Evaluate Recommender to work, the column names must be User, Item 1, Item 2, Item 3 and so forth.

评估推荐器计算平均归一化折损累计增益 (NDCG),并在输出数据集中返回它。Evaluate Recommender computes the average normalized discounted cumulative gain (NDCG) and returns it in the output dataset.

由于不可能知道推荐项的真正“实际”情况,因此,评估推荐器使用测试数据集中的用户-项评分作为 NDCG 计算中的增益。Because it's impossible to know the actual "ground truth" for the recommended items, Evaluate Recommender uses the user-item ratings in the test dataset as gains in the computation of the NDCG. 若要进行评估,推荐器评分模块只能为(测试数据集中)具有“实际”评分的项生成建议。To evaluate, the recommender scoring module must only produce recommendations for items with "ground truth" ratings (in the test dataset).

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.