为 SVD 推荐器评分Score SVD Recommender

本文介绍如何使用 Azure 机器学习设计器(预览版)中的“为 SVD 推荐器评分”模块。This article describes how to use the Score SVD Recommender module in Azure Machine Learning designer (preview). 使用此模块可以通过基于奇异值分解 (SVD) 算法的经过训练的建议模型创建预测。Use this module to create predictions by using a trained recommendation model based on the Single Value Decomposition (SVD) algorithm.

SVD 推荐器可生成两种不同类型的预测:The SVD recommender can generate two different kinds of predictions:

创建第二种类型的预测时,可在以下任一模式下进行操作:When you're creating the second type of predictions, you can operate in one of these modes:

  • 生产模式会考虑所有用户或项目 。Production mode considers all users or items. 它通常用于 Web 服务。It's typically used in a web service.

    可以为其创建评分的对象不仅仅是在训练过程中显示的用户,也可以是新用户。You can create scores for new users, not just users seen during training. 有关详细信息,请参阅技术说明For more information, see the technical notes.

  • 评估模式针对可评估的精简用户或项目集进行操作 。Evaluation mode operates on a reduced set of users or items that can be evaluated. 它通常用于管道操作。It's typically used during pipeline operations.

有关 SVD 推荐器算法的详细信息,请参阅推荐器系统的矩阵分解技术For more information on the SVD recommender algorithm, see the research paper Matrix factorization techniques for recommender systems.

“为 SVD 推荐器评分”的配置方法How to configure Score SVD Recommender

此模块支持两种类型的预测,它们都有不同的要求。This module supports two types of predictions, each with different requirements.

评级预测Prediction of ratings

预测评级时,该模型会根据训练数据计算用户对特定项目的反应。When you predict ratings, the model calculates how a user will react to a particular item, given the training data. 用于进行评分的输入数据必须同时提供用户和要评分的项目。The input data for scoring must provide both a user and the item to rate.

  1. 将已训练的建议模型添加到管道,并将其连接到已训练的 SVD 推荐器 。Add a trained recommendation model to your pipeline, and connect it to Trained SVD recommender. 必须使用训练 SVD 推荐器模块来创建模型。You must create the model by using the Train SVD Recommender module.

  2. 对于“推荐器预测类型”,请选择“评级预测” 。For Recommender prediction kind, select Rating Prediction. 无需其他任何参数。No other parameters are required.

  3. 添加要对其进行预测的数据,并将其连接到要进行评分的数据集 。Add the data for which you want to make predictions, and connect it to Dataset to score.

    要为模型预测评级,输入数据集必须包含用户-项目对。For the model to predict ratings, the input dataset must contain user-item pairs.

    数据集可以选择包含第三列,用于第一列和第二列中的用户-项目对的评级。The dataset can contain an optional third column of ratings for the user-item pair in the first and second columns. 但在预测过程中将忽略第三列。But the third column will be ignored during prediction.

  4. 运行管道。Run the pipeline.

评级预测的结果Results for rating predictions

输出数据集包含三列:用户、项目以及每个输入用户和项目的预测评级。The output dataset contains three columns: users, items, and the predicted rating for each input user and item.

向用户推荐Recommendations for users

要向用户推荐项目,请提供用户和项目列表作为输入。To recommend items for users, you provide a list of users and items as input. 在此数据中,该模型利用其关于现有项目和用户的知识来生成可能对每个用户都具有吸引力的项目列表。From this data, the model uses its knowledge about existing items and users to generate a list of items with probable appeal to each user. 可自定义返回的建议数。You can customize the number of recommendations returned. 并且可以为生成建议所需的以前的建议数设置阈值。And you can set a threshold for the number of previous recommendations that are required to generate a recommendation.

  1. 将已训练的建议模型添加到管道,并将其连接到已训练的 SVD 推荐器 。Add a trained recommendation model to your pipeline, and connect it to Trained SVD recommender. 必须使用训练 SVD 推荐器模块来创建模型。You must create the model by using the Train SVD Recommender module.

  2. 要向用户列表推荐项目,请将“推荐器预测类型”设置为“项目建议” 。To recommend items for a list of users, set Recommender prediction kind to Item Recommendation.

  3. 对于“建议的项目选择”,请指示是在生产中使用评分模块还是用于模型评估 。For Recommended item selection, indicate whether you're using the scoring module in production or for model evaluation. 请选择以下任一值:Choose one of these values:

    • 从所有项目:如果设置要在 Web 服务或生产中使用的管道,请选择此选项。From All Items: Select this option if you're setting up a pipeline to use in a web service or in production. 此选项会启用“生产模式” 。This option enables production mode. 该模块基于训练过程中显示的所有项目提出建议。The module makes recommendations from all items seen during training.

    • 从评级项目(用于模型评估) :如果要开发或测试模型,请选择此选项。From Rated Items (for model evaluation): Select this option if you're developing or testing a model. 此选项会启用“评估模式” 。This option enables evaluation mode. 该模块仅基于输入数据集中已评级的项目提出建议。The module makes recommendations only from those items in the input dataset that have been rated.

    • 从未评级项目(用于向用户推荐新项目) :如果希望该模块仅基于训练数据集中尚未评级的项目提出建议,请选择此选项。From Unrated Items (to suggest new items to users): Select this option if you want the module to make recommendations only from those items in the training dataset that have not been rated.

  4. 添加要对其进行预测的数据集,并将其连接到要进行评分的数据集 。Add the dataset for which you want to make predictions, and connect it to Dataset to score.

    • 对于“从所有项目”,输入数据集应包含一列 。For From All Items, the input dataset should consist of one column. 该列包含要为其提出建议的用户的标识符。It contains the identifiers of users for which to make recommendations.

      数据集可以包含额外的两列,分别为项目标识符和评级,但这两列将被忽略。The dataset can include an extra two columns of item identifiers and ratings, but these two columns are ignored.

    • 对于“从评级项目(用于模型评估)”,输入数据集应包含用户-项目对 。For From Rated Items (for model evaluation), the input dataset should consist of user-item pairs. 第一列应包含用户标识符。The first column should contain the user identifier. 第二列应包含相应的项目标识符。The second column should contain the corresponding item identifiers.

      数据集可以包含第三列(用户-项目评级),但此列将被忽略。The dataset can include a third column of user-item ratings, but this column is ignored.

    • 对于“从未评级项目(用于向用户推荐新项目)”,输入数据集应包含用户-项目对 。For From Unrated Items (to suggest new items to users), the input dataset should consist of user-item pairs. 第一列应包含用户标识符。The first column should contain the user identifier. 第二列应包含相应的项目标识符。The second column should contain the corresponding item identifiers.

    数据集可以包含第三列(用户-项目评级),但此列将被忽略。The dataset can include a third column of user-item ratings, but this column is ignored.

  5. 向用户推荐的最大项目数:输入要针对每个用户返回的项目数。Maximum number of items to recommend to a user: Enter the number of items to return for each user. 默认情况下,该模块会推荐 5 个项目。By default, the module recommends five items.

  6. 每个用户的建议池的最小大小:输入一个值,该值指示所需的以前的建议数。Minimum size of the recommendation pool per user: Enter a value that indicates how many prior recommendations are required. 默认情况下,此参数设置为“2”,这意味着至少向其他两个用户推荐了该项目 。By default, this parameter is set to 2, meaning at least two other users have recommended the item.

    仅当在评估模式下进行评分时才使用此选项。Use this option only if you're scoring in evaluation mode. 如果选择“从所有项目”或“从未评级项目(用于向用户推荐新项目)”,则此选项不可用 。The option is not available if you select From All Items or From Unrated Items (to suggest new items to users).

  7. 对于“从未评级项目(用于向用户推荐新项目)”,请使用名为“训练数据”的第三个输入端口来删除预测结果中已评级的项目 。For From Unrated Items (to suggest new items to users), use the third input port, named Training Data, to remove items that have already been rated from the prediction results.

    要应用此筛选器,请将原始训练数据集连接到该输入端口。To apply this filter, connect the original training dataset to the input port.

  8. 提交管道。Submit the pipeline.

项目建议的结果Results of item recommendation

“为 SVD 推荐器评分”返回的已评分数据集会列出每个用户的建议项目:The scored dataset returned by Score SVD Recommender lists the recommended items for each user:

  • 第一列包含用户标识符。The first column contains the user identifiers.
  • 根据“向用户推荐的最大项目数”部分设置的值,将生成多个其他列 。A number of additional columns are generated, depending on the value that you set for Maximum number of items to recommend to a user. 每列都包含一个推荐项目(按标识符排列)。Each column contains a recommended item (by identifier). 这些建议项目按用户-项目相关性排序。The recommendations are ordered by user-item affinity. 具有最高相关性的项目放在“项目 1”列中 。The item with highest affinity is put in column Item 1.

警告

请勿使用评估推荐器模块来评估此已评分的数据集。You can't evaluate this scored dataset by using the Evaluate Recommender module.

技术说明Technical notes

如果具有一个带有 SVD 推荐器的管道,并将该模型移动到生产环境中,请注意,在评估模式中使用该推荐器和在生产模式中使用该推荐器之间存在一些主要区别。If you have a pipeline with the SVD recommender, and you move the model to production, be aware that there are key differences between using the recommender in evaluation mode and using it in production mode.

根据定义,评估需要可以在测试集中根据基本事实进行验证的预测 。Evaluation, by definition, requires predictions that can be verified against the ground truth in a test set. 评估推荐器时,它必须只预测在测试集中已评级的项目。When you evaluate the recommender, it must predict only items that have been rated in the test set. 这会限制预测的可能值。This restricts the possible values that are predicted.

操作模型时,通常会更改预测模式,以根据所有可能的项目提出建议,从而获得最佳预测。When you operationalize the model, you typically change the prediction mode to make recommendations based on all possible items, in order to get the best predictions. 对于其中许多预测,并没有相应的基本事实。For many of these predictions, there's no corresponding ground truth. 因此,无法以在管道操作中的相同方式验证建议的准确性。So the accuracy of the recommendation can't be verified in the same way as during pipeline operations.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.