训练 Wide & Deep 推荐器Train Wide & Deep Recommender

本文介绍了如何使用 Azure 机器学习设计器(预览版)中的“训练 Wide & Deep 推荐器”模块,来训练建议模型。This article describes how to use the Train Wide & Deep Recommender module in Azure Machine Learning Designer (preview), to train a recommendation model. 本模块基于由 Google 提出的 Wide & Deep 学习。This module is based on Wide & Deep learning, which is proposed by Google.

“训练 Wide & Deep 推荐器”模块读取“用户-项-评分”三元组数据集,以及(可选)某些用户和项特征。The Train Wide & Deep Recommender module reads a dataset of user-item-rating triples and, optionally, some user and item features. 它返回训练后的 Wide & Deep 推荐器。It returns a trained Wide & Deep recommender. 然后,可以通过为 Wide and Deep 推荐器评分模块使用训练后的模型来生成评分预测或建议。You can then use the trained model to generate rating predictions or recommendations by using the Score Wide and Deep Recommender module.

有关建议模型和 Wide & Deep 推荐器的详细信息More about recommendation models and the Wide & Deep recommender

建议系统的主要目标是向系统的用户推荐一个或多个项目。 The main aim of a recommendation system is to recommend one or more items to users of the system. 项目示例可能为电影、餐馆、书籍或歌曲。Examples of an item could be a movie, restaurant, book, or song. 用户可以是具有项目首选项的人员、一组人员或其他实体。A user could be a person, group of persons, or other entity with item preferences.

推荐器系统有两种主要方法。There are two principal approaches to recommender systems.

  • 第一种是“基于内容”方法,该方法同时使用用户和项目的特性。The first is the content-based approach, which makes use of features for both users and items. 可以按年龄和性别等属性来描述用户,可按作者和制造商等属性来描述项。Users may be described by properties such as age and gender, and items may be described by properties such as author and manufacturer. 你可以在社交婚介网站上找到基于内容的建议系统的典型示例。Typical examples of content-based recommendation systems can be found on social matchmaking sites.
  • 第二种方法是“协作筛选”,它仅使用用户和项的标识符,并从用户对项给出的评分的(稀疏)矩阵中获取有关这些实体的隐含信息。The second approach is collaborative filtering, which uses only identifiers of the users and the items and obtains implicit information about these entities from a (sparse) matrix of ratings given by the users to the items. 我们可以通过某个用户已评级的项目以及对相同项目进行了评级的其他用户来了解该用户。We can learn about a user from the items they have rated and from other users who have rated the same items.

Wide & Deep 推荐器将这些方法结合在一起,即结合使用协作筛选和基于内容的方法。The Wide & Deep recommender combines these approaches, using collaborative filtering with a content-based approach. 因此,它被视为“混合推荐器”。It is therefore considered a hybrid recommender.

工作原理:如果用户对系统而言相对“较新”(系统尚未获取多少用户信息),可通过使用有关用户的特征信息来改进预测,从而解决众所周知的“冷启动”问题。How this works: When a user is relatively new to the system, predictions are improved by making use of the feature information about the user, thus addressing the well-known "cold-start" problem. 但是,一旦从特定用户收集了足够数量的评分,就可以根据特定评分而不是仅根据他们的特征对其进行完全个性化的预测。However, once you have collected a sufficient number of ratings from a particular user, it is possible to make fully personalized predictions for them based on their specific ratings rather than on their features alone. 因此,可从基于内容的建议平稳过渡到基于协同筛选的建议。Hence, there is a smooth transition from content-based recommendations to recommendations based on collaborative filtering. 即使用户或项目特征不可用,Wide & Deep 推荐器仍会在协同筛选模式下运行。Even if user or item features are not available, Wide & Deep recommender will still work in its collaborative filtering mode.

有关 Wide & Deep 推荐器及其基础概率算法的更多详细信息,请参阅相关研究论文:推荐器系统的 Wide & Deep 学习More details on the Wide & Deep recommender and its underlying probabilistic algorithm can be found in the relevant research paper: Wide & Deep Learning for Recommender Systems.

如何配置“训练 Wide & Deep 推荐器”How to configure Train Wide & Deep Recommender

准备数据Prepare data

在尝试使用该模块之前,数据必须已采用建议模型预期的格式。Before trying to use the module, it is essential that your data be in the format expected by the recommendation model. 需要“用户-项-评分”三元组的训练数据集,也可以在单独的数据集中分别包含用户特征和项目特征(如果可用)。A training data set of user-item-rating triples is required, but you can also include user features and item features (if available), in separate datasets.

所需的用户-项-评分数据集Required dataset of user-item-ratings

用于训练的输入数据必须包含正确格式的正确数据类型:The input data used for training must contain the right type of data in the correct format:

  • 第一列必须包含用户标识符。The first column must contain user identifiers.
  • 第二列必须包含项目标识符。The second column must contain item identifiers.
  • 第三列包含用户-项目对的评级。The third column contains the rating for the user-item pair. 评级值必须是数字类型的。Rating values must be numeric type.

例如,一组典型的用户-项-评分可能如下所示:For example, a typical set of user-item-ratings might look like this:

UserIdUserId MovieIdMovieId RatingRating
11 6864668646 10 个10
223223 3138131381 10 个10

用户功能数据集(可选)User features dataset (optional)

“用户特征”的数据集必须包含用户的标识符,并使用用户-项-评分数据集第一列中提供的相同标识符。The dataset of user features must contain identifiers for users, and use the same identifiers that were provided in the first column of the users-items-ratings dataset. 其余列可以包含任意数量的用于描述用户的特征。The remaining columns can contain any number of features that describe the users.

例如,一组典型的用户功能可能如下所示:For an example, a typical set of user features might look like this:

UserIdUserId AgeAge 性别Gender 兴趣Interest 位置Location
11 2525 male 戏剧Drama 欧洲Europe
223223 4040 female 浪漫Romance 亚洲Asia

项目特征数据集(可选)Item features dataset (optional)

项目特征的数据集必须在其第一列中包含项标识符。The dataset of item features must contain item identifiers in its first column. 其余列可以包含任意数量的项目的描述性特征。The remaining columns can contain any number of descriptive features for the items.

例如,一组典型的项目特征可能如下所示:For an example, a typical set of item features might look like this:

MovieIdMovieId 标题Title 源语言Original Language 流派Genres 年龄Year
6864668646 教父The Godfather 英语English 戏剧Drama 19721972
3138131381 Gone with the Wind 英语English 历史记录History 19391939

定型模型Train the model

  1. 在设计器(预览)中将“训练 Wide and Deep 推荐器”模块添加到你的试验,并将其连接到训练数据集。Add the Train Wide and Deep Recommender module to your experiment in the designer (preview), and connect it to the training dataset.

  2. 如果有一个单独的用户特征和/或项目特征的数据集,将它们连接到“训练 Wide and Deep 推荐器”模块。If you have a separate dataset of either user features and/or item features, connect them to the Train Wide and Deep Recommender module.

    • 用户特征数据集:将描述用户的数据集连接到第二个输入。User features dataset: Connect the dataset that describes users to the second input.
    • 项目特征数据集:将描述项的数据集连接到第三个输入。Item features dataset: Connect the dataset that describes items to the third input.
  3. 时期:指示算法应处理整个训练数据的次数。Epochs: indicate how many times the algorithm should process the whole training data.

    这个数字越高,训练就越充分;但是,训练会花费更多的时间,并可能导致过度拟合。The higher this number, the more adequate the training; however, training costs more time and may cause overfitting.

  4. 批处理大小:键入一个训练步骤中使用的训练示例数。Batch size: type the number of training examples utilized in one training step.

    此超参数会影响训练速度。This hyperparameter can influence the training speed. 批处理越大,时间成本时期越短,但可能会增加收敛时间。A higher batch size leads to a less time cost epoch, but may increase the convergence time. 如果批处理太大,无法适应 GPU/CPU,可能会引发内存错误。And if batch is too big to fit GPU/CPU, a memory error may raised.

  5. Wide 部分优化器:选择一个优化器,对模型的 wide 部分应用梯度。Wide part optimizer: select one optimizer to apply gradients to the wide part of the model.

  6. Wide 优化器学习速率:输入 0.0 和 2.0 之间的数字,该数字定义 wide 部分优化器的学习速率。Wide optimizer learning rate: enter a number between 0.0 and 2.0 that defines the learning rate of wide part optimizer.

    此超参数确定每个训练步骤的步骤大小,同时不断接近损失函数的最小值。This hyperparameter determines the step size at each training step while moving toward a minimum of loss function. 学习速率过高可能导致学习跳升超过最小值,而学习速率过小可能会导致收敛问题。A too big learning rate may cause learning jump over the minima, while a too small learning rate may cause convergence problem.

  7. 交叉特征维度:键入交叉用户 ID 和项 ID 特征的维度。Crossed feature dimension: type the dimension of crossed user ids and item ids feature.

    默认情况下,Wide & Deep 推荐器对用户 ID 和项目 ID 功能执行跨产品转换。The Wide & Deep recommender performs cross-product transformation over user id and item id features by default. 将根据此数字对交叉结果进行哈希处理,以确保维持该维度。The crossed result will be hashed according to this number to ensure the dimension.

  8. Deep 部分优化器:选择一个优化器,对模型的 deep 部分应用梯度。Deep part optimizer: select one optimizer to apply gradients to the deep part of the model.

  9. Deep 优化器学习速率:输入介于 0.0 和 2.0 之间的数字,该数字定义 deep 部分优化器的学习速率。Deep optimizer learning rate: enter a number between 0.0 and 2.0 that defines the learning rate of deep part optimizer.

  10. 用户嵌套维度:键入整数以指定用户 ID 嵌套的维度。User embedding dimension: type an integer to specify the dimension of user id embedding.

    Wide & Deep 推荐器为 Wide 部分和 Deep 分创建共享的用户 ID 嵌套和项目 ID 嵌套。The Wide & Deep recommender creates the shared user id embeddings and item id embeddings for both wide part and deep part.

  11. 嵌套维度:键入整数以指定项目 ID 嵌套的维度。Item embedding dimension: type an integer to specify the dimension of item id embedding.

  12. 分类特征嵌套维度:输入整数以指定分类特征嵌套的维度。Categorical features embedding dimension: enter an integer to specify the dimensions of categorical feature embeddings.

    在 Wide & Deep 推荐器的 deep 组件中,会为每个分类特征习得一个嵌套矢量。In deep component of Wide & Deep recommender, a embedding vector is learnt for each categorical feature. 这些嵌套矢量具有相同的维度。And these embedding vectors share the same dimension.

  13. 隐藏单位:键入 deep 组件的隐藏节点数。Hidden units: type the number of hidden nodes of deep component. 每个层中的节点数用逗号分隔。The nodes number in each layer is separated by commas. 例如,按类型“1000,500,100”,指定 deep 组件有三个层,第一层到最后一层分别有 1000 个节点、500 个节点和 100 个节点。For example, by type "1000,500,100", you specify the deep component has three layers, with the first layer to the last respectively has 1000 nodes, 500 nodes and 100 nodes.

  14. 激活函数:选择一个应用于每个层的激活函数,默认值为 ReLU。Activation function: select one activation function applied to each layer, the default is ReLU.

  15. 丢弃:输入 0.0 和 1.0 之间的数字,以确定训练期间每个层中丢弃输出的概率。Dropout: enter a number between 0.0 and 1.0 to determine the probability the outputs will be dropped in each layer during training.

    丢弃是一种可以防止神经网络过度拟合的正则化方法。Dropout is a regularization method to prevent neural networks from overfitting. 关于此值的一个常见决策是从 0.5 开始,对于许多网络和任务而言,这一值似乎都接近最优值。One common decision for this value is to start with 0.5, which seems to be close to optimal for a wide range of networks and tasks.

  16. 批标准化:选择此选项可在 deep 组件中的每个隐藏层之后使用批标准化。Batch Normalization: select this option to use batch normalization after each hidden layer in the deep component.

    批标准化是应对网络训练中内部协变量偏移问题的一种技术。Batch normalization is a technique to fight internal covariate shift problem during networks training. 一般来说,它可以帮助提高网络的速度、性能和稳定性。In general, it can help to improve the speed, performance and stability of the networks.

  17. 运行管道。Run the pipeline.

技术说明Technical notes

Wide & Deep 联合训练广义线性模型和深度神经网络,结合了记忆和泛华的优点。The Wide & Deep jointly trains wide linear models and deep neural networks to combine the strengths of memorization and generalization. Wide 组件接受一组原始特征和特征转换来记忆特征交互。The wide component accepts a set of raw features and feature transformations to memorize feature interactions. 通过弱化特征工程,深层组件通过低维密集特征嵌套,泛化到前所未见的特征组合。And with less feature engineering, the deep component generalize to unseen feature combinations through low-dimensional dense feature embeddings.

在实现 Wide & Deep 推荐器时,模块使用默认模型结构。In the implementation of Wide & Deep recommender, the module uses a default model structure. Wide 组件将用户嵌套、项目嵌套以及用户 ID 和项目 ID 的跨产品转换作为输入使用。The wide component takes user embeddings, item embeddings and the cross-product transformation of user ids and item ids as input. 对于模型的 deep 部分,将为每个分类特征习得一个嵌套矢量。For the deep part of the model, an embedding vector is learnt for each categorical features. 这些矢量随后与其他数值特征矢量一起,被馈入深层前馈神经网络。Together with other numeric feature vectors, these vectors are then fed into the deep feed-forward neural network. 通过汇总 wide 部分和 deep 部分的最终输出对数几率,并将其作为预测(最终将其转入常见损失函数用于联合训练)来合并这两个部分。The wide part and deep part are combined by summing up their final output log odds as the prediction, which finally goes to one common loss function for joint training.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available of Azure Machine Learning.