“双类神经网络”模块Two-Class Neural Network module

本文介绍 Azure 机器学习设计器中的一个模块。This article describes a module in Azure Machine Learning designer.

使用此模块,可以创建一个神经网络模型,该模型可用于预测只有两个值的目标。Use this module to create a neural network model that can be used to predict a target that has only two values.

使用神经网络进行分类是一种监督式学习方法,因此需要一个“带标记的数据集”(其中包含一个标签列)。Classification using neural networks is a supervised learning method, and therefore requires a tagged dataset , which includes a label column. 例如,你可以使用此神经网络模型来预测二元结果,例如患者是否具有特定疾病,或者计算机是否可能在指定时间段内失败。For example, you could use this neural network model to predict binary outcomes such as whether or not a patient has a certain disease, or whether a machine is likely to fail within a specified window of time.

在定义模型后,通过提供一个带标记的数据集和模型作为训练模型的输入来对其进行训练。After you define the model, train it by providing a tagged dataset and the model as an input to Train Model. 然后,可以使用训练后的模型针对新输入来预测值。The trained model can then be used to predict values for new inputs.

有关神经网络的详细信息More about neural networks

神经网络是一组互连的层。A neural network is a set of interconnected layers. 输入是第一层,并通过由加权边缘和节点组成的无环图连接到一个输出层。The inputs are the first layer, and are connected to an output layer by an acyclic graph comprised of weighted edges and nodes.

在输入层和输出层之间,可以插入多个隐藏层。Between the input and output layers you can insert multiple hidden layers. 大多数预测任务只需使用一个或几个隐藏层即可轻松完成。Most predictive tasks can be accomplished easily with only one or a few hidden layers. 但是,最近的研究表明,多层深度神经网络 (DNN) 可有效地完成复杂任务,如图像或语音识别。However, recent research has shown that deep neural networks (DNN) with many layers can be effective in complex tasks such as image or speech recognition. 后续层用于为不断增加的语义深度级别建模。The successive layers are used to model increasing levels of semantic depth.

输入与输出之间的关系可通过基于输入数据对神经网络进行训练来了解。The relationship between inputs and outputs is learned from training the neural network on the input data. 图形的方向是从输入层到隐藏层,再到输出层。The direction of the graph proceeds from the inputs through the hidden layer and to the output layer. 每一层中的所有节点都通过加权边缘连接到下一层中的节点。All nodes in a layer are connected by the weighted edges to nodes in the next layer.

为了针对特定输入计算网络的输出,会在隐藏层和输出层中的每个节点上计算一个值。To compute the output of the network for a particular input, a value is calculated at each node in the hidden layers and in the output layer. 此值是通过计算上一层中节点的值的加权和来设置的。The value is set by calculating the weighted sum of the values of the nodes from the previous layer. 然后会向该加权和应用一个激活函数。An activation function is then applied to that weighted sum.

配置方式How to configure

  1. 向你的管道中添加 双类神经网络 模块。Add the Two-Class Neural Network module to your pipeline. 可以在“机器学习”、“初始化”下的“分类”类别中找到此模块。You can find this module under Machine Learning , Initialize , in the Classification category.

  2. 通过设置“创建训练程序模式”选项,指定要如何对模型进行训练。Specify how you want the model to be trained, by setting the Create trainer mode option.

    • 单个参数 :如果已知如何配置模型,请选择此选项。Single Parameter : Choose this option if you already know how you want to configure the model.

    • 参数范围 :如果不确定最佳参数,可以使用 优化模型超参数模块来找到最佳参数。Parameter Range : If you are not sure of the best parameters, you can find the optimal parameters by using the Tune Model Hyperparameters module. 你提供某个值范围,然后训练程序就会循环访问多个设置组合,以确定可产生最佳结果的值组合。You provide some range of values, and the trainer iterates over multiple combinations of the settings to determine the combination of values that produces the best result.

  3. 对于“隐藏层规范”,请选择要创建的网络体系结构的类型。For Hidden layer specification , select the type of network architecture to create.

    • 完全连接的情况 :使用为双类神经网络定义的默认神经网络体系结构,如下所述:Fully connected case : Uses the default neural network architecture, defined for two-class neural networks as follows:

      • 有一个隐藏层。Has one hidden layer.

      • 输出层完全连接到隐藏层,并且隐藏层完全连接到输入层。The output layer is fully connected to the hidden layer, and the hidden layer is fully connected to the input layer.

      • 输入层中的节点数等于训练数据中的特征数。The number of nodes in the input layer equals the number of features in the training data.

      • 隐藏层中的节点数是由用户设置的。The number of nodes in the hidden layer is set by the user. 默认值为 100。The default value is 100.

      • 节点数等于类的数目。The number of nodes equals the number of classes. 对于双类神经网络,这意味着所有输入都必须映射到输出层中的两个节点之一。For a two-class neural network, this means that all inputs must map to one of two nodes in the output layer.

  4. 对于“学习比率”,请定义更正之前每次迭代要执行的步幅。For Learning rate , define the size of the step taken at each iteration, before correction. 学习比率的值越大,模型的汇聚速度就越快,但它可以超过本地最小值。A larger value for learning rate can cause the model to converge faster, but it can overshoot local minima.

  5. 对于“学习迭代数”,请指定算法处理训练事例的最大次数。For Number of learning iterations , specify the maximum number of times the algorithm should process the training cases.

  6. 对于“初始学习权重直径”,在学习过程开始时,指定节点权重。For The initial learning weights diameter , specify the node weights at the start of the learning process.

  7. 对于“动力”,指定在学习过程中要应用到之前迭代中的节点的权重。For The momentum , specify a weight to apply during learning to nodes from previous iterations

  8. 选择“随机选择示例”选项可在迭代间随机选择事例。Select the Shuffle examples option to shuffle cases between iterations. 如果取消选择此选项,则每次运行管道时,会以完全相同的顺序处理事例。If you deselect this option, cases are processed in exactly the same order each time you run the pipeline.

  9. 对于“随机数种子”,键入一个值来用作种子。For Random number seed , type a value to use as the seed.

    如果要确保同一管道的运行可重复,则指定种子值非常有用。Specifying a seed value is useful when you want to ensure repeatability across runs of the same pipeline. 否则,将使用系统时钟值作为种子,这可能会导致每次运行管道时产生略微不同的结果。Otherwise, a system clock value is used as the seed, which can cause slightly different results each time you run the pipeline.

  10. 将标记的数据集添加到管道,然后训练模型:Add a labeled dataset to the pipeline, and train the model:

    • 如果将“创建训练程序模式”设置为“单个参数”,请连接带标记的数据集和训练模型模块 。If you set Create trainer mode to Single Parameter , connect a tagged dataset and the Train Model module.

    • 如果将“创建训练程序模式”设置为“参数范围”,则连接标记的数据集并使用优化模型超参数来训练模型 。If you set Create trainer mode to Parameter Range , connect a tagged dataset and train the model by using Tune Model Hyperparameters.

    备注

    如果将参数范围传递给训练模型,则它只使用单个参数列表中的默认值。If you pass a parameter range to Train Model, it uses only the default value in the single parameter list.

    如果将一组参数值传递给优化模型超参数模块,则当它期望每个参数有一系列设置时,它会忽略这些值,并为学习器使用默认值。If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values, and uses the default values for the learner.

    如果选择“参数范围”选项并为任何参数输入单个值,则整个扫描过程中都会使用你指定的单个值,即使其他参数的一系列值发生了更改。If you select the Parameter Range option and enter a single value for any parameter, that single value you specified is used throughout the sweep, even if other parameters change across a range of values.

  11. 提交管道。Submit the pipeline.

结果Results

在训练完成后:After training is complete:

  • 若要保存已训练模型的快照,请选择“训练模型”模块右侧面板中的“输出”选项卡。To save a snapshot of the trained model, select the Outputs tab in the right panel of the Train model module. 选择“注册数据集”图标将模型保存为可重用模块。Select the Register dataset icon to save the model as a reusable module.

  • 若要使用模型进行评分,请向管道中添加 评分模型 模块。To use the model for scoring, add the Score Model module to a pipeline.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.