有关机器学习工作室(经典)的 Net# 神经网络规范语言的指南Guide to Net# neural network specification language for Machine Learning Studio (classic)

适用对象: 适用于.机器学习工作室(经典) 不适用于.Azure 机器学习APPLIES TO: Applies to.Machine Learning Studio (classic) Does not apply to.Azure Machine Learning

Net# 是由 Microsoft 开发的用于定义复杂神经网络体系结构(例如深度神经网络或任意维度的卷积)的语言。Net# is a language developed by Microsoft that is used to define complex neural network architectures such as deep neural networks or convolutions of arbitrary dimensions. 可使用复杂的结构改进图像、视频或音频等数据的学习。You can use complex structures to improve learning on data such as image, video, or audio.

在下列上下文中,可以使用 Net# 体系结构规范:You can use a Net# architecture specification in these contexts:

本文介绍了使用 Net# 开发自定义神经网络的基本概念和所需语法:This article describes the basic concepts and syntax needed to develop a custom neural network using Net#:

  • 神经网络要求以及如何定义主要组件Neural network requirements and how to define the primary components
  • Net# 规范语言的语法和关键字The syntax and keywords of the Net# specification language
  • 使用 Net# 创建的自定义神经网络的示例Examples of custom neural networks created using Net#

神经网络基础知识Neural network basics

神经网络结构包括了在层中组织的节点,以及节点之间的加权连接(或边缘)。A neural network structure consists of nodes that are organized in layers, and weighted connections (or edges) between the nodes. 连接是有方向的,每个连接具有一个源节点和一个目标节点。The connections are directional, and each connection has a source node and a destination node.

每个可训练层(隐藏或输出层)具有一个或多个连接捆绑Each trainable layer (a hidden or an output layer) has one or more connection bundles. 连接捆绑包括一个源层和该源层中的连接规范。A connection bundle consists of a source layer and a specification of the connections from that source layer. 给定捆绑中的所有连接共享源层和目标层。All the connections in a given bundle share source and destination layers. 在 Net # 中,连接捆绑将视为属于捆绑的目标层。In Net#, a connection bundle is considered as belonging to the bundle's destination layer.

Net# 支持各种类型的连接捆绑,可自定义映射到隐藏层和映射到输出的输入方式。Net# supports various kinds of connection bundles, which let you customize the way inputs are mapped to hidden layers and mapped to the outputs.

默认或标准捆绑是一个完整捆绑,其中源层中的每个节点都连接到目标层中的每个节点。The default or standard bundle is a full bundle, in which each node in the source layer is connected to every node in the destination layer.

此外,Net# 支持以下四种高级连接捆绑:Additionally, Net# supports the following four kinds of advanced connection bundles:

  • 筛选捆绑Filtered bundles. 可通过使用源层节点和目标层节点的位置来定义谓词。You can define a predicate by using the locations of the source layer node and the destination layer node. 每当谓词为 True,节点即连接。Nodes are connected whenever the predicate is True.

  • 卷积捆绑Convolutional bundles. 可在源层中定义节点的小范围邻域。You can define small neighborhoods of nodes in the source layer. 目标层中的每个节点连接到源层中节点的一个邻域。Each node in the destination layer is connected to one neighborhood of nodes in the source layer.

  • 池捆绑响应规范化捆绑。Pooling bundles and Response normalization bundles. 这些与卷积捆绑类似,用户可在其中定义源层中小范围的邻域。These are similar to convolutional bundles in that the user defines small neighborhoods of nodes in the source layer. 不同之处在于这些捆绑中边缘的权重不可训练。The difference is that the weights of the edges in these bundles are not trainable. 相反,为源节点值应用预定义的函数可确定目标节点值。Instead, a predefined function is applied to the source node values to determine the destination node value.

支持的自定义项Supported customizations

在 Azure 机器学习工作室(经典)中创建的神经网络模型的体系结构可通过使用 Net# 广泛自定义。The architecture of neural network models that you create in Azure Machine Learning Studio (classic) can be extensively customized by using Net#. 可以:You can:

  • 创建隐藏层并控制每层的节点数。Create hidden layers and control the number of nodes in each layer.
  • 指定如何相互连接层。Specify how layers are to be connected to each other.
  • 定义特殊的连接结构,如卷积和权重共享捆绑。Define special connectivity structures, such as convolutions and weight sharing bundles.
  • 指定不同的激活函数。Specify different activation functions.

有关规范语言语法的详细信息,请参阅 结构规范For details of the specification language syntax, see Structure Specification.

有关为某些常见机器学习任务定义神经网络的示例(从单一到复杂),请参阅示例For examples of defining neural networks for some common machine learning tasks, from simplex to complex, see Examples.

一般要求General requirements

  • 必须正好是一个输出层,至少一个输入层,以及零个或多个隐藏层。There must be exactly one output layer, at least one input layer, and zero or more hidden layers.
  • 每层都有固定节点数,在任意维度的举行数组中按概念排列。Each layer has a fixed number of nodes, conceptually arranged in a rectangular array of arbitrary dimensions.
  • 输入层没有关联的训练参数,并表示实例数据进入网络的点。Input layers have no associated trained parameters and represent the point where instance data enters the network.
  • 可训练层(隐藏和输出层)具有关联的培训参数,称为权重和偏差。Trainable layers (the hidden and output layers) have associated trained parameters, known as weights and biases.
  • 源和目标节点必须处于不同的层。The source and destination nodes must be in separate layers.
  • 连接必须是非循环的;换句话说,不能有导回初始源节点的连接链。Connections must be acyclic; in other words, there cannot be a chain of connections leading back to the initial source node.
  • 输出层不能是连接捆绑的源层。The output layer cannot be a source layer of a connection bundle.

结构规范Structure specifications

神经网络结构规范由三部分组成:常数声明层声明连接声明A neural network structure specification is composed of three sections: the constant declaration, the layer declaration, the connection declaration. 还有另一可选部分:共享声明There is also an optional share declaration section. 可以按任意顺序指定这些部分。The sections can be specified in any order.

常数声明Constant declaration

常数声明是可选的。A constant declaration is optional. 它提供了一种方法来定义神经网络定义中其他位置使用的值。It provides a means to define values used elsewhere in the neural network definition. 声明语句包含标识符(后跟一个等号和值表达式)。The declaration statement consists of an identifier followed by an equal sign and a value expression.

例如,下面的语句定义一个常量 xFor example, the following statement defines a constant x:

Const X = 28;

要同时定义两个或多个常量,可将标识符名称和值括在括号内,并使用分号隔开。To define two or more constants simultaneously, enclose the identifier names and values in braces, and separate them by using semicolons. 例如:For example:

Const { X = 28; Y = 4; }

每个赋值表达式的右侧可以是整数、实数、 布尔值(True 或 False)或数学表达式。The right-hand side of each assignment expression can be an integer, a real number, a Boolean value (True or False), or a mathematical expression. 例如:For example:

Const { X = 17 * 2; Y = true; }

层声明Layer declaration

层声明是必需的。The layer declaration is required. 它定义层的大小和源,包括其连接捆绑和属性。It defines the size and source of the layer, including its connection bundles and attributes. 声明语句以层(输入、隐藏或输出)名称为开头,后跟层的维度(正整数的元组)。The declaration statement starts with the name of the layer (input, hidden, or output), followed by the dimensions of the layer (a tuple of positive integers). 例如:For example:

input Data auto;
hidden Hidden[5,20] from Data all;
output Result[2] from Hidden all;
  • 维度的乘积是层中节点数。The product of the dimensions is the number of nodes in the layer. 在此示例中,有两个维度 [5,20],这意味着层中有 100 个节点。In this example, there are two dimensions [5,20], which means there are 100 nodes in the layer.
  • 层可按任意顺序进行声明,但有一个例外:如果定义了多个输入层,那么声明的顺序必须与输入数据中的功能顺序匹配。The layers can be declared in any order, with one exception: If more than one input layer is defined, the order in which they are declared must match the order of features in the input data.

若要指定自动确定层中节点数,请使用 auto 关键字。To specify that the number of nodes in a layer be determined automatically, use the auto keyword. auto 关键字具有不同的效果,具体取决于层:The auto keyword has different effects, depending on the layer:

  • 在输入层声明中,节点数是输入数据中的功能数。In an input layer declaration, the number of nodes is the number of features in the input data.
  • 在隐藏层声明中,节点数是隐藏节点数的参数值指定的数。In a hidden layer declaration, the number of nodes is the number that is specified by the parameter value for Number of hidden nodes.
  • 在输出层声明中,双类分类的节点数是 2,回归的节点数是 1,最多分类的节点数与输出节点数相等。In an output layer declaration, the number of nodes is 2 for two-class classification, 1 for regression, and equal to the number of output nodes for multiclass classification.

例如,以下网络定义可自动确定所有层的大小:For example, the following network definition allows the size of all layers to be automatically determined:

input Data auto;
hidden Hidden auto from Data all;
output Result auto from Hidden all;

可训练层(隐藏或输出层)的层声明可选择包括输出函数(也称为激活函数),这会分类模型默认为 sigmoid,回归模型默认为 linearA layer declaration for a trainable layer (the hidden or output layers) can optionally include the output function (also called an activation function), which defaults to sigmoid for classification models, and linear for regression models. 即使使用默认值,为了清楚起见,也可以显式声明激活函数。Even if you use the default, you can explicitly state the activation function, if desired for clarity.

支持以下输出函数:The following output functions are supported:

  • sigmoidsigmoid
  • linearlinear
  • softmaxsoftmax
  • rlinearrlinear
  • squaresquare
  • sqrtsqrt
  • srlinearsrlinear
  • absabs
  • tanhtanh
  • brlinearbrlinear

例如,下面的声明使用 softmax 函数:For example, the following declaration uses the softmax function:

output Result [100] softmax from Hidden all;

连接声明Connection declaration

定义可训练层之后,必须声明已定义的层中的连接。Immediately after defining the trainable layer, you must declare connections among the layers you have defined. 连接捆绑以关键字 from 开头,后跟捆绑的源层名称以及要创建的连接捆绑的种类。The connection bundle declaration starts with the keyword from, followed by the name of the bundle's source layer and the kind of connection bundle to create.

当前支持五种类型的连接捆绑:Currently, five kinds of connection bundles are supported:

  • 完整捆绑,由关键字 all 指示Full bundles, indicated by the keyword all
  • 经筛选的捆绑,由后跟谓词表达式的关键字 where 指示Filtered bundles, indicated by the keyword where, followed by a predicate expression
  • 卷积捆绑,由后跟卷积属性的关键字 convolve 指示Convolutional bundles, indicated by the keyword convolve, followed by the convolution attributes
  • 捆绑,由关键字 max poolmean pool 指示Pooling bundles, indicated by the keywords max pool or mean pool
  • 响应规范捆绑,由关键字 response norm 指示Response normalization bundles, indicated by the keyword response norm

完整捆绑Full bundles

完整捆绑包括源层中每个节点到目标层中每个节点的连接。A full connection bundle includes a connection from each node in the source layer to each node in the destination layer. 这是默认网络连接类型。This is the default network connection type.

筛选捆绑Filtered bundles

筛选连接捆绑规范包含一个谓词,在语法表达上,更类似 C# lambda 表达式。A filtered connection bundle specification includes a predicate, expressed syntactically, much like a C# lambda expression. 下面的示例定义两个筛选捆绑:The following example defines two filtered bundles:

input Pixels [10, 20];
hidden ByRow[10, 12] from Pixels where (s,d) => s[0] == d[0];
hidden ByCol[5, 20] from Pixels where (s,d) => abs(s[1] - d[1]) <= 1;
  • ByRow 的谓词中,s 是输入层 Pixels 的节点的矩形数组中表示索引的参数,d 是隐藏层 ByRow 的节点的数组中表示索引的参数。In the predicate for ByRow, s is a parameter representing an index into the rectangular array of nodes of the input layer, Pixels, and d is a parameter representing an index into the array of nodes of the hidden layer, ByRow. sd 的类型是长度为 2 的整数元组。The type of both s and d is a tuple of integers of length two. 从概念上讲,s 涵盖了 0 <= s[0] < 100 <= s[1] < 20 的所有整数对,d 涵盖了 0 <= d[0] < 100 <= d[1] < 12 的所有整数对。Conceptually, s ranges over all pairs of integers with 0 <= s[0] < 10 and 0 <= s[1] < 20, and d ranges over all pairs of integers, with 0 <= d[0] < 10 and 0 <= d[1] < 12.

  • 在谓词表达式的右侧端上有一个条件。On the right-hand side of the predicate expression, there is a condition. 在本例中,对于 sd 的每个值,该条件为 True,从源层节点到目标层节点有一个边缘。In this example, for every value of s and d such that the condition is True, there is an edge from the source layer node to the destination layer node. 因此,在所有 s[0] 等于 d[0] 的情况下,此筛选表达式指示捆绑包括从由 s 定义的节点到由 d 定义的节点的连接。Thus, this filter expression indicates that the bundle includes a connection from the node defined by s to the node defined by d in all cases where s[0] is equal to d[0].

或者,可以为筛选的捆绑指定一组权重。Optionally, you can specify a set of weights for a filtered bundle. Weights 属性的值必须是浮点值的元组,其长度与捆绑定义的连接数匹配。The value for the Weights attribute must be a tuple of floating point values with a length that matches the number of connections defined by the bundle. 默认情况下,权重是随机生成的。By default, weights are randomly generated.

权重值按照目标节点索引进行分组。Weight values are grouped by the destination node index. 也就是说,在源索引顺序中,如果第一个目标节点连接到 K 源节点,则 Weights 元组的前 K 个元素为第一个目标节点的权重。That is, if the first destination node is connected to K source nodes, the first K elements of the Weights tuple are the weights for the first destination node, in source index order. 这同样适用于其他目标节点。The same applies for the remaining destination nodes.

可直接将权重指定为常量值。It's possible to specify weights directly as constant values. 例如,如果以前了解权重,则可以使用此语法将其指定为常量:For example, if you learned the weights previously, you can specify them as constants using this syntax:

const Weights_1 = [0.0188045055, 0.130500451, ...]

卷积捆绑Convolutional bundles

如果训练数据具有同类结构,则通常使用卷积捆绑来了解数据的高级功能。When the training data has a homogeneous structure, convolutional connections are commonly used to learn high-level features of the data. 例如,映像、音频或视频数据中的空间或临时维数可以很统一。For example, in image, audio, or video data, spatial or temporal dimensionality can be fairly uniform.

卷积捆绑使用通过维度滑动的矩形内核Convolutional bundles employ rectangular kernels that are slid through the dimensions. 实质上,每个内核定义一组在本地邻域中应用的权重,称为内核应用程序Essentially, each kernel defines a set of weights applied in local neighborhoods, referred to as kernel applications. 每个内核应用程序对应一个源层中的节点,称为内核节点Each kernel application corresponds to a node in the source layer, which is referred to as the central node. 内核的权重在多个连接中共享。The weights of a kernel are shared among many connections. 在卷积捆绑中,每个内核都是矩形状的,并且所有内核应用程序都具有相同大小。In a convolutional bundle, each kernel is rectangular and all kernel applications are the same size.

卷积捆绑支持一下属性:Convolutional bundles support the following attributes:

InputShape 可定义用于此卷积捆绑的源层维数。InputShape defines the dimensionality of the source layer for the purposes of this convolutional bundle. 值必须是正整数的元组。The value must be a tuple of positive integers. 整数的乘积必须等于源层中的节点数,否则,不需要与源层声明的维数匹配。The product of the integers must equal the number of nodes in the source layer, but otherwise, it does not need to match the dimensionality declared for the source layer. 此元组的长度会成为卷积捆绑的实参数量值。The length of this tuple becomes the arity value for the convolutional bundle. 通常,实参数量表示函数可采用的参数或操作的数量。Typically arity refers to the number of arguments or operands that a function can take.

若要定义内核的形状和位置,可使用属性 KernelShapeStridePaddingLowerPadUpperPadTo define the shape and locations of the kernels, use the attributes KernelShape, Stride, Padding, LowerPad, and UpperPad:

  • KernelShape:(必需)为卷积捆绑定义每个内核的维数。KernelShape: (required) Defines the dimensionality of each kernel for the convolutional bundle. 值必须是正整数的元组,其长度等于捆绑的实参数量。The value must be a tuple of positive integers with a length that equals the arity of the bundle. 此元组的每个组件必须不超过 InputShape 的相应组件。Each component of this tuple must be no greater than the corresponding component of InputShape.

  • Stride:(可选)定义卷积的滑动步大小(每个维度的每步大小),即中央节点之间的距离。Stride: (optional) Defines the sliding step sizes of the convolution (one step size for each dimension), that is the distance between the central nodes. 值必须是正整数的元组,其长度为捆绑的实参数量。The value must be a tuple of positive integers with a length that is the arity of the bundle. 此元组的每个组件必须不超过 KernelShape 的相应组件。Each component of this tuple must be no greater than the corresponding component of KernelShape. 默认值为一个元组,其所有组件都等于 1。The default value is a tuple with all components equal to one.

  • Sharing:(可选)定义卷积的每个维度的权重共享。Sharing: (optional) Defines the weight sharing for each dimension of the convolution. 值可以是单个布尔值或布尔值的元组(其长度为捆绑的实参数量)。The value can be a single Boolean value or a tuple of Boolean values with a length that is the arity of the bundle. 单个布尔值扩展为正确长度的元组,所有组件都等于指定值。A single Boolean value is extended to be a tuple of the correct length with all components equal to the specified value. 默认值为包含所有 True 值的元组。The default value is a tuple that consists of all True values.

  • MapCount:(可选)为卷积捆绑定义功能映射数。MapCount: (optional) Defines the number of feature maps for the convolutional bundle. 值可以是单个正整数或正整数的元组(其长度为捆绑的实参数量)。The value can be a single positive integer or a tuple of positive integers with a length that is the arity of the bundle. 单个整数值扩展为正确长度的元组,第一个组件等于指定值,其他所有组件等于 1.A single integer value is extended to be a tuple of the correct length with the first components equal to the specified value and all the remaining components equal to one. 默认值为一。The default value is one. 功能映射的总数是元组组件的乘积。The total number of feature maps is the product of the components of the tuple. 组件之间的总数的因数分解可确定目标节点中功能映射值的分组方式。The factoring of this total number across the components determines how the feature map values are grouped in the destination nodes.

  • Weights:(可选)定义捆绑的初始权重。Weights: (optional) Defines the initial weights for the bundle. 值必须是浮点值的元组,其长度为内核数乘以每个内核的权重数,会在本文后面部分中定义。The value must be a tuple of floating point values with a length that is the number of kernels times the number of weights per kernel, as defined later in this article. 默认权重是随机生成的。The default weights are randomly generated.

有两组控制填充的属性,这两个属性相互排斥:There are two sets of properties that control padding, the properties being mutually exclusive:

  • Padding:(可选)确定是否应该使用默认填充方案来填充输入。Padding: (optional) Determines whether the input should be padded by using a default padding scheme. 值可以是单个布尔值或布尔值的元组(其长度为捆绑的实参数量)。The value can be a single Boolean value, or it can be a tuple of Boolean values with a length that is the arity of the bundle.

    单个布尔值扩展为正确长度的元组,所有组件都等于指定值。A single Boolean value is extended to be a tuple of the correct length with all components equal to the specified value.

    如果维度值为 True,则使用零值单元格将源按逻辑填充到维度以支持其他内核应用程序,从而使该维度中第一个和最后一个内核的中央节点称为源层的维度中的第一个和最后一个节点。If the value for a dimension is True, the source is logically padded in that dimension with zero-valued cells to support additional kernel applications, such that the central nodes of the first and last kernels in that dimension are the first and last nodes in that dimension in the source layer. 因此,每个维度中的“虚拟”节点数将自动确定,以使 (InputShape[d] - 1) / Stride[d] + 1 内核完全符合填充的源层。Thus, the number of "dummy" nodes in each dimension is determined automatically, to fit exactly (InputShape[d] - 1) / Stride[d] + 1 kernels into the padded source layer.

    如果维度值为 False,则将定义内核,使留出的每个端上的节点数都相同(最大差值为 1)。If the value for a dimension is False, the kernels are defined so that the number of nodes on each side that are left out is the same (up to a difference of 1). 此属性的默认值为一个元组,其所有组件都等于 False。The default value of this attribute is a tuple with all components equal to False.

  • UpperPadLowerPad:(可选)对大量要使用的填充提供更好的控制。UpperPad and LowerPad: (optional) Provide greater control over the amount of padding to use. 重要提示: 当且仅当没有定义上述 Padding 属性时,才能定义这些属性 。值必须是正整数值的元组,其长度为绑定的实参数量。指定这些属性后,“虚拟”节点将添加到输入层的每个维度的上下两端。每个维度的上下两端添加的节点数分别由 LowerPad[i] 和 UpperPad[i] 确定。Important: These attributes can be defined if and only if the Padding property above is *not_ defined. The values should be integer-valued tuples with lengths that are the arity of the bundle. When these attributes are specified, "dummy" nodes are added to the lower and upper ends of each dimension of the input layer. The number of nodes added to the lower and upper ends in each dimension is determined by _* LowerPad[i] and UpperPad[i] respectively.

    若要确保内核只对应“真实”节点而不是“虚拟”节点,则必须符合以下条件:To ensure that kernels correspond only to "real" nodes and not to "dummy" nodes, the following conditions must be met:

    • LowerPad 的每个组件必须严格小于 KernelShape[d]/2Each component of LowerPad must be strictly less than KernelShape[d]/2.

    • UpperPad 的每个组件不能大于 KernelShape[d]/2Each component of UpperPad must be no greater than KernelShape[d]/2.

    • 这些属性的默认值为一个元组,其所有组件都等于 0。The default value of these attributes is a tuple with all components equal to 0.

      设置 Padding = true 允许尽可能多的填充,使内核的“中心”保持在“真实”输入内。The setting Padding = true allows as much padding as is needed to keep the "center" of the kernel inside the "real" input. 这会对数学做出一些更改,以便计算输出大小。This changes the math a bit for computing the output size. 通常情况下,输出大小 D 计算为 D = (I - K) / S + 1,其中 I 是输入大小,K 是内核大小,S 是 stride,/ 表示整数除法(向零舍入)。Generally, the output size D is computed as D = (I - K) / S + 1, where I is the input size, K is the kernel size, S is the stride, and / is integer division (round toward zero). 如果设置 UpperPad = [1, 1],则输入大小 I 实际上是 29,因此 D = (29 - 5) / 2 + 1 = 13If you set UpperPad = [1, 1], the input size I is effectively 29, and thus D = (29 - 5) / 2 + 1 = 13. 但是,当 Padding = true 时,I 实际上增长 K - 1,因此 D = ((28 + 4) - 5) / 2 + 1 = 27 / 2 + 1 = 13 + 1 = 14However, when Padding = true, essentially I gets bumped up by K - 1; hence D = ((28 + 4) - 5) / 2 + 1 = 27 / 2 + 1 = 13 + 1 = 14. 通过为 UpperPadLowerPad 指定值,相比只设置 Padding = true,可以更好地控制填充。By specifying values for UpperPad and LowerPad you get much more control over the padding than if you just set Padding = true.

有关卷积网络及其应用程序的详细信息,请参阅这些文章:For more information about convolutional networks and their applications, see these articles:

池捆绑Pooling bundles

池捆绑 适用于类似卷积连接的几何,但是它使用源节点的预定义函数来派生目标节点值。A pooling bundle applies geometry similar to convolutional connectivity, but it uses predefined functions to source node values to derive the destination node value. 因此,池捆绑不具有可训练状态(权重或偏差)。Hence, pooling bundles have no trainable state (weights or biases). 池捆绑支持所有卷积属性,除了 SharingMapCountWeightsPooling bundles support all the convolutional attributes except Sharing, MapCount, and Weights.

通常情况下,按相邻池单位汇总的内核不会重叠。Typically, the kernels summarized by adjacent pooling units do not overlap. 如果在每个维度中 Stride[d] 等于KernelShape[d] ,那么获取的层为传统本地池层,其广泛应用于卷积神经网络。If Stride[d] is equal to KernelShape[d] in each dimension, the layer obtained is the traditional local pooling layer, which is commonly employed in convolutional neural networks. 每个目标节点将计算源层中其内核的最大活动数或平均值。Each destination node computes the maximum or the mean of the activities of its kernel in the source layer.

以下示例对池捆绑进行了说明:The following example illustrates a pooling bundle:

hidden P1 [5, 12, 12]
  from C1 max pool {
  InputShape  = [ 5, 24, 24];
   KernelShape = [ 1,  2,  2];
   Stride      = [ 1,  2,  2];
  }
  • 捆绑的实参数量为 3,也就是元组 InputShapeKernelShapeStride 的长度。The arity of the bundle is 3: that is, the length of the tuples InputShape, KernelShape, and Stride.
  • 源层中的节点数为 5 * 24 * 24 = 2880The number of nodes in the source layer is 5 * 24 * 24 = 2880.
  • 这是传统本地池层,因为 KernelShapeStride 相等。This is a traditional local pooling layer because KernelShape and Stride are equal.
  • 目标层中的节点数为 5 * 12 * 12 = 1440The number of nodes in the destination layer is 5 * 12 * 12 = 1440.

有关池层的详细信息,请参阅这些文章:For more information about pooling layers, see these articles:

响应规范化捆绑Response normalization bundles

响应规范化 是本地规范化方案,由 Geoffrey Hinton 等人在 ImageNet Classification with Deep Convolutional Neural Networks(深层卷积神经网络的 ImageNet 分类)一文中首次提出。Response normalization is a local normalization scheme that was first introduced by Geoffrey Hinton, et al, in the paper ImageNet Classification with Deep Convolutional Neural Networks.

响应规范化用于避免神经网络中的通用化。Response normalization is used to aid generalization in neural nets. 一个神经元在一个非常高的激活级别中激发时,本地响应规范化层将抑制周围神经元的激活级别。When one neuron is firing at a very high activation level, a local response normalization layer suppresses the activation level of the surrounding neurons. 这是通过使用三个参数(αβk)和一个卷积结构(或邻域形状)来完成的。This is done by using three parameters (α, β, and k) and a convolutional structure (or neighborhood shape). 目标层 y 中的每个神经元对应于源层中神经元 xEvery neuron in the destination layer y corresponds to a neuron x in the source layer. y 的激活级别由以下公式指定,其中 f 是神经元的激活级别,Nx 是内核(或包含 x 邻域中神经元的集),由以下卷积结构定义:The activation level of y is given by the following formula, where f is the activation level of a neuron, and Nx is the kernel (or the set that contains the neurons in the neighborhood of x), as defined by the following convolutional structure:

卷积结构的公式

响应规范化捆绑支持所有卷积属性,除了 SharingMapCountWeightsResponse normalization bundles support all the convolutional attributes except Sharing, MapCount, and Weights.

  • 如果内核包含与 x *相同的映射中的神经元,则规范化方案称为“相同映射规范化”。If the kernel contains neurons in the same map as *x_, the normalization scheme is referred to as _* same map normalization. 若要定义相同映射规范化,那么 InputShape 中的第一个坐标必须具有值 1.To define same map normalization, the first coordinate in InputShape must have the value 1.

  • 如果内核包含与 x *相同的空间位置中的神经元,但是神经元位于其他映射中,则规范化方案称为“跨映射规范化”。If the kernel contains neurons in the same spatial position as *x_, but the neurons are in other maps, the normalization scheme is called _* across maps normalization. 这种类型的响应规范化可实现一种横向抑制,其灵感来源于从真实神经元中发现的类型,可创建不同映射上计算的神经元输出之间的大激活级别的竞争。This type of response normalization implements a form of lateral inhibition inspired by the type found in real neurons, creating competition for big activation levels amongst neuron outputs computed on different maps. 若要定义跨映射规范化,第一个坐标必须是大于 1 且不大于映射数的正整数,其他坐标则必须具有值 1.To define across maps normalization, the first coordinate must be an integer greater than one and no greater than the number of maps, and the rest of the coordinates must have the value 1.

因为响应规范化捆绑应用源节点值的预定义函数以确定目标节点值,所以它们不具有可训练状态(权重或偏差)。Because response normalization bundles apply a predefined function to source node values to determine the destination node value, they have no trainable state (weights or biases).

备注

目标层中的节点对应于是内核的中央节点的神经元。The nodes in the destination layer correspond to neurons that are the central nodes of the kernels. 例如,如果 KernelShape[d] 为奇数,则 KernelShape[d]/2 对应于中央内核节点。For example, if KernelShape[d] is odd, then KernelShape[d]/2 corresponds to the central kernel node. 如果 KernelShape[d] 为偶数,则中央节点位于 KernelShape[d]/2 - 1If KernelShape[d] is even, the central node is at KernelShape[d]/2 - 1. 因此,如果 Padding[d] 为 False,则第一个和最后一个 KernelShape[d]/2 节点在目标层中没有对应节点。Therefore, if Padding[d] is False, the first and the last KernelShape[d]/2 nodes do not have corresponding nodes in the destination layer. 要避免这种情况,可将 Padding 定义为 [true, true, …, true]。To avoid this situation, define Padding as [true, true, …, true].

除了前面所述的四个属性,响应规范化捆绑还支持以下属性:In addition to the four attributes described earlier, response normalization bundles also support the following attributes:

  • Alpha:(必需)指定一个与前面公式中的 α 对应的浮点值。Alpha: (required) Specifies a floating-point value that corresponds to α in the previous formula.
  • Beta:(必需)指定一个与前面公式中的 β 对应的浮点值。Beta: (required) Specifies a floating-point value that corresponds to β in the previous formula.
  • Offset:(可选)指定一个与前面公式中的 k 对应的浮点值。Offset: (optional) Specifies a floating-point value that corresponds to k in the previous formula. 默认为 1。It defaults to 1.

下面的示例使用这些属性定义回应规范化捆绑:The following example defines a response normalization bundle using these attributes:

hidden RN1 [5, 10, 10]
from P1 response norm {
  InputShape  = [ 5, 12, 12];
  KernelShape = [ 1,  3,  3];
  Alpha = 0.001;
  Beta = 0.75;
  }
  • 源层包括五个映射,每个具有一个 12x12 维度,总计 1440 个节点。The source layer includes five maps, each with aof dimension of 12x12, totaling in 1440 nodes.
  • KernelShape 指示这是一个相同的映射规范化层,其中邻域为一个 3x3 矩形。The value of KernelShape indicates that this is a same map normalization layer, where the neighborhood is a 3x3 rectangle.
  • Padding 的默认值为 False,因此目标层的每个维度中只有 10 个节点。The default value of Padding is False, thus the destination layer has only 10 nodes in each dimension. 要包括一个与源层中每个节点对应的目标层中的节点,可添加 Padding = [true, true, true];然后将 RN1 的大小更改为 [5, 12, 12]。To include one node in the destination layer that corresponds to every node in the source layer, add Padding = [true, true, true]; and change the size of RN1 to [5, 12, 12].

共享声明Share declaration

Net # 可选择支持定义具有共享权重的多个捆绑。Net# optionally supports defining multiple bundles with shared weights. 如果任意两个捆绑的结构相同,则其权重可以共享。The weights of any two bundles can be shared if their structures are the same. 以下语法可定义具有共享权重的捆绑:The following syntax defines bundles with shared weights:

share-declaration:
  share    {    layer-list    }
  share    {    bundle-list    }
  share    {    bias-list    }

  layer-list:
    layer-name    ,    layer-name
    layer-list    ,    layer-name

  bundle-list:
    bundle-spec    ,    bundle-spec
    bundle-list    ,    bundle-spec

  bundle-spec:
    layer-name    =>     layer-name

  bias-list:
    bias-spec    ,    bias-spec
    bias-list    ,    bias-spec

  bias-spec:
    1    =>    layer-name

  layer-name:
    identifier

例如,以下共享声明指定层名称,指示应共享两个权重和偏差:For example, the following share-declaration specifies the layer names, indicating that both weights and biases should be shared:

Const {
  InputSize = 37;
  HiddenSize = 50;
  }
input {
  Data1 [InputSize];
  Data2 [InputSize];
  }
hidden {
  H1 [HiddenSize] from Data1 all;
  H2 [HiddenSize] from Data2 all;
  }
output Result [2] {
  from H1 all;
  from H2 all;
  }
share { H1, H2 } // share both weights and biases
  • 输入功能划分为两个相等大小的输入层。The input features are partitioned into two equal sized input layers.
  • 隐藏层则计算两个输入层上的更高级别的功能。The hidden layers then compute higher level features on the two input layers.
  • 共享声明指定 H1H2 必须从其各自的输入中以相同的方式进行计算。The share-declaration specifies that H1 and H2 must be computed in the same way from their respective inputs.

或者,可以使用两个单独的共享声明来指定,如下所示:Alternatively, this could be specified with two separate share-declarations as follows:

share { Data1 => H1, Data2 => H2 } // share weights
<!-- -->
    share { 1 => H1, 1 => H2 } // share biases

只有在层包含单个捆绑时,才可以使用缩写形式。You can use the short form only when the layers contain a single bundle. 一般情况下,只有在相关结构相同时(即具有相同大小、相同卷积几何等),才能共享。In general, sharing is possible only when the relevant structure is identical, meaning that they have the same size, same convolutional geometry, and so forth.

Net # 用法的示例Examples of Net# usage

本部分提供了一些示例,可了解如何使用 Net# 添加隐藏层,定义隐藏层与其它层交互的方式,以及生成卷积网络。This section provides some examples of how you can use Net# to add hidden layers, define the way that hidden layers interact with other layers, and build convolutional networks.

定义一个简单的自定义神经网络:“Hello World”示例Define a simple custom neural network: "Hello World" example

这个简单的示例演示了如何创建具有单个隐藏层的神经网络模型。This simple example demonstrates how to create a neural network model that has a single hidden layer.

input Data auto;
hidden H [200] from Data all;
output Out [10] sigmoid from H all;

该示例阐释了一些基本的命令,如下所示:The example illustrates some basic commands as follows:

  • 第一行定义了输入层(名为 Data)。The first line defines the input layer (named Data). 使用 auto 关键字时,神经网络会自动包括输入示例中的所有功能列。When you use the auto keyword, the neural network automatically includes all feature columns in the input examples.
  • 第二行创建隐藏层。The second line creates the hidden layer. 为该隐藏层指定了名称 H,该层有 200 个节点。The name H is assigned to the hidden layer, which has 200 nodes. 这一层完全连接到输入层。This layer is fully connected to the input layer.
  • 第三行定义了输出层(名为 Out),其中包含 10 个输出节点。The third line defines the output layer (named Out), which contains 10 output nodes. 如果神经网络用于分类,则每个类有一个输出节点。If the neural network is used for classification, there is one output node per class. 关键字 sigmoid 指示输出函数将应用于输出层。The keyword sigmoid indicates that the output function is applied to the output layer.

定义多个隐藏层:计算机影像示例Define multiple hidden layers: computer vision example

下面的示例将演示如何使用多个自定义隐藏层,定义稍微复杂一些的神经网络。The following example demonstrates how to define a slightly more complex neural network, with multiple custom hidden layers.

// Define the input layers
input Pixels [10, 20];
input MetaData [7];

// Define the first two hidden layers, using data only from the Pixels input
hidden ByRow [10, 12] from Pixels where (s,d) => s[0] == d[0];
hidden ByCol [5, 20] from Pixels where (s,d) => abs(s[1] - d[1]) <= 1;

// Define the third hidden layer, which uses as source the hidden layers ByRow and ByCol
hidden Gather [100]
{
from ByRow all;
from ByCol all;
}

// Define the output layer and its sources
output Result [10]
{
from Gather all;
from MetaData all;
}

此示例说明了神经网络规范语言的多个功能:This example illustrates several features of the neural networks specification language:

  • 此结构中包含两个输入层:PixelsMetaDataThe structure has two input layers, Pixels and MetaData.
  • Pixels 层是两个连接捆绑的源层(目标层为 ByRowByCol)。The Pixels layer is a source layer for two connection bundles, with destination layers, ByRow and ByCol.
  • GatherResult 是多个连接捆绑中的目标层。The layers Gather and Result are destination layers in multiple connection bundles.
  • 输出层 Result 是两个连接捆绑中的目标层;一个以第二级隐藏层 Gather 作为目标层,另一个以输入层 MetaData 作为目标层。The output layer, Result, is a destination layer in two connection bundles; one with the second level hidden layer Gather as a destination layer, and the other with the input layer MetaData as a destination layer.
  • 隐藏层 ByRowByCol 使用谓词表达式指定经筛选的连接。The hidden layers, ByRow and ByCol, specify filtered connectivity by using predicate expressions. 更确切地说,ByRow 中 [x, y] 处的节点连接到 Pixels 中第一个索引坐标等于节点的第一个坐标 x 的节点。More precisely, the node in ByRow at [x, y] is connected to the nodes in Pixels that have the first index coordinate equal to the node's first coordinate, x. 同样,ByCol 中 [x, y] 处的节点连接到 Pixels 中第二个索引坐标位于节点的第二个坐标 y 内的节点。Similarly, the node in ByCol at [x, y] is connected to the nodes in Pixels that have the second index coordinate within one of the node's second coordinate, y.

为多类分类定义卷积网络:数字识别示例Define a convolutional network for multiclass classification: digit recognition example

以下网络的定义旨在识别数字,并说明了自定义神经网络的一些高级技术。The definition of the following network is designed to recognize numbers, and it illustrates some advanced techniques for customizing a neural network.

input Image [29, 29];
hidden Conv1 [5, 13, 13] from Image convolve
  {
  InputShape  = [29, 29];
  KernelShape = [ 5,  5];
  Stride      = [ 2,  2];
  MapCount    = 5;
  }
hidden Conv2 [50, 5, 5]
from Conv1 convolve
  {
  InputShape  = [ 5, 13, 13];
  KernelShape = [ 1,  5,  5];
  Stride      = [ 1,  2,  2];
  Sharing     = [false, true, true];
  MapCount    = 10;
  }
hidden Hid3 [100] from Conv2 all;
output Digit [10] from Hid3 all;
  • 此结构具有单个输入层 ImageThe structure has a single input layer, Image.

  • 关键字 convolve 指示名为 Conv1Conv2 的层是卷积层。The keyword convolve indicates that the layers named Conv1 and Conv2 are convolutional layers. 每个这些层声明都后跟一个卷积属性列表。Each of these layer declarations is followed by a list of the convolution attributes.

  • Net 具有第三个隐藏层 Hid3,该层完全连接到第二个隐藏层 Conv2The net has a third hidden layer, Hid3, which is fully connected to the second hidden layer, Conv2.

  • 输出层 Digit 仅连接到第三个隐藏层 Hid3The output layer, Digit, is connected only to the third hidden layer, Hid3. 关键字 all 指示输出层完全连接到 Hid3The keyword all indicates that the output layer is fully connected to Hid3.

  • 卷积的参数数量为 3:元组 InputShapeKernelShapeStrideSharing 的长度。The arity of the convolution is three: the length of the tuples InputShape, KernelShape, Stride, and Sharing.

  • 每个内核的权重数为 1 + KernelShape\[0] * KernelShape\[1] * KernelShape\[2] = 1 + 1 * 5 * 5 = 26The number of weights per kernel is 1 + KernelShape\[0] * KernelShape\[1] * KernelShape\[2] = 1 + 1 * 5 * 5 = 26. 26 * 50 = 1300Or 26 * 50 = 1300.

  • 可以计算每个隐藏层中的节点,如下所示:You can calculate the nodes in each hidden layer as follows:

    NodeCount\[0] = (5 - 1) / 1 + 1 = 5 NodeCount\[1] = (13 - 5) / 2 + 1 = 5NodeCount\[0] = (5 - 1) / 1 + 1 = 5 NodeCount\[1] = (13 - 5) / 2 + 1 = 5 NodeCount\[2] = (13 - 5) / 2 + 1 = 5

  • 可通过使用层的声明维数 [50, 5, 5] 来计算节点总数,如下所示:MapCount * NodeCount\[0] * NodeCount\[1] * NodeCount\[2] = 10 * 5 * 5 * 5The total number of nodes can be calculated by using the declared dimensionality of the layer, [50, 5, 5], as follows: MapCount * NodeCount\[0] * NodeCount\[1] * NodeCount\[2] = 10 * 5 * 5 * 5

  • 因为只有 d == 0Sharing[d] 为 False,因此内核数为 MapCount * NodeCount\[0] = 10 * 5 = 50Because Sharing[d] is False only for d == 0, the number of kernels is MapCount * NodeCount\[0] = 10 * 5 = 50.

致谢Acknowledgements

用于自定义神经网络体系结构的 Net# 语言由 Microsoft 的 Shon Katzenberger(架构师,机器学习)和 Alexey Kamenev(软件工程师,Microsoft Research)开发。The Net# language for customizing the architecture of neural networks was developed at Microsoft by Shon Katzenberger (Architect, Machine Learning) and Alexey Kamenev (Software Engineer, Microsoft Research). 在内部,其用于机器学习项目和应用程序,其范围包括从映像检测到文本分析。It is used internally for machine learning projects and applications ranging from image detection to text analytics. 有关详细信息,请参阅 Neural Nets in Azure Machine Learning studio - Introduction to Net#(Azure 机器学习工作室中的神经网络 - Net# 简介)For more information, see Neural Nets in Azure Machine Learning studio - Introduction to Net#