定义 Azure 机器学习工作室(经典版)的自定义 R 模块Define custom R modules for Azure Machine Learning Studio (classic)

本主题介绍如何创作和部署自定义 R 工作室(经典版)。This topic describes how to author and deploy a custom R Studio (classic). 解释什么是自定义 R 模块以及要使用什么文件来定义此类模块,It explains what custom R modules are and what files are used to define them. 并举例说明如何构建定义模块的文件以及如何在机器学习工作区中进行模块部署注册。It illustrates how to construct the files that define a module and how to register the module for deployment in a Machine Learning workspace. 随后本主题会详细说明用于定义自定义模块的元素和属性。The elements and attributes used in the definition of the custom module are then described in more detail. 本主题还会讨论如何使用辅助功能和文件以及多个输出。How to use auxiliary functionality and files and multiple outputs is also discussed.

自定义模块是用户定义的模块,可上传至工作区并在 Azure 机器学习工作室(经典版)实验中运行 。A custom module is a user-defined module that can be uploaded to your workspace and executed as part of Azure Machine Learning Studio (classic) experiment. 自定义 R 模块是一种可执行用户定义的 R 函数的自定义模块。A custom R module is a custom module that executes a user-defined R function. R 是一种用于统计计算和图形的编程语言,统计和科学工作者将其广泛用于运行算法。R is a programming language for statistical computing and graphics that is widely used by statisticians and data scientists for implementing algorithms. 目前,R 是自定义模块支持的唯一语言;但根据计划,未来发布的版本会支持其他语言。Currently, R is the only language supported in custom modules, but support for additional languages is scheduled for future releases.

自定义模型在 Azure 机器学习工作室(经典版)中具有优先地位,因为你可以像使用任何其他模块一样使用自定义模块 。Custom modules have first-class status in Azure Machine Learning Studio (classic) in the sense that they can be used just like any other module. 它们可与其他模块一起运行,可包含在已发布实验或可视化中。They can be executed with other modules, included in published experiments or in visualizations. 可控制模块实施的算法、使用的输入和输出端口、建模参数以及各种其他运行时行为。You have control over the algorithm implemented by the module, the input and output ports to be used, the modeling parameters, and other various runtime behaviors. 包含自定义模块的实验也可发布到 Azure AI 库,以实现轻松共享。An experiment that contains custom modules can also be published into the Azure AI Gallery for easy sharing.

自定义 R 模块中的文件Files in a custom R module

自定义 R 模块由 .zip 文件定义,该文件至少包含两个文件:A custom R module is defined by a .zip file that contains, at a minimum, two files:

  • 一个可实施模块公开的 R 函数的源文件 A source file that implements the R function exposed by the module
  • 一个描述自定义模块接口的 XML 定义文件 An XML definition file that describes the custom module interface

也可在 .zip 文件中包含提供功能的辅助文件,并通过自定义模块访问这些功能。Additional auxiliary files can also be included in the .zip file that provides functionality that can be accessed from the custom module. 快速入门示例下方的“XML 定义文件中的元素” 的“参数” 部分对此选项进行了讨论。This option is discussed in the Arguments part of the reference section Elements in the XML definition file following the quickstart example.

快速入门示例:定义、包和注册自定义模块Quickstart example: define, package, and register a custom R module

此示例阐述如何构建自定义 R 模块需要的文件,如何将文件打包到一个 zip 文件,然后在机器学习工作区中注册该模块。This example illustrates how to construct the files required by a custom R module, package them into a zip file, and then register the module in your Machine Learning workspace. 可从下载 CustomAddRows.zip 文件下载示例 zip 包和示例文件。The example zip package and sample files can be downloaded from Download CustomAddRows.zip file.

源文件The source file

Custom Add Rows 模块为例,该模块可修改 Add Rows 模块的标准实施,后者用于连接来自两个数据集(数据帧)的行(观察值)。Consider the example of a Custom Add Rows module that modifies the standard implementation of the Add Rows module used to concatenate rows (observations) from two datasets (data frames). 标准 Add Rows 模块会使用 rbind 算法将第二个输入数据集的行附加到第一个输入数据集的末尾。The standard Add Rows module appends the rows of the second input dataset to the end of the first input dataset using the rbind algorithm. 自定义 CustomAddRows 函数同样会接受两个数据集,但它还会接受布尔交换参数作为一个额外的输入。The customized CustomAddRows function similarly accepts two datasets, but also accepts a Boolean swap parameter as an additional input. 如果交换参数设置为 FALSE,函数会返回与标准实现相同的数据集。If the swap parameter is set to FALSE, it returns the same data set as the standard implementation. 但如果交换参数为 TRUE,则函数会转而将第一个输入数据集的行附加到第二个数据集的末尾。But if the swap parameter is TRUE, the function appends rows of first input dataset to the end of the second dataset instead. CustomAddRows.R 文件包含 Custom Add Rows 模块公开的 R CustomAddRows 函数的实施,该文件的 R 代码如下。The CustomAddRows.R file that contains the implementation of the R CustomAddRows function exposed by the Custom Add Rows module has the following R code.

CustomAddRows <- function(dataset1, dataset2, swap=FALSE) 
{
    if (swap)
    {
        return (rbind(dataset2, dataset1));
    }
    else
    {
        return (rbind(dataset1, dataset2));
    } 
} 

XML 定义文件The XML definition file

要将此 CustomAddRows 函数作为 Azure 机器学习工作室(经典版)模块公开,必须创建 XML 定义文件来指定“自定义添加行”模块的外观和行为 。To expose this CustomAddRows function as the Azure Machine Learning Studio (classic) module, an XML definition file must be created to specify how the Custom Add Rows module should look and behave.

<!-- Defined a module using an R Script -->
<Module name="Custom Add Rows">
    <Owner>Microsoft Corporation</Owner>
    <Description>Appends one dataset to another. Dataset 2 is concatenated to Dataset 1 when Swap is FALSE, and vice versa when Swap is TRUE.</Description>

<!-- Specify the base language, script file and R function to use for this module. -->        
    <Language name="R" 
     sourceFile="CustomAddRows.R" 
     entryPoint="CustomAddRows" />  

<!-- Define module input and output ports -->
<!-- Note: The values of the id attributes in the Input and Arg elements must match the parameter names in the R Function CustomAddRows defined in CustomAddRows.R. -->
    <Ports>
        <Input id="dataset1" name="Dataset 1" type="DataTable">
            <Description>First input dataset</Description>
        </Input>
        <Input id="dataset2" name="Dataset 2" type="DataTable">
            <Description>Second input dataset</Description>
        </Input>
        <Output id="dataset" name="Dataset" type="DataTable">
            <Description>The combined dataset</Description>
        </Output>
    </Ports>

<!-- Define module parameters -->
    <Arguments>
        <Arg id="swap" name="Swap" type="bool" >
            <Description>Swap input datasets.</Description>
        </Arg>
    </Arguments>
</Module>

但要注意,XML 文件中 InputArg 元素的 ID 属性的值必须与 CustomAddRows.R 文件中 R 代码的函数参数名完全匹配:(示例中的 dataset1dataset2swap)。It is critical to note that the value of the id attributes of the Input and Arg elements in the XML file must match the function parameter names of the R code in the CustomAddRows.R file EXACTLY: (dataset1, dataset2, and swap in the example). 同样,Language 元素的 entryPoint 属性值必须与 R 脚本中的函数名完全匹配 :(示例中的 CustomAddRows) 。Similarly, the value of the entryPoint attribute of the Language element must match the name of the function in the R script EXACTLY: (CustomAddRows in the example).

与此相反,Output 元素的 ID 属性与 R 脚本中的任何变量都不相符。In contrast, the id attribute for the Output element does not correspond to any variables in the R script. 需要多个输出时,只需从 R 函数返回一个列表,列表中的结果需与 XML 文件中公开的 Outputs 元素顺序相同 。When more than one output is required, simply return a list from the R function with results placed in the same order as Outputs elements are declared in the XML file.

包和注册模块Package and register the module

将两个文件另存为 CustomAddRows.R 和 CustomAddRows.xml ,然后将两个文件一起压缩为 CustomAddRows.zip 文件。Save these two files as CustomAddRows.R and CustomAddRows.xml and then zip the two files together into a CustomAddRows.zip file.

要在机器学习工作区中进行注册,请转到 Azure 机器学习工作室(经典版)中的工作区,单击底部的“+新建” 按钮,并选择“模块”->“从 ZIP 包” ,上传新的自定义添加行模块。To register them in your Machine Learning workspace, go to your workspace in Azure Machine Learning Studio (classic), click the +NEW button on the bottom and choose MODULE -> FROM ZIP PACKAGE to upload the new Custom Add Rows module.

上传 Zip

现在即可通过机器学习实验访问 Custom Add Rows 模块。The Custom Add Rows module is now ready to be accessed by your Machine Learning experiments.

XML 定义文件中的元素Elements in the XML definition file

Module 元素Module elements

Module 元素用于在 XML 文件中定义自定义模块。The Module element is used to define a custom module in the XML file. 可使用多个 Module 元素在一个 XML 文件中定义多个模块。Multiple modules can be defined in one XML file using multiple module elements. 工作区中的每个模块必须具有唯一的名称。Each module in your workspace must have a unique name. 如果使用与现有自定义模块相同的名称注册自定义模块,新的模块会替代现有模块。Register a custom module with the same name as an existing custom module and it replaces the existing module with the new one. 但自定义模块可使用与现有 Azure 机器学习工作室(经典版)模块相同的名称进行注册。Custom modules can, however, be registered with the same name as an existing Azure Machine Learning Studio (classic) module. 此时模块会出现在模块面板的“自定义” 类别中。If so, they appear in the Custom category of the module palette.

<Module name="Custom Add Rows" isDeterministic="false"> 
    <Owner>Microsoft Corporation</Owner>
    <Description>Appends one dataset to another...</Description>/> 

可在 Module 元素中指定另外两个可选元素:Within the Module element, you can specify two additional optional elements:

  • 嵌入模块的 Owner 元素an Owner element that is embedded into the module
  • Description 元素,该元素包含在模块快速帮助上显示的文本,将鼠标悬停在机器学习 UI 中时,会出现该文本。a Description element that contains text that is displayed in quick help for the module and when you hover over the module in the Machine Learning UI.

模块元素字符限制规则:Rules for characters limits in the Module elements:

  • Module 元素中 name 属性的值长度不能超过 64 个字符。The value of the name attribute in the Module element must not exceed 64 characters in length.
  • Description 元素的内容长度不能超过 128 个字符。The content of the Description element must not exceed 128 characters in length.
  • Owner 元素的内容长度不能超过 32 个字符。The content of the Owner element must not exceed 32 characters in length.

模块结果可以是确定或不确定的。默认情况下,所有模块都被视为是确定的。A module's results can be deterministic or nondeterministic. By default, all modules are considered to be deterministic. 也就是说,给定一组不变的输入参数和数据,模块应返回相同的结果 eacRAND 或返回其运行的函数时间。That is, given an unchanging set of input parameters and data, the module should return the same results eacRAND or a function time it is run. 考虑到此种情况,只有在参数或输入数据发生改变时,Azure 机器学习工作室(经典版)才会重新运行标记为确定的模块。Given this behavior, Azure Machine Learning Studio (classic) only reruns modules marked as deterministic if a parameter or the input data has changed. 通过返回已缓存的结果,实验执行速度也可以得到大幅提高。Returning the cached results also provides much faster execution of experiments.

有一些函数是不确定的,例如 RAND 或返回当前日期或时间的函数。There are functions that are nondeterministic, such as RAND or a function that returns the current date or time. 如果模块使用不确定的函数,则可将可选 isDeterministic 属性设置为 FALSE,将模块指定为不确定。If your module uses a nondeterministic function, you can specify that the module is non-deterministic by setting the optional isDeterministic attribute to FALSE. 这可确保无论何时运行实验,模块都会重新运行,即使模块输入和参数没有发生改变。This insures that the module is rerun whenever the experiment is run, even if the module input and parameters have not changed.

语言定义Language Definition

XML 定义文件中的 Language 元素用于指定自定义模块的语言。The Language element in your XML definition file is used to specify the custom module language. 目前,R 是唯一受支持的语言。Currently, R is the only supported language. sourceFile 属性的值必须是包含运行模块时要调用的函数的 R 文件的名称。The value of the sourceFile attribute must be the name of the R file that contains the function to call when the module is run. 此文件必须是 zip 包的一部分。This file must be part of the zip package. entryPoint 属性的值是被调用的函数名,且必须与源文件中定义的一个有效函数相匹配。The value of the entryPoint attribute is the name of the function being called and must match a valid function defined with in the source file.

<Language name="R" sourceFile="CustomAddRows.R" entryPoint="CustomAddRows" />

端口Ports

自定义模块的输入和输出端口是在 XML 定义文件 Ports 部分的子元素中进行定义的。The input and output ports for a custom module are specified in child elements of the Ports section of the XML definition file. 这些元素的顺序决定了用户看到的布局 (UX)。The order of these elements determines the layout experienced (UX) by users. XML 文件 Ports 元素中列出的第一个子 inputoutput 会成为机器学习 UX 最左边的输入端口。The first child input or output listed in the Ports element of the XML file becomes the left-most input port in the Machine Learning UX. 每个输入和输出端口可以有一个可选 Description 子元素,用于指定鼠标悬停在机器学习 UI 中的端口时显示的文本。Each input and output port may have an optional Description child element that specifies the text shown when you hover the mouse cursor over the port in the Machine Learning UI.

端口规则Ports Rules:

  • 输入和输出端口 的最大数量各自为 8 个端口。Maximum number of input and output ports is 8 for each.

输入元素Input elements

通过输入端口可向 R 函数和工作区传送数据。Input ports allow you to pass data to your R function and workspace. 输入端口支持的数据类型 如下:The data types that are supported for input ports are as follows:

DataTable: 此类数据会作为 data.frame 传送到 R 函数。DataTable: This type is passed to your R function as a data.frame. 事实上,受机器学习支持并且与 DataTable 兼容的所有类型(例如,CSV 文件或 ARFF 文件)都将自动转换为 data.frame。In fact, any types (for example, CSV files or ARFF files) that are supported by Machine Learning and that are compatible with DataTable are converted to a data.frame automatically.

    <Input id="dataset1" name="Input 1" type="DataTable" isOptional="false">
        <Description>Input Dataset 1</Description>
       </Input>

与每个 DataTable 输入端口相关联的 ID 属性必须拥有一个唯一的值且此值必须与 R 函数中与之对应的命名参数相匹配。The id attribute associated with each DataTable input port must have a unique value and this value must match its corresponding named parameter in your R function. 未在实验中作为输入传送的可选 DataTable 端口会向 R 函数传送 NULL 值,且如果未连接输入,则会忽略可选 zip 端口。Optional DataTable ports that are not passed as input in an experiment have the value NULL passed to the R function and optional zip ports are ignored if the input is not connected. isOptional 属性为 DataTableZIP 类型的可选属性,默认为 FALSE 。The isOptional attribute is optional for both the DataTable and Zip types and is false by default.

Zip: 自定义模块可接受 zip 文件作为输入。Zip: Custom modules can accept a zip file as input. 此输入会解压缩到函数的 R 工作目录中This input is unpacked into the R working directory of your function

    <Input id="zippedData" name="Zip Input" type="Zip" IsOptional="false">
        <Description>Zip files to be extracted to the R working directory.</Description>
       </Input>

对于自定义 R 模块,Zip 端口的 ID 无需与 R 函数的任何参数匹配。For custom R modules, the ID for a Zip port does not have to match any parameters of the R function. 这是因为 zip 文件会自动提取到 R 工作目录。This is because the zip file is automatically extracted to the R working directory.

输入的规则:Input Rules:

  • Input 元素的 ID 属性值必须为有效的 R 变量名称。The value of the id attribute of the Input element must be a valid R variable name.
  • Input 元素的 ID 属性值不可超过 64 个字符。The value of the id attribute of the Input element must not be longer than 64 characters.
  • Input 元素的 name 属性值不可超过 64 个字符。The value of the name attribute of the Input element must not be longer than 64 characters.
  • Description 元素的内容不能超过 128 个字符The content of the Description element must not be longer than 128 characters
  • Input 元素的 type 属性值必须是 Zip 或 DataTable 。The value of the type attribute of the Input element must be Zip or DataTable.
  • Input 元素的 isOptional 属性值并非必须指定(且未指定时,默认为 false );但如果指定,则属性值必须为 true 或 false 。The value of the isOptional attribute of the Input element is not required (and is false by default when not specified); but if it is specified, it must be true or false.

输出元素Output elements

标准输出端口: 输出端口将映射到 R 函数返回的值上,可用于后续模块。Standard output ports: Output ports are mapped to the return values from your R function, which can then be used by subsequent modules. DataTable 是当前支持的唯一标准输出端口类型。DataTable is the only standard output port type supported currently. (随后会支持 Learners 和 Transforms 。)DataTable 输出的定义为:(Support for Learners and Transforms is forthcoming.) A DataTable output is defined as:

<Output id="dataset" name="Dataset" type="DataTable">
    <Description>Combined dataset</Description>
</Output>

对于自定义 R 模块中的输出而言,ID 属性值无需与 R 脚本中的任何内容相对应,但该值必须唯一。For outputs in custom R modules, the value of the id attribute does not have to correspond with anything in the R script, but it must be unique. 对于单个模块输出,R 函数返回的值必须为 data.frameFor a single module output, the return value from the R function must be a data.frame. 若要输出某个受支持数据类型的多个对象,需在 XML 定义文件中指定合适的端口且对象必须作为列表返回。In order to output more than one object of a supported data type, the appropriate output ports need to be specified in the XML definition file and the objects need to be returned as a list. 从左至右分配输出对象到输出端口,反映返回列表中对象的排列顺序。The output objects are assigned to output ports from left to right, reflecting the order in which the objects are placed in the returned list.

例如,如果想要修改 Custom Add Rows 模块,输出原始的两个数据集,dataset1dataset2,以及新联接的数据集 dataset(排列顺序从左至右依次是:datasetdataset1dataset2),请按如下所示在 CustomAddRows.xml 文件中定义输出端口:For example, if you want to modify the Custom Add Rows module to output the original two datasets, dataset1 and dataset2, in addition to the new joined dataset, dataset, (in an order, from left to right, as: dataset, dataset1, dataset2), then define the output ports in the CustomAddRows.xml file as follows:

<Ports> 
    <Output id="dataset" name="Dataset Out" type="DataTable"> 
        <Description>New Dataset</Description> 
    </Output> 
    <Output id="dataset1_out" name="Dataset 1 Out" type="DataTable"> 
        <Description>First Dataset</Description> 
    </Output> 
    <Output id="dataset2_out" name="Dataset 2 Out" type="DataTable"> 
        <Description>Second Dataset</Description> 
    </Output> 
    <Input id="dataset1" name="Dataset 1" type="DataTable"> 
        <Description>First Input Table</Description>
    </Input> 
    <Input id="dataset2" name="Dataset 2" type="DataTable"> 
        <Description>Second Input Table</Description> 
    </Input> 
</Ports> 

在“CustomAddRows.R”中以列表形式按正确的顺序返回列出的对象:And return the list of objects in a list in the correct order in ‘CustomAddRows.R’:

CustomAddRows <- function(dataset1, dataset2, swap=FALSE) { 
    if (swap) { dataset <- rbind(dataset2, dataset1)) } 
    else { dataset <- rbind(dataset1, dataset2)) 
    } 
return (list(dataset, dataset1, dataset2)) 
} 

可视化输出: 还可以指定 Visualization 类型的输出端口,该端口可显示 R 图形设备输出和控制台输出 。Visualization output: You can also specify an output port of type Visualization, which displays the output from the R graphics device and console output. 此端口不包含在 R 函数输出中且不会干扰其他输出端口类型的顺序。This port is not part of the R function output and does not interfere with the order of the other output port types. 若要添加可视化端口到自定义模块,添加一个 type 属性的值为 VisualizationOutput 元素:To add a visualization port to the custom modules, add an Output element with a value of Visualization for its type attribute:

<Output id="deviceOutput" name="View Port" type="Visualization">
  <Description>View the R console graphics device output.</Description>
</Output>

输出规则:Output Rules:

  • Output 元素的 ID 属性值必须为有效的 R 变量名称。The value of the id attribute of the Output element must be a valid R variable name.
  • Output 元素的 ID 属性值不可超过 32 个字符。The value of the id attribute of the Output element must not be longer than 32 characters.
  • Output 元素的 name 属性值不可超过 64个字符。The value of the name attribute of the Output element must not be longer than 64 characters.
  • Output 元素的 type 属性值必须是 VisualizationThe value of the type attribute of the Output element must be Visualization.

参数Arguments

通过在 Arguments 元素中定义的模块参数可向 R 函数传送其他数据。Additional data can be passed to the R function via module parameters which are defined in the Arguments element. 选择该模块时,这些参数将出现在机器学习 UI 最右侧的“属性”窗格中。These parameters appear in the rightmost properties pane of the Machine Learning UI when the module is selected. 参数可以是任何受支持的类型,或者可按需创建自定义枚举。Arguments can be any of the supported types or you can create a custom enumeration when needed. Ports 元素相似,Arguments 元素可拥有一个可选 Description 元素,该元素可指定鼠标悬停在参数名上时会出现的文本。Similar to the Ports elements, Arguments elements can have an optional Description element that specifies the text that appears when you hover the mouse over the parameter name. 可将 defaultValue、minValue 和 maxValue 等模块可选属性作为 Properties 元素的属性添加到任意参数。Optional properties for a module, such as defaultValue, minValue, and maxValue can be added to any argument as attributes to a Properties element. Properties 元素的有效属性取决于参数类型,这些属性将在下一节的受支持参数中进行说明。Valid properties for the Properties element depend on the argument type and are described with the supported argument types in the next section. 对于 isOptional 属性设置为 "true" 的参数,用户无需输入值。Arguments with the isOptional property set to "true" do not require the user to enter a value. 如果未向参数提供值,那么不会将该参数传送到入口点函数。If a value is not provided to the argument, then the argument is not passed to the entry point function. 函数需要对入口点函数的可选参数进行显式处理,例如,向入口点函数定义分配一个默认的 NULL 值。Arguments of the entry point function that are optional need to be explicitly handled by the function, e.g. assigned a default value of NULL in the entry point function definition. 如果用户提供了值,可选参数仅会执行其他参数约束,即最小或最大。An optional argument will only enforce the other argument constraints, i.e. min or max, if a value is provided by the user. 与输入和输出一样,每个参数必须有与其关联的唯一 ID 值。As with inputs and outputs, it is critical that each of the parameters have unique ID values associated with them. 在快速入门示例中,关联的 ID/参数为 swap 。In our quickstart example the associated id/parameter was swap.

Arg 元素Arg element

使用 XML 定义文件 Arguments 部分的 Arg 子元素定义模块参数。A module parameter is defined using the Arg child element of the Arguments section of the XML definition file. Ports 部分的子元素一样,Arguments 部分的参数顺序可定义 UX 中的布局。As with the child elements in the Ports section, the ordering of parameters in the Arguments section defines the layout encountered in the UX. 参数在 UI 中从上到下排列,排列顺序与其在 XML 文件中进行定义的顺序相同。The parameters appear from top down in the UI in the same order in which they are defined in the XML file. 此处列出了机器学习支持的参数类型。The types supported by Machine Learning for parameters are listed here.

int – 整数(32 位)类型参数。int – an Integer (32-bit) type parameter.

<Arg id="intValue1" name="Int Param" type="int">
    <Properties min="0" max="100" default="0" />
    <Description>Integer Parameter</Description>
</Arg>
  • 可选属性 :minmaxdefaultisOptionalOptional Properties: min, max, default and isOptional

double – 双精度浮点数类型参数。double – a double type parameter.

<Arg id="doubleValue1" name="Double Param" type="double">
    <Properties min="0.000" max="0.999" default="0.3" />
    <Description>Double Parameter</Description>
</Arg>
  • 可选属性 :minmaxdefaultisOptionalOptional Properties: min, max, default and isOptional

bool – 由 UX 中复选框表示的布尔参数。bool – a Boolean parameter that is represented by a check-box in UX.

<Arg id="boolValue1" name="Boolean Param" type="bool">
    <Properties default="true" />
    <Description>Boolean Parameter</Description>
</Arg>
  • 可选属性 :default - 若未设置则为 falseOptional Properties: default - false if not set

string:标准字符串string: a standard string

<Arg id="stringValue1" name="My string Param" type="string">
    <Properties isOptional="true" />
    <Description>String Parameter 1</Description>
</Arg>    
  • 可选属性 :defaultisOptionalOptional Properties: default and isOptional

ColumnPicker:列选择参数。ColumnPicker: a column selection parameter. 此类型在 UX 中显示为列选择器。This type renders in the UX as a column chooser. 此处的 Property 元素用于指定要从其中选择列的端口的 ID;目标端口类型必须为 DataTableThe Property element is used here to specify the ID of the port from which columns are selected, where the target port type must be DataTable. 列选择结果以字符串列表的形式传送到 R 函数,其中包含选中的列的名称。The result of the column selection is passed to the R function as a list of strings containing the selected column names.

    <Arg id="colset" name="Column set" type="ColumnPicker">      
      <Properties portId="datasetIn1" allowedTypes="Numeric" default="NumericAll"/>
      <Description>Column set</Description>
    </Arg>
  • 必需属性 :portId - 将 Input 元素 ID 与 DataTable 类型相匹配。Required Properties: portId - matches the ID of an Input element with type DataTable.

  • 可选属性 :Optional Properties:

    • allowedTypes - 筛选可选的列类型。allowedTypes - Filters the column types from which you can pick. 有效值包括:Valid values include:

      • NumericNumeric
      • BooleanBoolean
      • CategoricalCategorical
      • StringString
      • LabelLabel
      • FeatureFeature
      • ScoreScore
      • AllAll
    • default - 列选择器的有效默认选择包括:default - Valid default selections for the column picker include:

      • NoneNone
      • NumericFeatureNumericFeature
      • NumericLabelNumericLabel
      • NumericScoreNumericScore
      • NumericAllNumericAll
      • BooleanFeatureBooleanFeature
      • BooleanLabelBooleanLabel
      • BooleanScoreBooleanScore
      • BooleanAllBooleanAll
      • CategoricalFeatureCategoricalFeature
      • CategoricalLabelCategoricalLabel
      • CategoricalScoreCategoricalScore
      • CategoricalAllCategoricalAll
      • StringFeatureStringFeature
      • StringLabelStringLabel
      • StringScoreStringScore
      • StringAllStringAll
      • AllLabelAllLabel
      • AllFeatureAllFeature
      • AllScoreAllScore
      • AllAll

DropDown:用户指定的枚举(下拉)列表。DropDown: a user-specified enumerated (dropdown) list. 使用 Item 元素在 Properties 元素中指定下拉列表项。The dropdown items are specified within the Properties element using an Item element. 每个 ItemID 必须是唯一的有效 R变量The id for each Item must be unique and a valid R variable. Itemname 值既是显示的文本,也是传送到 R 函数的值。The value of the name of an Item serves as both the text that you see and the value that is passed to the R function.

<Arg id="color" name="Color" type="DropDown">
  <Properties default="red">
    <Item id="red" name="Red Value"/>
    <Item id="green" name="Green Value"/>
    <Item id="blue" name="Blue Value"/>
  </Properties>
  <Description>Select a color.</Description>
</Arg>    
  • 可选属性 :Optional Properties:
    • default - default 属性的值必须与一个 Item 元素中的 ID 值相对应。default - The value for the default property must correspond with an ID value from one of the Item elements.

辅助文件Auxiliary Files

放置在自定义模块 ZIP 文件中的所有文件都可以在执行期间使用。Any file that is placed in your custom module ZIP file is going to be available for use during execution time. 所有存在的目录结构都将保留。Any directory structures present are preserved. 这意味着本地和 Azure 机器学习工作室(经典版)执行中的文件寻源方式相同。This means that file sourcing works the same locally and in the Azure Machine Learning Studio (classic) execution.

备注

请注意所有文件均被提取到了“src”目录,因此所有路径必须有“src”前缀。Notice that all files are extracted to ‘src’ directory so all paths should have ‘src/’ prefix.

例如,假设你要从数据集中删除所有带 NA 的行并删除所有重复的行,然后再将其输出到 CustomAddRows 中,而且你已在 RemoveDupNARows.R 文件中编写了一个 R 函数来执行此操作。For example, say you want to remove any rows with NAs from the dataset, and also remove any duplicate rows, before outputting it into CustomAddRows, and you've already written an R function that does that in a file RemoveDupNARows.R:

RemoveDupNARows <- function(dataFrame) {
    #Remove Duplicate Rows:
    dataFrame <- unique(dataFrame)
    #Remove Rows with NAs:
    finalDataFrame <- dataFrame[complete.cases(dataFrame),]
    return(finalDataFrame)
}

可在 CustomAddRows 函数中寻源 RemoveDupNARows.R 辅助文件:You can source the auxiliary file RemoveDupNARows.R in the CustomAddRows function:

CustomAddRows <- function(dataset1, dataset2, swap=FALSE) {
    source("src/RemoveDupNARows.R")
        if (swap) { 
            dataset <- rbind(dataset2, dataset1))
         } else { 
              dataset <- rbind(dataset1, dataset2)) 
         } 
    dataset <- removeDupNARows(dataset)
    return (dataset)
}

然后,上传包含“CustomAddRows.R”、“CustomAddRows.xml”和“RemoveDupNARows.R”的 zip 文件作为自定义 R 模块。Next, upload a zip file containing ‘CustomAddRows.R’, ‘CustomAddRows.xml’, and ‘RemoveDupNARows.R’ as a custom R module.

执行环境Execution Environment

R 脚本的执行环境使用与执行 R 脚本模块相同的 R 版本,且可以使用相同的默认包。The execution environment for the R script uses the same version of R as the Execute R Script module and can use the same default packages. 还可以将 R 包加入自定义模块 zip 文件,将其添加到自定义模块。You can also add additional R packages to your custom module by including them in the custom module zip package. 只需像在自己的 R 环境中一样将其加载到 R 脚本中。Just load them in your R script as you would in your own R environment.

执行环境限制 包括:Limitations of the execution environment include:

  • 非永久文件系统:运行自定义模块时写入的文件不会在同一模块的多个运行中保留。Non-persistent file system: Files written when the custom module is run are not persisted across multiple runs of the same module.
  • 无网络访问No network access