“手动输入数据”模块Enter Data Manually module

本文介绍 Azure 机器学习设计器(预览版)中的一个模块。This article describes a module in Azure Machine Learning designer (preview).

使用手动输入数据模块,可以通过键入值来创建小型数据集。Use the Enter Data Manually module to create a small dataset by typing values. 该数据集可以有多个列。The dataset can have multiple columns.

此模块可用于以下方案:This module can be helpful in scenarios such as:

  • 生成一小组值用于测试。Generating a small set of values for testing.
  • 创建简短的标签列表。Creating a short list of labels.
  • 键入要在数据集中插入的列名的列表。Typing a list of column names to insert in a dataset.

创建数据集Create a dataset

  1. 手动输入数据模块添加到管道。Add the Enter Data Manually module to your pipeline. 可以在 Azure 机器学习的数据输入和输出类别中找到此模块。You can find this module in the Data Input and Output category in Azure Machine Learning.

  2. 对于“DataFormat” ,选择以下选项之一。For DataFormat, select one of the following options. 这些选项决定了应该如何分析你提供的数据。These options determine how the data that you provide should be parsed. 每种格式的要求差别很大,因此请务必阅读相关主题。The requirements for each format differ greatly, so be sure to read the related topics.

    • ARFF:Weka 使用的属性-关系文件格式。ARFF: Attribute-relation file format used by Weka.
    • CSV:逗号分隔值格式。CSV: Comma-separated values format. 有关详细信息,请参阅转换为 CSVFor more information, see Convert to CSV.
    • SVMLight:Vowpal Wabbit 和其他机器学习框架使用的一种格式。SVMLight: Format used by Vowpal Wabbit and other machine learning frameworks.
    • TSV:制表符分隔值格式。TSV: Tab-separated values format.

    如果你选择了某种格式,但是未提供满足格式规范的数据,则会发生运行时错误。If you choose a format and do not provide data that meets the format specifications, a runtime error occurs.

  3. 在“数据” 文本框内单击以开始输入数据。Click inside the Data text box to start entering data. 以下格式需要特别注意:The following formats require special attention:

    • CSV:若要创建多个列,请粘贴逗号分隔的文本,或者键入多个列并在字段之间使用逗号。CSV: To create multiple columns, paste in comma-separated text, or type multiple columns by using commas between fields.

      如果选择“HasHeader” 选项,则可以使用第一行值作为列标题。If you select the HasHeader option, you can use the first row of values as the column heading.

      如果取消选择此选项,将使用列名(Col1、Col2,等等)。If you deselect this option, the column names (Col1, Col2, and so forth) are used. 稍后可以使用编辑元数据来添加或更改列名。You can add or change columns names later by using Edit Metadata.

    • TSV:若要创建多个列,请粘贴制表符分隔的文本,或者键入多个列并在字段之间使用制表符。TSV: To create multiple columns, paste in tab-separated text, or type multiple columns by using tabs between fields.

      如果选择“HasHeader” 选项,则可以使用第一行值作为列标题。If you select the HasHeader option, you can use the first row of values as the column heading.

      如果取消选择此选项,将使用列名(Col1、Col2,等等)。If you deselect this option, the column names (Col1, Col2, and so forth) are used. 稍后可以使用编辑元数据来添加或更改列名。You can add or change columns names later by using Edit Metadata.

    • ARFF:粘贴现有的 ARFF 格式文件。ARFF: Paste in an existing ARFF format file. 如果直接键入值,请确保在数据的开头添加可选的标题和必需的属性字段。If you're typing values directly, be sure to add the optional header and required attribute fields at the beginning of the data.

      例如,可以将以下标题和属性行添加到一个简单列表中。For example, the following header and attribute rows can be added to a simple list. 列标题将是 SampleTextThe column heading would be SampleText. 请注意,不支持字符串类型。Note that the String type is not supported.

      % Title: SampleText.ARFF  
      % Source: Enter Data module  
      @ATTRIBUTE SampleText NUMERIC  
      @DATA  
      \<type first data row here>  
      
    • SVMLight:使用 SVMLight 格式键入或粘贴值。SVMLight: Type or paste in values by using the SVMLight format.

      例如,下面的示例以 SVMLight 格式表示 Blood Donation 数据集的前两行:For example, the following sample represents the first couple of lines of the Blood Donation dataset, in SVMLight format:

      # features are [Recency], [Frequency], [Monetary], [Time]  
      1 1:2 2:50 3:12500 4:98   
      1 1:0 2:13 3:3250 4:28   
      

      运行手动输入数据模块时,这些行将转换为列和索引值的数据集,如下所示:When you run the Enter Data Manually module, these lines are converted to a dataset of columns and index values as follows:

      Col1Col1 Col2Col2 Col3Col3 Col4Col4 标签Labels
      0.000160.00016 0.0040.004 0.9999610.999961 0.007840.00784 11
      00 0.0040.004 0.9999550.999955 0.0086150.008615 11
  4. 在每行后面选择 Enter 键,以便另起一行。Select the Enter key after each row, to start a new line.

    如果多次选择 Enter 来添加多个空的尾随行,则会删除或剪裁空行。If you select Enter multiple times to add multiple empty trailing rows, the empty rows will be removed or trimmed.

    如果创建包含缺失值的行,则稍后随时可以将其筛选出来。If you create rows with missing values, you can always filter them out later.

  5. 将输出端口连接到其他模块,然后运行管道。Connect the output port to other modules, and run the pipeline.

    若要查看数据集,请右键单击模块并选择“可视化” 。To view the dataset, right-click the module and select Visualize.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.