将训练数据从各种数据源导入 Azure 机器学习工作室(经典)Import your training data into Azure Machine Learning Studio (classic) from various data sources

适用于: yes机器学习工作室(经典) noAzure 机器学习APPLIES TO: yesMachine Learning Studio (classic) noAzure Machine Learning

若要在机器学习工作室(经典)中使用自己的数据来开发和训练预测分析解决方案,可以使用来自以下源的数据:To use your own data in Machine Learning Studio (classic) to develop and train a predictive analytics solution, you can use data from:

  • 本地文件 - 提前从硬盘驱动器加载本地数据,以便在工作区中创建数据集模块Local file - Load local data ahead of time from your hard drive to create a dataset module in your workspace
  • 联机数据源 - 运行试验时使用导入数据模块,访问来自多个联机源之一的数据Online data sources - Use the Import Data module to access data from one of several online sources while your experiment is running
  • 机器学习工作室(经典)试验 - 使用在机器学习工作室(经典)中保存为数据集的数据Machine Learning Studio (classic) experiment - Use data that was saved as a dataset in Machine Learning Studio (classic)
  • SQL Server 数据库 - 使用来自 SQL Server 数据库的数据,而无需手动复制数据SQL Server database - Use data from a SQL Server database without having to copy data manually

备注

机器学习工作室(经典)中提供了有许多可用于训练数据的示例数据集。There are a number of sample datasets available in Machine Learning Studio (classic) that you can use for training data. 若要了解这些示例数据集,请参阅在 Azure 机器学习工作室(经典)中使用示例数据集For information on these, see Use the sample datasets in Azure Machine Learning Studio (classic).

准备数据Prepare data

机器学习工作室(经典)旨在使用矩形或表格数据,例如从数据库分隔或结构化数据的文本数据,但在某些情况下,可能会使用非矩形数据。Machine Learning Studio (classic) is designed to work with rectangular or tabular data, such as text data that's delimited or structured data from a database, though in some circumstances non-rectangular data may be used.

在将数据导入到工作室(经典)之前,数据最好相对整洁。It's best if your data is relatively clean before you import it into Studio (classic). 例如,你需要处理不带引号的字符串之类的问题。For example, you'll want to take care of issues such as unquoted strings.

但是,工作室(经典)中有些模块可用来在导入数据后在试验中实现某些数据操作。However, there are modules available in Studio (classic) that enable some manipulation of data within your experiment after you import your data. 根据要使用的机器学习算法,可能需要确定处理数据结构问题(如缺失值和稀疏数据)的方法,并且提供有模块帮助处理这些问题。Depending on the machine learning algorithms you'll be using, you may need to decide how you'll handle data structural issues such as missing values and sparse data, and there are modules that can help with that. 请查看模块控制板的“数据转换”部分,了解执行这些函数的模块。Look in the Data Transformation section of the module palette for modules that perform these functions.

通过单击输出端口,任何时候都可在试验中查看或下载模块生成的数据。At any point in your experiment, you can view or download the data that's produced by a module by clicking the output port. 可能有不同的下载选项可用,或者能够在工作室(经典)中通过 Web 浏览器来使数据可视化,具体取决于模块。Depending on the module, there may be different download options available, or you may be able to visualize the data within your web browser in Studio (classic).

支持的数据格式和数据类型Supported data formats and data types

可将大量数据类型导入试验,具体取决于用于导入数据的机制以及数据的来源:You can import a number of data types into your experiment, depending on what mechanism you use to import data and where it's coming from:

  • 纯文本 (.txt)Plain text (.txt)
  • 逗号分隔值 (CSV),带有标头 (.csv) 或不带标头 (.nh.csv)Comma-separated values (CSV) with a header (.csv) or without (.nh.csv)
  • 制表符分隔值 (TSV),带有标头 (.tsv) 或不带标头 (.nh.tsv)Tab-separated values (TSV) with a header (.tsv) or without (.nh.tsv)
  • Excel 文件Excel file
  • Azure 表Azure table
  • Hive 表Hive table
  • SQL 数据库表SQL database table
  • OData 值OData values
  • SVMLight 数据 (.svmlight)(有关格式的信息,请参阅 SVMLight 定义SVMLight data (.svmlight) (see the SVMLight definition for format information)
  • 属性关系文件格式 (ARFF) 数据 (.arff)(有关格式的信息,请参阅 ARFF 定义Attribute Relation File Format (ARFF) data (.arff) (see the ARFF definition for format information)
  • Zip 文件 (.zip)Zip file (.zip)
  • R 对象或工作区文件 (.RData)R object or workspace file (.RData)

如果导入包括元数据的数据(如 ARFF 格式),工作室(经典)将使用此元数据定义每个列的标题和数据类型。If you import data in a format such as ARFF that includes metadata, Studio (classic) uses this metadata to define the heading and data type of each column.

如果导入不包括此元数据的数据(如 TSV 或 CSV 格式),工作室(经典)将通过数据采样推断每个列的数据类型。If you import data such as TSV or CSV format that doesn't include this metadata, Studio (classic) infers the data type for each column by sampling the data. 如果数据也没有列标题,工作室(经典)将提供默认名称。If the data also doesn't have column headings, Studio (classic) provides default names.

可使用编辑元数据模块显式指定或更改列的标题和数据类型。You can explicitly specify or change the headings and data types for columns using the Edit Metadata module.

工作室(经典)可识别以下数据类型:The following data types are recognized by Studio (classic):

  • StringString
  • IntegerInteger
  • DoubleDouble
  • 布尔Boolean
  • DateTimeDateTime
  • TimeSpanTimeSpan

工作室使用名为“数据表”的内部数据类型在模块间传递数据。Studio uses an internal data type called data table to pass data between modules. 可使用转换为数据集模块,将数据显式转换为数据表格式。You can explicitly convert your data into data table format using the Convert to Dataset module.

任何接受格式而不是数据表的模块都会在不提示的情况将数据转换为数据表,然后再将其传递到下一个模块。Any module that accepts formats other than data table will convert the data to data table silently before passing it to the next module.

如有必要,可使用其他转换模块,将数据表格式转换回 CSV、TSV、ARFF 或 SVMLight 格式。If necessary, you can convert data table format back into CSV, TSV, ARFF, or SVMLight format using other conversion modules. 请查看模块控制板的“数据格式转换”部分,了解执行这些函数的模块。Look in the Data Format Conversions section of the module palette for modules that perform these functions.

数据容量Data capacities

机器学习工作室(经典)中的模块针对常见用例支持最多包含 10 GB 密集数字数据的数据集。Modules in Machine Learning Studio (classic) support datasets of up to 10 GB of dense numerical data for common use cases. 如果模块接受多个输入,10 GB 这个值是所有输入大小的总计。If a module takes more than one input, the 10 GB value is the total of all input sizes. 可通过 Hive 或 Azure SQL 数据库查询对更大的数据集采样,或者在导入数据之前使用“按计数学习”预处理。You can sample larger datasets by using queries from Hive or Azure SQL Database, or you can use Learning by Counts preprocessing before you import the data.

以下数据类型可以在特征规范化期间扩展为较大数据集,其限制为小于 10 GB:The following types of data can expand to larger datasets during feature normalization and are limited to less than 10 GB:

  • 稀疏Sparse
  • 分类Categorical
  • 字符串Strings
  • Binary dataBinary data

以下模块限制为小于 10 GB 的数据集:The following modules are limited to datasets less than 10 GB:

  • 推荐器模块Recommender modules
  • 合成少数类过采样技术 (SMOTE) 模块Synthetic Minority Oversampling Technique (SMOTE) module
  • 脚本模块:R、Python、SQLScripting modules: R, Python, SQL
  • 输出数据大小可以大于输入数据大小的模块,例如联接或特征哈希Modules where the output data size can be larger than input data size, such as Join or Feature Hashing
  • 迭代数目极大时的交叉验证、调整模型超参数、顺序回归和一对多的多类Cross-validation, Tune Model Hyperparameters, Ordinal Regression, and One-vs-All Multiclass, when the number of iterations is very large

对于大于几个 GB 的数据集,应该将数据上传到 Azure 存储或 Azure SQL 数据库,或者使用 Azure HDInsight,而不要直接从本地文件上传。For datasets that are larger than a couple GBs, upload the data to Azure Storage or Azure SQL Database, or use Azure HDInsight, rather than uploading directly from a local file.

可在导入图像模块参考中找到有关图像数据的信息。You can find information about image data in the Import Images module reference.

从本地文件导入Import from a local file

可以上传硬盘驱动器中的数据文件,在工作室(经典)中将其用作训练数据。You can upload a data file from your hard drive to use as training data in Studio (classic). 导入数据文件时,你将在工作区中创建一个就绪可在试验中使用的数据集模块。When you import a data file, you create a dataset module ready for use in experiments in your workspace.

若要从本地硬盘驱动器导入数据,请执行以下操作:To import data from a local hard drive, do the following:

  1. 单击工作室(经典)窗口底部的“+新建”。Click +NEW at the bottom of the Studio (classic) window.
  2. 选择“数据集”和“从本地文件”。Select DATASET and FROM LOCAL FILE.
  3. 在“上传新的数据集”对话框中,浏览到要上传的文件。In the Upload a new dataset dialog, browse to the file you want to upload.
  4. 输入名称、标识数据类型,还可选择输入描述。Enter a name, identify the data type, and optionally enter a description. 建议输入描述:这样可以记录有关数据的任何特征,以便将来使用数据时能够记起。A description is recommended - it allows you to record any characteristics about the data that you want to remember when using the data in the future.
  5. 使用复选框“这是现有数据集的新版本”可以使用新数据更新现有数据集。The checkbox This is the new version of an existing dataset allows you to update an existing dataset with new data. 为此,请单击此复选框,并输入现有数据集的名称。To do so, click this checkbox and then enter the name of an existing dataset.

上传新的数据集

上传时间取决于数据大小和连接到服务的速度。Upload time depends on the size of your data and the speed of your connection to the service. 如果知道文件将花费很长时间,等待时可以在工作室(经典)中执行其他操作。If you know the file will take a long time, you can do other things inside Studio (classic) while you wait. 但是,在数据上传完成之前关闭浏览器将导致上传失败。However, closing the browser before the data upload is complete causes the upload to fail.

数据上传后,将存储在数据集模块中,并且可用于工作区中的任何实验。Once your data is uploaded, it's stored in a dataset module and is available to any experiment in your workspace.

编辑实验时,可以在模块选项板中的“已保存数据集”列表下的“我的数据集”列表中找到已上传的数据集。When you're editing an experiment, you can find the datasets you've uploaded in the My Datasets list under the Saved Datasets list in the module palette. 如果希望将数据集用于进一步分析和机器学习,可以将数据集拖放到实验画布。You can drag and drop the dataset onto the experiment canvas when you want to use the dataset for further analytics and machine learning.

从联机数据源导入Import from online data sources

使用导入数据模块,你的试验可以在试验运行时从各种联机数据源导入数据。Using the Import Data module, your experiment can import data from various online data sources while the experiment running.

备注

本文提供了有关导入数据模块的一般信息。This article provides general information about the Import Data module. 有关可访问的数据类型、格式、参数和常见问题解答的详细信息,请参阅导入数据模块的模块参考主题。For more detailed information about the types of data you can access, formats, parameters, and answers to common questions, see the module reference topic for the Import Data module.

使用导入数据模块,可以在试验运行时从多种联机数据源之一访问数据:By using the Import Data module, you can access data from one of several online data sources while your experiment is running:

  • 使用 HTTP 的 Web URLA Web URL using HTTP
  • 使用 HiveQL 的 HadoopHadoop using HiveQL
  • Azure Blob 存储Azure blob storage
  • Azure 表Azure table
  • Azure SQL 数据库。Azure SQL Database. SQL 托管实例或 SQL ServerSQL Managed Instance, or SQL Server
  • 数据馈送提供程序(目前为 OData)A data feed provider, OData currently
  • Azure Cosmos DBAzure Cosmos DB

由于此训练数据是在试验正在运行时访问的,因此只能在该试验中使用。Because this training data is accessed while your experiment is running, it's only available in that experiment. 相比之下,存储在数据集模块中的数据可供工作区中的任何试验使用。By comparison, data that has been stored in a dataset module is available to any experiment in your workspace.

若要在工作室(经典)试验中访问联机数据源,请向试验中添加导入数据模块。To access online data sources in your Studio (classic) experiment, add the Import Data module to your experiment. 然后,在“属性”下选择“启动导入数据向导”以根据分步指导说明来选择并配置数据源。Then select Launch Import Data Wizard under Properties for step-by-step guided instructions to select and configure the data source. 另外,也可以手动在“属性”下选择“数据源”并提供访问数据所需的参数。Alternatively, you can manually select Data source under Properties and supply the parameters needed to access the data.

下表列举了支持的联机数据源。The online data sources that are supported are itemized in the table below. 此表还汇总了支持的文件格式,以及用于访问数据的参数。This table also summarizes the file formats that are supported and parameters that are used to access the data.

重要

导入数据导出数据模块目前只能在使用经典部署模型创建的 Azure 存储中读取和写入数据。Currently, the Import Data and Export Data modules can read and write data only from Azure storage created using the Classic deployment model. 换言之,目前尚不支持可提供热存储访问层或冷存储访问层的新式 Azure Blob 存储帐户类型。In other words, the new Azure Blob Storage account type that offers a hot storage access tier or cool storage access tier is not yet supported.

一般情况下,在此服务选项推出之前创建的任何 Azure 存储帐户应该不受影响。Generally, any Azure storage accounts that you might have created before this service option became available should not be affected. 如果需要创建新帐户,请为“部署模型”选择“经典”或使用“资源管理器”;对于“帐户类型”,请选择“常规用途”而不是“Blob 存储”。 If you need to create a new account, select Classic for the Deployment model, or use Resource manager and select General purpose rather than Blob storage for Account kind.

有关详细信息,请参阅 Azure Blob 存储:热存储层和冷存储层For more information, see Azure Blob Storage: Hot and Cool Storage Tiers.

支持的联机数据源Supported online data sources

Azure 机器学习工作室(经典)导入数据模块支持以下数据源:The Azure Machine Learning Studio (classic) Import Data module supports the following data sources:

数据源Data Source 说明Description parametersParameters
通过 HTTP 的 Web URLWeb URL via HTTP 从使用 HTTP 的任何 Web URL 中读取逗号分隔值 (CSV)、制表符分隔值 (TSV)、属性关系文件格式 (ARFF) 和支持向量机 (SVM-light) 格式的数据Reads data in comma-separated values (CSV), tab-separated values (TSV), attribute-relation file format (ARFF), and Support Vector Machines (SVM-light) formats, from any web URL that uses HTTP URL:指定文件的完整名称,包括站点 URL 和文件名与任何扩展名。URL: Specifies the full name of the file, including the site URL and the file name, with any extension.

数据格式:指定支持的一种数据格式:CSV、TSV、ARFF 或 SVM-light。Data format: Specifies one of the supported data formats: CSV, TSV, ARFF, or SVM-light. 如果数据包含标头行,该数据用于分配列名。If the data has a header row, it is used to assign column names.
Hadoop/HDFSHadoop/HDFS 从 Hadoop 中的分布式存储读取数据。Reads data from distributed storage in Hadoop. 可以使用 HiveQL(类似于 SQL 的查询语言)指定所需的数据。You specify the data you want by using HiveQL, a SQL-like query language. 使用 HiveQL 还可以在将数据添加到工作室(经典)之前聚合数据和执行数据筛选。HiveQL can also be used to aggregate data and perform data filtering before you add the data to Studio (classic). Hive 数据库查询:指定用于生成数据的 Hive 查询。Hive database query: Specifies the Hive query used to generate the data.

HCatalog 服务器 URI:使用 <群集名称>.azurehdinsight.net 格式指定群集的名称。HCatalog server URI : Specified the name of your cluster using the format <your cluster name>.azurehdinsight.net.

Hadoop 用户帐户名称:指定用于预配群集的 Hadoop 用户帐户名。Hadoop user account name: Specifies the Hadoop user account name used to provision the cluster.

Hadoop 用户帐户密码:指定预配群集时使用的凭据。Hadoop user account password : Specifies the credentials used when provisioning the cluster. 有关详细信息,请参阅 Create Hadoop clusters in HDInsight(在 HDInsight 中创建 Hadoop 群集)。For more information, see Create Hadoop clusters in HDInsight.

输出数据的位置:指定数据是要存储在 Hadoop 分布式文件系统 (HDFS) 还是 Azure 中。Location of output data: Specifies whether the data is stored in a Hadoop distributed file system (HDFS) or in Azure.
    如果将输出数据存储在 HDFS 中,请指定 HDFS 服务器的 URI。If you store output data in HDFS, specify the HDFS server URI. (请务必使用不带 HTTPS:// 前缀的 HDInsight 群集名称)。(Be sure to use the HDInsight cluster name without the HTTPS:// prefix).

    如果将输出数据存储在 Azure 中,则必须指定 Azure 存储帐户名、存储访问密钥和存储容器名称。If you store your output data in Azure, you must specify the Azure storage account name, Storage access key and Storage container name.
SQL 数据库SQL database 读取在 Azure 虚拟机上运行的 Azure SQL 数据库、SQL 托管实例或 SQL Server 数据库中存储的数据。Reads data that is stored in Azure SQL Database, SQL Managed Instance, or in a SQL Server database running on an Azure virtual machine. 数据库服务器名称:指定运行数据库的服务器的名称。Database server name: Specifies the name of the server on which the database is running.
    对于 Azure SQL 数据库,请输入生成的服务器名称。In case of Azure SQL Database enter the server name that is generated. 其格式通常为 <生成的标识符>.database.chinacloudapi.cn。Typically it has the form <generated_identifier>.database.chinacloudapi.cn.

    对于托管在 Azure 虚拟机上的 SQL Server,请输入 tcp:<虚拟机 DNS 名称>, 1433In case of a SQL server hosted on an Azure Virtual machine enter tcp:<Virtual Machine DNS Name>, 1433

数据库名称:指定服务器上的数据库名称。Database name : Specifies the name of the database on the server.

服务器用户帐户名:指定具有数据库访问权限的帐户的用户名。Server user account name: Specifies a user name for an account that has access permissions for the database.

服务器用户帐户密码:指定用户帐户的密码。Server user account password: Specifies the password for the user account.

数据库查询:输入 SQL 语句用于说明要读取的数据。Database query:Enter a SQL statement that describes the data you want to read.
本地 SQL 数据库On-premises SQL database 读取 SQL 数据库中存储的数据。Reads data that is stored in a SQL database. 数据网关:指定可访问 SQL Server 数据库的计算机上安装的数据管理网关的名称。Data gateway: Specifies the name of the Data Management Gateway installed on a computer where it can access your SQL Server database. 有关设置网关的信息,请参阅使用 SQL Server 中的数据通过 Azure 机器学习工作室(经典)执行高级分析For information about setting up the gateway, see Perform advanced analytics with Azure Machine Learning Studio (classic) using data from a SQL server.

数据库服务器名称:指定运行数据库的服务器的名称。Database server name: Specifies the name of the server on which the database is running.

数据库名称:指定服务器上的数据库名称。Database name : Specifies the name of the database on the server.

服务器用户帐户名:指定具有数据库访问权限的帐户的用户名。Server user account name: Specifies a user name for an account that has access permissions for the database.

用户名和密码:单击“输入值”输入数据库凭据User name and password: Click Enter values to enter your database credentials. 可以使用 Windows 集成身份验证或 SQL Server 身份验证,具体取决于配置 SQL Server 的方式。You can use Windows Integrated Authentication or SQL Server Authentication depending upon how your SQL Server is configured.

数据库查询:输入 SQL 语句用于说明要读取的数据。Database query:Enter a SQL statement that describes the data you want to read.
Azure 表Azure Table 从 Azure 存储中的表服务读取数据。Reads data from the Table service in Azure Storage.

如果不常读取大量数据,请使用 Azure 表服务。If you read large amounts of data infrequently, use the Azure Table Service. 它提供了一个灵活、非关系 (NoSQL)、可大规模缩放、成本较低且高度可用的存储解决方案。It provides a flexible, non-relational (NoSQL), massively scalable, inexpensive, and highly available storage solution.
导入数据中的选项根据访问的是公共信息还是需要登录凭据的专用存储帐户而变化。The options in the Import Data change depending on whether you are accessing public information or a private storage account that requires login credentials. 这一点可以根据“身份验证类型”来确定,其值可能是“PublicOrSAS”或“Account”,两者都有自身的参数集。This is determined by the Authentication Type which can have value of "PublicOrSAS" or "Account", each of which has its own set of parameters.

公共或共享访问签名 (SAS) URI:参数包括:Public or Shared Access Signature (SAS) URI: The parameters are:

    表 URI:指定表的公共 URL 或 SAS URL。Table URI: Specifies the Public or SAS URL for the table.

    指定扫描属性名称的行:值为 TopN(扫描指定的行数)或 ScanAll(获取表中的所有行)Specifies the rows to scan for property names: The values are TopN to scan the specified number of rows, or ScanAll to get all rows in the table.

    如果数据是同构的且可预测,我们建议选择“TopN”并为 N 输入一个数字。对于大型表,这样可以加快读取速度。If the data is homogeneous and predictable, it is recommended that you select TopN and enter a number for N. For large tables, this can result in quicker reading times.

    如果已使用根据表的深度和位置变化的属性集将数据结构化,请选择“ScanAll”选项来扫描所有行。If the data is structured with sets of properties that vary based on the depth and position of the table, choose the ScanAll option to scan all rows. 这可确保生成的属性和元数据转换的完整性。This ensures the integrity of your resulting property and metadata conversion.

专用存储帐户:参数包括:Private Storage Account: The parameters are:

    帐户名称:指定要读取的表所在的帐户的名称。Account name: Specifies the name of the account that contains the table to read.

    帐户密钥:指定与帐户关联的存储密钥。Account key: Specifies the storage key associated with the account.

    表名称:指定要读取的数据所在的表的名称。Table name : Specifies the name of the table that contains the data to read.

    扫描属性名称的行:值为 TopN(扫描指定的行数)或 ScanAll(获取表中的所有行)Rows to scan for property names: The values are TopN to scan the specified number of rows, or ScanAll to get all rows in the table.

    如果数据是同构的且可预测,我们建议选择“TopN”并为 N 输入一个数字。对于大型表,这样可以加快读取速度。If the data is homogeneous and predictable, we recommend that you select TopN and enter a number for N. For large tables, this can result in quicker reading times.

    如果已使用根据表的深度和位置变化的属性集将数据结构化,请选择“ScanAll”选项来扫描所有行。If the data is structured with sets of properties that vary based on the depth and position of the table, choose the ScanAll option to scan all rows. 这可确保生成的属性和元数据转换的完整性。This ensures the integrity of your resulting property and metadata conversion.

Azure Blob 存储Azure Blob Storage 读取存储在 Azure 存储的 Blob 服务中的数据,包括图像、非结构化文本或二元数据。Reads data stored in the Blob service in Azure Storage, including images, unstructured text, or binary data.

可以使用 Blob 服务公开数据,或者私下存储应用程序数据。You can use the Blob service to publicly expose data, or to privately store application data. 可以使用 HTTP 或 HTTPS 连接从任意位置访问数据。You can access your data from anywhere by using HTTP or HTTPS connections.
导入数据模块中的选项根据访问的是公共信息还是需要登录凭据的专用存储帐户而变化。The options in the Import Data module change depending on whether you are accessing public information or a private storage account that requires login credentials. 这一点可以根据“身份验证类型”来确定,其值可能是“PublicOrSAS”或“Account”。This is determined by the Authentication Type which can have a value either of "PublicOrSAS" or of "Account".

公共或共享访问签名 (SAS) URI:参数包括:Public or Shared Access Signature (SAS) URI: The parameters are:

    URI:指定存储 Blob 的公共 URL 或 SAS URL。URI: Specifies the Public or SAS URL for the storage blob.

    文件格式:指定 Blob 服务中数据的格式。File Format: Specifies the format of the data in the Blob service. 支持的格式包括 CSV、TSV 和 ARFF。The supported formats are CSV, TSV, and ARFF.

专用存储帐户:参数包括:Private Storage Account: The parameters are:

    帐户名称:指定要读取的 Blob 所在的帐户的名称。Account name: Specifies the name of the account that contains the blob you want to read.

    帐户密钥:指定与帐户关联的存储密钥。Account key: Specifies the storage key associated with the account.

    容器、目录或 Blob 的路径:指定要读取的数据所在的 Blob 的名称。Path to container, directory, or blob : Specifies the name of the blob that contains the data to read.

    Blob 文件格式:指定 Blob 服务中数据的格式。Blob file format: Specifies the format of the data in the blob service. 支持的数据格式包括 CSV、TSV、ARFF、CSV(使用指定的编码)和 Excel。The supported data formats are CSV, TSV, ARFF, CSV with a specified encoding, and Excel.

      如果格式是 CSV 或 TSV,请务必指明文件是否包含标头行。If the format is CSV or TSV, be sure to indicate whether the file contains a header row.

      可以使用“Excel”选项从 Excel 工作簿中读取数据。You can use the Excel option to read data from Excel workbooks. 在“Excel 数据格式”选项中,指明数据是在 Excel 工作表范围内还是在 Excel 表中。In the Excel data format option, indicate whether the data is in an Excel worksheet range, or in an Excel table. 在“Excel 工作表或嵌入表”选项中,指定要从中读取数据的工作表或表的名称。In the Excel sheet or embedded table option, specify the name of the sheet or table that you want to read from.

数据馈送提供程序Data Feed Provider 从支持的馈送提供程序读取数据。Reads data from a supported feed provider. 目前仅支持开放数据协议 (OData) 格式。Currently only the Open Data Protocol (OData) format is supported. 数据内容类型:指定 OData 格式。Data content type: Specifies the OData format.

源 URL:指定数据馈送的完整 URL。Source URL: Specifies the full URL for the data feed.
例如,从 Northwind 示例数据库读取以下 URL: https://services.odata.org/northwind/northwind.svc/For example, the following URL reads from the Northwind sample database: https://services.odata.org/northwind/northwind.svc/

从另一个试验导入Import from another experiment

有时候想要从实验中提取直接结果,并将其用作其他实验的一部分。There will be times when you'll want to take an intermediate result from one experiment and use it as part of another experiment. 为此,请将模块另存为数据集:To do this, you save the module as a dataset:

  1. 单击想要另存为数据集的模块的输出。Click the output of the module that you want to save as a dataset.
  2. 单击“另存为数据集”。Click Save as Dataset.
  3. 出现提示时,请输入可轻松标识数据集的名称和描述。When prompted, enter a name and a description that would allow you to identify the dataset easily.
  4. 单击“确定”复选标记。Click the OK checkmark.

保存完成时,数据集将能在工作区的任何实验中使用。When the save finishes, the dataset will be available for use within any experiment in your workspace. 可在模块面板的“保存的数据集”列表中找到它。You can find it in the Saved Datasets list in the module palette.

后续步骤Next steps

部署使用数据导入和数据导出模块的 Azure 机器学习工作室 Web 服务Deploying Azure Machine Learning Studio web services that use Data Import and Data Export modules