将数据载入存储环境以进行分析Load data into storage environments for analytics

团队数据科学过程要求引入或载入各种不同存储环境中的数据在过程的每个阶段中都以最合适的方式进行处理或分析。The Team Data Science Process requires that data be ingested or loaded into a variety of different storage environments to be processed or analyzed in the most appropriate way in each stage of the process. 常用于处理的数据目标包括 Azure Blob 存储、SQL Azure 数据库、Azure VM 上的 SQL Server、 HDInsight (Hadoop) 和 Azure 机器学习。Data destinations commonly used for processing include Azure Blob Storage, SQL Azure databases, SQL Server on Azure VM, HDInsight (Hadoop), and Azure Machine Learning.

以下文章介绍如何将数据引入到存储和处理数据的各种目标环境中。The following articles describe how to ingest data into various target environments where the data is stored and processed.

技术和业务需求以及数据的初始位置、格式和大小,将确定需要将数据引入其中以实现分析的目标的目标环境。Technical and business needs, as well as the initial location, format, and size of your data will determine the target environments into which the data needs to be ingested to achieve the goals of your analysis. 要求数据在多个环境之间移动以实现构建预测模型所需的各种任务,这样的方案是不常见的。It is not uncommon for a scenario to require data to be moved between several environments to achieve the variety of tasks required to construct a predictive model. 例如,此系列任务可能包括数据浏览、预处理、清理、向下采样和模型定型。This sequence of tasks can include, for example, data exploration, pre-processing, cleaning, down-sampling, and model training.