将数据载入存储环境以进行分析Load data into storage environments for analytics

Team Data Science Process 要求在每个阶段将数据引入或加载到最合适的方式。The Team Data Science Process requires that data be ingested or loaded into the most appropriate way in each stage. 数据目标可以包括 Azure Blob 存储、SQL Azure 数据库、Azure VM 上的 SQL Server、HDInsight (Hadoop)、Azure Synapse Analytics 和 Azure 机器学习。Data destinations can include Azure Blob Storage, SQL Azure databases, SQL Server on Azure VM, HDInsight (Hadoop), Azure Synapse Analytics, and Azure Machine Learning.

以下文章介绍如何将数据引入到存储和处理数据的各种目标环境中。The following articles describe how to ingest data into various target environments where the data is stored and processed.

技术和业务需求以及数据的初始位置、格式和大小将决定最佳的数据引入计划。Technical and business needs, as well as the initial location, format, and size of your data will determine the best data ingestion plan. 最佳计划有几个步骤,这并不少见。It is not uncommon for a best plan to have several steps. 例如,此系列任务可能包括数据浏览、预处理、清理、向下采样和模型定型。This sequence of tasks can include, for example, data exploration, pre-processing, cleaning, down-sampling, and model training. 建议使用 Azure 数据工厂来协调数据移动和转换。Azure Data Factory is a recommended Azure resource to orchestrate data movement and transformation.