对 Azure Blob 容器、SQL Server 和 Hive 表中的数据采样Sample data in Azure blob containers, SQL Server, and Hive tables

以下文章介绍了如何对存储在三个不同 Azure 位置之一的数据进行采样:The following articles describe how to sample data that is stored in one of three different Azure locations:

此采样任务是团队数据科学流程 (TDSP) 中的一个步骤。This sampling task is a step in the Team Data Science Process (TDSP).

为什么对数据采样?Why sample data?

如果计划要分析的数据集很大,通常最好是对数据进行向下采样,以将数据减至较小但具备代表性且更易于管理的规模。If the dataset you plan to analyze is large, it's usually a good idea to down-sample the data to reduce it to a smaller but representative and more manageable size. 这有利于数据了解、探索和功能设计。This facilitates data understanding, exploration, and feature engineering. 它在 Cortana Analytics 进程中的作用是能够快速建立数据处理函数和机器学习模型的快速原型。Its role in the Cortana Analytics Process is to enable fast prototyping of the data processing functions and machine learning models.