Team Data Science Process 生命周期The Team Data Science Process lifecycle

Team Data Science Process (TDSP) 提供可用于构建数据科学项目的建议生命周期。The Team Data Science Process (TDSP) provides a recommended lifecycle that you can use to structure your data-science projects. 该生命周期概述了成功的项目所遵循的完整步骤。The lifecycle outlines the complete steps that successful projects follow. 如果使用另一数据科学生命周期,例如数据挖掘的跨行业标准过程 (CRISP-DM)、数据库中的知识发现 (KDD) 或组织自己的定制过程,仍可使用基于任务的 TDSP。If you use another data-science lifecycle, such as the Cross Industry Standard Process for Data Mining (CRISP-DM), Knowledge Discovery in Databases (KDD), or your organization's own custom process, you can still use the task-based TDSP.

此生命周期为数据科学项目而设计,这些项目旨在作为智能应用程序的一部分提供。This lifecycle is designed for data-science projects that are intended to ship as part of intelligent applications. 这些应用程序部署机器学习或人工智能模型以进行预测分析。These applications deploy machine learning or artificial intelligence models for predictive analytics. 探索性数据科学项目和即席分析项目也可受益于此过程的使用。Exploratory data-science projects and improvised analytics projects can also benefit from the use of this process. 但对于这些项目,可能并不需要这里描述的一些步骤。But for those projects, some of the steps described here might not be needed.

五个生命周期阶段Five lifecycle stages

TDSP 生命周期由 5 个以迭代方式执行的主要阶段组成。The TDSP lifecycle is composed of five major stages that are executed iteratively. 这些阶段包括:These stages include:

  1. 了解业务Business understanding
  2. 数据采集和理解Data acquisition and understanding
  3. 建模Modeling
  4. 部署Deployment
  5. 客户验收Customer acceptance

此处直观地展示了 TDSP 生命周期:Here is a visual representation of the TDSP lifecycle:

TDSP 生命周期

将 TDSP 生命周期建模为一系列迭代步骤,这些步骤为使用预测模型所需的任务提供指导。The TDSP lifecycle is modeled as a sequence of iterated steps that provide guidance on the tasks needed to use predictive models. 在计划使用的生产环境中部署预测模型以构建智能应用程序。You deploy the predictive models in the production environment that you plan to use to build the intelligent applications. 此过程生命周期的目标是将数据科学项目继续推动到明确的参与终点。The goal of this process lifecycle is to continue to move a data-science project toward a clear engagement end point. 数据科学是研究和发现中的一项运用。Data science is an exercise in research and discovery. 通过使用标准模板的定义完善的一系列项目,可以将任务传达给团队和客户,这有助于避免产生误解。The ability to communicate tasks to your team and your customers by using a well-defined set of artifacts that employ standardized templates helps to avoid misunderstandings. 使用这些模板还可增加成功完成复杂数据科学项目的几率。Using these templates also increases the chance of the successful completion of a complex data-science project.

我们为每个阶段提供以下信息:For each stage, we provide the following information:

  • 目标:具体目标。Goals: The specific objectives.
  • 如何执行:具体任务和有关如何完成这些任务的指导的概述。How to do it: An outline of the specific tasks and guidance on how to complete them.
  • 项目:可交付结果和用于生成结果的支持。Artifacts: The deliverables and the support to produce them.

后续步骤Next steps

我们还提供了完整的演练,演示特定方案过程中的所有步骤。We provide full end-to-end walkthroughs that demonstrate all the steps in the process for specific scenarios. 示例演练一文提供了包含链接和缩略图描述的方案列表。The Example walkthroughs article provides a list of the scenarios with links and thumbnail descriptions. 该演练演示如何将云、本地工具以及服务结合到一个工作流或管道中,以创建智能应用程序。The walkthroughs illustrate how to combine cloud, on-premises tools, and services into a workflow or pipeline to create an intelligent application.

有关如何在使用 Azure 机器学习工作室的 TDSP 中执行步骤的示例,请参阅通过 Azure 机器学习使用 TDSPFor examples of how to execute steps in TDSPs that use Azure Machine Learning Studio, see Use the TDSP with Azure Machine Learning.