Team Data Science Process 中单个参与者的任务Tasks for an individual contributor in the Team Data Science Process

本主题概述在 Team Data Science Process (TDSP) 中设置项目时,单个参与者完成的任务 。This topic outlines the tasks that an individual contributor completes to set up a project in the Team Data Science Process (TDSP). 目标是在标准化 TDSP 的协作型团队环境中工作。The objective is to work in a collaborative team environment that standardizes on the TDSP. TDSP 设计用于帮助改进协作和团队学习。The TDSP is designed to help improve collaboration and team learning. 有关致力于标准化 TDSP 的数据科学团队要处理的人员角色及其相关任务的概述,请参阅团队数据科学流程角色和任务For an outline of the personnel roles and their associated tasks that are handled by a data science team standardizing on the TDSP, see Team Data Science Process roles and tasks.

下图显示了各个项目参与者(数据科学家)在设置其团队环境时所要完成的任务。The following diagram shows the tasks that project individual contributors (data scientists) complete to set up their team environment. 有关如何在 TDSP 下执行数据科学项目的说明,请参阅执行数据科学项目For instructions on how to execute a data science project under the TDSP, see Execution of data science projects.

单个参与者的任务

  • ProjectRepository 是项目团队维护的存储库,用于共享项目模板和资产 。ProjectRepository is the repository your project team maintains to share project templates and assets.
  • TeamUtilities 是团队专门为自己维护的实用程序存储库 。TeamUtilities is the utilities repository your team maintains specifically for your team.
  • GroupUtilities 是组维护的存储库,用于在整个组中共享有用的实用程序 。GroupUtilities is the repository your group maintains to share useful utilities across the entire group.

备注

本文使用 Azure Repos 和 Data Science Virtual Machine (DSVM) 设置 TDSP 环境,因为 Microsoft 使用此方法实现 TDSP。This article uses Azure Repos and a Data Science Virtual Machine (DSVM) to set up a TDSP environment, because that is how to implement TDSP at Microsoft. 如果团队使用其他代码托管或开发平台,则单个参与者的任务是相同的,但完成这些任务的方法可能不同。If your team uses other code hosting or development platforms, the individual contributor tasks are the same, but the way to complete them may be different.

先决条件Prerequisites

本教程假设组管理员团队主管项目主管已设置以下资源和权限:This tutorial assumes that the following resources and permissions have been set up by your group manager, team lead, and project lead:

  • Azure DevOps 组织,用于数据科学单元 The Azure DevOps organization for your data science unit
  • 项目存储库,由项目主管设置,用于共享项目模板和资产 A project repository set up by your project lead to share project templates and assets
  • GroupUtilities 和 TeamUtilities 存储库,由组管理员和团队主管设置(如果适用) GroupUtilities and TeamUtilities repositories set up by the group manager and team lead, if applicable
  • Azure 文件存储,设置用于共享团队或项目的资产(如果适用) Azure file storage set up for shared assets for your team or project, if applicable
  • 权限,用于从项目存储库中克隆并推送回项目存储库 Permissions for you to clone from and push back to your project repository

若要克隆存储库并修改本地计算机或 DSVM 上的内容,或将 Azure 文件存储装载到 DSVM,需要考虑以下清单:To clone repositories and modify content on your local machine or DSVM, or mount Azure file storage to your DSVM, you need to consider this checklist:

  • Azure 订阅。An Azure subscription.
  • 计算机上安装的 Git。Git installed on your machine. 如果要使用 DSVM,则需预安装 Git。If you're using a DSVM, Git is pre-installed. 否则,请参阅平台和工具附录Otherwise, see the Platforms and tools appendix.
  • 如果要使用 DSVM,需要在 Azure 中创建和配置 Windows 或 Linux DSVM。If you want to use a DSVM, the Windows or Linux DSVM created and configured in Azure. 有关详细信息和说明,请参阅 Data Science Virtual Machine 文档For more information and instructions, see the Data Science Virtual Machine Documentation.
  • 对于 Windows DSVM,需要在计算机上安装 Git 凭据管理器 (GCM)For a Windows DSVM, Git Credential Manager (GCM) installed on your machine. 在 README.md 文件中,向下滚动到“下载并安装”部分,然后选择“最新安装程序” 。In the README.md file, scroll down to the Download and Install section and select the latest installer. 从安装程序页下载 .exe 安装程序并运行它 。Download the .exe installer from the installer page and run it.
  • 对于 Linux DSVM,需要在 DSVM 上设置 SSH 公钥,并将其添加到 Azure DevOps 中。For a Linux DSVM, an SSH public key set up on your DSVM and added in Azure DevOps. 有关详细信息和说明,请参阅平台和工具附录中的“创建 SSH 公钥” 部分。For more information and instructions, see the Create SSH public key section in the Platforms and tools appendix.
  • 针对需要装载到 DSVM 的任何 Azure 文件存储的 Azure 文件存储信息。The Azure file storage information for any Azure file storage you need to mount to your DSVM.

克隆存储库Clone repositories

要在本地使用存储库并将所做的更改推送到共享的团队和项目存储库,请先将存储库复制或克隆到本地计算机 。To work with repositories locally and push your changes up to the shared team and project repositories, you first copy or clone the repositories to your local machine.

  1. 在 Azure DevOps 中,转到团队项目的“摘要”页面,地址为 https://<server name>/<organization name>/<team name>,例如 https://dev.azure.com/DataScienceUnit/MyTeam。In Azure DevOps, go to your team's project Summary page at https://<server name>/<organization name>/<team name>, for example, https://dev.azure.com/DataScienceUnit/MyTeam.

  2. 在左侧导航栏中,选择“存储库”,然后在页面顶部选择要克隆的存储库 。Select Repos in the left navigation, and at the top of the page, select the repository you want to clone.

  3. 在“存储库”页上,选择右上方的“克隆” 。On the repo page, select Clone at upper right.

  4. 在“克隆存储库”对话框中,为 HTTP 连接选择“HTTPS”,或为 SSH 连接选择“SSH”,并将命令行下的克隆 URL 复制到剪贴板 。In the Clone repository dialog, select HTTPS for an HTTP connection, or SSH for an SSH connection, and copy the clone URL under Command line to your clipboard.

    克隆存储库

  5. 在本地计算机或 DSVM 上,创建以下目录:On your local machine or DSVM, create the following directories:

    • 对于 Windows:C:\GitRepos For Windows: C:\GitRepos
    • 对于 Linux,则为 $home/GitRepos For Linux: $home/GitRepos
  6. 切换到创建的目录。Change to the directory you created.

  7. 在 Git Bash 中,针对要克隆的各个存储库运行命令 git clone <clone URL>In Git Bash, run the command git clone <clone URL> for each repository you want to clone.

    例如,以下命令可将 TeamUtilities 存储库克隆到本地计算机上的 MyTeam 目录 。For example, the following command clones the TeamUtilities repository to the MyTeam directory on your local machine.

    HTTPS 连接:HTTPS connection:

    git clone https://DataScienceUnit@dev.azure.com/DataScienceUnit/MyTeam/_git/TeamUtilities
    

    SSH 连接:SSH connection:

    git clone git@ssh.dev.azure.com:v3/DataScienceUnit/MyTeam/TeamUtilities
    
  8. 确认可在本地项目目录中看到克隆的存储库的文件夹。Confirm that you can see the folders for the cloned repositories in your local project directory.

    三个本地存储库文件夹

将 Azure 文件存储装载到 DSVMMount Azure file storage to your DSVM

如果团队或项目在 Azure 文件存储中具有共享的资产,请将文件存储装载到本地计算机或 DSVM。If your team or project has shared assets in Azure file storage, mount the file storage to your local machine or DSVM. 请按照在本地计算机或 DSVM 上装载 Azure 文件存储中的说明进行操作。Follow the instructions at Mount Azure file storage on your local machine or DSVM.

后续步骤Next steps

下面是 Team Data Science Process 定义的其他角色和任务的详细说明链接:Here are links to detailed descriptions of the other roles and tasks defined by the Team Data Science Process: