Team Data Science Process 团队主管的任务Tasks for the team lead on a Team Data Science Process team

本文介绍团队主管要为其数据科学团队完成的任务。 This article describes the tasks that a team lead completes for their data science team. 团队主管的目标在于创建致力于标准化 Team Data Science Process (TDSP) 的协作型团队环境。The team lead's objective is to establish a collaborative team environment that standardizes on the Team Data Science Process (TDSP). TDSP 设计用于帮助改进协作和团队学习。The TDSP is designed to help improve collaboration and team learning.

TDSP 是一种敏捷的迭代式数据科学方法,可有效交付预测分析解决方案和智能应用程序。The TDSP is an agile, iterative data science methodology to efficiently deliver predictive analytics solutions and intelligent applications. 该过程从 Microsoft 和整个行业中提炼出最佳做法和结构。The process distills the best practices and structures from Microsoft and the industry. 目标是成功实施数据科学计划,并充分实现其分析项目的优势。The goal is successful implementation of data science initiatives and fully realizing the benefits of their analytics programs. 有关致力于标准化 TDSP 的数据科学团队要处理的人员角色及相关任务的概述,请参阅团队数据科学流程角色和任务For an outline of the personnel roles and associated tasks for a data science team standardizing on the TDSP, see Team Data Science Process roles and tasks.

团队主管管理由企业的数据科学单位中的若干数据科学家组成的团队。A team lead manages a team consisting of several data scientists in the data science unit of an enterprise. 根据数据科学单位的规模和结构,小组经理和团队主管可能是同一个人,或者可将其任务委托给代理人。Depending on the data science unit's size and structure, the group manager and the team lead might be the same person, or they could delegate their tasks to surrogates. 但是,任务本身不会改变。But the tasks themselves do not change.

下图显示了团队主管为了设置团队环境而要完成的任务的工作流:The following diagram shows the workflow for the tasks the team lead completes to set up a team environment:

团队主管任务工作流

  1. 在小组所在组织的 Azure DevOps 中创建 团队项目Create a team project in the group's organization in Azure DevOps.

  2. 将默认的团队存储库重命名为 TeamUtilitiesRename the default team repository to TeamUtilities.

  3. 在团队项目中创建新的 TeamTemplate 存储库。Create a new TeamTemplate repository in the team project.

  4. 将小组的 GroupUtilitiesGroupProjectTemplate 存储库的内容导入 TeamUtilitiesTeamTemplate 存储库。Import the contents of the group's GroupUtilities and GroupProjectTemplate repositories into the TeamUtilities and TeamTemplate repositories.

  5. 通过添加团队成员并配置其权限来设置 安全控制Set up security control by adding team members and configuring their permissions.

  6. 根据需要创建团队数据和分析资源:If required, create team data and analytics resources:

    • 将团队特定的实用工具添加到 TeamUtilities 存储库。Add team-specific utilities to the TeamUtilities repository.
    • 创建 Azure 文件存储,用于存储可用于整个团队的数据资产。Create Azure file storage to store data assets that can be useful for the entire team.
    • 将 Azure 文件存储装载到团队主管的 Data Science Virtual Machine (DSVM) 并在其中添加数据资产。Mount the Azure file storage to the team lead's Data Science Virtual Machine (DSVM) and add data assets to it.

以下教程详细介绍了相关步骤。The following tutorial walks through the steps in detail.

备注

本文使用 Azure DevOps 和 DSVM 设置 TDSP 团队环境,因为 Microsoft 使用此方法实现 TDSP。This article uses Azure DevOps and a DSVM to set up a TDSP team environment, because that is how to implement TDSP at Microsoft. 如果团队使用其他代码托管或开发平台,则团队领导的任务是相同的,但完成这些任务的方法可能不同。If your team uses other code hosting or development platforms, the team lead tasks are the same, but the way to complete them may be different.

先决条件Prerequisites

本教程假设小组经理已设置以下资源和权限:This tutorial assumes that the following resources and permissions have been set up by your group manager:

  • 数据单位的 Azure DevOps 组织The Azure DevOps organization for your data unit
  • GroupProjectTemplateGroupUtilities 存储库,其中填充了 Microsoft TDSP 团队的 ProjectTemplateUtilities 存储库的内容GroupProjectTemplate and GroupUtilities repositories, populated with the contents of the Microsoft TDSP team's ProjectTemplate and Utilities repositories
  • 对组织帐户的权限,用于为团队创建项目和存储库Permissions on your organization account for you to create projects and repositories for your team

若要克隆存储库并修改其在本地计算机或 DSVM 上的内容,或者要设置 Azure 文件存储并将其装载到 DSVM,需要做好以下准备:To be able to clone repositories and modify their content on your local machine or DSVM, or set up Azure file storage and mount it to your DSVM, you need the following:

  • Azure 订阅。An Azure subscription.
  • 计算机上安装的 Git。Git installed on your machine. 如果要使用 DSVM,则需预安装 Git。If you're using a DSVM, Git is pre-installed. 否则,请参阅平台和工具附录Otherwise, see the Platforms and tools appendix.
  • 如果要使用 DSVM,需要在 Azure 中创建和配置 Windows 或 Linux DSVM。If you want to use a DSVM, the Windows or Linux DSVM created and configured in Azure. 有关详细信息和说明,请参阅 Data Science Virtual Machine 文档For more information and instructions, see the Data Science Virtual Machine Documentation.
  • 对于 Windows DSVM,需要在计算机上安装 Git 凭据管理器 (GCM)For a Windows DSVM, Git Credential Manager (GCM) installed on your machine. 在 README.md 文件中,向下滚动到“下载并安装”部分,然后选择“最新安装程序”。In the README.md file, scroll down to the Download and Install section and select the latest installer. 从安装程序页下载 .exe 安装程序并运行它。Download the .exe installer from the installer page and run it.
  • 对于 Linux DSVM,需要在 DSVM 上设置 SSH 公钥,并将其添加到 Azure DevOps 中。For a Linux DSVM, an SSH public key set up on your DSVM and added in Azure DevOps. 有关详细信息和说明,请参阅平台和工具附录中的“创建 SSH 公钥”部分。For more information and instructions, see the Create SSH public key section in the Platforms and tools appendix.

创建团队项目和存储库Create a team project and repositories

在本部分,你将在小组的 Azure DevOps 组织中创建以下资源:In this section, you create the following resources in your group's Azure DevOps organization:

  • 在 Azure DevOps 中创建 MyTeam 项目The MyTeam project in Azure DevOps
  • TeamTemplate 存储库The TeamTemplate repository
  • TeamUtilities 存储库The TeamUtilities repository

在本教程中为存储库和目录指定的名称假设目标是在较大的数据科学组织中为你自己的团队建立一个单独的项目。The names specified for the repositories and directories in this tutorial assume that you want to establish a separate project for your own team within your larger data science organization. 但是,整个小组可以选择在小组经理或组织管理员创建的单个项目下工作。However, the entire group can choose to work under a single project created by the group manager or organization administrator. 然后,所有数据科学团队将在此单个项目下创建存储库。Then, all the data science teams create repositories under this single project. 此方案可能适用于:This scenario might be valid for:

  • 没有多个数据科学团队的小型数据科学小组。A small data science group that doesn't have multiple data science teams.
  • 具有多个数据科学团队的较大型数据科学小组,不过,该科学小组希望通过小组级冲刺规划等活动来优化团队间协作。A larger data science group with multiple data science teams that nevertheless wants to optimize inter-team collaboration with activities such as group-level sprint planning.

如果团队选择将其团队特定的存储库放在单个小组项目下,则团队主管应使用类似于 <TeamName>Template 和 <TeamName>Utilities 的名称创建存储库。If teams choose to have their team-specific repositories under a single group project, the team leads should create the repositories with names like <TeamName>Template and <TeamName>Utilities. 例如:TeamATemplateTeamAUtilitiesFor instance: TeamATemplate and TeamAUtilities.

在任何情况下,团队主管都需要让其团队成员知道要设置和克隆哪个模板和实用工具存储库。In any case, team leads need to let their team members know which template and utilities repositories to set up and clone. 项目主管应当按照数据科学团队的项目主管任务所述在单独的项目下或者在单个项目下创建项目存储库。Project leads should follow the project lead tasks for a data science team to create project repositories, whether under separate projects or a single project.

创建 MyTeam 项目Create the MyTeam project

为团队创建单独的项目:To create a separate project for your team:

  1. 在 Web 浏览器中,转到小组的 Azure DevOps 组织主页(URL 为 https://<server name>/<organization name> ),然后选择“新建项目”。In your web browser, go to your group's Azure DevOps organization home page at URL https://<server name>/<organization name>, and select New project.

    选择“新建项目”

  2. 在“创建项目”对话框中的“项目名称”下输入团队名称(例如 MyTeam),然后选择“高级”。In the Create project dialog, enter your team name, such as MyTeam, under Project name, and then select Advanced.

  3. 在“版本控制”下选择“Git”,然后在“工作项流程”下选择“敏捷”。Under Version control, select Git, and under Work item process, select Agile. 然后选择“创建”。Then select Create.

    创建项目

此时会打开团队项目的“摘要”页,其中包含页 URL https://<server name>/<organization name>/<team name>。The team project Summary page opens, with page URL https://<server name>/<organization name>/<team name>.

将 MyTeam 默认存储库重命名为 TeamUtilitiesRename the MyTeam default repository to TeamUtilities

  1. 在“MyTeam”项目“摘要”页上的“要从哪个服务开始?”下,选择“存储库”。On the MyTeam project Summary page, under What service would you like to start with?, select Repos.

    选择“存储库”

  2. 在“MyTeam”存储库页的顶部选择“MyTeam”存储库,然后从下拉列表中选择“管理存储库”。On the MyTeam repo page, select the MyTeam repository at the top of the page, and then select Manage repositories from the dropdown.

    选择“管理存储库”

  3. 在“项目设置”页上,选择“MyTeam”存储库旁边的“...”,然后选择“重命名存储库”。On the Project Settings page, select the ... next to the MyTeam repository, and then select Rename repository.

    选择“重命名存储库”

  4. 在“重命名 MyTeam 存储库”弹出窗口中输入 TeamUtilities,然后选择“重命名”。In the Rename the MyTeam repository popup, enter TeamUtilities, and then select Rename.

创建 TeamTemplate 存储库Create the TeamTemplate repository

  1. 在“项目设置”页上,选择“新建存储库”。On the Project Settings page, select New repository.

    选择“新建存储库”

    或者,从“MyTeam”项目“摘要”页的左侧导航窗格中选择“存储库”,选择页面顶部的存储库,然后从下拉列表中选择“新建存储库”。Or, select Repos from the left navigation of the MyTeam project Summary page, select a repository at the top of the page, and then select New repository from the dropdown.

  2. 在“创建新存储库”对话框中,确保已在“类型”下选择“Git”。In the Create a new repository dialog, make sure Git is selected under Type. 在“存储库名称”下输入 TeamTemplate,然后选择“创建”。Enter TeamTemplate under Repository name, and then select Create.

    创建存储库

  3. 确认可以在项目设置页上看到两个存储库:“TeamUtilities”和“TeamTemplate”。Confirm that you can see the two repositories TeamUtilities and TeamTemplate on your project settings page.

    两个团队存储库

导入小组通用存储库的内容Import the contents of the group common repositories

若要在团队存储库中填充小组经理设置的小组通用存储库的内容:To populate your team repositories with the contents of the group common repositories set up by your group manager:

  1. MyTeam 项目主页中,从左侧导航窗格中选择“存储库”。From your MyTeam project home page, select Repos in the left navigation. 如果有消息指出找不到 MyTeam 模板,请选择“否则请导航到默认的 TeamTemplate 存储库”中的链接。If you get a message that the MyTeam template is not found, select the link in Otherwise, navigate to your default TeamTemplate repository.

    此时将打开默认的 TeamTemplate 存储库。The default TeamTemplate repository opens.

  2. 在“TeamTemplate 为空”页上,选择“导入”。On the TeamTemplate is empty page, select Import.

    选择“导入”

  3. 在“导入 Git 存储库”对话框中,选择“Git”作为“源类型”,然后在“克隆 URL”下输入小组通用模板存储库的 URL。In the Import a Git repository dialog, select Git as the Source type, and enter the URL for your group common template repository under Clone URL. URL 为 https://<server name>/<organization name>/_git/<repository name>The URL is https://<server name>/<organization name>/_git/<repository name>. 例如:https://dev.azure.com/DataScienceUnit/GroupCommon/_git/GroupProjectTemplateFor example: https://dev.azure.com/DataScienceUnit/GroupCommon/_git/GroupProjectTemplate.

  4. 选择“导入” 。Select Import. 小组模板存储库的内容随即会导入到团队模板存储库中。The contents of your group template repository are imported into your team template repository.

    导入小组通用模板存储库

  5. 在项目的“存储库”页的顶部,选择“TeamUtilities”存储库。At the top of your project's Repos page, drop down and select the TeamUtilities repository.

  6. 重复导入过程,将小组通用实用工具存储库(例如 GroupUtilities)的内容导入 TeamUtilities 存储库。Repeat the import process to import the contents of your group common utilities repository, for example GroupUtilities, into your TeamUtilities repository.

两个团队存储库中的每一个现在包含相应小组通用存储库中的文件。Each of your two team repositories now contains the files from the corresponding group common repository.

自定义团队存储库的内容Customize the contents of the team repositories

现在可以根据团队的具体需求,自定义团队存储库的内容。If you want to customize the contents of your team repositories to meet your team's specific needs, you can do that now. 可以修改文件、更改目录结构,或添加文件和文件夹。You can modify files, change the directory structure, or add files and folders.

若要直接在 Azure DevOps 中修改、上传或者创建文件或文件夹:To modify, upload, or create files or folders directly in Azure DevOps:

  1. 在“MyTeam”项目的“摘要”页上,选择“存储库”。On the MyTeam project Summary page, select Repos.

  2. 在页面顶部,选择要自定义的存储库。At the top of the page, select the repository you want to customize.

  3. 在存储库目录结构中,导航到要更改的文件夹或文件。In the repo directory structure, navigate to the folder or file you want to change.

    • 若要创建新的文件夹或文件,请选择“新建”旁边的箭头。To create new folders or files, select the arrow next to New.

      创建新文件

    • 若要上传文件,请选择“上传文件”。To upload files, select Upload file(s).

      上传文件

    • 若要编辑现有文件,请导航到该文件,然后选择“编辑”。To edit existing files, navigate to the file and then select Edit.

      编辑文件

  4. 添加或编辑文件后,选择“提交”。After adding or editing files, select Commit.

    提交更改

若要在本地计算机或 DSVM 上使用存储库,请先将存储库复制或克隆到本地计算机,然后将所做的更改提交并推送到共享的团队存储库。To work with repositories on your local machine or DSVM, you first copy or clone the repositories to your local machine, and then commit and push your changes up to the shared team repositories,

若要克隆存储库:To clone repositories:

  1. 在“MyTeam”项目的“摘要”页上选择“存储库”,然后在页面顶部选择要克隆的存储库。On the MyTeam project Summary page, select Repos, and at the top of the page, select the repository you want to clone.

  2. 在“存储库”页上,选择右上方的“克隆”。On the repo page, select Clone at upper right.

  3. 在“克隆存储库”对话框中的“命令行”下,为 HTTP 连接选择“HTTPS”,或者为 SSH 连接选择“SSH”,然后将克隆 URL 复制到剪贴板。In the Clone repository dialog, under Command line, select HTTPS for an HTTP connection or SSH for an SSH connection, and copy the clone URL to your clipboard.

    复制克隆 URL

  4. 在本地计算机上创建以下目录:On your local machine, create the following directories:

    • 对于 Windows:C:\GitRepos\MyTeamFor Windows: C:\GitRepos\MyTeam
    • 对于 Linux:$home/GitRepos/MyTeamFor Linux, $home/GitRepos/MyTeam
  5. 切换到创建的目录。Change to the directory you created.

  6. 在 Git Bash 中运行 git clone <clone URL> 命令,其中 <clone URL> 是从“克隆”对话框复制的 URL。In Git Bash, run the command git clone <clone URL>, where <clone URL> is the URL you copied from the Clone dialog.

    例如,使用以下命令之一将 TeamUtilities 存储库克隆到本地计算机上的 MyTeam 目录。For example, use one of the following commands to clone the TeamUtilities repository to the MyTeam directory on your local machine.

    HTTPS 连接:HTTPS connection:

    git clone https://DataScienceUnit@dev.azure.com/DataScienceUnit/MyTeam/_git/TeamUtilities
    

    SSH 连接:SSH connection:

    git clone git@ssh.dev.azure.com:v3/DataScienceUnit/MyTeam/TeamUtilities
    

在存储库的本地克隆中进行任何所需的更改后,将更改提交并推送到共享的团队存储库。After making whatever changes you want in the local clone of your repository, commit and push the changes to the shared team repositories.

从本地 GitRepos\MyTeam\TeamTemplateGitRepos\MyTeam\TeamUtilities 目录运行以下 Git Bash 命令。Run the following Git Bash commands from your local GitRepos\MyTeam\TeamTemplate or GitRepos\MyTeam\TeamUtilities directory.

git add .
git commit -m "push from local"
git push

备注

如果这是首次提交到 Git 存储库,则需要在运行 git commit 命令之前配置全局参数 user.nameuser.emailIf this is the first time you commit to a Git repository, you may need to configure global parameters user.name and user.email before you run the git commit command. 运行以下两个命令:Run the following two commands:

git config --global user.name <your name>

git config --global user.email <your email address>

如果要提交到多个 Git 存储库,请在每次提交时都使用相同的姓名和电子邮件地址。If you're committing to several Git repositories, use the same name and email address for all of them. 使用相同的姓名和电子邮件地址在构建 Power BI 仪表板来跟踪多个存储库中的 Git 活动时可提供便利。Using the same name and email address is convenient when building Power BI dashboards to track your Git activities in multiple repositories.

添加团队成员并配置权限Add team members and configure permissions

若要向团队添加成员:To add members to the team:

  1. 在 Azure DevOps 的 MyTeam 项目主页中,从左侧导航窗格中选择“项目设置”。In Azure DevOps, from the MyTeam project home page, select Project settings from the left navigation.

  2. 在“项目设置”左侧导航窗格中选择“团队”,然后在“团队”页上选择“MyTeam 团队”。From the Project Settings left navigation, select Teams, then on the Teams page, select the MyTeam Team.

    配置团队

  3. 在“团队资料”页上,选择“添加”。On the Team Profile page, select Add.

    添加到 MyTeam 团队

  4. 在“添加用户和组”对话框中,搜索并选择要添加到组的成员,然后选择“保存更改”。In the Add users and groups dialog, search for and select members to add to the group, and then select Save changes.

    添加用户和组

若要配置团队成员的权限:To configure permissions for team members:

  1. 在“项目设置”左侧导航窗格中,选择“权限”。From the Project Settings left navigation, select Permissions.

  2. 在“权限”页上,选择要向其中添加成员的组。On the Permissions page, select the group you want to add members to.

  3. 在该组的页面上选择“成员”,然后选择“添加”。On the page for that group, select Members, and then select Add.

  4. 在“邀请成员”弹出菜单中,搜索并选择要添加到组的成员,然后选择“保存”。In the Invite members popup, search for and select members to add to the group, and then select Save.

    为成员授予权限

创建团队数据和分析资源:Create team data and analytics resources

此步骤是可选的,但与整个团队共享数据和分析资源可以带来性能和成本方面的优势。This step is optional, but sharing data and analytics resources with your entire team has performance and cost benefits. 团队成员可以在共享的资源中执行其项目、节省预算并更有效地展开协作。Team members can execute their projects on the shared resources, save on budgets, and collaborate more efficiently. 可以创建 Azure 文件存储并将其装载到 DSVM 上,以便与团队成员共享。You can create Azure file storage and mount it on your DSVM to share with team members.

有关与团队共享其他资源(例如 Azure HDInsight Spark 群集)的信息,请参阅平台和工具For information about sharing other resources with your team, such as Azure HDInsight Spark clusters, see Platforms and tools. 本主题从数据科学的角度针对如何选择适合需求的资源提供了指导,并提供了指向产品页面和其他相关且有用教程的链接。That topic provides guidance from a data science perspective on selecting resources that are appropriate for your needs, and links to product pages and other relevant and useful tutorials.

备注

为了避免跨数据中心传输数据(这可能很慢而且成本很高),请确保 Azure 资源组、存储帐户和 DSVM 全部托管在同一 Azure 区域。To avoid transmitting data across data centers, which might be slow and costly, make sure that your Azure resource group, storage account, and DSVM are all hosted in the same Azure region.

创建 Azure 文件存储Create Azure file storage

  1. 运行以下脚本,为整个团队可用的数据资产创建 Azure 文件存储。Run the following script to create Azure file storage for data assets that are useful for your entire team. 此脚本会提示输入 Azure 订阅信息,因此请准备好这些信息。The script prompts you for your Azure subscription information, so have that ready to enter.

    • 在 Windows 计算机上,从 PowerShell 命令提示符运行该脚本:On a Windows machine, run the script from the PowerShell command prompt:

      wget "https://raw.githubusercontent.com/Azure/Azure-MachineLearning-DataScience/master/Misc/TDSP/CreateFileShare.ps1" -outfile "CreateFileShare.ps1"
      .\CreateFileShare.ps1
      
    • 在 Linux 计算机上,从 Linux shell 运行该脚本:On a Linux machine, run the script from the Linux shell:

      wget "https://raw.githubusercontent.com/Azure/Azure-MachineLearning-DataScience/master/Misc/TDSP/CreateFileShare.sh"
      bash CreateFileShare.sh
      
  2. 根据提示登录到 Microsoft Azure 帐户,然后选择要使用的订阅。Log in to your Microsoft Azure account when prompted, and select the subscription you want to use.

  3. 在所选订阅下选择要使用的存储帐户或者创建一个新的存储帐户。Select the storage account to use, or create a new one under your selected subscription. 对于 Azure 文件存储名称,可以使用小写字符、数字和连字符。You can use lowercase characters, numbers, and hyphens for the Azure file storage name.

  4. 为了方便装载和共享存储,请按 Enter 或输入 Y,将 Azure 文件存储信息保存到当前目录中的某个文本文件内。To facilitate mounting and sharing the storage, press Enter or enter Y to save the Azure file storage information into a text file in your current directory. 可将此文本文件签入 TeamTemplate 存储库(最好是在 Docs\DataDictionaries 下),以便团队中的所有项目都可以访问它。You can check in this text file to your TeamTemplate repository, ideally under Docs\DataDictionaries, so all projects in your team can access it. 在下一部分,也需要使用这些文件信息将 Azure 文件存储装载到 Azure DSVM。You also need the file information to mount your Azure file storage to your Azure DSVM in the next section.

在本地计算机或 DSVM 上装载 Azure 文件存储Mount Azure file storage on your local machine or DSVM

  1. 若要将 Azure 文件存储装载到本地计算机或 DSVM,请使用以下脚本。To mount your Azure file storage to your local machine or DSVM, use the following script.

    • 在 Windows 计算机上,从 PowerShell 命令提示符运行该脚本:On a Windows machine, run the script from the PowerShell command prompt:

      wget "https://raw.githubusercontent.com/Azure/Azure-MachineLearning-DataScience/master/Misc/TDSP/AttachFileShare.ps1" -outfile "AttachFileShare.ps1"
      .\AttachFileShare.ps1
      
    • 在 Linux 计算机上,从 Linux shell 运行该脚本:On a Linux machine, run the script from the Linux shell:

      wget "https://raw.githubusercontent.com/Azure/Azure-MachineLearning-DataScience/master/Misc/TDSP/AttachFileShare.sh"
      bash AttachFileShare.sh
      
  2. 如果在上一步骤中保存了 Azure 文件存储信息文件,请按 Enter 或输入 Y 继续。Press Enter or enter Y to continue, if you saved an Azure file storage information file in the previous step. 输入创建的文件的完整路径和名称。Enter the complete path and name of the file you created.

    如果没有 Azure 文件存储信息文件,请输入 n,然后按照说明输入订阅、Azure 存储帐户和 Azure 文件存储信息。If you don't have an Azure file storage information file, enter n, and follow the instructions to enter your subscription, Azure storage account, and Azure file storage information.

  3. 输入要在其上装载文件共享的本地驱动器或 TDSP 驱动器的名称。Enter the name of a local or TDSP drive to mount the file share on. 屏幕中会显示现有驱动器名称的列表。The screen displays a list of existing drive names. 提供尚不存在的驱动器名称。Provide a drive name that doesn't already exist.

  4. 确认新驱动器和存储已成功装载到计算机上。Confirm that the new drive and storage is successfully mounted on your machine.

后续步骤Next steps

下面是 Team Data Science Process 定义的其他角色和任务的详细说明链接:Here are links to detailed descriptions of the other roles and tasks defined by the Team Data Science Process: