教程:创建用于多类图像分类的标记项目Tutorial: Create a labeling project for multi-class image classification

本教程介绍如何管理在构建机器学习模型时用作数据的图像的标记过程。This tutorial shows you how to manage the process of labeling (also referred to as tagging) images to be used as data for building machine learning models. Azure 机器学习中的数据标记功能目前为公共预览版。Data labeling in Azure Machine Learning is in public preview.

若要训练某个机器学习模型来对图像进行分类,需要数百甚至数千个正确标记的图像。If you want to train a machine learning model to classify images, you need hundreds or even thousands of images that are correctly labeled. Azure 机器学习可帮助你管理领域专家私人团队在标记你的数据时的进度。Azure Machine Learning helps you manage the progress of your private team of domain experts as they label your data.

本教程将使用猫和狗的图像。In this tutorial, you'll use images of cats and dogs. 由于每张图像要么是猫,要么是狗,因此这属于一个多类标记项目。Since each image is either a cat or a dog, this is a multi-class labeling project. 将了解如何执行以下操作:You'll learn how to:

  • 创建一个 Azure 存储帐户并将图像上传到该帐户。Create an Azure storage account and upload images to the account.
  • 创建 Azure 机器学习工作区。Create an Azure Machine Learning workspace.
  • 创建一个多类图像标记项目。Create a multi-class image labeling project.
  • 标记你的数据。Label your data. 可由你或者标记人员执行此任务。Either you or your labelers can perform this task.
  • 检查并导出数据,以此完成项目。Complete the project by reviewing and exporting the data.

先决条件Prerequisites

  • Azure 订阅。An Azure subscription. 如果没有 Azure 订阅,请创建一个试用帐户If you don't have an Azure subscription, create a trial account.

创建工作区Create a workspace

Azure 机器学习工作区是云中的基础资源,用于试验、训练和部署机器学习模型。An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models. 它将 Azure 订阅和资源组关联到服务中一个易于使用的对象。It ties your Azure subscription and resource group to an easily consumed object in the service.

可以通过许多方法来创建工作区There are many ways to create a workspace. 本教程将通过 Azure 门户创建工作区,该门户是用于管理 Azure 资源的基于 Web 的控制台。In this tutorial, you create a workspace via the Azure portal, a web-based console for managing your Azure resources.

  1. 使用 Azure 订阅的凭据登录到 Azure 门户Sign in to Azure portal by using the credentials for your Azure subscription.

  2. 在 Azure 门户的左上角,选择“+ 创建资源” 。In the upper-left corner of Azure portal, select + Create a resource.

    创建新资源

  3. 使用搜索栏查找“机器学习” 。Use the search bar to find Machine Learning.

  4. 选择“机器学习” 。Select Machine Learning.

  5. 在“机器学习”窗格中,选择“创建”以开始 。In the Machine Learning pane, select Create to begin.

  6. 提供以下信息来配置新工作区:Provide the following information to configure your new workspace:

    字段Field 说明Description
    工作区名称Workspace name 输入用于标识工作区的唯一名称。Enter a unique name that identifies your workspace. 本示例使用 docs-ws 。In this example, we use docs-ws. 名称在整个资源组中必须唯一。Names must be unique across the resource group. 使用易于记忆且区别于其他人所创建工作区的名称。Use a name that's easy to recall and to differentiate from workspaces created by others.
    订阅Subscription 选择要使用的 Azure 订阅。Select the Azure subscription that you want to use.
    资源组Resource group 使用订阅中的现有资源组,或者输入一个名称以创建新的资源组。Use an existing resource group in your subscription or enter a name to create a new resource group. 资源组保存 Azure 解决方案的相关资源。A resource group holds related resources for an Azure solution. 本示例使用 docs-aml 。In this example, we use docs-aml.
    位置Location 选择离你的用户和数据资源最近的位置来创建工作区。Select the location closest to your users and the data resources to create your workspace.
    工作区版本Workspace edition 选择“基本” 作为本教程的工作区类型。Select Basic as the workspace type for this tutorial. 工作区类型(基本和企业)确定要访问的功能和定价。The workspace type (Basic & Enterprise) determines the features to which you’ll have access and pricing. 本教程中的所有内容均可使用基本或企业工作区来执行。Everything in this tutorial can be performed with either a Basic or Enterprise workspace.
  7. 完成工作区配置后,选择“查看 + 创建” 。After you are finished configuring the workspace, select Review + Create.

    警告

    在云中创建工作区可能需要几分钟时间。It can take several minutes to create your workspace in the cloud.

    完成创建后,会显示部署成功消息。When the process is finished, a deployment success message appears.

  8. 若要查看新工作区,请选择“转到资源” 。To view the new workspace, select Go to resource.

启动标记项目Start a labeling project

接下来,在 Azure 机器学习工作室中管理数据标记项目。机器学习工作室是一个整合的界面,其中包含的机器学习工具可供各种技能水平的数据科学实践者用来执行数据科学方案。Next you will manage the data labeling project in Azure Machine Learning studio, a consolidated interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. Internet Explorer 浏览器不支持此工作室。The studio is not supported on Internet Explorer browsers.

  1. 登录到 Azure 机器学习工作室Sign in to Azure Machine Learning studio.

  2. 选择创建的订阅和工作区。Select your subscription and the workspace you created.

创建数据存储Create a datastore

Azure 机器学习数据存储用于存储连接信息,例如订阅 ID 和令牌授权。Azure Machine Learning datastores are used to store connection information, like your subscription ID and token authorization. 在此处,你将使用数据存储连接到包含本教程所用图像的存储帐户。Here you use a datastore to connect to the storage account that contains the images for this tutorial.

  1. 在工作区的左侧,选择“数据存储”。On the left side of your workspace, select Datastores.

  2. 选择“+ 新建数据存储”。Select + New datastore.

  3. 在窗体中填写以下设置:Fill out the form with these settings:

    字段Field 说明Description
    数据存储名称Datastore name 为数据存储提供一个名称。Give the datastore a name. 此处我们使用 labeling_tutorial。Here we use labeling_tutorial.
    数据存储类型Datastore type 选择存储的类型。Select the type of storage. 此处我们使用“Azure Blob 存储”,这是适用于图像的首选存储。Here we use Azure Blob Storage, the preferred storage for images.
    帐户选择方法Account selection method 选择“手动输入”。Select Enter manually.
    URLURL https://azureopendatastorage.blob.core.chinacloudapi.cn/openimagescontainer
    身份验证类型Authentication type 选择“SAS 令牌”。Select SAS token.
    帐户密钥Account key ?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2025-03-25T04:51:17Z&st=2020-03-24T20:51:17Z&spr=https&sig=7D7SdkQidGT6pURQ9R4SUzWGxZ%2BHlNPCstoSRRVg8OY%3D
  4. 选择“创建”以创建数据存储。Select Create to create the datastore.

创建标记项目Create a labeling project

现在,你有权访问所要标记的数据,接下来请创建标记项目。Now that you have access to the data you want to have labeled, create your labeling project.

  1. 在页面顶部选择“项目”。At the top of the page, select Projects.

  2. 选择“+ 添加项目”。Select + Add project.

    创建项目

项目详细信息Project details

  1. 在“项目详细信息”窗体中使用以下输入:Use the following input for the Project details form:

    字段Field 说明Description
    项目名称Project name 为你的项目命名。Give your project a name. 此处我们使用 tutorial-cats-n-dogs。Here we'll use tutorial-cats-n-dogs.
    标记任务类型Labeling task type 选择“多类图像分类”。Select Image Classification Multi-class.

    选择“下一步”继续创建项目。Select Next to continue creating the project.

选择或创建数据集Select or create a dataset

  1. 在“选择或创建数据集”窗体中,选择第二个选项“创建数据集”,然后选择“从数据存储”链接。On the Select or create a dataset form, select the second choice, Create a dataset, then select the link From datastore.

  2. 在“从数据存储创建数据集”窗体中使用以下输入:Use the following input for the Create dataset from datastore form:

    1. 在“基本信息”窗体中添加一个名称,此处我们使用 images-for-tutorial。On the Basic info form, add a name, here we'll use images-for-tutorial. 添加说明(如果需要)。Add a description if you wish. 然后,选择“下一步”。Then select Next.
    2. 在“选择数据存储”窗体中,使用下拉列表选择以前创建的数据存储,例如“tutorial_images (Azure Blob 存储)”On the Datastore selection form, use the dropdown to select your Previously created datastore, for example tutorial_images (Azure Blob Storage)
    3. 接下来,仍在“选择数据存储”窗体中,依次选择“浏览”、“MultiClass - DogsCats”。Next, still on the Datastore selection form, select Browse and then select MultiClass - DogsCats. 选择“保存”并使用“/MultiClass - DogsCats”作为路径。Select Save to use /MultiClass - DogsCats as the path.
    4. 选择“下一步”来确认详细信息,然后选择“创建”以创建数据集。Select Next to confirm details and then Create to create the dataset.
    5. 在列表中选择数据集名称(例如“images-for-tutorial”)旁边的圆圈。Select the circle next to the dataset name in the list, for example images-for-tutorial.
  3. 选择“下一步”继续创建项目。Select Next to continue creating the project.

增量刷新Incremental refresh

如果打算将新映像添加到数据集中,增量刷新将找到这些新映像并将其添加到项目中。If you plan to add new images to your dataset, incremental refresh will find these new images and add them to your project. 启用此功能后,项目将定期检查新映像。When you enable this feature, the project will periodically check for new images. 在本教程中,你不会将新映像添加到数据存储中,因此不要选中此功能。You won't be adding new images to the datastore for this tutorial, so leave this feature unchecked.

选择“下一步”继续。Select Next to continue.

标签类Label classes

  1. 在“标签类”窗体中键入标签名称,然后选择“+ 添加标签”以键入下一个标签。On the Label classes form, type a label name, then select +Add label to type the next label. 对于本项目,标签为“猫”、“狗”和“不确定”。For this project, the labels are Cat, Dog, and Uncertain.

  2. 添加所有标签后,选择“下一步”。Select Next when have added all the labels.

标记说明Labeling instructions

  1. 在“标记说明”窗体中,可以输入一个为标记人员提供详细说明的网站的链接。On the Labeling instructions form, you can provide a link to a website that provides detailed instructions for your labelers. 对于本教程,我们将此窗体留空。We'll leave it blank for this tutorial.

  2. 还可以直接在窗体中添加任务的简短说明。You can also add a short description of the task directly on the form. 键入“标记教程 - 猫和狗”。Type Labeling tutorial - Cats & Dogs.

  3. 选择“下一步”。Select Next.

  4. 在“ML 辅助标记”部分中,让复选框保留未选中状态。In the ML assisted labeling section, leave the checkbox unchecked. ML 辅助标记所需的数据比在本教程中使用的数据更多。ML assisted labeling requires more data than you'll be using in this tutorial.

  5. 选择“创建项目”。 Select Create project.

此页面不会自动刷新。This page doesn't automatically refresh. 片刻之后,请手动刷新页面,直到项目状态更改为“已创建”。After a pause, manually refresh the page until the project's status changes to Created.

开始标记Start labeling

你现在已设置了 Azure 资源,并已配置了数据标记项目。You have now set up your Azure resources, and configured a data labeling project. 接下来需要向数据添加标签。It's time to add labels to your data.

标记图像Tag the images

在本教程部分,你要将角色从“项目管理员”切换为标记人员的角色。In this part of the tutorial, you'll switch roles from the project administrator to that of a labeler. 任何对你的工作区具有参与者访问权限的人都可以成为标记人员。Anyone who has contributor access to your workspace can become a labeler.

  1. 机器学习工作室中,选择左侧的“数据标记”来找到你的项目。In Machine Learning studio, select Data labeling on the left-hand side to find your project.

  2. 选择项目的“标签链接”。Select Label link for the project.

  3. 阅读说明,然后选择“任务”。Read the instructions, then select Tasks.

  4. 选择右侧的缩略图以显示要一次性标记的图像数。Select a thumbnail image on the right to display the number of images you wish to label in one go. 必须标记所有这些图像才能继续操作。You must label all these images before you can move on. 仅当有包含未标记数据的新页面时,才可以切换布局。Only switch layouts when you have a fresh page of unlabeled data. 切换布局会清除页面的正在进行的标记工作。Switching layouts clears the page's in-progress tagging work.

  5. 选择一个或多个图像,然后选择要应用到所选图像的标记。Select one or more images, then select a tag to apply to the selection. 标记将显示在图像的下方。The tag appears below the image. 在页面上继续选择并标记所有图像。Continue to select and tag all images on the page. 若要同时选择所有显示的图像,请选择“全选”。To select all the displayed images simultaneously, select Select all. 请至少选择一个要应用标记的图像。Select at least one image to apply a tag.

    提示

    可以使用键盘上的数字键选择前九个标记。You can select the first nine tags by using the number keys on your keyboard.

  6. 标记页面中的所有图像后,选择“提交”以提交这些标签。Once all the images on the page are tagged, select Submit to submit these labels.

    标记图像

  7. 提交手头数据的标记后,Azure 将使用工作队列中的一组新图像刷新页面。After you submit tags for the data at hand, Azure refreshes the page with a new set of images from the work queue.

完成项目Complete the project

现在,请将角色切换回到标记项目的“项目管理员”。Now you'll switch roles back to the project administrator for the labeling project.

作为管理员,你可能想要审查标记人员的工作。As a manager, you may want to review the work of your labeler.

审查标记的数据Review labeled data

  1. 机器学习工作室中,选择左侧的“数据标记”来找到你的项目。In Machine Learning studio, select Data labeling on the left-hand side to find your project.

  2. 选择项目名称链接。Select the project name link.

  3. 仪表板将显示项目的进度。The Dashboard shows you the progress of your project.

  4. 在页面顶部选择“数据”。At the top of the page, select Data.

  5. 在左侧选择“标记的数据”以查看标记的图像。On the left side, select Labeled data to see your tagged images.

  6. 如果你不同意使用某个标签,请选择相应的图像,然后在页面底部选择“拒绝”。When you disagree with a label, select the image and then select Reject at the bottom of the page. 标记随即被删除,而该图像会放回到未标记图像的队列中。The tags will be removed and the image is put back in the queue of unlabeled images.

导出标记的数据Export labeled data

随时可以导出标签数据以进行机器学习试验。You can export the label data for Machine Learning experimentation at any time. 用户经常需要导出数据多次并训练不同的模型,而不是等待所有图像标记完成。Users often export multiple times and train different models, rather than wait for all the images to be labeled.

可以使用 COCO 格式导出图像标签,或将其导出为 Azure 机器学习数据集。Image labels can be exported in COCO format or as an Azure Machine Learning dataset. 数据集采用的格式使得数据集可以方便地在 Azure 机器学习中用于训练。The dataset format makes it easy to use for training in Azure Machine Learning.

  1. 机器学习工作室中,选择左侧的“数据标记”来找到你的项目。In Machine Learning studio, select Data labeling on the left-hand side to find your project.

  2. 选择项目名称链接。Select the project name link.

  3. 依次选择“导出”、“作为 Azure ML 数据集导出”。Select Export and choose Export as Azure ML Dataset.

    “导出”按钮的正下方会显示导出状态。The status of the export appears just below the Export button.

  4. 成功导出标签后,选择左侧的“数据集”来查看结果。Once the labels are successfully exported, select Datasets on the left side to view the results.

清理资源Clean up resources

重要

已创建的资源可以用作其他 Azure 机器学习教程和操作方法文章的先决条件。The resources you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.

如果不打算使用已创建的资源,请删除它们,以免产生任何费用:If you don't plan to use the resources you created, delete them, so you don't incur any charges:

  1. 在 Azure 门户中,选择最左侧的“资源组” 。In the Azure portal, select Resource groups on the far left.

    在 Azure 门户中删除Delete in the Azure portal

  2. 从列表中选择已创建的资源组。From the list, select the resource group you created.

  3. 选择“删除资源组” 。Select Delete resource group.

  4. 输入资源组名称。Enter the resource group name. 然后选择“删除” 。Then select Delete.

后续步骤Next steps

在本教程中,你已标记了图像。In this tutorial, you labeled images. 现在,请使用标记的数据来执行以下操作:Now use your labeled data: