转换为图像目录Convert to Image Directory

本文介绍如何使用“转换为图像目录”模块将图像数据集转换为“图像目录”数据类型,该类型采用图像相关任务(例如 Azure 机器学习设计器中的图像分类)中的标准化数据格式。This article describes how to use the Convert to Image Directory module to help convert image dataset to Image Directory data type, which is standardized data format in image-related tasks like image classification in Azure Machine Learning designer.

如何使用“转换为图像目录”模块How to use Convert to Image Directory

  1. 首先准备好图像数据集。Prepare your image dataset first.

    对于监督式学习,你需要指定训练数据集的标签。For supervised learning, you need to specify the label of training dataset. 图像数据集文件应采用以下结构:The image dataset file should be in following structure:

    Your_image_folder_name/Category_1/xxx.png
    Your_image_folder_name/Category_1/xxy.jpg
    Your_image_folder_name/Category_1/xxz.jpeg
    
    Your_image_folder_name/Category_2/123.png
    Your_image_folder_name/Category_2/nsdf3.png
    Your_image_folder_name/Category_2/asd932_.png
    

    图像数据集文件夹中有多个子文件夹。In the image dataset folder, there are multiple subfolders. 每个子文件夹分别包含一个类别的图像。Each subfolder contains images of one category respectively. 子文件夹的名称视为图像分类等任务的标签。The names of subfolders are considered as the labels for tasks like image classification. 有关详细信息,请参阅 torchvision 数据集Refer to torchvision datasets for more information.

    警告

    设计器目前不支持从数据标签中导出的带标签的数据集。Currently labeled datasets exported from Data Labeling are not supported in the designer.

    支持具有以下扩展名(小写)的图像:“.jpg”、“.jpeg”、“.png”、“.ppm”、“.bmp”、“.pgm”、“.tif”、“.tiff”、“.webp”。Images with these extensions (in lowercase) are supported: '.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'. 你也可以在一个文件夹中包含多种类型的图像。You can also have multiple types of images in one folder. 每个类别的文件夹中不必包含相同数量的图像。It is not necessary to contain the same count of images in each category folder.

    可以使用文件夹或扩展名为“.zip”、“.tar”、“.gz”和“.bz2”的压缩文件。You can either use the folder or compressed file with extension '.zip', '.tar', '.gz', and '.bz2'. 建议使用压缩文件以获得较好的性能。Compressed files are recommended for better performance.

    图像示例数据集

    备注

    对于推理,图像数据集文件夹仅需要包含未分类的图像。For inference, the image dataset folder only needs to contain unclassified images.

  2. 在工作区中将图像数据集注册为文件数据集,因为“转换为图像目录”模块的输入必须是文件数据集。Register the image dataset as a file dataset in your workspace, since the input of Convert to Image Directory module must be a File dataset.

  3. 将已注册的图像数据集添加到画布。Add the registered image dataset to the canvas. 可以在画布左侧的模块列表的“数据集”类别中找到已注册的数据集。You can find your registered dataset in the Datasets category in the module list in the left of canvas. 设计器目前不支持可视化图像数据集。Currently Designer does not support visualize image dataset.

    警告

    不能使用“导入数据”模块导入图像数据集,因为“导入数据”模块的输出类型是“DataFrame 目录”,它只包含文件路径字符串。You cannot use Import Data module to import image dataset, because the output type of Import Data module is DataFrame Directory, which only contains file path string.

  4. 将“转换为图像目录”模块添加到画布中。Add the Convert to Image Directory module to the canvas. 可以在模块列表的“计算机视觉/图像数据转换”类别中找到此模块。You can find this module in the 'Computer Vision/Image Data Transformation' category in the module list. 将其连接到图像数据集。Connect it to the image dataset.

  5. 提交管道。Submit the pipeline. 此模块可在 GPU 或 CPU 上运行。This module could be run on either GPU or CPU.

结果Results

“转换为图像目录”模块的输出为“图像目录”格式,可连接到其他与图像相关的模块(其输入端口格式也是“图像目录”) 。The output of Convert to Image Directory module is in Image Directory format, and can be connected to other image-related modules of which the input port format is also Image Directory.

“转换为图像目录”输出

技术说明Technical notes

预期输入Expected inputs

名称Name 类型Type 说明Description
输入数据集Input dataset 任何目录、Zip 文件Any Directory, Zip File 输入数据集Input dataset

输出Output

名称Name 类型Type 说明Description
输出图像目录Output image directory 图像目录Image Directory 输出图像目录Output image directory

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.