转换为图像目录Convert to Image Directory

本文介绍如何使用“转换为图像目录”模块将图像数据集转换为“图像目录”数据类型,该类型采用图像相关任务(例如 Azure 机器学习设计器(预览版)中的图像分类)中的标准化数据格式。This article describes how to use the Convert to Image Directory module to help convert image dataset to 'Image Directory' data type, which is standardized data format in image-related tasks like image classification in Azure Machine Learning designer (preview).

如何使用“转换为图像目录”模块How to use Convert to Image Directory

  1. 将“转换为图像目录”模块添加到试验中。Add the Convert to Image Directory module to your experiment. 可以在模块列表的“计算机视觉/图像数据转换”类别中找到此模块。You can find this module in the 'Computer Vision/Image Data Transformation' category in the module list.

  2. 注册一个图像数据集,并将其连接到模块输入端口。Register an image dataset and connect it to the module input port. 请确保输入数据集中有图像。Please make sure there is image in input dataset. 支持以下数据集格式:Following dataset formats are supported:

    • 具有以下扩展名的压缩文件:“.zip”、“.tar”、“.gz”、“.bz2”。Compressed file in these extensions: '.zip', '.tar', '.gz', '.bz2'.
    • 包含图像的文件夹。Folder containing images. 强烈建议先压缩此类文件夹,然后再使用压缩文件作为数据集。Highly recommend compressing such folder first then use the compressed file as dataset.

    警告

    不能使用“导入数据”模块导入图像数据集,因为“导入数据”模块的输出类型是“DataFrame 目录”,它只包含文件路径字符串。You cannot use Import Data module to import image dataset, because the output type of Import Data module is DataFrame Directory, which only contains file path string.

    备注

    如果在监督式学习中使用图像数据集,则需要标签。If use image dataset in supervised learning, label is required. 对于图像分类任务,如果此图像数据集以 Torchvision ImageFolder 格式进行组织,则可以在模块输出中将标签生成为图像“类别”。For image classification task, label can be generated as image 'category' in module output if this image dataset is organized in torchvision ImageFolder format. 否则,只保存不带标签的图像。Otherwise, only images are saved without label. 下面是一个示例,说明了如何组织图像数据集以获取标签,并使用图像类别作为子文件夹名称。Here is an example of how you could organize image dataset to get label, use image category as subfolder name. 有关详细信息,请参阅 torchvision 数据集Please refer to torchvision datasets for more information.

    root/dog/xxx.png
    root/dog/xxy.png
    root/dog/xxz.png
    
    root/cat/123.png
    root/cat/nsdf3.png
    root/cat/asd932_.png
    
  3. 提交管道。Submit the pipeline.

结果Results

“转换为图像目录”模块的输出为“图像目录”格式,可连接到其他与图像相关的模块(其输入端口格式也是“图像目录”)。The output of Convert to Image Directory module is in Image Directory format, and can be connected to other image-related modules of which the input port format is also Image Directory.

技术说明Technical notes

预期输入Expected inputs

名称Name 类型Type 说明Description
输入数据集Input dataset 任何目录、Zip 文件Any Directory, Zip File 输入数据集Input dataset

输出Output

名称Name 类型Type 说明Description
输出图像目录Output image directory 图像目录Image Directory 输出图像目录Output image directory

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.