CLI (v2) 自动化 ML 映像分类作业 YAML 架构

2025/07/31

源 JSON 架构可在 https://azuremlsdk2.blob.core.chinacloudapi.cn/preview/0.0.1/autoMLImageClassificationJob.schema.json 中找到。

备注

本文档中详细介绍的 YAML 语法基于最新版本的 ML CLI v2 扩展的 JSON 架构。此语法必定仅适用于最新版本的 ML CLI v2 扩展。可以在 https://azuremlschemasprod.azureedge.net/ 上查找早期扩展版本的架构。

YAML 语法

密钥	类型	说明	允许的值	默认值
`$schema`	字符串	YAML 架构。如果用户使用 Azure 机器学习 VS Code 扩展来创作 YAML 文件，则用户可通过在文件顶部包含 `$schema` 来调用架构和资源完成操作。
`type`	const	必需。作业类型。	`automl`	`automl`
`task`	const	必需。 AutoML 任务的类型。	`image_classification`	`image_classification`
`name`	字符串	作业的名称。对工作区中的所有作业必须唯一。如果省略，Azure 机器学习将自动生成该名称的 GUID。
`display_name`	字符串	作业在工作室 UI 中的显示名称。在工作区中可以不唯一。如果省略此项，Azure 机器学习将为显示名称自动生成人类可读的形容词-名词标识符。
`experiment_name`	字符串	用于对作业进行组织的试验名称。每个作业的运行记录将在工作室的“试验”选项卡中的相应试验下进行组织。如果省略此项，Azure 机器学习默认将按照创建作业的工作目录的名称进行组织。
`description`	字符串	作业的说明。
`tags`	object	作业的标记字典。
`compute`	字符串	要在其上执行作业的计算目标的名称。此计算可以是对工作区中现有计算的引用（使用 `azureml:<compute_name>` 语法），也可以是对 `local` 的引用，以指定本地执行。有关 AutoML 映像作业计算的详细信息，请参阅用于运行试验的计算部分。注意：管道中的作业不支持将 `local` 用作 `compute`。 *		`local`
`log_verbosity`	数字	不同的日志详细程度。	`not_set`，`debug`，`info`，`warning`，`error`，`critical`	`info`
`primary_metric`	字符串	将由 AutoML 针对模型选择进行优化的指标。	`accuracy`	`accuracy`
`target_column_name`	字符串	必需。要对其进行预测的列的名称。始终必须指定它。此参数适用于 `training_data` 和 `validation_data`。
`training_data`	object	必需。要在作业中使用的数据。它应包含训练特征列和目标列。始终必须提供参数 training_data。有关键及其说明的详细信息，请参阅训练或验证数据部分。有关示例，请参阅使用数据部分。
`validation_data`	object	要在作业中使用的验证数据。应该包含训练特征和标签列（可以选择性地包含样本权重列）。如果指定了 `validation_data`，则必须指定 `training_data` 和 `target_column_name` 参数。有关键及其说明的详细信息，请参阅训练或验证数据部分。有关示例，请参阅使用数据部分
`validation_data_size`	FLOAT	未指定用户验证数据时，要保留用于验证的数据部分。	(0.0, 1.0) 范围内的值
`limits`	对象	作业的限制配置字典。键是作业上下文中的限制的名称，值是限制值。有关详细信息，请参阅配置试验设置部分。
`training_parameters`	object	字典，包含作业的训练参数。提供一个对象，该对象具有以下部分列出的键。 - 与模型无关的超参数 - 图像分类（多类和多标签）特定的超参数。有关示例，请参阅支持的模型体系结构部分。
`sweep`	对象	包含作业扫描参数的字典。它包含两个键 - `sampling_algorithm`（必需）和 `early_termination`。有关详细信息和示例，请参阅扫描的采样方法、提前终止策略部分。
`search_space`	object	超参数搜索空间的字典。键是超参数的名称，值是参数表达式。用户可以从为 `training_parameters` 键指定的参数中找到可能的超参数。有关示例，请参阅扫描模型的超参数部分。
`search_space.<hyperparameter>`	object	有两种类型的超参数： - 离散超参数：离散超参数将指定为离散值中的一个 `choice`。 `choice` 可以是一个或多个逗号分隔值、`range` 对象，或任意 `list` 对象。也可以使用分布来指定高级离散超参数 - `randint`、`qlognormal`、`qnormal`、`qloguniform`、`quniform`。有关详细信息，请参阅此部分。 - 连续超参数：连续超参数指定为连续值范围内的分布。目前支持的分布为 - `lognormal`、`normal`、`loguniform`、`uniform`。有关详细信息，请参阅此部分。有关可能可用的表达式集，请参阅参数表达式。
`outputs`	object	作业的输出配置字典。键是作业上下文中的输出名称，值是输出配置。
`outputs.best_model`	object	最佳模型的输出配置字典。有关详细信息，请参阅最佳模型输出配置。

训练或验证数据

密钥	类型	说明	允许的值	默认值
`description`	字符串	描述此输入数据的详细信息。
`path`	字符串	路径可以是 `file` 路径、`folder` 路径，或路径的 `pattern`。 `pattern` 指定搜索模式，以允许对包含数据的文件和文件夹进行通配（`` 和 `*`）。支持的 URI 类型为 `azureml`、`https`、`wasbs`、`abfss` 和 `adl`。有关如何使用 URI 格式的详细信息，请参阅`azureml://`核心 YAML 语法。项目文件位置的 URI。如果此 URI 没有方案（例如，http:、azureml: 等），则会将它视为本地引用，它指向的文件将在创建实体时上传到默认工作区 blob-storage。
`mode`	字符串	数据集传送机制。	`direct`	`direct`
`type`	const	若要生成计算机视觉模型，用户需要以 MLTable 的形式引入标记的图像数据作为模型训练的输入。	mltable	mltable

最佳模型输出配置

密钥	类型	说明	允许的值	默认值
`type`	字符串	必需。最佳模型的类型。 AutoML 仅允许 mlflow 模型。	`mlflow_model`	`mlflow_model`
`path`	字符串	必需。模型项目文件存储位置的 URI。如果此 URI 没有方案（例如，http:、azureml: 等），则会将它视为本地引用，它指向的文件将在创建实体时上传到默认工作区 blob-storage。
`storage_uri`	字符串	模型的 HTTP URL。将此 URL 与 `az storage copy -s THIS_URL -d DESTINATION_PATH --recursive` 结合使用可下载数据。

备注

az ml job 命令可用于管理 Azure 机器学习作业。

示例

示例 GitHub 存储库中提供了示例。下面提供了与图像分类作业相关的示例的链接。

YAML：AutoML 图像分类作业

$schema: https://azuremlsdk2.blob.core.chinacloudapi.cn/preview/0.0.1/autoMLJob.schema.json

type: automl

experiment_name: dpv2-cli-automl-image-classification-experiment
description: A multi-class Image classification job using fridge items dataset

compute: azureml:gpu-cluster

task: image_classification
log_verbosity: debug
primary_metric: accuracy

target_column_name: label
training_data:
  # Update the path, if prepare_data.py is using data_path other than "./data"
  path: data/training-mltable-folder
  type: mltable
validation_data:
  # Update the path, if prepare_data.py is using data_path other than "./data"
  path: data/validation-mltable-folder
  type: mltable

limits:
  timeout_minutes: 60
  max_trials: 10
  max_concurrent_trials: 2

training_parameters:
  early_stopping: True
  evaluation_frequency: 1

sweep:
  sampling_algorithm: random
  early_termination:
    type: bandit
    evaluation_interval: 2
    slack_factor: 0.2
    delay_evaluation: 6

search_space:
  - model_name:
      type: choice
      values: [vitb16r224, vits16r224]
    learning_rate:
      type: uniform
      min_value: 0.001
      max_value: 0.01
    number_of_epochs:
      type: choice
      values: [15, 30]

  - model_name:
      type: choice
      values: [seresnext, resnet50]
    learning_rate:
      type: uniform
      min_value: 0.001
      max_value: 0.01
    layers_to_freeze:
      type: choice
      values: [0, 2]

YAML：AutoML 图像分类管道作业

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

description: Pipeline using AutoML Image Multiclass Classification task

display_name: pipeline-with-image-classification
experiment_name: pipeline-with-automl

settings:
  default_compute: azureml:gpu-cluster

inputs:
  image_multiclass_classification_training_data:
    type: mltable
    # Update the path, if prepare_data.py is using data_path other than "./data"
    path: data/training-mltable-folder
  image_multiclass_classification_validation_data:
    type: mltable
    # Update the path, if prepare_data.py is using data_path other than "./data"
    path: data/validation-mltable-folder

jobs:
  image_multiclass_classification_node:
    type: automl
    task: image_classification
    log_verbosity: info
    primary_metric: accuracy
    limits:
      timeout_minutes: 180
      max_trials: 10
      max_concurrent_trials: 2
    target_column_name: label
    training_data: ${{parent.inputs.image_multiclass_classification_training_data}}
    validation_data: ${{parent.inputs.image_multiclass_classification_validation_data}}
    sweep:
      sampling_algorithm: random
      early_termination:
        type: bandit
        evaluation_interval: 2
        slack_factor: 0.2
        delay_evaluation: 6
    search_space:
      - model_name:
          type: choice
          values: [vitb16r224, vits16r224]
        learning_rate:
          type: uniform
          min_value: 0.001
          max_value: 0.01
        number_of_epochs:
          type: choice
          values: [15, 30]

      - model_name:
          type: choice
          values: [seresnext, resnet50]
        learning_rate:
          type: uniform
          min_value: 0.001
          max_value: 0.01
        layers_to_freeze:
          type: choice
          values: [0, 2]
    training_parameters:
      early_stopping: True
      evaluation_frequency: 1
    # currently need to specify outputs "mlflow_model" explicitly to reference it in following nodes
    outputs:
      best_model:
        type: mlflow_model
  register_model_node:
    type: command
    component: file:./components/component_register_model.yaml
    inputs:
      model_input_path: ${{parent.jobs.image_multiclass_classification_node.outputs.best_model}}
      model_base_name: fridge_items_multiclass_classification_model

后续步骤

安装并使用 CLI (v2)

通过