CLI (v2) 命令组件 YAML 架构

2025/07/31

源 JSON 架构可在 https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 中找到。

备注

本文档中详细介绍的 YAML 语法基于最新版本的 ML CLI v2 扩展的 JSON 架构。此语法必定仅适用于最新版本的 ML CLI v2 扩展。可以在 https://azuremlschemasprod.azureedge.net/ 上查找早期扩展版本的架构。

YAML 语法

密钥	类型	说明	允许的值	默认值
`$schema`	字符串	YAML 架构。如果使用 Azure 机器学习 VS Code 扩展来创作 YAML 文件，则可通过在文件顶部包含 `$schema` 来调用架构和资源完成操作。
`type`	const	组件的类型。	`command`	`command`
`name`	字符串	必需。组件的名称。必须以小写字母开头。允许的字符是小写字母、数字和下划线 (_)。最大长度为 255 个字符。
`version`	字符串	组件的版本。如果省略，Azure 机器学习将自动生成一个版本。
`display_name`	字符串	组件在工作室 UI 中的显示名称。在工作区中可以不唯一。
`description`	字符串	组件的说明。
`tags`	object	组件的标记字典。
`is_deterministic`	boolean	此选项确定组件是否会为相同的输入数据生成相同的输出。对于从外部源加载数据（例如从 URL 导入数据）的组件，通常应将其设置为 `false`。这是因为 URL 中的数据可能会随时间而发生更改。		`true`
`command`	string	必需。要执行的命令。
`code`	字符串	要上传并用于组件的源代码目录的本地路径。
`environment`	字符串或对象	必需。用于组件的环境。此值可以是对工作区中现有版本受控环境的引用，也可以是对内联环境规范的引用。若要引用现有环境，请使用 `azureml:<environment-name>:<environment-version>` 语法。若要以内联方式定义环境，请遵循环境架构。排除 `name` 和 `version` 属性，因为内联环境不支持这些属性。
`distribution`	object	分布式训练方案的分布配置。 MpiConfiguration、PyTorchConfiguration 或 TensorFlowConfiguration。
`resources.instance_count`	整型	用于作业的节点数。		`1`
`inputs`	object	组件输入的字典。键是组件上下文中的输入名称，值是组件输入定义。可以在 `command` 中使用 `${{ inputs.<input_name> }}` 表达式引用输入。
`inputs.<input_name>`	object	组件输入定义。请参阅组件输入以了解可配置属性集。
`outputs`	object	组件输出的字典。键是组件上下文中的输出名称，值是组件输出定义。可以在 `command` 中使用 `${{ outputs.<output_name> }}` 表达式引用输出。
`outputs.<output_name>`	object	组件输出定义。请参阅组件输出以了解可配置属性集。

分布配置

MpiConfiguration

密钥	类型	说明	允许的值
`type`	const	必需。分布类型。	`mpi`
`process_count_per_instance`	整型	必需。要为作业启动的每节点进程数。

PyTorchConfiguration

密钥	类型	说明	允许的值	默认值
`type`	const	必需。分布类型。	`pytorch`
`process_count_per_instance`	整型	要为作业启动的每节点进程数。		`1`

TensorFlowConfiguration

密钥	类型	说明	允许的值	默认值
`type`	const	必需。分布类型。	`tensorflow`
`worker_count`	整型	要为作业启动的工作线程数。		默认为 `resources.instance_count`。
`parameter_server_count`	整型	要为作业启动的参数服务器数。		`0`

组件输入

密钥	类型	说明	允许的值	默认值
`type`	字符串	必需。组件输入的类型。详细了解数据访问	`number`，`integer`，`boolean`，`string`，`uri_file`，`uri_folder`，`mltable`，`mlflow_model`
`description`	字符串	输入的说明。
`default`	数字、整数、布尔值或字符串	输入的默认值。
`optional`	boolean	输入是否为必需。如果设置为 `true`，则需要使用包含带 `$[[]]` 可选输入的命令		`false`
`min`	整数或数字	接受的最小输入值。仅当 `type` 字段为 `number` 或 `integer` 时，才能指定此字段。
`max`	整数或数字	接受的最大输入值。仅当 `type` 字段为 `number` 或 `integer` 时，才能指定此字段。
`enum`	array	允许的输入值列表。仅当 `type` 字段为 `string` 时才适用。

组件输出

密钥	类型	说明	允许的值	默认值
`type`	字符串	必需。组件输出的类型。	`uri_file`，`uri_folder`，`mltable`，`mlflow_model`
`description`	字符串	输出的说明。

备注

az ml component 命令可用于管理 Azure 机器学习组件。

示例

示例 GitHub 存储库中提供了命令组件示例。所选示例如下所示。

示例 GitHub 存储库中提供了示例。下面显示了几个示例。

YAML：Hello World 命令组件

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: hello_python_world
display_name: Hello_Python_World
version: 1

code: ./src

environment: 
  image: python

command: >-
  python hello.py

YAML：具有不同输入类型的组件

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_data_component_cli
display_name: train_data
description: A example train component
tags:
  author: azureml-sdk-team
version: 9
type: command
inputs:
  training_data: 
    type: uri_folder
  max_epocs:
    type: integer
    optional: true
  learning_rate: 
    type: number
    default: 0.01
    optional: true
  learning_rate_schedule: 
    type: string
    default: time-based
    optional: true
outputs:
  model_output:
    type: uri_folder
code: ./train_src
environment: azureml://registries/azureml/environments/sklearn-1.0/labels/latest
command: >-
  python train.py 
  --training_data ${{inputs.training_data}} 
  $[[--max_epocs ${{inputs.max_epocs}}]]
  $[[--learning_rate ${{inputs.learning_rate}}]]
  $[[--learning_rate_schedule ${{inputs.learning_rate_schedule}}]]
  --model_output ${{outputs.model_output}}

在命令行中定义可选输入

当输入设置为 optional = true 时，需要使用 $[[]] 来包括带有输入的命令行。例如，$[[--input1 ${{inputs.input1}}]。运行时的命令行可能有不同的输入。

如果只指定所需的 training_data 和 model_output 参数，命令行将如下所示：

python train.py --training_data some_input_path --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

如果在运行时未指定任何值，learning_rate 和 learning_rate_schedule 将使用默认值。

如果所有输入/输出都在运行时提供值，命令行将如下所示：

python train.py --training_data some_input_path --max_epocs 10 --learning_rate 0.01 --learning_rate_schedule time-based --model_output some_output_path

常见错误和建议

下面是定义组件时出现的一些常见错误和相应的建议。

密钥	错误	建议
命令	1. 可选输入只能位于 `$[[]]` 中 2. 命令不支持使用 `\` 创建新行。 3. 找不到输入或输出。	1. 检查是否已在 `inputs` 和 `outputs` 部分定义命令中使用的所有输入或输出，并在可选输入 `$[[]]` 或所需输入 `${{}}` 中使用正确的格式。 2. 请勿使用 `\` 创建新行。
环境	1. 环境 `{envName}` 版本 `{envVersion}` 不存在定义。 2. 名称 `{envName}` 版本 `{envVersion}` 不存在环境。 3. 找不到 ID 为 `{envAssetId}` 的资产。	1. 确保组件定义中存在引用的环境名称和版本。 2. 如果引用注册环境，则需要指定版本。
输入/输出	1. 输入/输出名称与系统保留参数冲突。 2. 重复的输入或输出名称。	1. 请勿将以下任意任何保留参数用作输入/输出名称：`path`、`ld_library_path`、`user`、`logname`、`home`、`pwd`、`shell`。 2. 请确保输入和输出名称不是重复的。

Microsoft Ignite

通过

YAML 语法

分布配置

MpiConfiguration

PyTorchConfiguration

TensorFlowConfiguration

组件输入

组件输出

备注

示例

YAML：Hello World 命令组件

YAML：具有不同输入类型的组件

在命令行中定义可选输入

常见错误和建议

后续步骤

通过

CLI (v2) 命令组件 YAML 架构

YAML 语法

分布配置

MpiConfiguration

PyTorchConfiguration

TensorFlowConfiguration

组件输入

组件输出

备注

示例

YAML：Hello World 命令组件

YAML：具有不同输入类型的组件

在命令行中定义可选输入

常见错误和建议

后续步骤

其他资源