CLI (v2) 命令作业 YAML 架构

项目
10/19/2023

源 JSON 架构可在 https://azuremlschemas.azureedge.net/latest/commandJob.schema.json 中找到。

注意

本文档中详细介绍的 YAML 语法基于最新版本的 ML CLI v2 扩展的 JSON 架构。此语法必定仅适用于最新版本的 ML CLI v2 扩展。可以在 https://azuremlschemasprod.azureedge.net/ 上查找早期扩展版本的架构。

YAML 语法

密钥	类型	说明	允许的值	默认值
`$schema`	字符串	YAML 架构。如果使用 Azure 机器学习 VS Code 扩展来创作 YAML 文件，则可通过在文件顶部包含 `$schema` 来调用架构和资源完成操作。
`type`	const	作业类型。	`command`	`command`
`name`	字符串	作业的名称。对工作区中的所有作业必须唯一。如果省略，Azure 机器学习将自动生成该名称的 GUID。
`display_name`	字符串	作业在工作室 UI 中的显示名称。在工作区中可以不唯一。如果省略此项，Azure 机器学习将为显示名称自动生成人类可读的形容词-名词标识符。
`experiment_name`	字符串	用于对作业进行组织的试验名称。每个作业的运行记录将在工作室的“试验”选项卡中的相应试验下进行组织。如果省略此项，Azure 机器学习默认将按照创建作业的工作目录的名称进行组织。
`description`	字符串	作业的说明。
`tags`	object	作业的标记字典。
`command`	字符串	必需（如果不使用 `component` 字段）。要执行的命令。
`code`	字符串	要上传并用于作业的源代码目录的本地路径。
`environment`	字符串或对象	必需（如果不使用 `component` 字段）。用于作业的环境。这可以是对工作区中现有版本受控环境的引用，也可以是对内联环境规范的引用。要引用现有环境，请使用 `azureml:<environment_name>:<environment_version>` 语法或 `azureml:<environment_name>@latest`（引用环境的最新版本）。若要以内联方式定义环境，请遵循环境架构。排除 `name` 和 `version` 属性，因为内联环境不支持这些属性。
`environment_variables`	object	要在执行命令的进程上设置的环境变量键/值对的字典。
`distribution`	object	分布式训练方案的分布配置。 MpiConfiguration、PyTorchConfiguration 或 TensorFlowConfiguration。
`compute`	字符串	要在其上执行作业的计算目标的名称。这可以是对工作区中现有计算的引用（使用 `azureml:<compute_name>` 语法），也可以是对 `local` 的引用，以指定本地执行。注意：管道中的作业不支持将 `local` 作为 `compute`		`local`
`resources.instance_count`	整型	用于作业的节点数。		`1`
`resources.instance_type`	字符串	用于作业的实例类型。如果省略，这将默认为 Kubernetes 群集的默认实例类型。有关更多信息，请参阅创建和选择 Kubernetes 实例类型。
`resources.shm_size`	字符串	Docker 容器的共享内存块的大小。它应采用以下格式：`<number><unit>`，其中数字必须大于 0，单位可以是 `b`（字节）、`k`（千字节）、`m`（兆字节）或 `g`（千兆字节）之一。		`2g`
`limits.timeout`	整型	允许作业运行的最长时间（秒）。达到此限制后，系统会取消该作业。
`inputs`	object	作业的输入字典。键是作业上下文中的输入名称，值是输入值。可以在 `command` 中使用 `${{ inputs.<input_name> }}` 表达式引用输入。
`inputs.<input_name>`	数字、整数、布尔值、字符串或对象	文字值（数字、整数、布尔值或字符串类型）或包含作业输入数据规范的对象之一。
`outputs`	object	作业的输出配置字典。键是作业上下文中的输出名称，值是输出配置。可以在 `command` 中使用 `${{ outputs.<output_name> }}` 表达式引用输出。
`outputs.<output_name>`	object	你可以将对象留空，在这种情况下，默认情况下输出将是 `uri_folder` 类型，Azure 机器学习将系统生成输出的输出位置。将通过读写挂载将文件写入输出目录。如果要为输出指定不同的模式，请提供一个包含作业输出规范的对象。
`identity`	对象	此标识用于数据访问。该标识可以是 UserIdentityConfiguration、ManagedIdentityConfiguration 或 None。如果是 UserIdentityConfiguration，则作业提交器的标识将用于访问输入数据并将结果写入输出文件夹，否则，将使用计算目标的托管标识。

分布配置

MpiConfiguration

密钥	类型	说明	允许的值
`type`	const	必需。分布类型。	`mpi`
`process_count_per_instance`	整型	必需。要为作业启动的每节点进程数。

PyTorchConfiguration

密钥	类型	说明	允许的值	默认值
`type`	const	必需。分布类型。	`pytorch`
`process_count_per_instance`	整型	要为作业启动的每节点进程数。		`1`

TensorFlowConfiguration

密钥	类型	说明	允许的值	默认值
`type`	const	必需。分布类型。	`tensorflow`
`worker_count`	整型	要为作业启动的工作线程数。		默认为 `resources.instance_count`。
`parameter_server_count`	整型	要为作业启动的参数服务器数。		`0`

作业输入

键	类型	说明	允许的值	默认值
`type`	字符串	作业输入的类型。为指向单个文件源的输入数据指定 `uri_file`，或为指向文件夹源的输入数据指定 `uri_folder`。	`uri_file`、`uri_folder`、`mlflow_model`、`custom_model`	`uri_folder`
`path`	字符串	用作输入的数据的路径。可通过几种方式来执行它： - 数据源文件或文件夹的本地路径，例如 `path: ./iris.csv`。数据将在作业提交期间上传。 - 要用作输入的文件或文件夹的云路径的 URI。支持的 URI 类型为 `azureml`、`https`、`wasbs`、`abfss`、`adl`。有关如何使用 `azureml://` URI 格式的详细信息，请参阅核心 YAML 语法。 - 要用作输入的现有已注册的 Azure 机器学习数据资产。若要引用已注册的数据资产，请使用 `azureml:<data_name>:<data_version>` 语法或 `azureml:<data_name>@latest`（引用该数据资产的最新版本），例如 `path: azureml:cifar10-data:1` 或 `path: azureml:cifar10-data@latest`。
`mode`	字符串	将数据传送到计算目标的模式。对于只读装载 (`ro_mount`)，数据将用作装载路径。文件夹将装载为文件夹，文件将装载为文件。 Azure 机器学习会将输入解析为装载路径。对于 `download` 模式，数据将下载到计算目标。 Azure 机器学习会将输入解析为下载的路径。如果你只想要数据工件的存储位置的 URL 而不是挂载或下载数据本身，则可以使用 `direct` 模式。这将传入存储位置的 URL 作为作业输入。请注意，在这种情况下，你全权负责处理凭据以访问存储。 `eval_mount` 和 `eval_download` 模式对于 MLTable 是唯一的，并且将数据装载为路径或将数据下载到计算目标。有关详细信息，请参阅访问作业中的数据	`ro_mount`、`download`、`direct`、`eval_download`、`eval_mount`	`ro_mount`

作业输出

键	类型	说明	允许的值	默认值
`type`	字符串	作业输出的类型。对于默认的 `uri_folder` 类型，输出将对应一个文件夹。	`uri_folder`、`mlflow_model`、`custom_model`	`uri_folder`
`mode`	字符串	输出文件传送到目标存储的模式。对于读写装载模式 (`rw_mount`)，输出目录是装载的目录。对于上传模式，写入的文件将在作业结束时上传。	`rw_mount`、`upload`	`rw_mount`

标识配置

UserIdentityConfiguration

密钥	类型	说明	允许的值
`type`	const	必需。标识类型。	`user_identity`

ManagedIdentityConfiguration

密钥	类型	说明	允许的值
`type`	const	必需。标识类型。	`managed` 或 `managed_identity`

备注

az ml job 命令可用于管理 Azure 机器学习作业。

示例

示例 GitHub 存储库中提供了示例。下面显示了几个示例。

YAML：hello world

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest

YAML：显示名称、试验名称、说明和标记

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
tags:
  hello: world
display_name: hello-world-example
experiment_name: hello-world-example
description: |
  # Azure Machine Learning "hello world" job

  This is a "hello world" job running in the cloud via Azure Machine Learning!

  ## Description

  Markdown is supported in the studio for job descriptions! You can edit the description there or via CLI.

YAML：环境变量

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo $hello_env_var
environment:
  image: library/python:latest
environment_variables:
  hello_env_var: "hello world"

YAML：源代码

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: ls
code: src
environment:
  image: library/python:latest

YAML：文字输入

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  echo ${{inputs.hello_string}}
  echo ${{inputs.hello_number}}
environment:
  image: library/python:latest
inputs:
  hello_string: "hello world"
  hello_number: 42

YAML：写入默认输出

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world" > ./outputs/helloworld.txt
environment:
  image: library/python:latest

YAML：写入命名数据输出

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world" > ${{outputs.hello_output}}/helloworld.txt
outputs:
  hello_output:
environment:
  image: python

YAML：数据存储 URI 文件输入

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  echo "--iris-csv: ${{inputs.iris_csv}}"
  python hello-iris.py --iris-csv ${{inputs.iris_csv}}
code: src
inputs:
  iris_csv:
    type: uri_file 
    path: azureml://datastores/workspaceblobstore/paths/example-data/iris.csv
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest

YAML：数据存储 URI 文件夹输入

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  ls ${{inputs.data_dir}}
  echo "--iris-csv: ${{inputs.data_dir}}/iris.csv"
  python hello-iris.py --iris-csv ${{inputs.data_dir}}/iris.csv
code: src
inputs:
  data_dir:
    type: uri_folder 
    path: azureml://datastores/workspaceblobstore/paths/example-data/
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest

YAML：URI 文件输入

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  echo "--iris-csv: ${{inputs.iris_csv}}"
  python hello-iris.py --iris-csv ${{inputs.iris_csv}}
code: src
inputs:
  iris_csv:
    type: uri_file 
    path: https://azuremlexamples.blob.core.chinacloudapi.cn/datasets/iris.csv
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest

YAML：URI 文件夹输入

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  ls ${{inputs.data_dir}}
  echo "--iris-csv: ${{inputs.data_dir}}/iris.csv"
  python hello-iris.py --iris-csv ${{inputs.data_dir}}/iris.csv
code: src
inputs:
  data_dir:
    type: uri_folder 
    path: wasbs://datasets@azuremlexamples.blob.core.chinacloudapi.cn/
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest

YAML：通过 papermill 的笔记本

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: |
  pip install ipykernel papermill
  papermill hello-notebook.ipynb outputs/out.ipynb -k python
code: src
environment:
  image: library/python:latest

YAML：基本 Python 模型训练

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
  python main.py 
  --iris-csv ${{inputs.iris_csv}}
  --C ${{inputs.C}}
  --kernel ${{inputs.kernel}}
  --coef0 ${{inputs.coef0}}
inputs:
  iris_csv: 
    type: uri_file
    path: wasbs://datasets@azuremlexamples.blob.core.chinacloudapi.cn/iris.csv
  C: 0.8
  kernel: "rbf"
  coef0: 0.1
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
compute: azureml:cpu-cluster
display_name: sklearn-iris-example
experiment_name: sklearn-iris-example
description: Train a scikit-learn SVM on the Iris dataset.

YAML：使用本地 Docker 生成上下文进行基本 R 模型训练

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: >
  Rscript train.R 
  --data_folder ${{inputs.iris}}
code: src
inputs:
  iris: 
    type: uri_file
    path: https://azuremlexamples.blob.core.chinacloudapi.cn/datasets/iris.csv
environment:
  build:
    path: docker-context
compute: azureml:cpu-cluster
display_name: r-iris-example
experiment_name: r-iris-example
description: Train an R model on the Iris dataset.

YAML：分布式 PyTorch

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
  python train.py 
  --epochs ${{inputs.epochs}}
  --learning-rate ${{inputs.learning_rate}}
  --data-dir ${{inputs.cifar}}
inputs:
  epochs: 1
  learning_rate: 0.2
  cifar: 
     type: uri_folder
     path: azureml:cifar-10-example@latest
environment: azureml:AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu@latest
compute: azureml:gpu-cluster
distribution:
  type: pytorch 
  process_count_per_instance: 1
resources:
  instance_count: 2
display_name: pytorch-cifar-distributed-example
experiment_name: pytorch-cifar-distributed-example
description: Train a basic convolutional neural network (CNN) with PyTorch on the CIFAR-10 dataset, distributed via PyTorch.

YAML：分布式 TensorFlow

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
  python train.py 
  --epochs ${{inputs.epochs}}
  --model-dir ${{inputs.model_dir}}
inputs:
  epochs: 1
  model_dir: outputs/keras-model
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu@latest
compute: azureml:gpu-cluster
resources:
  instance_count: 2
distribution:
  type: tensorflow
  worker_count: 2
display_name: tensorflow-mnist-distributed-example
experiment_name: tensorflow-mnist-distributed-example
description: Train a basic neural network with TensorFlow on the MNIST dataset, distributed via TensorFlow.

YAML：分布式 MPI

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
  python train.py
  --epochs ${{inputs.epochs}}
inputs:
  epochs: 1
environment: azureml:AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu@latest
compute: azureml:gpu-cluster
resources:
  instance_count: 2
distribution:
  type: mpi
  process_count_per_instance: 1
display_name: tensorflow-mnist-distributed-horovod-example
experiment_name: tensorflow-mnist-distributed-horovod-example
description: Train a basic neural network with TensorFlow on the MNIST dataset, distributed via Horovod.

后续步骤

安装并使用 CLI (v2)

CLI (v2) 命令作业 YAML 架构

YAML 语法

分布配置

MpiConfiguration

PyTorchConfiguration

TensorFlowConfiguration

作业输入

作业输出

标识配置

UserIdentityConfiguration

ManagedIdentityConfiguration

备注

示例

YAML：hello world

YAML：显示名称、试验名称、说明和标记

YAML：环境变量

YAML：源代码

YAML：文字输入

YAML：写入默认输出

YAML：写入命名数据输出

YAML：数据存储 URI 文件输入

YAML：数据存储 URI 文件夹输入

YAML：URI 文件输入

YAML：URI 文件夹输入

YAML：通过 papermill 的笔记本

YAML：基本 Python 模型训练

YAML：使用本地 Docker 生成上下文进行基本 R 模型训练

YAML：分布式 PyTorch

YAML：分布式 TensorFlow

YAML：分布式 MPI

后续步骤

其他资源