使用 Azure Batch CLI 模板和文件传输Use Azure Batch CLI templates and file transfer

使用 Azure CLI 的 Azure Batch 扩展可在不编写代码的情况下运行 Batch 作业。Using an Azure Batch extension to the Azure CLI, it is possible to run Batch jobs without writing code.

通过 Azure CLI 创建 JSON 模板文件,并使用模板文件创建 Batch 池、作业和任务。Create and use JSON template files with the Azure CLI to create Batch pools, jobs, and tasks. 使用 CLI 扩展命令轻松将作业输入文件上传到与 Batch 帐户关联的存储帐户,并下载作业输出文件。Use CLI extension commands to easily upload job input files to the storage account associated with the Batch account, and download job output files.

概述Overview

通过 Azure CLI 扩展,非开发者用户可使用端到端的 Batch。An extension to the Azure CLI enables Batch to be used end-to-end by users who are not developers. 只需使用 CLI 命令便可创建池、上传输入数据、创建作业和关联的任务,以及下载生成的输出数据。With only CLI commands, you can create a pool, upload input data, create jobs and associated tasks, and download the resulting output data. 不需要任何其他代码。No additional code is required. 可以直接运行 CLI 命令,也可以将其集成到脚本中。Run the CLI commands directly or integrate them into scripts.

Batch 模板基于 Azure CLI 中现有的 Batch 支持,在创建池、作业、任务和其他项时允许使用 JSON 文件来指定属性值。Batch templates build on the existing Batch support in the Azure CLI for JSON files to specify property values when creating pools, jobs, tasks, and other items. Batch 模板添加了以下功能:Batch templates add the following capabilities:

  • 可以定义参数。Parameters can be defined. 使用模板时,仅指定参数值以创建项,而在模板正文中指定其他项属性值。When the template is used, only the parameter values are specified to create the item, with other item property values specified in the template body. 了解 Batch 和 Batch 运行的应用程序的用户可以创建模板、指定池、作业和任务属性值。A user who understands Batch and the applications to be run by Batch can create templates, specifying pool, job, and task property values. 不太熟悉 Batch 和/或应用程序的用户只需指定定义的参数的值。A user less familiar with Batch and/or the applications only needs to specify the values for the defined parameters.

  • 使用作业任务工厂可创建与作业相关联的一个或多个任务,而无需创建多个任务定义,因此极大地简化了作业的提交。Job task factories create one or more tasks associated with a job, avoiding the need for many task definitions to be created and significantly simplifying job submission.

作业通常使用输入数据文件并产生输出数据文件。Jobs typically use input data files and produce output data files. 默认情况下,每个 Batch 帐户都与一个存储帐户相关联。A storage account is associated, by default, with each Batch account. 可以使用 CLI 将文件传输到此存储帐户以及从此存储帐户传输文件,不需要进行编码,也不需要任何存储凭据。Transfer files to and from this storage account using the CLI, with no coding and no storage credentials.

例如,ffmpeg 是处理音频和视频文件的常用应用程序。For example, ffmpeg is a popular application that processes audio and video files. 下面是使用 Azure Batch CLI 来调用 ffmpeg 以将源视频文件转码为不同分辨率的步骤。Here are steps with the Azure Batch CLI to invoke ffmpeg to transcode source video files to different resolutions.

  • 创建池模板。Create a pool template. 创建模板的用户了解如何调用 ffmpeg 应用程序及其需求;他们可指定适当的操作系统、VM 大小、安装 ffmpeg 的方法(例如,从应用程序包或使用包管理器进行安装),以及其他池属性值。The user creating the template knows how to call the ffmpeg application and its requirements; they specify the appropriate OS, VM size, how ffmpeg is installed (from an application package or using a package manager, for example), and other pool property values. 已创建参数,因此当使用模板时,仅需指定池 ID 和 VM 数。Parameters are created so when the template is used, only the pool ID and number of VMs need to be specified.

  • 创建作业模板。Create a job template. 创建模板的用户了解如何调用 ffmpeg,以便将源视频转码为不同的分辨率,并指定任务命令行;他们还了解存在包含源视频文件(每个输入文件均包含所需任务)的文件夹。The user creating the template knows how ffmpeg needs to be invoked to transcode source video to a different resolution and specifies the task command line; they also know that there is a folder containing the source video files, with a task required per input file.

  • 具有一组要转码的视频文件的最终用户首先需要使用池模板创建一个池,然后仅指定池 ID 和所需的 VM 数量。An end user with a set of video files to transcode first creates a pool using the pool template, specifying only the pool ID and number of VMs required. 然后,他们可以上传源文件以进行转码。They can then upload the source files to transcode. 可使用作业模板提交作业,仅指定池 ID 和上传的源文件的位置。A job can then be submitted using the job template, specifying only the pool ID and location of the source files uploaded. 创建批处理作业时,每个输入文件生成一项任务。The Batch job is created, with one task per input file being generated. 最后,可以下载已转码的输出文件。Finally, the transcoded output files can be downloaded.

安装Installation

若要安装 Azure Batch CLI 扩展,请首先安装 Azure CLI 2.0To install the Azure Batch CLI extension, first Install the Azure CLI 2.0.

使用以下 Azure CLI 命令安装最新版本的 Batch 扩展:Install the latest version of the Batch extension using the following Azure CLI command:

az extension add --name azure-batch-cli-extensions

有关 Batch CLI 扩展和其他安装选项的详细信息,请参阅 GitHub 存储库For more information about the Batch CLI extension and additional installation options, see the GitHub repo.

若要使用 CLI 扩展功能,需要 Azure Batch 帐户和链接的存储帐户(针对于在存储之间传输文件的命令)。To use the CLI extension features, you need an Azure Batch account and, for the commands that transfer files to and from storage, a linked storage account.

要使用 Azure CLI 登录 Batch 帐户,请参阅使用 Azure CLI 管理 Batch 资源To log into a Batch account with the Azure CLI, see Manage Batch resources with Azure CLI.

模板Templates

Azure 批处理模板在功能和语法上非常类似于 Azure 资源管理器模板。Azure Batch templates are similar to Azure Resource Manager templates, in functionality and syntax. 它们是包含项属性名称和值的 JSON 文件,但添加了以下主要概念:They are JSON files that contain item property names and values, but add the following main concepts:

  • ParametersParameters

    • 允许在正文部分中指定属性值,使用模板时,仅需提供参数值。Allow property values to be specified in a body section, with only parameter values needing to be supplied when the template is used. 例如,池的完整定义应放入正文且仅定义池 id 的一个参数;因此仅需提供一个池 ID 字符串来创建池。For example, the complete definition for a pool could be placed in the body and only one parameter defined for pool id; only a pool ID string therefore needs to be supplied to create a pool.

    • 模板正文可由了解 Batch 和 Batch 运行的应用程序的人进行创建;使用模板时,必须提供仅作者定义的参数值。The template body can be authored by someone with knowledge of Batch and the applications to be run by Batch; only values for the author-defined parameters must be supplied when the template is used. 因此,没有深入了解 Batch 和/或应用程序的用户可以使用模板。A user without the in-depth Batch and/or application knowledge can therefore use the templates.

  • 变量Variables

    • 允许在一个位置指定简单或复杂参数值,并在模板正文中的一个或多个位置使用它们。Allow simple or complex parameter values to be specified in one place and used in one or more places in the template body. 变量可以简化和减小模板大小,以及通过在单个位置更改属性使其更易于维护。Variables can simplify and reduce the size of the template, as well as make it more maintainable by having one location to change properties.
  • 更高级别的构造Higher-level constructs

    • Batch API 中尚不可用的模板提供了一些更高级别的构造。Some higher-level constructs are available in the template that are not yet available in the Batch APIs. 例如,可以在作业模板中定义任务工厂,以使用常见任务定义为作业创建多项任务。For example, a task factory can be defined in a job template that creates multiple tasks for the job, using a common task definition. 这些构造避免了进行编码以动态创建多个 JSON 文件(如每个任务创建一个文件),以及创建脚本文件以通过程序包管理器安装应用程序等需求。These constructs avoid the need to code to dynamically create multiple JSON files, such as one file per task, as well as create script files to install applications via a package manager.

    • 在某些时候,可能会将这些构造添加到 Batch 服务,且它们在 Batch API、UI 等中可用。At some point, these constructs may be added to the Batch service and available in the Batch APIs, UIs, etc.

池模板Pool templates

池模板支持参数和变量的标准模板功能。Pool templates support the standard template capabilities of parameters and variables. 它们还支持以下更高级别的构造:They also support the following higher-level construct:

  • 包引用Package references

    • 允许通过使用包管理器将软件复制到池节点(可选)。Optionally allows software to be copied to pool nodes by using package managers. 已指定包管理器和包 ID。The package manager and package ID are specified. 通过声明一个或多个包,你不再需要创建用来获取所需包的脚本、安装该脚本以及在每个池节点上运行该脚本。By declaring one or more packages, you avoid creating a script that gets the required packages, installing the script, and running the script on each pool node.

下面是用于创建安装有 ffmpeg 的 Linux VM 池的模板的示例。The following is an example of a template that creates a pool of Linux VMs with ffmpeg installed. 若要使用该模板,只需要提供池 ID 字符串和池中的 VM 数:To use it, supply only a pool ID string and the number of VMs in the pool:

{
    "parameters": {
        "nodeCount": {
            "type": "int",
            "metadata": {
                "description": "The number of pool nodes"
            }
        },
        "poolId": {
            "type": "string",
            "metadata": {
                "description": "The pool ID "
            }
        }
    },
    "pool": {
        "type": "Microsoft.Batch/batchAccounts/pools",
        "apiVersion": "2016-12-01",
        "properties": {
            "id": "[parameters('poolId')]",
            "virtualMachineConfiguration": {
                "imageReference": {
                    "publisher": "Canonical",
                    "offer": "UbuntuServer",
                    "sku": "16.04-LTS",
                    "version": "latest"
                },
                "nodeAgentSKUId": "batch.node.ubuntu 16.04"
            },
            "vmSize": "STANDARD_D3_V2",
            "targetDedicatedNodes": "[parameters('nodeCount')]",
            "enableAutoScale": false,
            "maxTasksPerNode": 1,
            "packageReferences": [
                {
                    "type": "aptPackage",
                    "id": "ffmpeg"
                }
            ]
        }
    }
}

如果模板文件名为 pool-ffmpeg.json,请如下所示调用该模板:If the template file was named pool-ffmpeg.json, then invoke the template as follows:

az batch pool create --template pool-ffmpeg.json

CLI 会提示你提供 poolIdnodeCount 参数的值。The CLI prompts you to provide values for the poolId and nodeCount parameters. 也可以提供 JSON 文件中的参数。You can also supply the parameters in a JSON file. 例如:For example:

{
  "poolId": {
    "value": "mypool"
  },
  "nodeCount": {
    "value": 2
  }
}

如果参数 JSON 文件名为 pool-parameters.json,请按以下所示调用该模板:If the parameters JSON file was named pool-parameters.json, then invoke the template as follows:

az batch pool create --template pool-ffmpeg.json --parameters pool-parameters.json

作业模板Job templates

作业模板支持参数和变量的标准模板功能。Job templates support the standard template capabilities of parameters and variables. 它们还支持以下更高级别的构造:They also support the following higher-level construct:

  • 任务工厂Task factory

    • 通过一个任务定义即可为作业创建多项任务。Creates multiple tasks for a job from one task definition. 支持三种类型的任务工厂 - 参数扫描、每个文件任务和任务集合。Three types of task factory are supported - parametric sweep, task per file, and task collection.

下面是一个模板示例,它创建使用 ffmpeg 将 MP4 视频文件转码为两个较低分辨率之一的作业:The following is an example of a template that creates a job to transcode MP4 video files with ffmpeg to one of two lower resolutions. 它针对每个源视频文件创建一个任务。It creates one task per source video file. 有关作业输入和输出文件组的详细信息,请参阅文件组和文件传输See File groups and file transfer for more about file groups for job input and output.

{
    "parameters": {
        "poolId": {
            "type": "string",
            "metadata": {
                "description": "The name of Azure Batch pool which runs the job"
            }
        },
        "jobId": {
            "type": "string",
            "metadata": {
                "description": "The name of Azure Batch job"
            }
        },
        "resolution": {
            "type": "string",
            "defaultValue": "428x240",
            "allowedValues": [
                "428x240",
                "854x480"
            ],
            "metadata": {
                "description": "Target video resolution"
            }
        }
    },
    "job": {
        "type": "Microsoft.Batch/batchAccounts/jobs",
        "apiVersion": "2016-12-01",
        "properties": {
            "id": "[parameters('jobId')]",
            "constraints": {
                "maxWallClockTime": "PT5H",
                "maxTaskRetryCount": 1
            },
            "poolInfo": {
                "poolId": "[parameters('poolId')]"
            },
            "taskFactory": {
                "type": "taskPerFile",
                "source": { 
                    "fileGroup": "ffmpeg-input"
                },
                "repeatTask": {
                    "commandLine": "ffmpeg -i {fileName} -y -s [parameters('resolution')] -strict -2 {fileNameWithoutExtension}_[parameters('resolution')].mp4",
                    "resourceFiles": [
                        {
                            "blobSource": "{url}",
                            "filePath": "{fileName}"
                        }
                    ],
                    "outputFiles": [
                        {
                            "filePattern": "{fileNameWithoutExtension}_[parameters('resolution')].mp4",
                            "destination": {
                                "autoStorage": {
                                    "path": "{fileNameWithoutExtension}_[parameters('resolution')].mp4",
                                    "fileGroup": "ffmpeg-output"
                                }
                            },
                            "uploadOptions": {
                                "uploadCondition": "TaskSuccess"
                            }
                        }
                    ]
                }
            },
            "onAllTasksComplete": "terminatejob"
        }
    }
}

如果模板文件名为 job-ffmpeg.json,请如下所示调用该模板:If the template file was named job-ffmpeg.json, then invoke the template as follows:

az batch job create --template job-ffmpeg.json

CLI 同样会提示你提供参数的值。As before, the CLI prompts you to provide values for the parameters. 也可以提供 JSON 文件中的参数。You can also supply the parameters in a JSON file.

使用 Batch Explorer 中的模板Use templates in Batch Explorer

可以将 Batch CLI 模板上传到 Batch Explorer 桌面应用程序(以前称为 BatchLabs),以创建 Batch 池或作业。You can upload a Batch CLI template to the Batch Explorer desktop application (formerly called BatchLabs) to create a Batch pool or job. 还可以在 Batch Explorer 库中选择预定义的池和作业模板。You can also select from predefined pool and job templates in the Batch Explorer Gallery.

上传模板:To upload a template:

  1. 在 Batch Explorer 中,选择“库” > “本地模板”。In Batch Explorer, select Gallery > Local templates.

  2. 选择或拖放本地池或作业模板。Select, or drag and drop, a local pool or job template.

  3. 选择“使用此模板”,并按照屏幕上的提示操作。Select Use this template, and follow the on-screen prompts.

文件组和文件传输File groups and file transfer

大部分作业和任务均需要输入文件并生成输出文件。Most jobs and tasks require input files and produce output files. 通常,输入文件和输出文件从客户端传输到节点,或者从节点传输到客户端。Usually, input files and output files are transferred, either from the client to the node, or from the node to the client. Azure Batch CLI 扩展提取文件传输,并利用可与每个 Batch 帐户相关联的存储帐户。The Azure Batch CLI extension abstracts away file transfer and utilizes the storage account that you can associate with each Batch account.

文件组等同于在 Azure 存储帐户中创建的容器。A file group equates to a container that is created in the Azure storage account. 文件组允许包含子文件夹。The file group may have subfolders.

Batch CLI 扩展提供了相关的命令,用以将客户端的文件上传到指定文件组,以及将指定文件组的文件下载到客户端。The Batch CLI extension provides commands to upload files from client to a specified file group and download files from the specified file group to a client.

az batch file upload --local-path c:\source_videos\*.mp4 
    --file-group ffmpeg-input

az batch file download --file-group ffmpeg-output --local-path
    c:\output_lowres_videos

通过池和作业模板,可将存储在文件组中的文件指定为复制到池节点或离开池节点返回到文件组。Pool and job templates allow files stored in file groups to be specified for copy onto pool nodes or off pool nodes back to a file group. 例如,在之前指定的作业模板中,为任务工厂指定文件组 ffmpeg-input ,作为复制到节点以供转码的源视频文件的位置。For example, in the job template specified previously, the file group ffmpeg-input is specified for the task factory as the location of the source video files copied down to the node for transcoding. 文件组 ffmpeg-output 是从运行每个任务的节点复制已转码输出文件的位置。The file group ffmpeg-output is the location where the transcoded output files are copied from the node running each task.

摘要Summary

目前仅对 Azure CLI 添加了模板和文件传输支持。Template and file transfer support have currently been added only to the Azure CLI. 其目的在于,将可以使用 Batch 的受众扩大到无需使用 Batch API 开发代码的用户,例如研究人员和 IT 用户。The goal is to expand the audience that can use Batch to users who do not need to develop code using the Batch APIs, such as researchers and IT users. 了解 Azure、批处理和批处理将要运行的应用程序的用户无需编码即可创建模板以创建池和作业。Without coding, users with knowledge of Azure, Batch, and the applications to be run by Batch can create templates for pool and job creation. 有了模板参数,对批处理和应用程序没有深入了解的用户也可使用这些模板。With template parameters, users without detailed knowledge of Batch and the applications can use the templates.

试用 Azure CLI 的 Batch 扩展,并通过本文的评论区或 Batch 社区存储库向我们提供任何反馈或建议。Try out the Batch extension for the Azure CLI and provide us with any feedback or suggestions, either in the comments for this article or via the Batch Community repo.

后续步骤Next steps

  • 有关安装和使用情况的详细文档、示例和源代码,请参阅 Azure GitHub 存储库Detailed installation and usage documentation, samples, and source code are available in the Azure GitHub repo.

  • 深入了解如何使用 Batch Explorer 来创建和管理 Batch 资源。Learn more about using Batch Explorer to create and manage Batch resources.