Azure Batch 中的作业和任务Jobs and tasks in Azure Batch

在 Azure Batch 中,“任务”表示计算单元。In Azure Batch, a task represents a unit of computation. “作业”是这些任务的集合。A job is a collection of these tasks. 下面详细介绍了作业和任务及其在 Azure Batch 工作流中的运用。More about jobs and tasks, and how they are used in an Azure Batch workflow, is described below.

作业Jobs

作业是任务的集合。A job is a collection of tasks. 作业控制其任务对池中计算节点执行计算的方式。It manages how computation is performed by its tasks on the compute nodes in a pool.

作业指定要在其上运行工作的A job specifies the pool in which the work is to be run. 可以为每个作业创建新池,或将池用于多个作业。You can create a new pool for each job, or use one pool for many jobs. 可以针对与作业计划关联的每个作业创建池,或者针对与作业计划关联的所有作业创建池。You can create a pool for each job that is associated with a job schedule, or for all jobs that are associated with a job schedule.

作业优先级Job priority

可以向创建的作业分配可选的作业优先级。You can assign an optional job priority to jobs that you create. Batch 服务使用作业的优先级值来确定帐户中的作业计划顺序(不要与 计划的作业相混淆)。The Batch service uses the priority value of the job to determine the order of job scheduling within an account (this is not to be confused with a scheduled job). 优先级值的范围为 -1000 到 1000,-1000 表示最低优先级,1000 表示最高优先级。The priority values range from -1000 to 1000, with -1000 being the lowest priority and 1000 being the highest. 若要更新作业的优先级,请调用更新作业的属性操作 (Batch REST) 或修改 CloudJob.Priority 属性 (Batch .NET)。To update the priority of a job, call the Update the properties of a job operation (Batch REST), or modify the CloudJob.Priority property (Batch .NET).

在同一个帐户内,高优先级作业的计划优先顺序高于低优先级作业。Within the same account, higher-priority jobs have scheduling precedence over lower-priority jobs. 一个帐户中具有较高优先级值的作业,其计划优先级并不高于不同帐户中较低优先级值的另一个作业。A job with a higher-priority value in one account does not have scheduling precedence over another job with a lower-priority value in a different account. 已经运行的低优先级作业中的任务不会预先清空。Tasks in lower-priority jobs that are already running are not preempted.

不同池的作业计划是独立的。Job scheduling across pools is independent. 在不同的池之间,即使作业的优先级较高,如果其关联的池缺少空闲的节点,则不保证此作业优先计划。Between different pools, it is not guaranteed that a higher-priority job is scheduled first if its associated pool is short of idle nodes. 在同一个池中,相同优先级的作业有相同的计划机会。In the same pool, jobs with the same priority level have an equal chance of being scheduled.

作业约束Job constraints

可以使用作业约束为作业指定特定的限制:You can use job constraints to specify certain limits for your jobs:

  • 可以设置 最大挂钟时间,以便在作业的运行时间超过指定的最大挂钟时间时,终止该作业及其所有关联的任务。You can set a maximum wallclock time, so that if a job runs for longer than the maximum wallclock time that is specified, the job and all of its tasks are terminated.
  • 可以将任务重试最大次数指定为约束,包括指定是要始终重试还是永不重试某个任务。You can specify the maximum number of task retries as a constraint, including whether a task is always retried or never retried. 重试任务意味着,如果任务失败,它将重新排队以再次运行。Retrying a task means that if the task fails, it will be requeued to run again.

作业管理器任务和自动终止Job manager tasks and automatic termination

客户端应用程序可将任务添加到作业,用户也可以指定 作业管理器任务Your client application can add tasks to a job, or you can specify a job manager task. 作业管理器任务包含必要的信息用于为池中某个计算节点上运行的包含作业管理器任务的作业创建所需的任务。A job manager task contains the information that is necessary to create the required tasks for a job, with the job manager task being run on one of the compute nodes in the pool. 作业管理器任务专门由 Batch 来处理;创建作业和重新启动失败的作业后,会立即将任务排队。The job manager task is handled specifically by Batch; it is queued as soon as the job is created and is restarted if it fails. 作业计划创建的作业需要作业管理器任务,因为它是在实例化作业之前定义任务的唯一方式。A job manager task is required for jobs that are created by a job schedule, because it is the only way to define the tasks before the job is instantiated.

默认情况下,当作业内的所有任务都完成时,作业仍保持活动状态。By default, jobs remain in the active state when all tasks within the job are complete. 可以更改此行为,使作业在其中的所有任务完成时自动终止。You can change this behavior so that the job is automatically terminated when all tasks in the job are complete. 将作业的 onAllTasksComplete 属性(在 Batch .NET 中为 OnAllTasksComplete)设置为 terminatejob,可在作业的所有任务处于已完成状态时自动终止该作业。Set the job's onAllTasksComplete property (OnAllTasksComplete in Batch .NET) to terminatejob to automatically terminate the job when all of its tasks are in the completed state.

Batch 服务将没有任务的作业视为其所有任务都已完成。The Batch service considers a job with no tasks to have all of its tasks completed. 因此,此选项往往与 作业管理器任务配合使用。Therefore, this option is most commonly used with a job manager task. 如果想要使用自动作业终止而不通过作业管理器终止,首先应该将新作业的 onAllTasksComplete 属性设置为 noaction,然后只有在完成将任务添加到作业之后才将它设置为 terminatejobIf you want to use automatic job termination without a job manager, you should initially set a new job's onAllTasksComplete property to noaction, then set it to terminatejob only after you've finished adding tasks to the job.

计划的作业Scheduled jobs

使用作业计划可在 Batch 服务中创建周期性作业。Job schedules enable you to create recurring jobs within the Batch service. 作业计划指定何时要运行作业,并包含要运行的作业的规范。A job schedule specifies when to run jobs and includes the specifications for the jobs to be run. 可以指定计划的持续时间(计划的持续时间和生效时间),以及在计划的时间段创建作业的频率。You can specify the duration of the schedule (how long and when the schedule is in effect) and how frequently jobs are created during the scheduled period.

任务Tasks

任务是与作业关联的计算单位。A task is a unit of computation that is associated with a job. 它在节点上运行。It runs on a node. 任务将分配到节点以执行,或排入队列直到节点空闲。Tasks are assigned to a node for execution, or are queued until a node becomes free. 简而言之,任务将在计算节点上运行一个或多个程序或脚本,以执行你需要完成的工作。Put simply, a task runs one or more programs or scripts on a compute node to perform the work you need done.

创建任务时,可以指定:When you create a task, you can specify:

  • 任务的命令行The command line for the task. 这是可在计算节点上运行应用程序或脚本的命令行。This is the command line that runs your application or script on the compute node.

    请务必注意,命令行不是在 shell 下运行的。It is important to note that the command line does not run under a shell. 因此无法以本机方式利用 shell 功能,例如环境变量扩展(包括 PATH)。Therefore, it cannot natively take advantage of shell features like environment variable expansion (this includes the PATH). 若要利用此类功能,必须在命令行中调用 shell,例如,在 Windows 节点上启动 cmd.exe,或者在 Linux 上启动 /bin/shTo take advantage of such features, you must invoke the shell in the command line, such as by launching cmd.exe on Windows nodes or /bin/sh on Linux:

    cmd /c MyTaskApplication.exe %MY_ENV_VAR%

    /bin/sh -c MyTaskApplication $MY_ENV_VAR

    如果任务需要运行不在节点的 PATH 中的应用程序或脚本,或在引用环境变量,请在任务命令行中显式调用 shell。If your tasks need to run an application or script that is not in the node's PATH or reference environment variables, invoke the shell explicitly in the task command line.

  • 资源文件Resource files that contain the data to be processed. 在执行任务的命令行之前,这些文件将自动从 Azure 存储帐户中的 Blob 存储复制到节点。These files are automatically copied to the node from Blob storage in an Azure Storage account before the task's command line is executed. 有关详细信息,请参阅启动任务文件和目录For more information, see Start task and Files and directories.

  • 应用程序所需的 环境变量The environment variables that are required by your application. 有关详细信息,请参阅任务的环境设置For more information, see Environment settings for tasks.

  • 执行任务所依据的 约束The constraints under which the task should execute. 例如,约束包括允许运行任务的最长时间、重试失败任务的次数上限,以及文件保留在任务工作目录中的最长时间。For example, constraints include the maximum time that the task is allowed to run, the maximum number of times a failed task should be retried, and the maximum time that files in the task's working directory are retained.

  • Application packagesApplication packages to deploy to the compute node on which the task is scheduled to run. 应用程序包 提供任务运行的应用程序的简化部署和版本控制。Application packages provide simplified deployment and versioning of the applications that your tasks run. 在共享池的环境中,任务级应用程序包特别有用:不同的作业在一个池上运行,完成某个作业时不删除该池。Task-level application packages are especially useful in shared-pool environments, where different jobs are run on one pool, and the pool is not deleted when a job is completed. 如果作业中的任务少于池中的节点,任务应用程序包可以减少数据传输,因为应用程序只部署到运行任务的节点。If your job has fewer tasks than nodes in the pool, task application packages can minimize data transfer since your application is deployed only to the nodes that run tasks.

  • Docker 中心的容器映像引用,或者专用注册表和其他设置,用于创建 Docker 容器,其中的任务运行在节点上。A container image reference in Docker Hub or a private registry and additional settings to create a Docker container in which the task runs on the node. 如果池使用容器配置进行设置,则仅指定此信息。You only specify this information if the pool is set up with a container configuration.

备注

最长任务生存期(从添加到作业时算起到任务完成时结束)为 180 天。The maximum lifetime of a task, from when it is added to the job to when it completes, is 180 days. 已完成的任务保存 7 天;最长生存期内未完成的任务的数据不可访问。Completed tasks persist for 7 days; data for tasks not completed within the maximum lifetime is not accessible.

除了可以定义在节点上运行计算的任务以外,Batch 服务还提供几种特殊任务:In addition to tasks you define to perform computation on a node, several special tasks are also provided by the Batch service:

启动任务Start task

通过将启动任务与池相关联,可以准备池节点的操作环境。By associating a start task with a pool, you can prepare the operating environment of its nodes. 可以执行各种操作,例如,安装任务所要运行的应用程序或启动后台进程。For example, you can perform actions such as installing the applications that your tasks run, or starting background processes. 启动任务在节点每次启动时运行,且只要保留在池中就会持续运行。The start task runs every time a node starts, for as long as it remains in the pool. 这包括首次将节点添加到池时,以及重新启动节点或重置节点映像时。This includes when the node is first added to the pool and when it is restarted or reimaged.

启动任务的主要优点是可以包含全部所需的信息,使你能够配置计算节点,以及安装执行任务所需的应用程序。A primary benefit of the start task is that it can contain all the information necessary to configure a compute node and install the applications required for task execution. 因此,增加池中的节点数量与指定新的目标节点计数一样简单。Therefore, increasing the number of nodes in a pool is as simple as specifying the new target node count. 启动任务为 Batch 服务提供所需的信息以配置新节点并使其准备好接受任务。The start task provides the information needed for the Batch service to configure the new nodes and get them ready for accepting tasks.

与任何 Azure Batch 任务一样,除了指定要执行的命令行以外,还可以指定 Azure 存储中的资源文件列表。As with any Azure Batch task, you can specify a list of resource files in Azure Storage, in addition to a command line to be executed. Batch 服务先将资源文件从 Azure 存储复制到节点,然后运行命令行。The Batch service first copies the resource files to the node from Azure Storage, and then runs the command line. 对于池启动任务,文件列表通常包含任务应用程序及其依赖项。For a pool start task, the file list typically contains the task application and its dependencies.

但是,启动任务还可能包含计算节点上运行的所有任务使用的引用数据。However, the start task could also include reference data to be used by all tasks that are running on the compute node. 例如,启动任务的命令行可执行 robocopy 操作,将应用程序文件(已指定为资源文件并下载到节点)从启动任务的工作目录复制到共享文件夹,然后运行 MSI 或 setup.exeFor example, a start task's command line could perform a robocopy operation to copy application files (which were specified as resource files and downloaded to the node) from the start task's working directory to the shared folder, and then run an MSI or setup.exe.

通常,Batch 服务需要等待启动任务完成,然后认为节点已准备好分配任务,但你可以配置这种行为。It is typically desirable for the Batch service to wait for the start task to complete before considering the node ready to be assigned tasks, but you can configure this.

如果某个计算节点上的启动任务失败,则节点的状态将会更新以反映失败状态,同时,不会为该节点分配任何任务。If a start task fails on a compute node, then the state of the node is updated to reflect the failure, and the node is not assigned any tasks. 如果从存储中复制启动任务的资源文件时出现问题,或由其命令行执行的进程返回了非零退出代码,则启动任务可能会失败。A start task can fail if there is an issue copying its resource files from storage, or if the process executed by its command line returns a nonzero exit code.

如果添加或更新现有池的启动任务,必须重启其计算节点,启动任务才应用到节点。If you add or update the start task for an existing pool, you must reboot its compute nodes for the start task to be applied to the nodes.

备注

Batch 限制启动任务的总大小,其中包括资源文件和环境变量。Batch limits the total size of a start task, which includes resource files and environment variables. 如需缩小启动任务,可使用下述两种方法中的一种:If you need to reduce the size of a start task, you can use one of two approaches:

  1. 可以使用应用程序包,将应用程序或数据分发到 Batch 池中的每个节点。You can use application packages to distribute applications or data across each node in your Batch pool. 有关应用程序包的详细信息,请参阅使用 Batch 应用程序包将应用程序部署到计算节点For more information about application packages, see Deploy applications to compute nodes with Batch application packages.

  2. 可以手动创建压缩的存档,其中包含应用程序文件。You can manually create a zipped archive containing your applications files. 将压缩的存档作为 Blob 上传到 Azure 存储。Upload your zipped archive to Azure Storage as a blob. 将压缩的存档指定为启动任务的资源文件。Specify the zipped archive as a resource file for your start task. 为启动任务运行命令行之前,请在命令行中将存档解压缩。Before you run the command line for your start task, unzip the archive from the command line.

    若要解压缩存档,可以使用所选归档工具。To unzip the archive, you can use the archiving tool of your choice. 需包括相关工具,以便为启动任务解压缩资源文件形式的存档。You will need to include the tool that you use to unzip the archive as a resource file for the start task.

作业管理器任务Job manager task

通常使用作业管理器任务来控制和/或监视作业的执行。You typically use a job manager task to control and/or monitor job execution. 例如,作业管理器任务经常用于创建和提交作业的任务、确定其他要运行的任务,以及确定任务何时完成。For example, job manager tasks are often used to create and submit the tasks for a job, determine additional tasks to run, and determine when work is complete.

但是,作业管理器任务并不限定于这些活动。However, a job manager task is not restricted to these activities. 它是功能齐备的任务,可执行作业所需的任何操作。It is a full-fledged task that can perform any actions that are required for the job. 例如,作业管理器任务可以下载指定为参数的文件、分析该文件的内容,并根据这些内容提交其他任务。For example, a job manager task might download a file that is specified as a parameter, analyze the contents of that file, and submit additional tasks based on those contents.

作业管理员任务在所有其他任务之前启动。A job manager task is started before all other tasks. 它提供以下功能:It provides the following features:

  • 创建作业时由 Batch 服务自动提交为任务。It is automatically submitted as a task by the Batch service when the job is created.
  • 安排在作业中的其他任务之前执行。It is scheduled to execute before the other tasks in a job.
  • 缩小池时,关联的节点最后才从池中删除。Its associated node is the last to be removed from a pool when the pool is being downsized.
  • 此终止可能完全取决于作业中的所有任务终止。Its termination can be tied to the termination of all tasks in the job.
  • 需要重新启动时,作业管理器任务有最高的优先级。A job manager task is given the highest priority when it needs to be restarted. 如果找不到空闲的节点,Batch 服务可以终止池中正在运行的其他某个任务,以便腾出空间供作业管理器任务运行。If an idle node is not available, the Batch service might terminate one of the other running tasks in the pool to make room for the job manager task to run.
  • 一个作业中的作业管理器任务的优先级不高于其他作业的任务。A job manager task in one job does not have priority over the tasks of other jobs. 不同作业之间只遵循作业级别的优先级。Across jobs, only job-level priorities are observed.

作业准备和释放任务Job preparation and release tasks

Batch 提供作业准备任务用于前期作业执行设置,还提供作业释放任务用于后期作业维护或清除。Batch provides job preparation tasks for pre-job execution setup, and job release tasks for post-job maintenance or cleanup.

在任何其他作业任务执行之前,作业准备任务在计划要运行任务的所有计算节点上运行。A job preparation task runs on all compute nodes that are scheduled to run tasks, before any of the other job tasks are executed. 例如,可以使用作业准备任务复制所有任务共享的、但对作业唯一的数据。For example, you can use a job preparation task to copy data that is shared by all tasks, but is unique to the job.

作业完成后,作业释放任务将在池中至少运行了一个任务的每个节点上运行。When a job has completed, a job release task runs on each node in the pool that executed at least one task. 例如,作业释放任务可以删除作业准备任务所复制的数据,也可以压缩并上传诊断日志数据。For example, a job release task can delete data that was copied by the job preparation task, or it can compress and upload diagnostic log data.

作业准备和释放任务允许指定调用任务时要运行的命令行。Both job preparation and release tasks allow you to specify a command line to run when the task is invoked. 这些任务提供许多功能,例如文件下载、以提升权限方式执行、自定义环境变量、最大执行持续时间、重试计数和文件保留时间。They offer features like file download, elevated execution, custom environment variables, maximum execution duration, retry count, and file retention time.

有关作业准备和释放任务的详细信息,请参阅 在 Azure Batch 计算节点上运行作业准备和完成任务For more information on job preparation and release tasks, see Run job preparation and completion tasks on Azure Batch compute nodes.

多实例任务Multi-instance task

多实例任务 是经过配置后可以在多个计算节点上同时运行的任务。A multi-instance task is a task that is configured to run on more than one compute node simultaneously. 通过多实例任务,可以启用高性能计算方案,此类方案需要将一组计算节点分配到一起来处理单个工作负荷,例如消息传递接口 (MPI)。With multi-instance tasks, you can enable high-performance computing scenarios that require a group of compute nodes that are allocated together to process a single workload, such as Message Passing Interface (MPI).

有关在 Batch 中使用 Batch .NET 库运行 MPI 作业的详细介绍,请参阅 Use multi-instance tasks to run Message Passing Interface (MPI) applications in Azure Batch(在 Azure Batch 中使用多实例任务来执行消息传递接口 (MPI) 应用程序)。For a detailed discussion on running MPI jobs in Batch by using the Batch .NET library, check out Use multi-instance tasks to run Message Passing Interface (MPI) applications in Azure Batch.

任务依赖项Task dependencies

顾名思义,使用任务依赖项可以在执行某个任务之前,指定该任务与其他任务的依赖性。Task dependencies, as the name implies, allow you to specify that a task depends on the completion of other tasks before its execution. 此功能提供以下情况的支持:“下游”任务取用“上游”任务的输出,或当上游任务执行下游任务所需的某种初始化时。This feature provides support for situations in which a "downstream" task consumes the output of an "upstream" task, or when an upstream task performs some initialization that is required by a downstream task.

若要使用此功能,必须先在 Batch 作业上启用任务依赖性To use this feature, you must first enable task dependencies on your Batch job. 然后,针对每个依赖于另一个任务(或其他许多任务)的任务,指定该任务依赖的任务。Then, for each task that depends on another (or many others), you specify the tasks which that task depends on.

使用任务依赖性,可以配置如下所述的方案:With task dependencies, you can configure scenarios like the following:

  • taskB 依赖于 taskA(直到 taskA 完成,才开始执行 taskB)。taskB depends on taskA (taskB will not begin execution until taskA has completed).
  • taskC 同时依赖于 taskAtaskBtaskC depends on both taskA and taskB.
  • taskD 在执行前依赖于某个范围的任务,例如任务 110taskD depends on a range of tasks, such as tasks 1 through 10, before it executes.

有关更多详细信息,请参阅 Azure Batch 中的任务依赖关系azure-batch-samples GitHub 存储库中的 TaskDependencies 代码示例。For more details, see Task dependencies in Azure Batch and the TaskDependencies code sample in the azure-batch-samples GitHub repository.

任务的环境设置Environment settings for tasks

批处理服务执行的每个任务都可以访问在计算节点上设置的环境变量。Each task executed by the Batch service has access to environment variables that it sets on compute nodes. 这包括 Batch 服务(服务定义型)定义的环境变量,以及用户可以针对其任务定义的自定义环境变量。This includes environment variables defined by the Batch service (service-defined and custom environment variables that you can define for your tasks. 任务执行的应用程序和脚本可以在执行期间访问这些环境变量。The applications and scripts your tasks execute have access to these environment variables during execution.

可以通过填充这些实体的 环境设置 属性,在任务或作业级别设置自定义环境变量。You can set custom environment variables at the task or job level by populating the environment settings property for these entities. 有关更多详细信息,请参阅将任务添加到作业] 操作 (Batch REST API),或 Batch .NET 中的 CloudTask.EnvironmentSettingsCloudJob.CommonEnvironmentSettings 属性。For more details, see the Add a task to a job] operation (Batch REST API), or the CloudTask.EnvironmentSettings and CloudJob.CommonEnvironmentSettings properties in Batch .NET.

客户端应用程序或服务可使用获取有关任务的信息操作 (Batch REST) 或通过访问 CloudTask.EnvironmentSettings 属性 (Batch .NET),来获取任务的环境变量(服务定义型和自定义环境变量)。Your client application or service can obtain a task's environment variables, both service-defined and custom, by using the Get information about a task operation (Batch REST) or by accessing the CloudTask.EnvironmentSettings property (Batch .NET). 在计算节点上执行的进程可以在节点上访问这些和其他环境变量,例如,通过使用熟悉的 %VARIABLE_NAME% (Windows) 或 $VARIABLE_NAME (Linux) 语法。Processes executing on a compute node can access these and other environment variables on the node, for example, by using the familiar %VARIABLE_NAME% (Windows) or $VARIABLE_NAME (Linux) syntax.

可以在计算节点环境变量中找到包含所有服务定义型环境变量的完整列表。You can find a full list of all service-defined environment variables in Compute node environment variables.

后续步骤Next steps