创建任务依赖关系,以运行依赖于其他任务的任务Create task dependencies to run tasks that depend on other tasks

可以定义任务依赖关系,以便仅在完成父任务后,才运行一个或一组任务。You can define task dependencies to run a task or set of tasks only after a parent task has completed. 任务依赖关系可发挥作用的部分方案包括:Some scenarios where task dependencies are useful include:

  • 云中的 MapReduce 样式工作负荷。MapReduce-style workloads in the cloud.
  • 数据处理任务可以表示为有向无环图 (DAG) 的作业。Jobs whose data processing tasks can be expressed as a directed acyclic graph (DAG).
  • 渲染前和渲染后过程,其中只有在完成每个任务后,其后续任务才能开始。Pre-rendering and post-rendering processes, where each task must complete before the next task can begin.
  • 下游任务依赖于上游任务输出的任何其他作业。Any other job in which downstream tasks depend on the output of upstream tasks.

使用批处理任务依赖关系,可以创建在完成一个或多个父任务后在计算节点上按计划执行的任务。With Batch task dependencies, you can create tasks that are scheduled for execution on compute nodes after the completion of one or more parent tasks. 例如,可以创建一个作业,使用单独的并行任务渲染 3D 影片的每个帧。For example, you can create a job that renders each frame of a 3D movie with separate, parallel tasks. 最后一个任务为“合并任务”,仅在所有帧已成功渲染后,才将渲染的帧合并为完整影片。The final task--the "merge task"--merges the rendered frames into the complete movie only after all frames have been successfully rendered.

默认情况下,依赖任务计划为仅在成功完成父任务后执行。By default, dependent tasks are scheduled for execution only after the parent task has completed successfully. 可以指定一个依赖关系操作来重写默认行为,并在父任务失败时运行任务。You can specify a dependency action to override the default behavior and run tasks when the parent task fails. 有关详细信息,请参阅依赖关系操作部分。See the Dependency actions section for details.

用户可以创建依赖于一对一或一对多关系中其他任务的任务。You can create tasks that depend on other tasks in a one-to-one or one-to-many relationship. 甚至可以创建一个范围依赖关系,使其中一项任务依赖于特定任务 ID 范围内一组任务的完成。You can also create a range dependency where a task depends on the completion of a group of tasks within a specified range of task IDs. 可以组合这三种基本方案,以创建多对多关系。You can combine these three basic scenarios to create many-to-many relationships.

Batch .NET 的任务依赖关系Task dependencies with Batch .NET

本文讨论如何使用 Batch .NET 库配置任务依赖关系。In this article, we discuss how to configure task dependencies by using the Batch .NET library. 本文首先说明如何为作业启用任务依赖关系,然后演示如何为任务配置依赖关系We first show you how to enable task dependency on your jobs, and then demonstrate how to configure a task with dependencies. 本文还会介绍如何指定一个依赖关系操作,以便在父任务失败时运行依赖任务。We also describe how to specify a dependency action to run dependent tasks if the parent fails. 最后介绍 Batch 支持的依赖关系方案Finally, we discuss the dependency scenarios that Batch supports.

启用任务依赖关系Enable task dependencies

要在批处理应用程序中使用任务依赖关系,必须先将作业配置为使用任务依赖关系。To use task dependencies in your Batch application, you must first configure the job to use task dependencies. 在 Batch .NET 中,为 CloudJob 启用任务依赖关系的方法是将其 UsesTaskDependencies 属性设置为 trueIn Batch .NET, enable it on your CloudJob by setting its UsesTaskDependencies property to true:

CloudJob unboundJob = batchClient.JobOperations.CreateJob( "job001",
    new PoolInformation { PoolId = "pool001" });

// IMPORTANT: This is REQUIRED for using task dependencies.
unboundJob.UsesTaskDependencies = true;

在以上代码片段中,“batchClient”是 BatchClient 类的一个实例。In the preceding code snippet, "batchClient" is an instance of the BatchClient class.

创建依赖任务Create dependent tasks

若要创建一个依赖于一个或多个父任务的完成的任务,可以指定该任务必须“依赖于”其他任务。To create a task that depends on the completion of one or more parent tasks, you can specify that the task "depends on" the other tasks. 在 Batch .NET 中,为 CloudTask.DependsOn 属性配置 TaskDependencies 类的一个实例:In Batch .NET, configure the CloudTask.DependsOn property with an instance of the TaskDependencies class:

// Task 'Flowers' depends on completion of both 'Rain' and 'Sun'
// before it is run.
new CloudTask("Flowers", "cmd.exe /c echo Flowers")
{
    DependsOn = TaskDependencies.OnIds("Rain", "Sun")
},

此代码片段创建任务 ID 为“Flowers”的依赖任务。This code snippet creates a dependent task with task ID "Flowers". “Flowers”任务依赖于“Rain”和“Sun”任务。The "Flowers" task depends on tasks "Rain" and "Sun". “Flowers”任务将计划为仅在“Rain”和“Sun”任务已成功完成后才在计算节点上运行。Task "Flowers" will be scheduled to run on a compute node only after tasks "Rain" and "Sun" have completed successfully.

Note

默认情况下,当任务处于“已完成”状态并且其“退出代码”为 0 时,该任务视为已成功完成。By default, a task is considered to be completed successfully when it is in the completed state and its exit code is 0. 在 Batch .NET 中,这意味着 CloudTask.State 属性值为 Completed,CloudTask 的 TaskExecutionInformation.ExitCode 属性值为 0In Batch .NET, this means a CloudTask.State property value of Completed and the CloudTask's TaskExecutionInformation.ExitCode property value is 0. 有关如何更改此设置,请参阅依赖项操作部分。For how to change this, see the Dependency actions section.

依赖关系方案Dependency scenarios

可以在 Azure Batch 中使用三种基本任务依赖关系方案:一对一、一对多和任务 ID 范围依赖关系。There are three basic task dependency scenarios that you can use in Azure Batch: one-to-one, one-to-many, and task ID range dependency. 可以组合这些方案以提供第四种方案:多对多。These can be combined to provide a fourth scenario, many-to-many.

方案       Scenario        示例Example
一对一One-to-one taskB 取决于 taskAtaskB depends on taskA

taskA 成功完成后,taskB 才会按计划执行taskB will not be scheduled for execution until taskA has completed successfully

关系图:一对一任务依赖关系Diagram: one-to-one task dependency
一对多One-to-many taskC 同时取决于 taskAtaskBtaskC depends on both taskA and taskB

taskAtaskB 成功完成后,taskC 才会按计划执行taskC will not be scheduled for execution until both taskA and taskB have completed successfully

关系图:一对多任务依赖关系Diagram: one-to-many task dependency
任务 ID 范围Task ID range taskD 取决于一系列任务taskD depends on a range of tasks

ID 为 110 的任务成功完成后,taskD 才会按计划执行taskD will not be scheduled for execution until the tasks with IDs 1 through 10 have completed successfully

示意图:任务 ID 范围依赖项Diagram: Task id range dependency

Tip

可以创建多对多关系,例如,在此关系中任务 C、D、E 和 F 都依赖于任务 A 和 B。这很有用,例如,在下游任务依赖于多个上游任务的输出的并行化预处理方案中,即可以这样操作。You can create many-to-many relationships, such as where tasks C, D, E, and F each depend on tasks A and B. This is useful, for example, in parallelized preprocessing scenarios where your downstream tasks depend on the output of multiple upstream tasks.

在本部分的示例中,仅在父任务成功完成时才运行依赖任务。In the examples in this section, a dependent task runs only after the parent tasks complete successfully. 此行为是依赖任务的默认行为。This behavior is the default behavior for a dependent task. 可以通过指定一个依赖关系操作来重写默认行为,在父任务失败后运行依赖任务。You can run a dependent task after a parent task fails by specifying a dependency action to override the default behavior. 有关详细信息,请参阅依赖关系操作部分。See the Dependency actions section for details.

一对一One-to-one

在一对一关系中,任务依赖于一个父任务的成功完成。In a one-to-one relationship, a task depends on the successful completion of one parent task. 若要创建该依赖关系,请在填充 CloudTaskDependsOn 属性时,为 TaskDependencies.OnId 静态方法提供单个任务 ID。To create the dependency, provide a single task ID to the TaskDependencies.OnId static method when you populate the DependsOn property of CloudTask.

// Task 'taskA' doesn't depend on any other tasks
new CloudTask("taskA", "cmd.exe /c echo taskA"),

// Task 'taskB' depends on completion of task 'taskA'
new CloudTask("taskB", "cmd.exe /c echo taskB")
{
    DependsOn = TaskDependencies.OnId("taskA")
},

一对多One-to-many

在一对多关系中,任务依赖于多个父任务的完成。In a one-to-many relationship, a task depends on the completion of multiple parent tasks. 若要创建该依赖关系,请在填充 CloudTaskDependsOn 属性时,为 TaskDependencies.OnIds 静态方法提供任务 ID 的集合。To create the dependency, provide a collection of task IDs to the TaskDependencies.OnIds static method when you populate the DependsOn property of CloudTask.

// 'Rain' and 'Sun' don't depend on any other tasks
new CloudTask("Rain", "cmd.exe /c echo Rain"),
new CloudTask("Sun", "cmd.exe /c echo Sun"),

// Task 'Flowers' depends on completion of both 'Rain' and 'Sun'
// before it is run.
new CloudTask("Flowers", "cmd.exe /c echo Flowers")
{
    DependsOn = TaskDependencies.OnIds("Rain", "Sun")
},

任务 ID 范围Task ID range

在依赖于一系列父任务的关系中,任务依赖于其 ID 位于某个范围内的任务的完成。In a dependency on a range of parent tasks, a task depends on the completion of tasks whose IDs lie within a range. 若要创建该依赖关系,请在填充 CloudTaskDependsOn 属性时,为 TaskDependencies.OnIdRange 静态方法提供该范围内的第一个和最后一个任务 ID。To create the dependency, provide the first and last task IDs in the range to the TaskDependencies.OnIdRange static method when you populate the DependsOn property of CloudTask.

Important

将任务 ID 范围用于依赖项时,只有 ID 表示整数值的任务将由范围选定。When you use task ID ranges for your dependencies, only tasks with IDs representing integer values will be selected by the range. 因此范围 1..10 将选择任务 37,而不是 5flamingoesSo the range 1..10 will select tasks 3 and 7, but not 5flamingoes.

在评估范围依赖项时,前导零不重要,因此,带字符串标识符 404004 的任务都将处于范围内,它们将全部视为任务 4,因此,要完成的第一个任务将满足依赖项。Leading zeroes are not significant when evaluating range dependencies, so tasks with string identifiers 4, 04 and 004 will all be within the range and they will all be treated as task 4, so the first one to complete will satisfy the dependency.

范围内的每个任务必须通过成功完成或者已完成但出现了映射到设置为 Satisfy 的某个依赖关系操作的失败,来满足该依赖关系。Every task in the range must satisfy the dependency, either by completing successfully or by completing with a failure that’s mapped to a dependency action set to Satisfy. 有关详细信息,请参阅依赖关系操作部分。See the Dependency actions section for details.

// Tasks 1, 2, and 3 don't depend on any other tasks. Because
// we will be using them for a task range dependency, we must
// specify string representations of integers as their ids.
new CloudTask("1", "cmd.exe /c echo 1"),
new CloudTask("2", "cmd.exe /c echo 2"),
new CloudTask("3", "cmd.exe /c echo 3"),

// Task 4 depends on a range of tasks, 1 through 3
new CloudTask("4", "cmd.exe /c echo 4")
{
    // To use a range of tasks, their ids must be integer values.
    // Note that we pass integers as parameters to TaskIdRange,
    // but their ids (above) are string representations of the ids.
    DependsOn = TaskDependencies.OnIdRange(1, 3)
},

依赖关系操作Dependency actions

默认情况下,只有在父任务成功完成后,才能运行某个依赖任务或任务集。By default, a dependent task or set of tasks runs only after a parent task has completed successfully. 在某些情况下,你可能希望即使父任务失败,也能运行依赖任务。In some scenarios, you may want to run dependent tasks even if the parent task fails. 可以通过指定依赖关系操作来重写默认行为。You can override the default behavior by specifying a dependency action. 依赖关系操作根据父任务的成功或失败状态指定某个依赖任务是否符合运行的条件。A dependency action specifies whether a dependent task is eligible to run, based on the success or failure of the parent task.

例如,假设某个依赖任务正在等待完成上游任务后提供的数据。For example, suppose that a dependent task is awaiting data from the completion of the upstream task. 如果上游任务失败,依赖任务仍可使用旧数据运行。If the upstream task fails, the dependent task may still be able to run using older data. 在这种情况下,依赖关系操作可以指定即使父任务失败,依赖任务也符合运行的条件。In this case, a dependency action can specify that the dependent task is eligible to run despite the failure of the parent task.

依赖关系操作基于父任务的退出条件。A dependency action is based on an exit condition for the parent task. 可为以下任一退出条件指定依赖关系操作;对于 .NET,请参阅 ExitConditions 类了解详细信息:You can specify a dependency action for any of the following exit conditions; for .NET, see the ExitConditions class for details:

  • 预处理错误发生时。When a pre-processing error occurs.
  • 文件上传错误发生时。When a file upload error occurs. 如果任务退出并返回通过 exitCodesexitCodeRanges 指定的退出代码,然后遇到文件上传错误,则优先执行退出代码指定的操作。If the task exits with an exit code that was specified via exitCodes or exitCodeRanges, and then encounters a file upload error, the action specified by the exit code takes precedence.
  • 任务退出并返回 ExitCodes 属性定义的退出代码时。When the task exits with an exit code defined by the ExitCodes property.
  • 任务退出并返回处于 ExitCodeRanges 属性指定的范围内的退出代码时。When the task exits with an exit code that falls within a range specified by the ExitCodeRanges property.
  • 默认情况下,如果任务退出时返回 ExitCodesExitCodeRanges 未定义的退出代码,或者如果任务退出时返回预处理错误并且 PreProcessingError 属性未设置,或者如果任务失败时返回文件上传错误并且 FileUploadError 属性未设置。The default case, if the task exits with an exit code not defined by ExitCodes or ExitCodeRanges, or if the task exits with a pre-processing error and the PreProcessingError property is not set, or if the task fails with a file upload error and the FileUploadError property is not set.

若要在 .NET 中指定依赖关系操作,请为退出条件设置 ExitOptions.DependencyAction 属性。To specify a dependency action in .NET, set the ExitOptions.DependencyAction property for the exit condition. DependencyAction 属性采用以下两个值之一:The DependencyAction property takes one of two values:

  • DependencyAction 属性设置为 Satisfy 表示在父任务退出并返回指定的错误时,依赖任务符合运行的条件。Setting the DependencyAction property to Satisfy indicates that dependent tasks are eligible to run if the parent task exits with a specified error.
  • DependencyAction 属性设置为 Block 表示依赖任务不符合运行的条件。Setting the DependencyAction property to Block indicates that dependent tasks are not eligible to run.

对于退出代码 0,DependencyAction 属性的默认设置为 Satisfy;对于其他退出条件,其默认设置为 BlockThe default setting for the DependencyAction property is Satisfy for exit code 0, and Block for all other exit conditions.

以下代码片段设置父任务的 DependencyAction 属性。The following code snippet sets the DependencyAction property for a parent task. 如果父任务退出并返回预处理错误或指定的错误代码,则依赖任务将被阻止。If the parent task exits with a pre-processing error, or with the specified error codes, the dependent task is blocked. 如果父任务退出并返回其他任何非零错误,依赖任务将符合运行的条件。If the parent task exits with any other non-zero error, the dependent task is eligible to run.

// Task A is the parent task.
new CloudTask("A", "cmd.exe /c echo A")
{
    // Specify exit conditions for task A and their dependency actions.
    ExitConditions = new ExitConditions
    {
        // If task A exits with a pre-processing error, block any downstream tasks (in this example, task B).
        PreProcessingError = new ExitOptions
        {
            DependencyAction = DependencyAction.Block
        },
        // If task A exits with the specified error codes, block any downstream tasks (in this example, task B).
        ExitCodes = new List<ExitCodeMapping>
        {
            new ExitCodeMapping(10, new ExitOptions() { DependencyAction = DependencyAction.Block }),
            new ExitCodeMapping(20, new ExitOptions() { DependencyAction = DependencyAction.Block })
        },
        // If task A succeeds or fails with any other error, any downstream tasks become eligible to run 
        // (in this example, task B).
        Default = new ExitOptions
        {
            DependencyAction = DependencyAction.Satisfy
        }
    }
},
// Task B depends on task A. Whether it becomes eligible to run depends on how task A exits.
new CloudTask("B", "cmd.exe /c echo B")
{
    DependsOn = TaskDependencies.OnId("A")
},

代码示例Code sample

TaskDependencies 示例项目是 GitHub 上的 Azure Batch 代码示例之一。The TaskDependencies sample project is one of the Azure Batch code samples on GitHub. 此 Visual Studio 解决方案演示了:This Visual Studio solution demonstrates:

  • 如何在作业中启用任务依赖关系How to enable task dependency on a job
  • 如何创建依赖于其他任务的任务How to create tasks that depend on other tasks
  • 如何在计算节点池中执行这些任务。How to execute those tasks on a pool of compute nodes.

后续步骤Next steps

应用程序部署Application deployment

使用 Batch 的应用程序包功能,可以轻松地部署任务在计算节点上执行的应用程序并对其进行版本控制。The application packages feature of Batch provides an easy way to both deploy and version the applications that your tasks execute on compute nodes.

安装应用程序和暂存数据Installing applications and staging data

有关准备节点以运行任务的方法概述,请参阅 Azure Batch 论坛中的 Installing applications and staging data on Batch compute nodes(在批处理计算节点上安装应用程序和暂存数据)。See Installing applications and staging data on Batch compute nodes in the Azure Batch forum for an overview of methods for preparing your nodes to run tasks. 此帖子由某个 Azure Batch 团队成员编写,是一篇很好的入门教程,介绍如何使用不同的方法将应用程序、任务输入数据和其他文件复制到计算节点。Written by one of the Azure Batch team members, this post is a good primer on the different ways to copy applications, task input data, and other files to your compute nodes.