以并发方式运行任务以最大程度地利用 Batch 计算节点Run tasks concurrently to maximize usage of Batch compute nodes

可在池中数量较少的计算节点上最大程度地利用资源,方法是在每个节点上同时运行多个任务。You can maximize resource usage on a smaller number of compute nodes in your pool by running more than one task simultaneously on each node.

尽管某些方案在某个节点的所有资源专用于单个任务时效果最好,但当多个任务共享这些资源时,某些工作负荷可能会发现作业时间缩短且成本降低:While some scenarios work best with all of a node's resources dedicated to a single task, certain workloads may see shorter job times and lower costs when multiple tasks share those resources:

  • 对于能够共享数据的任务,请 尽量减少数据传输Minimize data transfer for tasks that are able to share data. 将共享数据复制到较小数目的节点并在每个节点上并行执行任务可以大大减少数据传输费用,You can dramatically reduce data transfer charges by copying shared data to a smaller number of nodes, then executing tasks in parallel on each node. 尤其是在复制到每个节点的数据必须跨地理区域传输的情况下。This especially applies if the data to be copied to each node must be transferred between geographic regions.
  • 如果任务需要大量的内存,但这种需要仅在执行过程中短时出现且时间不固定,则请 尽量增加内存使用Maximize memory usage for tasks which require a large amount of memory, but only during short periods of time, and at variable times during execution. 可以减少计算节点的数量但增加其大小,同时提供更多的内存,以便有效地应对此类高峰负载。You can employ fewer, but larger, compute nodes with more memory to efficiently handle such spikes. 这些节点会在每个节点上并行运行多个任务,而每个任务都会充分利用节点在不同时间的大量内存。These nodes would have multiple tasks running in parallel on each node, but each task would take advantage of the nodes' plentiful memory at different times.
  • 对于需要在池中进行节点间通信的情况,请 减少节点数目限制Mitigate node number limits when inter-node communication is required within a pool. 目前,经过配置可以进行节点间通信的池仅限 50 个计算节点。Currently, pools configured for inter-node communication are limited to 50 compute nodes. 如果此类池中的每个节点都可以并行执行任务,则可同时执行较大数量的任务。If each node in such a pool is able to execute tasks in parallel, a greater number of tasks can be executed simultaneously.
  • 复制本地计算群集 :适用于首次将计算环境移至 Azure 等情况。Replicate an on-premises compute cluster , such as when you first move a compute environment to Azure. 如果当前本地解决方案在单个计算节点上执行多个任务,则可以通过增大节点任务的最大数量来更彻底地对配置进行镜像操作。If your current on-premises solution executes multiple tasks per compute node, you can increase the maximum number of node tasks to more closely mirror that configuration.

示例方案Example scenario

例如,假设有一个具有 CPU 和内存要求的任务应用程序,而标准_D1 节点足以满足该要求。As an example, imagine a task application with CPU and memory requirements such that Standard_D1 nodes are sufficient. 但是,若要在所需时间内完成作业,需要使用 1,000 个这样的节点。However, in order to finish the job in the required time, 1,000 of these nodes are needed.

如果不使用具有 1 个 CPU 内核的 Standard_D1 节点,则可使用每个具有 16 个内核的 Standard_D14 节点,同时允许并行执行任务。Instead of using Standard_D1 nodes that have 1 CPU core, you could use Standard_D14 nodes that have 16 cores each, and enable parallel task execution. 这意味着可以使用 1/16 的节点,即只需使用 63 个节点,而无需使用 1,000 个节点。This means that 16 times fewer nodes could be used--instead of 1,000 nodes, only 63 would be required. 如果需要对每个节点使用大型应用程序文件或引用数据,则可进一步缩短作业持续时间并提高效率,因为只需将数据复制到 63 个节点。If large application files or reference data are required for each node, job duration and efficiency are again improved, since the data is copied to only 63 nodes.

允许并行执行任务Enable parallel task execution

可在池级别配置计算节点,以便并行执行任务。You configure compute nodes for parallel task execution at the pool level. 在创建池时,使用 Batch .NET 库设置 CloudPool.TaskSlotsPerNode 属性。With the Batch .NET library, set the CloudPool.TaskSlotsPerNode property when you create a pool. 如果使用的是 Batch REST API,则可在创建池时在请求正文中设置 taskSlotsPerNode 元素。If you're using the Batch REST API, set the taskSlotsPerNode element in the request body during pool creation.

备注

只能在创建池时设置 taskSlotsPerNode 元素和 TaskSlotsPerNode 属性。You can set the taskSlotsPerNode element and TaskSlotsPerNode property only at pool creation time. 创建完池以后,不能对上述元素和属性进行修改。They can't be modified after a pool has already been created.

Azure Batch 允许你将每节点的任务槽数最多设置为节点核心数的 4 倍。Azure Batch allows you to set task slots per node up to (4x) the number of node cores. 例如,如果将池的节点大小配置为“大型”(四核),则可将 taskSlotsPerNode 设置为 16。For example, if the pool is configured with nodes of size "Large" (four cores), then taskSlotsPerNode may be set to 16. 但是,无论节点有多少个核心,每个节点的任务槽数都不能超过 256 个。However, regardless of how many cores the node has, you can't have more than 256 task slots per node. 有关每个节点大小的核心数的详细信息,请参阅云服务的大小For details on the number of cores for each of the node sizes, see Sizes for Cloud Services. 有关服务限制的详细信息,请参阅 Azure Batch 服务的配额和限制For more information on service limits, see Quotas and limits for the Azure Batch service.

提示

为池构造自动缩放公式时,请务必考虑 taskSlotsPerNode 值。Be sure to take into account the taskSlotsPerNode value when you construct an autoscale formula for your pool. 例如,如果增加每个节点的任务数,则可能会极大地影响对 $RunningTasks 求值的公式。For example, a formula that evaluates $RunningTasks could be dramatically affected by an increase in tasks per node. 有关详细信息,请参阅自动缩放 Azure Batch 池中的计算节点For more information, see Automatically scale compute nodes in an Azure Batch pool.

指定任务分布Specify task distribution

当启用并发任务时,请务必指定任务在池中各节点之间的分布方式。When enabling concurrent tasks, it's important to specify how you want the tasks to be distributed across the nodes in the pool.

可以通过 CloudPool.TaskSchedulingPolicy 属性指定任务,即让任务在池中所有节点之间平均分配(“散布式”)。By using the CloudPool.TaskSchedulingPolicy property, you can specify that tasks should be assigned evenly across all nodes in the pool ("spreading"). 或者,先给池中的每个节点分配尽量多的任务,再将任务分配给池中的其他节点(“装箱式”)。Or you can specify that as many tasks as possible should be assigned to each node before tasks are assigned to another node in the pool ("packing").

例如,可参阅上面示例中 Standard_D14 节点的池,该池配置后的 CloudPool.TaskSlotsPerNode 值为 16。As an example, consider the pool of Standard_D14 nodes (in the example above) that is configured with a CloudPool.TaskSlotsPerNode value of 16. 如果在对 CloudPool.TaskSchedulingPolicy 进行配置时,将 ComputeNodeFillType 设置为 Pack,则会充分使用每个节点的所有 16 个核心,并可通过自动缩放池将不使用的节点(没有分配任何任务的节点)从池中删除。If the CloudPool.TaskSchedulingPolicy is configured with a ComputeNodeFillType of Pack , it would maximize usage of all 16 cores of each node and allow an autoscaling pool to remove unused nodes (nodes without any tasks assigned) from the pool. 这可以最大程度地减少资源使用量并节省资金。This minimizes resource usage and saves money.

定义每任务可变槽数Define variable slots per task

可以通过 CloudTask.RequiredSlots 属性对任务进行定义,指定它在计算节点上运行时所需的槽数。A task can be defined with CloudTask.RequiredSlots property, specifying how many slots it requires to run on a compute node. 默认值为 1。The default value as 1. 如果你的任务在计算节点上具有不同的资源使用率权重,则可以设置可变任务槽数。You can set variable task slots if your tasks have different weights regarding to resource usage on the compute node. 这使每个计算节点都有合理数量的并发运行任务,而不会滥用系统资源,例如 CPU 或内存。This lets each compute node have a reasonable number of concurrent running tasks without overwhelming system resources like CPU or memory.

例如,对于具有属性 taskSlotsPerNode = 8 的池,你可以使用 requiredSlots = 8 提交需要多核的 CPU 密集型任务,而其他任务可以设置为 requiredSlots = 1For example, for a pool with property taskSlotsPerNode = 8, you can submit multi-core required CPU-intensive tasks with requiredSlots = 8, while other tasks can be set to requiredSlots = 1. 计划此混合工作负荷时,CPU 密集型任务将以独占方式在其计算节点上运行,而其他任务可以在其他节点上并发运行(一次最多八个任务)。When this mixed workload is scheduled, the CPU-intensive tasks will run exclusively on their compute nodes, while other tasks can run concurrently (up to eight tasks at once) on other nodes. 这有助于平衡多个计算节点上的工作负荷,提高资源使用效率。This helps you balance your workload across compute nodes and improve resource usage efficiency.

提示

使用可变任务槽数时,可能会暂时无法计划具有更多必需槽的大型任务,原因如下:任何计算节点上都没有足够的可用槽,即使某些节点上仍有空闲的槽。When using variable task slots, it's possible that large tasks with more required slots can temporarily fail to be scheduled because not enough slots are available on any compute node, even when there are still idle slots on some nodes. 你可以提高这些任务的作业优先级,增加对节点上可用槽的竞争机会。You can raise the job priority for these tasks to increase their chance to compete for available slots on nodes.

Batch 服务在无法计划要运行的任务时会发出 TaskScheduleFailEvent,并且始终会在必需的槽变得可用之前重试计划。The Batch service emits the TaskScheduleFailEvent when it fails to schedule a task to run, and keeps retrying the scheduling until required slots become available. 你可以侦听该事件以检测潜在的任务计划问题,并相应地进行缓解。You can listen to that event to detect potential task scheduling issues and mitigate accordingly.

备注

不要将任务的 requiredSlots 指定为大于池的 taskSlotsPerNodeDo not specify a task's requiredSlots to be greater than the pool's taskSlotsPerNode. 这将导致任务永远无法运行。This will result in the task never being able to run. 当你提交任务时,Batch 服务当前不会验证此冲突,因为作业在提交时可能未绑定池,也可能已通过禁用/重新启用将其更改到不同的池。The Batch Service doesn't currently validate this conflict when you submit tasks because a job may not have a pool bound at submission time, or it could be changed to a different pool by disabling/re-enabling.

Batch .NET 示例Batch .NET example

以下 Batch .NET API 代码片段展示了如何创建一个每个节点具有多个任务槽的池,以及如何提交具有必需槽的任务。The following Batch .NET API code snippets show how to create a pool with multiple task slots per node and how to submit a task with required slots.

创建每个节点上具有多个任务槽的池Create a pool with multiple task slots per node

此代码片段演示了一个请求,该请求要求创建一个包含四个节点的池,每个节点允许有四个任务槽。This code snippet shows a request to create a pool that contains four nodes, with four task slots allowed per node. 它指定了一个任务计划策略,要求先用任务填充一个节点,然后再将任务分配给池中的其他节点。It specifies a task scheduling policy that will fill each node with tasks prior to assigning tasks to another node in the pool.

有关如何使用 Batch .NET API 添加池的详细信息,请参阅 BatchClient.PoolOperations.CreatePoolFor more information on adding pools by using the Batch .NET API, see BatchClient.PoolOperations.CreatePool.

CloudPool pool =
    batchClient.PoolOperations.CreatePool(
        poolId: "mypool",
        targetDedicatedComputeNodes: 4
        virtualMachineSize: "standard_d1_v2",
        cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5"));

pool.TaskSlotsPerNode = 4;
pool.TaskSchedulingPolicy = new TaskSchedulingPolicy(ComputeNodeFillType.Pack);
pool.Commit();

创建具有必需槽的任务Create a task with required slots

此代码片段创建一个具有非默认 requiredSlots 的任务。This code snippet creates a task with non-default requiredSlots. 仅当计算节点上有足够的空闲槽可用时,此任务才会运行。This task will only run when there are enough free slots available on a compute node.

CloudTask task = new CloudTask(taskId, taskCommandLine)
{
    RequiredSlots = 2
};

列出计算节点以及正在运行的任务和槽的计数List compute nodes with counts for running tasks and slots

此代码片段列出池中的所有计算节点,并输出每个节点正在运行的任务和任务槽的计数。This code snippet lists all compute nodes in the pool, and prints out the counts for running tasks and task slots per node.

ODATADetailLevel nodeDetail = new ODATADetailLevel(selectClause: "id,runningTasksCount,runningTaskSlotsCount");
IPagedEnumerable<ComputeNode> nodes = batchClient.PoolOperations.ListComputeNodes(poolId, nodeDetail);

await nodes.ForEachAsync(node =>
{
    Console.WriteLine(node.Id + " :");
    Console.WriteLine($"RunningTasks = {node.RunningTasksCount}, RunningTaskSlots = {node.RunningTaskSlotsCount}");

}).ConfigureAwait(continueOnCapturedContext: false);

列出作业的任务计数List task counts for the job

此代码片段获取作业的任务计数,其中包含每个任务状态的任务和任务槽计数。This code snippet gets task counts for the job, which includes both tasks and task slots count per task state.

TaskCountsResult result = await batchClient.JobOperations.GetJobTaskCountsAsync(jobId);

Console.WriteLine("\t\tActive\tRunning\tCompleted");
Console.WriteLine($"TaskCounts:\t{result.TaskCounts.Active}\t{result.TaskCounts.Running}\t{result.TaskCounts.Completed}");
Console.WriteLine($"TaskSlotCounts:\t{result.TaskSlotCounts.Active}\t{result.TaskSlotCounts.Running}\t{result.TaskSlotCounts.Completed}");

Batch REST 示例Batch REST example

以下 Batch REST API 代码片段展示了如何创建一个每个节点具有多个任务槽的池,以及如何提交具有必需槽的任务。The following Batch REST API code snippets show how to create a pool with multiple task slots per node and how to submit a task with required slots.

创建每个节点上具有多个任务槽的池Create a pool with multiple task slots per node

此代码片段演示了一个请求,该请求要求创建一个包含两个大型节点的池,每个节点最多四个任务。This snippet shows a request to create a pool that contains two large nodes with a maximum of four tasks per node.

有关如何使用 REST API 添加池的详细信息,请参阅将池添加到帐户For more information on adding pools by using the REST API, see Add a pool to an account.

{
  "odata.metadata":"https://myaccount.myregion.batch.chinacloudapi.cn/$metadata#pools/@Element",
  "id":"mypool",
  "vmSize":"large",
  "cloudServiceConfiguration": {
    "osFamily":"4",
    "targetOSVersion":"*",
  },
  "targetDedicatedComputeNodes":2,
  "taskSlotsPerNode":4,
  "enableInterNodeCommunication":true,
}

创建具有必需槽的任务Create a task with required slots

此代码片段展示了一个请求,该请求要求添加具有非默认 requiredSlots 的任务。This snippet shows a request to add a task with non-default requiredSlots. 仅当计算节点上有足够的空闲槽可用时,此任务才会运行。This task will only run when there are enough free slots available on the compute node.

{
  "id": "taskId",
  "commandLine": "bash -c 'echo hello'",
  "userIdentity": {
    "autoUser": {
      "scope": "task",
      "elevationLevel": "nonadmin"
    }
  },
  "requiredSLots": 2
}

GitHub 上的代码示例Code sample on GitHub

GitHub 上的 ParallelTasks 项目说明了如何使用 CloudPool.TaskSlotsPerNode 属性。The ParallelTasks project on GitHub illustrates the use of the CloudPool.TaskSlotsPerNode property.

此 C# 控制台应用程序使用 Batch .NET 库创建包含一个或多个计算节点的池。This C# console application uses the Batch .NET library to create a pool with one or more compute nodes. 它在这些节点上执行其数量可以配置的任务,以便模拟可变负荷。It executes a configurable number of tasks on those nodes to simulate a variable load. 应用程序的输出显示了哪些节点执行了每个任务。Output from the application shows which nodes executed each task. 该应用程序还提供了作业参数和持续时间的摘要。The application also provides a summary of the job parameters and duration.

例如,下面显示了 ParallelTasks 示例应用程序运行两次后的输出摘要部分。As an example, below is the summary portion of the output from two different runs of the ParallelTasks sample application. 此处显示的作业持续时间不包括创建池的时间,因为每个作业都提交到先前创建的、其计算节点在提交时处于空闲状态的池。Job durations shown here don't include pool creation time, since each job was submitted to a previously created pool whose compute nodes were in the Idle state at submission time.

第一次执行示例应用程序时,结果显示,在池中只有一个节点且使用默认的一个节点一个任务设置的情况下,作业持续时间超过 30 分钟。The first execution of the sample application shows that with a single node in the pool and the default setting of one task per node, the job duration is over 30 minutes.

Nodes: 1
Node size: large
Task slots per node: 1
Max slots per task: 1
Tasks: 32
Duration: 00:30:01.4638023

第二次运行示例应用程序时,显示作业持续时间显著缩短。The second run of the sample shows a significant decrease in job duration. 这是因为该池已被配置为每个节点四个任务,因此可以并行执行任务,使得作业可以在大约四分之一的时间内完成。This is because the pool was configured with four tasks per node, allowing for parallel task execution to complete the job in nearly a quarter of the time.

Nodes: 1
Node size: large
Task slots per node: 4
Max slots per task: 1
Tasks: 32
Duration: 00:08:48.2423500

后续步骤Next steps

  • 尝试 Batch Explorer 热度地图。Try the Batch Explorer Heat Map. Batch Explorer 是一个功能丰富的免费独立客户端工具,可帮助创建、调试和监视 Azure Batch 应用程序。Batch Explorer is a free, rich-featured, standalone client tool to help create, debug, and monitor Azure Batch applications. 执行 ParallelTasks 示例应用程序时,可以使用 Batch Explorer 热度地图功能轻松显示每个节点上并行任务的执行情况。When you're executing the ParallelTasks sample application, the Batch Explorer Heat Map feature lets you easily visualize the execution of parallel tasks on each node.
  • 探究 GitHub 上的 Azure Batch 示例Explore Azure Batch samples on GitHub.
  • 详细了解 Batch 任务依赖项Learn more about Batch task dependencies.