在 Batch 中使用多实例任务来运行消息传递接口 (MPI) 应用程序Use multi-instance tasks to run Message Passing Interface (MPI) applications in Batch

使用多实例任务可在多个计算节点上同时运行 Azure Batch 任务。Multi-instance tasks allow you to run an Azure Batch task on multiple compute nodes simultaneously. 这些任务可在 Batch 中实现高性能计算方案,例如消息传递接口 (MPI) 应用程序。These tasks enable high performance computing scenarios like Message Passing Interface (MPI) applications in Batch. 本文介绍如何使用 Batch .NET 库执行多实例任务。In this article, you learn how to execute multi-instance tasks using the Batch .NET library.

Note

虽然本文中的示例重点介绍 Batch .NET、MS-MPI 和 Windows 计算节点,但此处讨论的多实例任务概念也适用于其他平台和技术(例如 Linux 节点上的 Python 和 Intel MPI)。While the examples in this article focus on Batch .NET, MS-MPI, and Windows compute nodes, the multi-instance task concepts discussed here are applicable to other platforms and technologies (Python and Intel MPI on Linux nodes, for example).

多实例任务概述Multi-instance task overview

在 Batch 中,每个任务通常是在单个计算节点上执行 --将多个任务提交给作业,Batch 服务将每个任务安排在节点上执行。In Batch, each task is normally executed on a single compute node--you submit multiple tasks to a job, and the Batch service schedules each task for execution on a node. 但是,可以通过配置任务的“多实例设置”,告知 Batch 改为创建一个主要任务和多个子任务,然后在多个节点上执行它们。However, by configuring a task's multi-instance settings, you tell Batch to instead create one primary task and several subtasks that are then executed on multiple nodes.

多实例任务概述Multi-instance task overview

将具有多实例设置的任务提交给作业时,Batch 执行多实例任务特有的几个步骤:When you submit a task with multi-instance settings to a job, Batch performs several steps unique to multi-instance tasks:

  1. 批处理服务根据多实例设置创建一个主要任务和多个子任务The Batch service creates one primary and several subtasks based on the multi-instance settings. 任务(主要任务和所有子任务)的总数与用户在多实例设置中指定的实例(计算节点)数相符。The total number of tasks (primary plus all subtasks) matches the number of instances (compute nodes) you specify in the multi-instance settings.
  2. 批处理将其中一个计算节点指定为节点,将主要任务安排在主节点上执行。Batch designates one of the compute nodes as the master, and schedules the primary task to execute on the master. 将子任务安排在已分配给多实例任务的剩余计算节点上执行,一个节点一个子任务。It schedules the subtasks to execute on the remainder of the compute nodes allocated to the multi-instance task, one subtask per node.
  3. 主要任务和所有子任务会下载在多实例设置中指定的任何通用资源文件The primary and all subtasks download any common resource files you specify in the multi-instance settings.
  4. 下载公共资源文件之后,主任务和子任务会执行多实例设置中指定的 协调命令After the common resource files have been downloaded, the primary and subtasks execute the coordination command you specify in the multi-instance settings. 通常使用协调命令准备节点,以便执行任务。The coordination command is typically used to prepare nodes for executing the task. 该操作可能包括启动后台服务(例如 Microsoft MPIsmpd.exe),以及验证节点是否已就绪,能够处理节点间消息。This can include starting background services (such as Microsoft MPI's smpd.exe) and verifying that the nodes are ready to process inter-node messages.
  5. 在主要任务和所有子任务成功完成协调命令以后,主要任务会在主节点上执行应用程序命令The primary task executes the application command on the master node after the coordination command has been completed successfully by the primary and all subtasks. 应用程序命令是多实例任务本身的命令行,只由主要任务执行。The application command is the command line of the multi-instance task itself, and is executed only by the primary task. 在基于 MS-MPI 的解决方案中,用户将在此处使用 mpiexec.exe 执行已启用 MPI 的应用程序。In an MS-MPI-based solution, this is where you execute your MPI-enabled application using mpiexec.exe.

Note

虽然“多实例任务”在功能上不同,但并不是特殊的任务类型,例如 StartTaskJobPreparationTaskThough it is functionally distinct, the "multi-instance task" is not a unique task type like the StartTask or JobPreparationTask. 多实例任务只是已配置多实例设置的标准 Batch 任务(Batch .NET 中的 CloudTask)。The multi-instance task is simply a standard Batch task (CloudTask in Batch .NET) whose multi-instance settings have been configured. 在本文中,我们将它称为多实例任务In this article, we refer to this as the multi-instance task.

多实例任务的要求Requirements for multi-instance tasks

多实例任务需要有已启用节点间通信已禁用并发任务执行的池。Multi-instance tasks require a pool with inter-node communication enabled, and with concurrent task execution disabled. 若要禁用并发任务执行,请将 CloudPool.MaxTasksPerComputeNode 属性设置为 1。To disable concurrent task execution, set the CloudPool.MaxTasksPerComputeNode property to 1.

Note

Batch 限制已启用节点间通信的池的大小。Batch limits the size of a pool that has inter-node communication enabled.

此代码片段演示如何使用 Batch.NET 库为多实例任务创建池。This code snippet shows how to create a pool for multi-instance tasks using the Batch .NET library.

CloudPool myCloudPool =
    myBatchClient.PoolOperations.CreatePool(
        poolId: "MultiInstanceSamplePool",
        targetDedicatedComputeNodes: 3
        virtualMachineSize: "standard_d1_v2",
        cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5"));

// Multi-instance tasks require inter-node communication, and those nodes
// must run only one task at a time.
myCloudPool.InterComputeNodeCommunicationEnabled = true;
myCloudPool.MaxTasksPerComputeNode = 1;

Note

如果尝试在已禁用节点间通信,或 maxTasksPerNode 值大于 1 的池中运行多实例任务,则永远不排定任务 -- 它无限期停留在“活动”状态。If you try to run a multi-instance task in a pool with internode communication disabled, or with a maxTasksPerNode value greater than 1, the task is never scheduled--it remains indefinitely in the "active" state.

使用 StartTask 安装 MPIUse a StartTask to install MPI

若要通过多实例任务运行 MPI 应用程序,首先需在池中的计算节点上安装 MPI 实现(例如 MS-MPI 或 Intel MPI)。To run MPI applications with a multi-instance task, you first need to install an MPI implementation (MS-MPI or Intel MPI, for example) on the compute nodes in the pool. 这是使用 StartTask 的好时机,每当节点加入池或重新启动时,它就会执行。This is a good time to use a StartTask, which executes whenever a node joins a pool, or is restarted. 此代码片段创建一个 StartTask,将 MS-MPI 安装程序包指定为资源文件This code snippet creates a StartTask that specifies the MS-MPI setup package as a resource file. 资源文件下载到节点之后,会执行启动任务的命令行。The start task's command line is executed after the resource file is downloaded to the node. 在本示例中,命令行执行 MS-MPI 的无人参与安装。In this case, the command line performs an unattended install of MS-MPI.

// Create a StartTask for the pool which we use for installing MS-MPI on
// the nodes as they join the pool (or when they are restarted).
StartTask startTask = new StartTask
{
    CommandLine = "cmd /c MSMpiSetup.exe -unattend -force",
    ResourceFiles = new List<ResourceFile> { new ResourceFile("https://mystorageaccount.blob.core.chinacloudapi.cn/mycontainer/MSMpiSetup.exe", "MSMpiSetup.exe") },
    UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin)),
    WaitForSuccess = true
};
myCloudPool.StartTask = startTask;

// Commit the fully configured pool to the Batch service to actually create
// the pool and its compute nodes.
await myCloudPool.CommitAsync();

远程直接内存访问 (RDMA)Remote direct memory access (RDMA)

在以下文章中查找指定为“支持 RDMA”的大小:Look for the sizes specified as "RDMA capable" in the following articles:

Note

若要充分利用 Linux 计算节点上的 RDMA,必须使用节点上的 Intel MPITo take advantage of RDMA on Linux compute nodes, you must use Intel MPI on the nodes.

使用 Batch .NET 创建多实例任务Create a multi-instance task with Batch .NET

我们已讨论池的要求和 MPI 包安装,现在让我们创建多实例任务。Now that we've covered the pool requirements and MPI package installation, let's create the multi-instance task. 在此代码片段中,我们将创建一个标准 CloudTask,然后配置其 MultiInstanceSettings 属性。In this snippet, we create a standard CloudTask, then configure its MultiInstanceSettings property. 如前所述,多实例任务不是独特的任务类型,而只是已配置多实例设置的标准 Batch 任务。As mentioned earlier, the multi-instance task is not a distinct task type, but a standard Batch task configured with multi-instance settings.

// Create the multi-instance task. Its command line is the "application command"
// and will be executed *only* by the primary, and only after the primary and
// subtasks execute the CoordinationCommandLine.
CloudTask myMultiInstanceTask = new CloudTask(id: "mymultiinstancetask",
    commandline: "cmd /c mpiexec.exe -wdir %AZ_BATCH_TASK_SHARED_DIR% MyMPIApplication.exe");

// Configure the task's MultiInstanceSettings. The CoordinationCommandLine will be executed by
// the primary and all subtasks.
myMultiInstanceTask.MultiInstanceSettings =
    new MultiInstanceSettings(numberOfNodes) {
    CoordinationCommandLine = @"cmd /c start cmd /c ""%MSMPI_BIN%\smpd.exe"" -d",
    CommonResourceFiles = new List<ResourceFile> {
    new ResourceFile("https://mystorageaccount.blob.core.chinacloudapi.cn/mycontainer/MyMPIApplication.exe",
                     "MyMPIApplication.exe")
    }
};

// Submit the task to the job. Batch will take care of splitting it into subtasks and
// scheduling them for execution on the nodes.
await myBatchClient.JobOperations.AddTaskAsync("mybatchjob", myMultiInstanceTask);

主要任务和子任务Primary task and subtasks

创建任务的多实例设置时,需要指定用于执行任务的计算节点数目。When you create the multi-instance settings for a task, you specify the number of compute nodes that are to execute the task. 将任务提交给作业时,Batch 服务将创建一个主要任务和足够的子任务,并且合计符合指定的节点数。When you submit the task to a job, the Batch service creates one primary task and enough subtasks that together match the number of nodes you specified.

系统分配范围介于 0 到 numberOfInstances - 1 的整数 ID 给这些任务。These tasks are assigned an integer id in the range of 0 to numberOfInstances - 1. ID 为 0 的任务是主要任务,其他所有 ID 都是子任务。The task with id 0 is the primary task, and all other ids are subtasks. 例如,如果为任务创建以下多实例设置,则主要任务的 ID 为 0,而子任务的 ID 为 1 到 9。For example, if you create the following multi-instance settings for a task, the primary task would have an id of 0, and the subtasks would have ids 1 through 9.

int numberOfNodes = 10;
myMultiInstanceTask.MultiInstanceSettings = new MultiInstanceSettings(numberOfNodes);

主节点Master node

当用户提交多实例任务时,批处理服务会将其中一个计算节点指定为“主”节点,并将主要任务安排在主节点上执行。When you submit a multi-instance task, the Batch service designates one of the compute nodes as the "master" node, and schedules the primary task to execute on the master node. 子任务安排在已分配给多实例任务的剩余节点上执行。The subtasks are scheduled to execute on the remainder of the nodes allocated to the multi-instance task.

协调命令Coordination command

主要任务和子任务都执行协调命令The coordination command is executed by both the primary and subtasks.

阻止调用协调命令 -- 在所有子任务的协调命令成功返回之前,Batch 不执行应用程序命令。The invocation of the coordination command is blocking--Batch does not execute the application command until the coordination command has returned successfully for all subtasks. 因此,协调命令应该启动任何所需的后台服务,确认它们已准备好可供使用,并退出。The coordination command should therefore start any required background services, verify that they are ready for use, and then exit. 例如,在使用 MS-MPI 第 7 版的方案中,此协调命令在节点上启动 SMPD 服务,并退出:For example, this coordination command for a solution using MS-MPI version 7 starts the SMPD service on the node, then exits:

cmd /c start cmd /c ""%MSMPI_BIN%\smpd.exe"" -d

请注意此协调命令中使用 startNote the use of start in this coordination command. 这是必需的,因为 smpd.exe 应用程序不会在执行后立即返回。This is required because the smpd.exe application does not return immediately after execution. 如果不使用 start 命令,此协调命令就不返回,因此会阻止执行应用程序命令。Without the use of the start command, this coordination command would not return, and would therefore block the application command from running.

应用程序命令Application command

主要任务及所有子任务完成执行协调命令之后,只有主要任务执行多实例任务的命令行。Once the primary task and all subtasks have finished executing the coordination command, the multi-instance task's command line is executed by the primary task only. 我们将此命令行称为应用程序命令,以便与协调命令区分开来。We call this the application command to distinguish it from the coordination command.

对于 MS-MPI 应用程序,请使用应用程序命令通过 mpiexec.exe 执行已启用 MPI 的应用程序。For MS-MPI applications, use the application command to execute your MPI-enabled application with mpiexec.exe. 例如,以下是使用 MS-MPI 第 7 版的方案所执行的应用程序命令:For example, here is an application command for a solution using MS-MPI version 7:

cmd /c ""%MSMPI_BIN%\mpiexec.exe"" -c 1 -wdir %AZ_BATCH_TASK_SHARED_DIR% MyMPIApplication.exe

Note

由于 MS-MPI 的 mpiexec.exe 默认使用 CCP_NODES 变量(请参阅环境变量),上述示例应用程序命令行已排除该变量。Because MS-MPI's mpiexec.exe uses the CCP_NODES variable by default (see Environment variables) the example application command line above excludes it.

环境变量Environment variables

批处理创建的多个 环境变量 特定于已分配给某个多实例任务的计算节点上的多实例任务。Batch creates several environment variables specific to multi-instance tasks on the compute nodes allocated to a multi-instance task. 协调命令行和应用程序命令行可以引用这些环境变量,就像其所执行的脚本和程序一样。Your coordination and application command lines can reference these environment variables, as can the scripts and programs they execute.

以下环境变量由多实例任务所使用的 Batch 服务创建:The following environment variables are created by the Batch service for use by multi-instance tasks:

  • CCP_NODES
  • AZ_BATCH_NODE_LIST
  • AZ_BATCH_HOST_LIST
  • AZ_BATCH_MASTER_NODE
  • AZ_BATCH_TASK_SHARED_DIR
  • AZ_BATCH_IS_CURRENT_NODE_MASTER

如需这些环境变量以及其他批处理计算节点环境变量的完整详细信息(包括内容和可见性),请参阅 Compute node environment variables(计算节点环境变量)。For full details on these and the other Batch compute node environment variables, including their contents and visibility, see Compute node environment variables.

Tip

此批处理 Linux MPI 代码示例包含一个示例,介绍了如何使用这些环境变量中的其中几个。The Batch Linux MPI code sample contains an example of how several of these environment variables can be used. coordination-cmd Bash 脚本可从 Azure 存储下载常用应用程序和输入文件,在主节点上启用网络文件系统 (NFS) 共享,以及将其他分配给多实例任务的节点配置为 NFS 客户端。The coordination-cmd Bash script downloads common application and input files from Azure Storage, enables a Network File System (NFS) share on the master node, and configures the other nodes allocated to the multi-instance task as NFS clients.

资源文件Resource files

多实例任务需要考虑两组资源文件:所有任务(主要任务和子任务)下载的通用资源文件,以及为多实例任务本身指定的资源文件(只有主要任务下载)。There are two sets of resource files to consider for multi-instance tasks: common resource files that all tasks download (both primary and subtasks), and the resource files specified for the multi-instance task itself, which only the primary task downloads.

可以在任务的多实例设置中指定一个或多个通用资源文件You can specify one or more common resource files in the multi-instance settings for a task. 主要任务及所有子任务从 Azure 存储将这些通用资源文件下载到每个节点的任务共享目录These common resource files are downloaded from Azure Storage into each node's task shared directory by the primary and all subtasks. 可以使用 AZ_BATCH_TASK_SHARED_DIR 环境变量从应用程序命令和协调命令行访问任务共享目录。You can access the task shared directory from application and coordination command lines by using the AZ_BATCH_TASK_SHARED_DIR environment variable. AZ_BATCH_TASK_SHARED_DIR 路径在所有分配给多实例任务的节点上都是相同的,因此可在主要任务和所有子任务之间共享单个协调命令。The AZ_BATCH_TASK_SHARED_DIR path is identical on every node allocated to the multi-instance task, thus you can share a single coordination command between the primary and all subtasks. 从远程访问的意义上来说,批处理并不“共享”目录,但用户可将其用作装入点或共享点,如此前在有关环境变量的提示中所述。Batch does not "share" the directory in a remote access sense, but you can use it as a mount or share point as mentioned earlier in the tip on environment variables.

默认情况下,为多实例任务本身指定的资源文件下载到任务的工作目录 AZ_BATCH_TASK_WORKING_DIRResource files that you specify for the multi-instance task itself are downloaded to the task's working directory, AZ_BATCH_TASK_WORKING_DIR, by default. 如前所述,仅主要任务下载为多实例任务本身指定的资源文件(与通用资源文件相比)。As mentioned, in contrast to common resource files, only the primary task downloads resource files specified for the multi-instance task itself.

Important

在命令行中,请始终使用环境变量 AZ_BATCH_TASK_SHARED_DIRAZ_BATCH_TASK_WORKING_DIR 来引用这些目录。Always use the environment variables AZ_BATCH_TASK_SHARED_DIR and AZ_BATCH_TASK_WORKING_DIR to refer to these directories in your command lines. 请勿尝试手动构造路径。Do not attempt to construct the paths manually.

任务生存期Task lifetime

主要任务的生存期控制整个多实例任务的生存期。The lifetime of the primary task controls the lifetime of the entire multi-instance task. 当主要任务退出时,所有子任务就会终止。When the primary exits, all of the subtasks are terminated. 主要任务的退出代码就是任务的退出代码,因此在重试用途上用于判断任务成功或失败。The exit code of the primary is the exit code of the task, and is therefore used to determine the success or failure of the task for retry purposes.

如果任何子任务失败,例如退出时返回代码不是零,则整个多实例任务失败。If any of the subtasks fail, exiting with a non-zero return code, for example, the entire multi-instance task fails. 然后终止并重试多实例任务,直到到达重试限制为止。The multi-instance task is then terminated and retried, up to its retry limit.

删除多实例任务时,Batch 服务也会删除主要任务和所有子任务。When you delete a multi-instance task, the primary and all subtasks are also deleted by the Batch service. 所有子任务目录及其文件从计算节点中删除,如同在标准任务中一样。All subtask directories and their files are deleted from the compute nodes, just as for a standard task.

多实例任务的 TaskConstraints(例如 MaxTaskRetryCountMaxWallClockTimeRetentionTime 属性)都视为用于标准任务,并应用到主要任务和所有子任务。TaskConstraints for a multi-instance task, such as the MaxTaskRetryCount, MaxWallClockTime, and RetentionTime properties, are honored as they are for a standard task, and apply to the primary and all subtasks. 但是,如果在多实例任务添加到作业之后更改 RetentionTime 属性,此更改只应用到主要任务。However, if you change the RetentionTime property after adding the multi-instance task to the job, this change is applied only to the primary task. 所有的子任务继续使用原始 RetentionTimeAll of the subtasks continue to use the original RetentionTime.

如果最近的任务是多实例任务的一部分,计算节点的最近任务列表反映子任务的 ID。A compute node's recent task list reflects the id of a subtask if the recent task was part of a multi-instance task.

获取有关子任务的信息Obtain information about subtasks

若要使用 Batch .NET 库获取子任务的详细信息,请调用 CloudTask.ListSubtasks 方法。To obtain information on subtasks by using the Batch .NET library, call the CloudTask.ListSubtasks method. 此方法返回所有子任务的相关信息,以及已执行任务的计算节点的相关信息。This method returns information on all subtasks, and information about the compute node that executed the tasks. 可以根据此信息判断每项子任务的根目录、池 ID、其当前状态、退出代码等等。From this information, you can determine each subtask's root directory, the pool id, its current state, exit code, and more. 可以使用此信息结合 PoolOperations.GetNodeFile 方法,以获取子任务的文件。You can use this information in combination with the PoolOperations.GetNodeFile method to obtain the subtask's files. 请注意,此方法不返回主要任务 (ID 0) 的相关信息。Note that this method does not return information for the primary task (id 0).

Note

除非另有指明,否则在多实例 CloudTask 本身执行的 Batch .NET 方法只应用到主要任务。Unless otherwise stated, Batch .NET methods that operate on the multi-instance CloudTask itself apply only to the primary task. 例如,当在多实例任务上调用 CloudTask.ListNodeFiles 方法时,只返回主要任务的文件。For example, when you call the CloudTask.ListNodeFiles method on a multi-instance task, only the primary task's files are returned.

以下代码段演示如何获取子任务信息,以及从它们执行所在的节点请求文件的内容。The following code snippet shows how to obtain subtask information, as well as request file contents from the nodes on which they executed.

// Obtain the job and the multi-instance task from the Batch service
CloudJob boundJob = batchClient.JobOperations.GetJob("mybatchjob");
CloudTask myMultiInstanceTask = boundJob.GetTask("mymultiinstancetask");

// Now obtain the list of subtasks for the task
IPagedEnumerable<SubtaskInformation> subtasks = myMultiInstanceTask.ListSubtasks();

// Asynchronously iterate over the subtasks and print their stdout and stderr
// output if the subtask has completed
await subtasks.ForEachAsync(async (subtask) =>
{
    Console.WriteLine("subtask: {0}", subtask.Id);
    Console.WriteLine("exit code: {0}", subtask.ExitCode);

    if (subtask.State == SubtaskState.Completed)
    {
        ComputeNode node =
            await batchClient.PoolOperations.GetComputeNodeAsync(subtask.ComputeNodeInformation.PoolId,
                                                                 subtask.ComputeNodeInformation.ComputeNodeId);

        NodeFile stdOutFile = await node.GetNodeFileAsync(subtask.ComputeNodeInformation.TaskRootDirectory + "\\" + Constants.StandardOutFileName);
        NodeFile stdErrFile = await node.GetNodeFileAsync(subtask.ComputeNodeInformation.TaskRootDirectory + "\\" + Constants.StandardErrorFileName);
        stdOut = await stdOutFile.ReadAsStringAsync();
        stdErr = await stdErrFile.ReadAsStringAsync();

        Console.WriteLine("node: {0}:", node.Id);
        Console.WriteLine("stdout.txt: {0}", stdOut);
        Console.WriteLine("stderr.txt: {0}", stdErr);
    }
    else
    {
        Console.WriteLine("\tSubtask {0} is in state {1}", subtask.Id, subtask.State);
    }
});

代码示例Code sample

GitHub 上的 MultiInstanceTasks 代码示例演示了如何通过多实例任务在 Batch 计算节点上运行 MS-MPI 应用程序。The MultiInstanceTasks code sample on GitHub demonstrates how to use a multi-instance task to run an MS-MPI application on Batch compute nodes. 准备执行中的步骤运行该示例。Follow the steps in Preparation and Execution to run the sample.

准备工作Preparation

  1. 执行 How to compile and run a simple MS-MPI program(如何编译和运行简单的 MS-MPI 程序)中的头两个步骤。Follow the first two steps in How to compile and run a simple MS-MPI program. 这样即可满足下一步的先决条件。This satisfies the prerequisites for the following step.
  2. 生成 MPIHelloWorld 示例 MPI 程序的 发行 版。Build a Release version of the MPIHelloWorld sample MPI program. 该程序是会在计算节点上通过多实例任务运行的程序。This is the program that will be run on compute nodes by the multi-instance task.
  3. 创建包含 MPIHelloWorld.exe(在步骤 2 构建)和 MSMpiSetup.exe(在步骤 1 下载)的 zip 文件。Create a zip file containing MPIHelloWorld.exe (which you built step 2) and MSMpiSetup.exe (which you downloaded step 1). 需在下一步将此 zip 文件作为应用程序包上传。You'll upload this zip file as an application package in the next step.
  4. 通过 Azure 门户创建名为“MPIHelloWorld”的 Batch 应用程序,并将上一步创建的 zip 文件指定为“1.0”版应用程序包。Use the Azure portal to create a Batch application called "MPIHelloWorld", and specify the zip file you created in the previous step as version "1.0" of the application package. 有关详细信息,请参阅上传和管理应用程序See Upload and manage applications for more information.

Tip

生成发行MPIHelloWorld.exe,这样就不需在应用程序包中包括任何其他依赖项(例如 msvcp140d.dllvcruntime140d.dll)。Build a Release version of MPIHelloWorld.exe so that you don't have to include any additional dependencies (for example, msvcp140d.dll or vcruntime140d.dll) in your application package.

执行Execution

  1. 从 GitHub 下载 azure-batch-samplesDownload the azure-batch-samples from GitHub.

  2. 在 Visual Studio 2017 中打开 MultiInstanceTasks 解决方案Open the MultiInstanceTasks solution in Visual Studio 2017. MultiInstanceTasks.sln 解决方案文件位于:The MultiInstanceTasks.sln solution file is located in:

    azure-batch-samples\CSharp\ArticleProjects\MultiInstanceTasks\

  3. 将 Batch 和存储帐户凭据输入到 Microsoft.Azure.Batch.Samples.Common 项目中的 AccountSettings.settingsEnter your Batch and Storage account credentials in AccountSettings.settings in the Microsoft.Azure.Batch.Samples.Common project.

  4. 生成并运行 MultiInstanceTasks 解决方案,在批处理池中的计算节点上执行 MPI 示例应用程序。Build and run the MultiInstanceTasks solution to execute the MPI sample application on compute nodes in a Batch pool.

  5. 可选:在删除资源前,请先通过 Azure 门户Batch Explorer 检查示例池、作业和任务(“MultiInstanceSamplePool”、“MultiInstanceSampleJob”、“MultiInstanceSampleTask”)。Optional: Use the Azure portal or Batch Explorer to examine the sample pool, job, and task ("MultiInstanceSamplePool", "MultiInstanceSampleJob", "MultiInstanceSampleTask") before you delete the resources.

Tip

如果没有 Visual Studio,可下载免费版 Visual Studio CommunityYou can download Visual Studio Community for free if you do not have Visual Studio.

MultiInstanceTasks.exe 的输出与下面类似:Output from MultiInstanceTasks.exe is similar to the following:

Creating pool [MultiInstanceSamplePool]...
Creating job [MultiInstanceSampleJob]...
Adding task [MultiInstanceSampleTask] to job [MultiInstanceSampleJob]...
Awaiting task completion, timeout in 00:30:00...

Main task [MultiInstanceSampleTask] is in state [Completed] and ran on compute node [tvm-1219235766_1-20161017t162002z]:
---- stdout.txt ----
Rank 2 received string "Hello world" from Rank 0
Rank 1 received string "Hello world" from Rank 0

---- stderr.txt ----

Main task completed, waiting 00:00:10 for subtasks to complete...

---- Subtask information ----
subtask: 1
        exit code: 0
        node: tvm-1219235766_3-20161017t162002z
        stdout.txt:
        stderr.txt:
subtask: 2
        exit code: 0
        node: tvm-1219235766_2-20161017t162002z
        stdout.txt:
        stderr.txt:

Delete job? [yes] no: yes
Delete pool? [yes] no: yes

Sample complete, hit ENTER to exit...

后续步骤Next steps