在 Batch 计算节点上运行作业准备和作业发布任务Run job preparation and job release tasks on Batch compute nodes

Azure Batch 作业在执行其任务之前,通常需要经过某种形式的设置,并且需要在其任务完成时进行作业后维护。An Azure Batch job often requires some form of setup before its tasks are executed, and post-job maintenance when its tasks are completed. 可能需要将常见的任务输入数据下载到计算节点,或者在作业完成之后,将任务输出数据上传到 Azure 存储。You might need to download common task input data to your compute nodes, or upload task output data to Azure Storage after the job completes. 可以使用“作业准备”和“作业释放”任务来执行这些操作。You can use job preparation and job release tasks to perform these operations.

什么是作业准备和作业释放任务?What are job preparation and release tasks?

在运行作业的任务之前,作业准备任务在计划要运行至少一个任务的所有计算节点上运行。Before a job's tasks run, the job preparation task runs on all compute nodes scheduled to run at least one task. 作业完成后,作业释放任务会在池中至少运行了一个任务的每个节点上运行。Once the job is completed, the job release task runs on each node in the pool that executed at least one task. 与普通的 Batch 任务一样,可以指定在运行作业准备或释放任务时要调用的命令行。As with normal Batch tasks, you can specify a command line to be invoked when a job preparation or release task is run.

作业准备和释放任务提供许多熟悉的 Batch 任务功能,例如文件下载(资源文件)、提升权限的执行、自定义环境变量、最大执行持续时间、重试计数和文件保留时间。Job preparation and release tasks offer familiar Batch task features such as file download (resource files), elevated execution, custom environment variables, maximum execution duration, retry count, and file retention time.

以下部分介绍如何使用 Batch .NET 库中的 JobPreparationTaskJobReleaseTask 类。In the following sections, you'll learn how to use the JobPreparationTask and JobReleaseTask classes found in the Batch .NET library.

提示

作业准备和释放任务在“共享池”环境中特别有用。在这些环境中,计算节点池在任务运行之间保留,并由许多作业使用。Job preparation and release tasks are especially helpful in "shared pool" environments, in which a pool of compute nodes persists between job runs and is used by many jobs.

何时使用作业准备和释放任务When to use job preparation and release tasks

作业准备和作业释放任务适用于以下情况:Job preparation and job release tasks are a good fit for the following situations:

下载常见任务数据Download common task data

Batch 作业通常需要一组通用的数据作为作业任务的输入。Batch jobs often require a common set of data as input for the job's tasks. 例如,在每日风险分析计算中,市场数据特定于作业,同时也是作业中所有任务通用的数据。For example, in daily risk analysis calculations, market data is job-specific, yet common to all tasks in the job. 这些市场数据(大小通常为若干 GB)应该只下载到每个计算节点一次,以供节点上运行的任意任务使用。This market data, often several gigabytes in size, should be downloaded to each compute node only once so that any task that runs on the node can use it. 在执行作业的其他任务之前使用作业准备任务 将数据下载到每个节点。Use a job preparation task to download this data to each node before the execution of the job's other tasks.

删除作业和任务输出Delete job and task output

在“共享池”环境中,不会解除作业之间的池计算节点,因此可能需要删除运行之间的作业数据。In a "shared pool" environment, where a pool's compute nodes are not decommissioned between jobs, you may need to delete job data between runs. 可能需要保留节点上的磁盘空间,或符合组织的安全策略。You might need to conserve disk space on the nodes, or satisfy your organization's security policies. 使用 作业释放任务 可以删除作业准备任务下载的数据,或者在任务执行期间生成的数据。Use a job release task to delete data that was downloaded by a job preparation task, or generated during task execution.

日志保留Log retention

可能想要保留任务生成的日志文件的副本,或失败应用程序可能生成的崩溃转储文件。You might want to keep a copy of log files that your tasks generate, or perhaps crash dump files that can be generated by failed applications. 在这种情况下,使用作业释放任务可将这些数据压缩并上传到 Azure 存储帐户。Use a job release task in such cases to compress and upload this data to an Azure Storage account.

提示

保存日志及其他作业和任务输出数据的另一种方法是使用 Azure Batch 文件约定库。Another way to persist logs and other job and task output data is to use the Azure Batch File Conventions library.

作业准备任务Job preparation task

在执行作业的任务之前,Batch 在每个计划运行任务的计算节点上执行作业准备任务。Before execution of a job's tasks, Batch executes the job preparation task on each compute node scheduled to run a task. 默认情况下,Batch 会等待作业准备任务完成,然后才在节点上运行计划执行的任务。By default, Batch waits for the job preparation task to complete before running the tasks scheduled to execute on the node. 但可以将该服务配置为不要等待。However, you can configure the service not to wait. 如果节点重启,作业准备任务将重新运行。If the node restarts, the job preparation task runs again. 此可以禁用此行为。You can also disable this behavior. 如果你的作业配置了作业准备任务和作业管理器任务,那么作业准备任务将在作业管理器任务之前运行,就像它对所有其他任务所做的一样。If you have a job with a job preparation task and a job manager task configured, the job preparation task runs before the job manager task, just as it does for all other tasks. 作业准备任务始终首先运行。The job preparation task always runs first.

作业准备任务只会在计划运行任务的节点上运行。The job preparation task is executed only on nodes that are scheduled to run a task. 例如,这可以防止未分配任务的节点不必要地执行准备任务,This prevents the unnecessary execution of a preparation task in case a node is not assigned a task. 当作业的任务数小于池中的节点数时,可能会出现这种情况。This can occur when the number of tasks for a job is less than the number of nodes in a pool. 此外,这也适用于在任务计数小于可能的并行任务总数的情况下启用并行任务执行,从而留出一些空闲节点的情况。It also applies when concurrent task execution is enabled, which leaves some nodes idle if the task count is lower than the total possible concurrent tasks. 不在空闲节点上运行作业准备任务可以节省数据传输费用。By not running the job preparation task on idle nodes, you can spend less money on data transfer charges.

备注

JobPreparationTaskCloudPool.StartTask 的不同之处在于,JobPreparationTask 在每个作业启动时执行,而 StartTask 只在计算节点首次加入池或重启时执行。JobPreparationTask differs from CloudPool.StartTask in that JobPreparationTask executes at the start of each job, whereas StartTask executes only when a compute node first joins a pool or restarts.

作业释放任务Job release task

将作业标记为完成后,作业释放任务会在池中至少运行了一个任务的每个节点上执行。Once a job is marked as completed, the job release task is executed on each node in the pool that executed at least one task. 可以通过发出终止请求将作业标记为已完成。You mark a job as completed by issuing a terminate request. 然后,Batch 服务会将作业状态设置为正在终止,终止与任务关联的任何活动任务或正在运行的任务,并运行作业释放任务。The Batch service then sets the job state to terminating, terminates any active or running tasks associated with the job, and runs the job release task. 然后,该作业将进入 已完成 状态。The job then moves to the completed state.

备注

作业删除操作也会执行作业释放任务。Job deletion also executes the job release task. 但是,如果已经终止了某个作业,则以后删除该作业时,不会再次运行释放任务。However, if a job has already been terminated, the release task is not run a second time if the job is later deleted.

在批处理服务终止作业释放任务之前,它最多可以运行 15 分钟。Jobs release tasks can run for a maximum of 15 minutes before being terminated by the Batch service. 有关详细信息,请参阅 REST API 参考文档For more information, see the REST API reference documentation.

使用 Batch .NET 执行作业准备和释放任务Job prep and release tasks with Batch .NET

要使用作业准备任务,可将 JobPreparationTask 对象分配到作业的 CloudJob.JobPreparationTask 属性。To use a job preparation task, assign a JobPreparationTask object to your job's CloudJob.JobPreparationTask property. 同样,初始化 JobReleaseTask 并将它分配到作业的 CloudJob.JobReleaseTask 属性可以设置作业的释放任务。Similarly, initialize a JobReleaseTask and assign it to your job's CloudJob.JobReleaseTask property to set the job's release task.

在此代码片段中,myBatchClientBatchClient 的实例,myPool 是 Batch 帐户中的现有池。In this code snippet, myBatchClient is an instance of BatchClient, and myPool is an existing pool within the Batch account.

// Create the CloudJob for CloudPool "myPool"
CloudJob myJob =
    myBatchClient.JobOperations.CreateJob(
        "JobPrepReleaseSampleJob",
        new PoolInformation() { PoolId = "myPool" });

// Specify the command lines for the job preparation and release tasks
string jobPrepCmdLine =
    "cmd /c echo %AZ_BATCH_NODE_ID% > %AZ_BATCH_NODE_SHARED_DIR%\\shared_file.txt";
string jobReleaseCmdLine =
    "cmd /c del %AZ_BATCH_NODE_SHARED_DIR%\\shared_file.txt";

// Assign the job preparation task to the job
myJob.JobPreparationTask =
    new JobPreparationTask { CommandLine = jobPrepCmdLine };

// Assign the job release task to the job
myJob.JobReleaseTask =
    new JobReleaseTask { CommandLine = jobReleaseCmdLine };

await myJob.CommitAsync();

如前所述,终止或删除作业时会执行释放任务。As mentioned earlier, the release task is executed when a job is terminated or deleted. 使用 JobOperations.TerminateJobAsync 终止作业。Terminate a job with JobOperations.TerminateJobAsync. 使用 JobOperations.DeleteJobAsync 删除作业。Delete a job with JobOperations.DeleteJobAsync. 通常在作业的任务完成时或者达到定义的超时时终止或删除操作。You typically terminate or delete a job when its tasks are completed, or when a timeout that you've defined has been reached.

// Terminate the job to mark it as Completed; this will initiate the
// Job Release Task on any node that executed job tasks. Note that the
// Job Release Task is also executed when a job is deleted, thus you
// need not call Terminate if you typically delete jobs after task completion.
await myBatchClient.JobOperations.TerminateJobAsync("JobPrepReleaseSampleJob");

GitHub 上的代码示例Code sample on GitHub

若要了解作业准备和释放任务的操作实践,请查看 GitHub 上的 JobPrepRelease 示例项目。To see job preparation and release tasks in action, check out the JobPrepRelease sample project on GitHub. 此控制台应用程序将执行以下操作:This console application does the following:

  1. 创建包含两个节点的池。Creates a pool with two nodes.
  2. 创建具有作业准备、释放和标准任务的作业。Creates a job with job preparation, release, and standard tasks.
  3. 运行作业准备任务,该任务首先会将节点 ID 写入节点的“共享”目录中的文本文件内。Runs the job preparation task, which first writes the node ID to a text file in a node's "shared" directory.
  4. 在每个节点上运行一个任务,该任务将其任务 ID 写入同一文本文件。Runs a task on each node that writes its task ID to the same text file.
  5. 完成所有任务(或达到超时)后,将每个节点的文本文件内容输出到控制台。Once all tasks are completed (or the timeout is reached), prints the contents of each node's text file to the console.
  6. 完成作业后,运行作业释放任务以从节点中删除该文件。When the job is completed, runs the job release task to delete the file from the node.
  7. 输出执行作业准备和释放任务的每个节点上的这些任务的退出代码。Prints the exit codes of the job preparation and release tasks for each node on which they executed.
  8. 暂停执行,以便确认删除作业和/或池。Pauses execution to allow confirmation of job and/or pool deletion.

示例应用程序的输出类似于:Output from the sample application is similar to the following:

Attempting to create pool: JobPrepReleaseSamplePool
Created pool JobPrepReleaseSamplePool with 2 nodes
Checking for existing job JobPrepReleaseSampleJob...
Job JobPrepReleaseSampleJob not found, creating...
Submitting tasks and awaiting completion...
All tasks completed.

Contents of shared\job_prep_and_release.txt on tvm-2434664350_1-20160623t173951z:
-------------------------------------------
tvm-2434664350_1-20160623t173951z tasks:
  task001
  task004
  task005
  task006

Contents of shared\job_prep_and_release.txt on tvm-2434664350_2-20160623t173951z:
-------------------------------------------
tvm-2434664350_2-20160623t173951z tasks:
  task008
  task002
  task003
  task007

Waiting for job JobPrepReleaseSampleJob to reach state Completed
...

tvm-2434664350_1-20160623t173951z:
  Prep task exit code:    0
  Release task exit code: 0

tvm-2434664350_2-20160623t173951z:
  Prep task exit code:    0
  Release task exit code: 0

Delete job? [yes] no
yes
Delete pool? [yes] no
yes

Sample complete, hit ENTER to exit...

备注

由于新池中各个节点的创建和启动时间并不一样(某些节点比其他节点更早做好任务准备),输出可能不同。Due to the variable creation and start time of nodes in a new pool (some nodes are ready for tasks before others), you may see different output. 具体而言,因为任务快速完成,池的某个节点可能执行作业的所有任务。Specifically, because the tasks complete quickly, one of the pool's nodes may execute all of the job's tasks. 如果发生这种情况,会发现未执行任何任务的节点没有作业准备和作业释放任务存在。If this occurs, you will notice that the job prep and release tasks do not exist for the node that executed no tasks.

在 Azure 门户中检查作业准备和释放任务Inspect job preparation and release tasks in the Azure portal

在运行示例应用程序时,可以使用 Azure 门户查看作业及其任务的属性,甚至可以下载作业任务修改的共享文本文件。When you run the sample application, you can use the Azure portal to view the properties of the job and its tasks, or even download the shared text file that is modified by the job's tasks.

以下屏幕截图显示了在运行示例应用程序之后,Azure 门户中出现的“准备任务边栏选项卡”。The screenshot below shows the Preparation tasks blade in the Azure portal after a run of the sample application. 在任务完成之后(但在删除作业与池之前),导航到 JobPrepReleaseSampleJob 属性,并单击“准备任务”或“释放任务”以查看其属性。Navigate to the JobPrepReleaseSampleJob properties after your tasks have completed (but before deleting your job and pool) and click Preparation tasks or Release tasks to view their properties.

Azure 门户中的作业准备属性

后续步骤Next steps

应用程序包Application packages

除了作业准备任务外,还可以使用 Batch 的应用程序包功能来为计算节点做好任务执行准备。In addition to the job preparation task, you can also use the application packages feature of Batch to prepare compute nodes for task execution. 此功能特别适合用于部署不需要运行安装程序的应用程序、包含许多(100 个以上)文件的应用程序,或需要严格版本控制的应用程序。This feature is especially useful for deploying applications that do not require running an installer, applications that contain many (100+) files, or applications that require strict version control.

安装应用程序和暂存数据Installing applications and staging data

以下 MSDN 论坛文章提供了有关准备节点以运行任务的各种方法的概述:This MSDN forum post provides an overview of several methods of preparing your nodes for running tasks:

在 Batch 计算节点上安装应用程序和暂存数据Installing applications and staging data on Batch compute nodes

此文章的作者是一位 Azure Batch 团队成员,其中介绍了将应用程序和数据部署到计算节点时可以使用的的多种方法。Written by one of the Azure Batch team members, it discusses several techniques that you can use to deploy applications and data to compute nodes.