持久保存作业和任务输出Persist job and task output

在 Azure Batch 中运行的任务可能在其运行时产生输出数据。A task running in Azure Batch may produce output data when it runs. 通常需要存储任务输出数据,以便作业中的其他任务和/或执行该作业的客户端应用程序进行检索。Task output data often needs to be stored for retrieval by other tasks in the job, the client application that executed the job, or both. 任务可向 Batch 计算节点的文件系统写入输出数据,但当重置节点映像或节点离开池时,节点上的所有数据都会丢失。Tasks write output data to the file system of a Batch compute node, but all data on the node is lost when it is reimaged or when the node leaves the pool. 任务可能还具有文件保留期,超过此保留期后,系统将删除该任务创建的文件。Tasks may also have a file retention period, after which files created by the task are deleted. 出于这些原因,请务必将以后需要的任务输出保留到数据存储,如 Azure 存储For these reasons, it's important to persist task output that you'll need later to a data store such as Azure Storage.

有关 Batch 中的存储帐户选项,请参阅 Batch 功能概述For storage account options in Batch, see the Batch feature overview.

任务输出的一些常见示例包括:Some common examples of task output include:

  • 任务处理输入数据时创建的文件。Files created when the task processes input data.
  • 与任务执行情况关联的日志文件。Log files associated with task execution.

本文介绍用于持久保存任务输出的各个选项。This article describes various options for persisting task output.

持久保存输出的选项Options for persisting output

可以根据方案通过多种不同的方法来持久保存任务输出:Depending on your scenario, there are a few different approaches you can take to persist task output:

以下各部分简要介绍每个方法以及持久保存输出的一般设计注意事项。The following sections briefly describe each approach as well as general design considerations for persisting output.

使用 Batch 服务 APIUse the Batch service API

Batch 服务支持在向作业添加任务向作业添加任务集合时指定任务数据在 Azure 存储中的输出文件。The Batch service supports specifying output files in Azure Storage for task data when you add a task to a job or add a collection of tasks to a job.

若要详细了解如何使用 Batch 服务 API 来持久保存任务输出,请参阅使用 Batch 服务 API 将任务数据持久保存到 Azure 存储For more information on persisting task output with the Batch service API, see Persist task data to Azure Storage with the Batch service API.

使用适用于 .NET 的 Batch 文件约定库Use the Batch File Conventions library for .NET

Batch 定义了一组可选的约定,用于命名 Azure 存储中的任务输出文件。Batch defines an optional set of conventions for naming task output files in Azure Storage. 如果给定的输出文件取决于作业和任务的名称,则文件约定标准决定了 Azure 存储中目标容器和 Blob 路径的名称。The File Conventions standard determines the names of the destination container and blob path in Azure Storage for a given output file based on the names of the job and task.

你可以自行决定是否使用文件约定标准来命名输出数据文件。It's up to you whether you decide to use the File Conventions standard for naming your output data files. 你还可以任意命名目标容器和 Blob。You can also name the destination container and blob however you wish. 如果使用文件约定标准来命名输出文件,则可在 Azure 门户中查看输出文件。If you do use the File Conventions standard for naming output files, then your output files are available for viewing in the Azure portal.

通过 C# 和 .NET 生成 Batch 解决方案的开发人员可以使用适用于 .NET 的文件约定库,按 Batch 文件约定标准将任务数据持久保存到 Azure 存储帐户。Developers building Batch solutions with C# and .NET can use the File Conventions library for .NET to persist task data to an Azure Storage account, according to the Batch File Conventions standard. 文件约定库负责将输出文件移至 Azure 存储,并以已知方式对目标容器和 Blob 命名。The File Conventions library handles moving output files to Azure Storage and naming destination containers and blobs in a well-known way.

有关使用适用于 .NET 的文件约定库保存任务输出的详细信息,请参阅使用适用于 .NET 的 Batch 文件约定库将作业和任务数据保存到 Azure 存储For more information on persisting task output with the File Conventions library for .NET, see Persist job and task data to Azure Storage with the Batch File Conventions library for .NET.

实现 Batch 文件约定标准Implement the Batch File Conventions standard

如果使用 .NET 之外的语言,则可在自己的应用程序中实现 Batch 文件约定标准If you are using a language other than .NET, you can implement the Batch File Conventions standard in your own application.

如果需要经验证的命名方案,或者需要在 Azure 门户中查看任务输出,则可能需要自行实现文件约定命名标准。You may want to implement the File Conventions naming standard yourself when you want a proven naming scheme, or when you want to view task output in the Azure portal.

实现自定义文件移动解决方案Implement a custom file movement solution

也可实现你自己的完整的文件移动解决方案。You can also implement your own complete file movement solution. 以下情况可以使用此方法:Use this approach when:

  • 需将任务数据持久保存到 Azure 存储之外的其他数据存储。You want to persist task data to a data store other than Azure Storage. 若要将文件上传到 Azure SQL 或 Azure DataLake 之类的数据存储,可以创建一个自定义脚本或可执行文件,以便将文件上传到该位置。To upload files to a data store like Azure SQL or Azure DataLake, you can create a custom script or executable to upload to that location. 运行主要的可执行文件以后,即可在命令行中进行调用。You can then call it on the command line after running your primary executable. 例如,可以在 Windows 节点上调用以下两个命令:doMyWork.exe && uploadMyFilesToSql.exeFor example, on a Windows node, you might call these two commands: doMyWork.exe && uploadMyFilesToSql.exe
  • 需对初始结果执行检查点或提前上传操作。You want to perform check-pointing or early upload of initial results.
  • 需对错误处理保持精细控制。You want to maintain granular control over error handling. 例如,如果需要使用任务依赖关系操作根据特定的任务退出代码来执行某些上传操作,则可能需要实现你自己的解决方案。For example, you may want to implement your own solution if you want to use task dependency actions to take certain upload actions based on specific task exit codes. 有关任务依赖关系操作的详细信息,请参阅创建任务依赖关系,以运行依赖于其他任务的任务For more information on task dependency actions, see Create task dependencies to run tasks that depend on other tasks.

持久保存输出的设计注意事项Design considerations for persisting output

设计 Batch 解决方案时,请考虑以下与作业和任务输出相关的因素。When designing your Batch solution, consider the following factors related to job and task outputs.

  • 计算节点生存期:计算节点通常是瞬态的,尤其是在启用了自动缩放的池中。Compute node lifetime: Compute nodes are often transient, especially in autoscale-enabled pools. 在某个节点上运行的任务的输出仅在该节点存在时才可用,并且仅在为任务设置的文件保留期内可用。Output from a task that runs on a node is available only while the node exists, and only within the file retention period you've set for the task. 如果在任务完成以后,可能需要任务生成的输出,则该任务必须将其输出文件上传到某个持久性存储,例如 Azure 存储。If a task produces output that may be needed after the task is complete, then the task must upload its output files to a durable store such as Azure Storage.

  • 输出存储:建议将 Azure 存储用作任务输出的数据存储,但可使用任意持久存储。Output storage: Azure Storage is recommended as a data store for task output, but you can use any durable storage. 将任务输出写入 Azure 存储的功能已集成到 Batch 服务 API 中。Writing task output to Azure Storage is integrated into the Batch service API. 如果使用其他形式的持久性存储,则需编写应用程序逻辑,自行持久保存任务输出。If you use another form of durable storage, you'll need to write the application logic to persist task output yourself.

  • 输出检索:可以直接从池中的计算节点检索任务输出;如果已保存任务输出,则可以从 Azure 存储或其他数据存储检索任务输出。Output retrieval: You can retrieve task output directly from the compute nodes in your pool, or from Azure Storage or another data store if you have persisted task output. 若要直接从计算节点检索任务输出,需要获取文件名及其在节点上的输出位置。To retrieve a task's output directly from a compute node, you need the file name and its output location on the node. 如果将任务输出持久保存到 Azure 存储,则需获得 Azure 存储中文件的完整路径,然后才能使用 Azure 存储 SDK 下载输出文件。If you persist task output to Azure Storage, then you need the full path to the file in Azure Storage to download the output files with the Azure Storage SDK.

  • 查看输出:导航到 Azure 门户中的某个 Batch 任务并选择“节点上的文件” 时,将看到与该任务关联的所有文件,而不仅仅是你想要查看的输出文件。Viewing output: When you navigate to a Batch task in the Azure portal and select Files on node, you are presented with all files associated with the task, not just the output files you're interested in. 同样,计算节点上的文件仅在该节点存在时才可用,并且仅在为任务设置的文件保留时间范围内才可用。Again, files on compute nodes are available only while the node exists and only within the file retention time you've set for the task. 若要查看已持久保存到 Azure 存储的任务输出,可以使用 Azure 门户,也可以使用 Azure 存储客户端应用程序,例如 Azure 存储资源管理器To view task output that you've persisted to Azure Storage, you can use the Azure portal or an Azure Storage client application such as the Azure Storage Explorer. 若要使用门户或其他工具查看 Azure 存储中的输出数据,必须知道文件的位置,然后直接导航到该位置。To view output data in Azure Storage with the portal or another tool, you must know the file's location and navigate to it directly.

后续步骤Next steps