使用适用于 .NET 的 Batch 文件约定库将作业和任务数据保存到 Azure 存储Persist job and task data to Azure Storage with the Batch File Conventions library for .NET

在 Azure Batch 中运行的任务可能在其运行时产生输出数据。A task running in Azure Batch may produce output data when it runs. 通常需要存储任务输出数据,以便作业中的其他任务和/或执行该作业的客户端应用程序进行检索。Task output data often needs to be stored for retrieval by other tasks in the job, the client application that executed the job, or both. 任务可向 Batch 计算节点的文件系统写入输出数据,但当重置节点映像或节点离开池时,节点上的所有数据都会丢失。Tasks write output data to the file system of a Batch compute node, but all data on the node is lost when it is reimaged or when the node leaves the pool. 任务可能还具有文件保留期,超过此保留期后,系统将删除该任务创建的文件。Tasks may also have a file retention period, after which files created by the task are deleted. 出于这些原因,请务必将以后需要的任务输出保留到数据存储,如 Azure 存储For these reasons, it's important to persist task output that you'll need later to a data store such as Azure Storage.

有关 Batch 中的存储帐户选项,请参阅 Batch 功能概述For storage account options in Batch, see the Batch feature overview.

保留任务数据的方法之一是使用适用于 .NET 的 Azure Batch 文件约定库One way to persist task data is to use the Azure Batch File Conventions library for .NET. 文件约定库简化了在 Azure 存储中存储和检索任务输出数据的过程。The File Conventions library simplifies the process of storing task output data to Azure Storage and retrieving it. 可以在任务代码和客户端代码中使用文件约定库 — 在任务代码中用于保存文件,在客户端代码中用于列出和检索文件。You can use the File Conventions library in both task and client code — in task code for persisting files, and in client code to list and retrieve them. 任务代码还可以使用该库来检索上游任务的输出(例如,在任务依赖项方案中)。Your task code can also use the library to retrieve the output of upstream tasks, such as in a task dependencies scenario.

若要使用文件约定库检索输出文件,可以查找给定作业或任务的文件:按 ID 和用途列出这些文件即可。To retrieve output files with the File Conventions library, you can locate the files for a given job or task by listing them by ID and purpose. 不需要知道文件的名称或位置。You don't need to know the names or locations of the files. 例如,可以使用文件约定库列出给定任务的所有中间文件,或获取给定作业的预览文件。For example, you can use the File Conventions library to list all intermediate files for a given task, or get a preview file for a given job.

提示

对于在使用虚拟机配置创建的池上运行的任务和作业管理器任务,从版本 2017-05-01 开始,Batch 服务 API 支持将此类任务的输出数据保存到 Azure 存储。Starting with version 2017-05-01, the Batch service API supports persisting output data to Azure Storage for tasks and job manager tasks that run on pools created with the virtual machine configuration. Batch 服务 API 提供了一种简单的方法来从创建任务的代码内部保存输出,可以替代文件约定库。The Batch service API provides a simple way to persist output from within the code that creates a task and serves as an alternative to the File Conventions library. 可以修改 Batch 客户端应用程序来保存输出,而无需更新任务运行的应用程序。You can modify your Batch client applications to persist output without needing to update the application that your task is running. 有关详细信息,请参阅使用 Batch 服务 API 将任务数据保存到 Azure 存储For more information, see Persist task data to Azure Storage with the Batch service API.

何时使用文件约定库保存任务输出?When do I use the File Conventions library to persist task output?

Azure Batch 提供多种方式来保存任务输出。Azure Batch provides more than one way to persist task output. 文件约定最适合以下情况:The File Conventions is best suited to these scenarios:

  • 可以轻松修改任务运行的应用程序的代码,以使用文件约定库来保存文件。You can easily modify the code for the application that your task is running to persist files using the File Conventions library.
  • 希望在任务仍然运行时,将数据流式传输到 Azure 存储。You want to stream data to Azure Storage while the task is still running.
  • 希望保存使用云服务配置或虚拟机配置创建的池中的数据。You want to persist data from pools created with either the cloud service configuration or the virtual machine configuration.
  • 作业中的客户端应用程序或其他任务需要根据 ID 或用途定位并下载任务输出文件。Your client application or other tasks in the job needs to locate and download task output files by ID or by purpose.
  • 想要在 Azure 门户中查看任务输出。You want to view task output in the Azure portal.

如果你的情况与上面不同,可能需要考虑不同的方法。If your scenario differs from those listed above, you may need to consider a different approach. 有关保存任务输出的其他选项的详细信息,请参阅将作业和任务输出保存到 Azure 存储For more information on other options for persisting task output, see Persist job and task output to Azure Storage.

什么是 Batch 文件约定标准?What is the Batch File Conventions standard?

Batch 文件约定标准为输出文件要写入到的目标容器和 Blob 路径提供命名方案。The Batch File Conventions standard provides a naming scheme for the destination containers and blob paths to which your output files are written. 可在 Azure 门户中查看 Azure 存储中保存的、遵守文件约定标准的文件。Files persisted to Azure Storage that adhere to the File Conventions standard are automatically available for viewing in the Azure portal. 门户可以识别命名约定,因此可以显示遵守该约定的文件。The portal is aware of the naming convention and so can display files that adhere to it.

适用于 .NET 的文件约定库根据文件约定标准,自动为存储容器和任务输出文件命名。The File Conventions library for .NET automatically names your storage containers and task output files according to the File Conventions standard. 文件约定库还提供根据作业 ID、任务 ID 或用途在 Azure 存储中查询输出文件的方法。The File Conventions library also provides methods to query output files in Azure Storage according to job ID, task ID, or purpose.

如果使用除 .NET 之外的语言进行开发,则可以在应用程序中自行实现文件约定标准。If you are developing with a language other than .NET, you can implement the File Conventions standard yourself in your application. 有关详细信息,请参阅实现 Batch 文件约定标准For more information, see Implement the Batch File Conventions standard.

若要使用文件约定库将输出数据保存到 Azure 存储,必须先将 Azure 存储帐户链接到 Batch 帐户。To persist output data to Azure Storage using the File Conventions library, you must first link an Azure Storage account to your Batch account. 如果尚未这样做,请使用 Azure 门户将存储帐户链接到 Batch 帐户:If you haven't done so already, link a Storage account to your Batch account by using the Azure portal:

  1. 导航到 Azure 门户中的批处理帐户。Navigate to your Batch account in the Azure portal.
  2. 在“设置”下面,选择“存储帐户”。 Under Settings, select Storage Account.
  3. 如果目前没有任何存储帐户与 Batch 帐户关联,请单击“存储帐户(无)”。If you do not already have a Storage account associated with your Batch account, click Storage Account (None).
  4. 从订阅列表中选择一个存储帐户。Select a Storage account from the list for your subscription. 为获得最佳性能,请使用运行任务的 Batch 帐户所在的区域中的 Azure 存储帐户。For best performance, use an Azure Storage account that is in the same region as the Batch account where your tasks are running.

保存输出数据Persist output data

若要使用文件约定库保存作业和任务输出数据,请在 Azure 存储中创建一个容器,然后将输出保存到该容器。To persist job and task output data with the File Conventions library, create a container in Azure Storage, then save the output to the container. 在任务代码中使用适用于 .NET 的 Azure 存储客户端库将任务输出上传到该容器。Use the Azure Storage client library for .NET in your task code to upload the task output to the container.

有关在 Azure 存储中使用容器和 Blob 的详细信息,请参阅通过 .NET 开始使用 Azure Blob 存储For more information about working with containers and blobs in Azure Storage, see Get started with Azure Blob storage using .NET.

警告

使用文件约定库保存的所有作业和任务输出存储在同一个容器中。All job and task outputs persisted with the File Conventions library are stored in the same container. 如果大量的任务同时尝试保留文件,可能会强制实施 Azure 存储限制。If a large number of tasks try to persist files at the same time, Azure Storage throttling limits may be enforced. 有关限制的详细信息,请参阅 Blob 存储的性能与可伸缩性核对清单For more information about throttling limits, see Performance and scalability checklist for Blob storage.

创建存储容器Create storage container

若要将任务输出保留到 Azure 存储,首先请通过调用 CloudJob.PrepareOutputStorageAsync 创建容器。To persist task output to Azure Storage, first create a container by calling CloudJob.PrepareOutputStorageAsync. 此扩展方法采用 CloudStorageAccount 对象作为参数。This extension method takes a CloudStorageAccount object as a parameter. 它会创建根据文件约定标准命名的容器,以便通过 Azure 门户和本文稍后介绍的检索方法发现其内容。It creates a container named according to the File Conventions standard, so that its contents are discoverable by the Azure portal and the retrieval methods discussed later in the article.

我们通常将创建容器的代码放在客户端应用程序中 — 即创建池、作业和任务的应用程序。You typically place the code to create a container in your client application — the application that creates your pools, jobs, and tasks.

CloudJob job = batchClient.JobOperations.CreateJob(
    "myJob",
    new PoolInformation { PoolId = "myPool" });

// Create reference to the linked Azure Storage account
CloudStorageAccount linkedStorageAccount =
    new CloudStorageAccount(myCredentials, true);

// Create the blob storage container for the outputs
await job.PrepareOutputStorageAsync(linkedStorageAccount);

存储任务输出Store task outputs

在 Azure 存储中准备好一个容器后,任务可以使用文件约定库中的 TaskOutputStorage 类将输出保存到该容器。Now that you've prepared a container in Azure Storage, tasks can save output to the container by using the TaskOutputStorage class found in the File Conventions library.

在任务代码中,请先创建一个 TaskOutputStorage 对象,然后,当该任务完成工作时,会调用 TaskOutputStorage.SaveAsync 方法将输出保存到 Azure 存储。In your task code, first create a TaskOutputStorage object, then when the task has completed its work, call the TaskOutputStorage.SaveAsync method to save its output to Azure Storage.

CloudStorageAccount linkedStorageAccount = new CloudStorageAccount(myCredentials);
string jobId = Environment.GetEnvironmentVariable("AZ_BATCH_JOB_ID");
string taskId = Environment.GetEnvironmentVariable("AZ_BATCH_TASK_ID");

TaskOutputStorage taskOutputStorage = new TaskOutputStorage(
    linkedStorageAccount, jobId, taskId);

/* Code to process data and produce output file(s) */

await taskOutputStorage.SaveAsync(TaskOutputKind.TaskOutput, "frame_full_res.jpg");
await taskOutputStorage.SaveAsync(TaskOutputKind.TaskPreview, "frame_low_res.jpg");

TaskOutputStorage.SaveAsync 方法的 kind 参数对保存的文件进行分类。The kind parameter of the TaskOutputStorage.SaveAsync method categorizes the persisted files. 有四个预定义的 TaskOutputKind 类型:TaskOutputTaskPreviewTaskLogTaskIntermediate.。也可以定义输出的自定义类别。There are four predefined TaskOutputKind types: TaskOutput, TaskPreview, TaskLog, and TaskIntermediate. You can also define custom categories of output.

以后在 Batch 中查询给定任务的已保存输出时,可以使用这些输出类型来指定要列出哪种类型的输出。These output types allow you to specify which type of outputs to list when you later query Batch for the persisted outputs of a given task. 换而言之,列出某个任务的输出时,可以根据某种输出类型来筛选列表。In other words, when you list the outputs for a task, you can filter the list on one of the output types. 例如,“列出任务 109 的预览输出。”For example, "Give me the preview output for task 109." 本文稍后的“检索输出”部分中会详细介绍如何列出和检索输出。More on listing and retrieving outputs appears in Retrieve output later in the article.

提示

输出种类还决定了特定文件在 Azure 门户中的显示位置:TaskOutput 类别的文件显示在“任务输出文件”下,TaskLog 文件显示在“任务日志”下。The output kind also determines where in the Azure portal a particular file appears: TaskOutput-categorized files appear under Task output files, and TaskLog files appear under Task logs.

存储作业输出Store job outputs

除了存储任务输出以外,还可以存储与整个作业关联的输出。In addition to storing task outputs, you can store the outputs associated with an entire job. 例如,在电影渲染作业的合并任务中,可以将完全渲染的电影保存为作业输出。For example, in the merge task of a movie rendering job, you could persist the fully rendered movie as a job output. 作业完成后,客户端应用程序可以列出并检索该作业的输出,而不需要查询各个任务。When your job is completed, your client application can list and retrieve the outputs for the job, and does not need to query the individual tasks.

通过调用 JobOutputStorage.SaveAsync 方法存储作业输出,并指定 JobOutputKind 和文件名:Store job output by calling the JobOutputStorage.SaveAsync method, and specify the JobOutputKind and filename:

CloudJob job = new JobOutputStorage(acct, jobId);
JobOutputStorage jobOutputStorage = job.OutputStorage(linkedStorageAccount);

await jobOutputStorage.SaveAsync(JobOutputKind.JobOutput, "mymovie.mp4");
await jobOutputStorage.SaveAsync(JobOutputKind.JobPreview, "mymovie_preview.mp4");

与用于任务输出的 TaskOutputKind 类型一样,可以使用 JobOutputKind 类型来对作业的保留文件进行分类。As with the TaskOutputKind type for task outputs, you use the JobOutputKind type to categorize a job's persisted files. 以后可以使用此参数查询(列出)特定的输出类型。This parameter allows you to later query for (list) a specific type of output. JobOutputKind 类型包括输出和预览类别,并支持创建自定义类别。The JobOutputKind type includes both output and preview categories, and supports creating custom categories.

存储任务日志Store task logs

除了在任务或作业完成时将文件保存到持久性存储以外,可能还需要保存执行某个任务期间更新的文件 — 例如,日志文件或 stdout.txtstderr.txtIn addition to persisting a file to durable storage when a task or job completes, you may need to persist files that are updated during the execution of a task — log files or stdout.txt and stderr.txt, for example. 为此,Azure Batch 文件约定库提供了 TaskOutputStorage.SaveTrackedAsync 方法。For this purpose, the Azure Batch File Conventions library provides the TaskOutputStorage.SaveTrackedAsync method. 使用 SaveTrackedAsync,可以跟踪对节点上的文件所做的更新(按照指定的间隔),并将这些更新保留到 Azure 存储。With SaveTrackedAsync, you can track updates to a file on the node (at an interval that you specify) and persist those updates to Azure Storage.

在以下代码片段中,我们会在执行任务期间,每隔 15 秒使用 SaveTrackedAsync 更新 Azure 存储中的 stdout.txtIn the following code snippet, we use SaveTrackedAsync to update stdout.txt in Azure Storage every 15 seconds during the execution of the task:

TimeSpan stdoutFlushDelay = TimeSpan.FromSeconds(3);
string logFilePath = Path.Combine(
    Environment.GetEnvironmentVariable("AZ_BATCH_TASK_DIR"), "stdout.txt");

// The primary task logic is wrapped in a using statement that sends updates to
// the stdout.txt blob in Storage every 15 seconds while the task code runs.
using (ITrackedSaveOperation stdout =
        await taskStorage.SaveTrackedAsync(
        TaskOutputKind.TaskLog,
        logFilePath,
        "stdout.txt",
        TimeSpan.FromSeconds(15)))
{
    /* Code to process data and produce output file(s) */

    // We are tracking the disk file to save our standard output, but the
    // node agent may take up to 3 seconds to flush the stdout stream to
    // disk. So give the file a moment to catch up.
     await Task.Delay(stdoutFlushDelay);
}

带有注释的部分 Code to process data and produce output file(s) 是任务通常会执行的代码的占位符。The commented section Code to process data and produce output file(s) is a placeholder for the code that your task would normally perform. 例如,代码可能会从 Azure 存储下载数据,并对其执行转换或计算。For example, you might have code that downloads data from Azure Storage and performs transformation or calculation on it. 此代码片段的重要部分演示了如何在 using 块中包装此类代码,以定期使用 SaveTrackedAsync 更新文件。The important part of this snippet is demonstrating how you can wrap such code in a using block to periodically update a file with SaveTrackedAsync.

节点代理是一个程序,它在池中的每个节点上运行,并在节点与 Batch 服务之间提供命令和控制接口。The node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. using 块的末尾必须有 Task.Delay 调用,确保节点代理有时间将标准输出的内容刷新到节点上的 stdout.txt 文件。The Task.Delay call is required at the end of this using block to ensure that the node agent has time to flush the contents of standard out to the stdout.txt file on the node. 若没有此延迟,可能会遗漏最后几秒的输出。Without this delay, it is possible to miss the last few seconds of output. 并非所有文件都需要此延迟。This delay may not be required for all files.

备注

启用 SaveTrackedAsync 文件跟踪时,只会在 Azure 存储中保存被跟踪文件的追加内容。When you enable file tracking with SaveTrackedAsync, only appends to the tracked file are persisted to Azure Storage. 此方法只应该用于跟踪非轮转的日志文件,或使用文件末尾的追加操作进行写入的其他文件。Use this method only for tracking non-rotating log files or other files that are written to with append operations to the end of the file.

检索输出数据Retrieve output data

使用 Azure Batch 文件约定库检索保存的输出时,以任务和作业为中心的方式执行此操作。When you retrieve your persisted output using the Azure Batch File Conventions library, you do so in a task- and job-centric manner. 可以请求给定任务或作业的输出,而无需知道输出在 Azure 存储中的路径,甚至不需要知道其文件名。You can request the output for given task or job without needing to know its path in Azure Storage, or even its file name. 可以按任务 ID 或作业 ID 请求输出文件。Instead, you can request output files by task or job ID.

以下代码片段将循环访问某个作业的任务,列显有关该任务的输出文件的一些信息,然后从存储空间下载该任务的文件。The following code snippet iterates through a job's tasks, prints some information about the output files for the task, and then downloads its files from Storage.

foreach (CloudTask task in myJob.ListTasks())
{
    foreach (OutputFileReference output in
        task.OutputStorage(storageAccount).ListOutputs(
            TaskOutputKind.TaskOutput))
    {
        Console.WriteLine($"output file: {output.FilePath}");

        output.DownloadToFileAsync(
            $"{jobId}-{output.FilePath}",
            System.IO.FileMode.Create).Wait();
    }
}

在 Azure 门户中查看输出文件View output files in the Azure portal

Azure 门户将显示使用 Batch 文件约定标准保存到链接的 Azure 存储帐户的任务输出文件和日志。The Azure portal displays task output files and logs that are persisted to a linked Azure Storage account using the Batch File Conventions standard. 可以使用所选的语言自行实现这些约定,或者在 .NET 应用程序中使用文件约定库。You can implement these conventions yourself in the a language of your choice, or you can use the File Conventions library in your .NET applications.

若要在门户中显示输出文件,必须满足以下要求:To enable the display of your output files in the portal, you must satisfy the following requirements:

  1. 将 Azure 存储帐户链接到 Batch 帐户。Link an Azure Storage account to your Batch account.
  2. 保存输出时遵循存储容器和文件的预定义命名约定。Adhere to the predefined naming conventions for Storage containers and files when persisting outputs. 可以在文件约定库的自述文件中找到这些约定的定义。You can find the definition of these conventions in the File Conventions library README. 如果使用 Azure Batch 文件约定库保存输出,则会根据文件约定标准保留文件。If you use the Azure Batch File Conventions library to persist your output, your files are persisted according to the File Conventions standard.

若要在 Azure 门户中查看任务输出文件和日志,请导航到要查看其输出的任务,然后单击“保存的输出文件”或“保存的日志”。 To view task output files and logs in the Azure portal, navigate to the task whose output you are interested in, then click either Saved output files or Saved logs. 下图显示了 ID 为“007”的任务的“保存的输出文件”:This image shows the Saved output files for the task with ID "007":

Azure 门户中的“任务输出”边栏选项卡Task outputs blade in the Azure portal

代码示例Code sample

PersistOutputs 示例项目是 GitHub 上的 Azure Batch 代码示例之一。The PersistOutputs sample project is one of the Azure Batch code samples on GitHub. 此 Visual Studio 解决方案演示如何使用 Azure Batch 文件约定库将任务输出保存到持久性存储。This Visual Studio solution demonstrates how to use the Azure Batch File Conventions library to persist task output to durable storage. 若要运行该示例,请遵循以下步骤:To run the sample, follow these steps:

  1. 在 Visual Studio 2019 中打开项目。Open the project in Visual Studio 2019.
  2. 将 Batch 和存储帐户凭据添加到 Microsoft.Azure.Batch.Samples.Common 项目中的 AccountSettings.settingsAdd your Batch and Storage account credentials to AccountSettings.settings in the Microsoft.Azure.Batch.Samples.Common project.
  3. 生成(但不要运行)该解决方案。Build (but do not run) the solution. 根据提示还原所有 NuGet 包。Restore any NuGet packages if prompted.
  4. 使用 Azure 门户上传 PersistOutputsTask应用程序包Use the Azure portal to upload an application package for PersistOutputsTask. 在 .zip 包中包含 PersistOutputsTask.exe 及其依赖程序集,将应用程序 ID 设置为“PersistOutputsTask”,将应用程序包版本设置为“1.0”。Include the PersistOutputsTask.exe and its dependent assemblies in the .zip package, set the application ID to "PersistOutputsTask", and the application package version to "1.0".
  5. 启动(运行)PersistOutputs 项目。Start (run) the PersistOutputs project.
  6. 当系统提示选择用于运行示例的持久化技术时,请输入 1,以运行使用文件约定库保存任务输出的示例。When prompted to choose the persistence technology to use for running the sample, enter 1 to run the sample using the File Conventions library to persist task output.

后续步骤Next steps

获取适用于 .NET 的 Batch 文件约定库Get the Batch File Conventions library for .NET

NuGet 上提供了适用于 .NET 的 Batch 文件约定库。The Batch File Conventions library for .NET is available on NuGet. 该库使用新方法扩展了 CloudJobCloudTask 类。The library extends the CloudJob and CloudTask classes with new methods. 另请参阅文件约定库的参考文档Also see the reference documentation for the File Conventions library.

GitHub 上的用于 .NET 的 Microsoft Azure SDK 存储库中提供了文件约定库的源代码The source code for the File Conventions library is available on GitHub in the Microsoft Azure SDK for .NET repository.

探索保存输出数据的其他方法Explore other approaches for persisting output data