使用适用于 .NET 的 Batch 文件约定库将作业和任务数据保存到 Azure 存储Persist job and task data to Azure Storage with the Batch File Conventions library for .NET

在 Azure Batch 中运行的任务可能在其运行时产生输出数据。A task running in Azure Batch may produce output data when it runs. 通常需要存储任务输出数据,以便作业中的其他任务和/或执行该作业的客户端应用程序进行检索。Task output data often needs to be stored for retrieval by other tasks in the job, the client application that executed the job, or both. 任务可向 Batch 计算节点的文件系统写入输出数据,但当重置节点映像或节点离开池时,节点上的所有数据都会丢失。Tasks write output data to the file system of a Batch compute node, but all data on the node is lost when it is reimaged or when the node leaves the pool. 任务可能还具有文件保留期,超过此保留期后,系统将删除该任务创建的文件。Tasks may also have a file retention period, after which files created by the task are deleted. 出于这些原因,请务必将以后需要的任务输出保留到数据存储,如 Azure 存储For these reasons, it's important to persist task output that you'll need later to a data store such as Azure Storage.

有关 Batch 中的存储帐户选项,请参阅 Batch 功能概述For storage account options in Batch, see the Batch feature overview.

持久保存任务数据的一种方法是使用适用于 .NET 的 Azure Batch 文件约定库One way to persist task data is to use the Azure Batch File Conventions library for .NET. 文件约定库简化了将任务输出数据存储到 Azure 存储并对其进行检索的过程。The File Conventions library simplifies the process of storing task output data to Azure Storage and retrieving it. 可以在任务代码和客户端代码中使用文件约定库 — 在任务代码中用于持久保存文件,在客户端代码中用于列出和检索文件。You can use the File Conventions library in both task and client code — in task code for persisting files, and in client code to list and retrieve them. 任务代码还可以使用该库来检索上游任务的输出(例如,在任务依赖关系方案中这样使用)。Your task code can also use the library to retrieve the output of upstream tasks, such as in a task dependencies scenario.

若要使用文件约定库来检索输出文件,可以按 ID 和用途列出这些文件,这样即可找出给定作业或任务的文件。To retrieve output files with the File Conventions library, you can locate the files for a given job or task by listing them by ID and purpose. 不需要知道文件的名称或位置。You don't need to know the names or locations of the files. 例如,可以使用文件约定库来列出给定任务的所有中间文件,或者获取给定作业的预览文件。For example, you can use the File Conventions library to list all intermediate files for a given task, or get a preview file for a given job.

Tip

从 2017-05-01 版开始,Batch 服务 API 支持将在池(使用虚拟机配置创建)中运行的 Batch 任务和作业管理器任务的输出数据持久保存到 Azure 存储。Starting with version 2017-05-01, the Batch service API supports persisting output data to Azure Storage for tasks and job manager tasks that run on pools created with the virtual machine configuration. 使用 Batch 服务 API 可以快速地通过代码持久保存输出,该代码可以创建任务,并可替代文件约定库。The Batch service API provides a simple way to persist output from within the code that creates a task and serves as an alternative to the File Conventions library. 可以修改 Batch 客户端应用程序,在不需更新任务正在运行的应用程序的情况下持久保存输出。You can modify your Batch client applications to persist output without needing to update the application that your task is running. 有关详细信息,请参阅使用 Batch 服务 API 将任务数据持久保存到 Azure 存储For more information, see Persist task data to Azure Storage with the Batch service API.

何时使用文件约定库来持久保存任务输出?When do I use the File Conventions library to persist task output?

Azure Batch 提供多种持久保存任务输出的方式。Azure Batch provides more than one way to persist task output. 文件约定库最适合以下情形:The File Conventions is best suited to these scenarios:

  • 可以轻松地修改任务正在运行的应用程序的代码,以便使用文件约定库来持久保存文件。You can easily modify the code for the application that your task is running to persist files using the File Conventions library.
  • 需要在任务仍运行的情况下,将数据流式传输到 Azure 存储。You want to stream data to Azure Storage while the task is still running.
  • 需要在使用云服务配置或虚拟机配置创建的池中持久保存数据。You want to persist data from pools created with either the cloud service configuration or the virtual machine configuration.
  • 客户端应用程序或作业中的其他任务需按 ID 或用途找出并下载任务输出文件。Your client application or other tasks in the job needs to locate and download task output files by ID or by purpose.
  • 需在 Azure 门户中查看任务输出。You want to view task output in the Azure portal.

如果你的情形不同于上面列出的情形,则可能需要考虑其他方式。If your scenario differs from those listed above, you may need to consider a different approach. 若要详细了解持久保存任务输出的其他选项,请参阅将作业和任务输出持久保存到 Azure 存储For more information on other options for persisting task output, see Persist job and task output to Azure Storage.

Batch 文件约定标准是什么?What is the Batch File Conventions standard?

Batch 文件约定标准为需将输出文件写入到其中的目标容器和 Blob 路径提供命名方案。The Batch File Conventions standard provides a naming scheme for the destination containers and blob paths to which your output files are written. 根据文件约定标准持久保存到 Azure 存储的文件可以自动在 Azure 门户中查看。Files persisted to Azure Storage that adhere to the File Conventions standard are automatically available for viewing in the Azure portal. 门户能感知命名约定,因此可以显示遵循该约定 的文件。The portal is aware of the naming convention and so can display files that adhere to it.

适用于 .NET 的文件约定库会自动根据文件约定标准,为存储容器和任务输出文件命名。The File Conventions library for .NET automatically names your storage containers and task output files according to the File Conventions standard. 文件约定库还提供方法,用于在 Azure 存储中按作业 ID、任务 ID 或用途查询输出文件。The File Conventions library also provides methods to query output files in Azure Storage according to job ID, task ID, or purpose.

如果使用 .NET 之外的语言进行开发,则可在应用程序中自行实现文件约定标准。If you are developing with a language other than .NET, you can implement the File Conventions standard yourself in your application. 有关详细信息,请参阅实现 Batch 文件约定标准For more information, see Implement the Batch File Conventions standard.

若要使用文件约定库将输出数据持久保存到 Azure 存储,必须先将 Azure 存储帐户关联到 Batch 帐户。To persist output data to Azure Storage using the File Conventions library, you must first link an Azure Storage account to your Batch account. 如果尚未这样做,请使用 Azure 门户将存储帐户关联到 Batch 帐户:If you haven't done so already, link a Storage account to your Batch account by using the Azure portal:

  1. 导航到 Azure 门户中的 Batch 帐户。Navigate to your Batch account in the Azure portal.
  2. 在“设置”下,选择“存储帐户”。 Under Settings, select Storage Account.
  3. 如果尚未将存储帐户与 Batch 帐户关联,请单击“存储帐户(无)”。If you do not already have a Storage account associated with your Batch account, click Storage Account (None).
  4. 从订阅的列表中选择一个存储帐户。Select a Storage account from the list for your subscription. 为了优化性能,请使用与 Batch 帐户(其中正运行任务)位于同一区域的 Azure 存储帐户。For best performance, use an Azure Storage account that is in the same region as the Batch account where your tasks are running.

持久保存输出数据Persist output data

若要使用文件约定库来持久保存作业和任务输出数据,请在 Azure 存储中创建一个容器,然后将输出保存到该容器。To persist job and task output data with the File Conventions library, create a container in Azure Storage, then save the output to the container. 在任务代码中使用适用于 .NET 的 Azure 存储客户端库,将任务输出上传到容器。Use the Azure Storage client library for .NET in your task code to upload the task output to the container.

若要详细了解如何在 Azure 存储中使用容器和 Blob,请参阅通过 .NET 使用 Azure Blob 存储入门For more information about working with containers and blobs in Azure Storage, see Get started with Azure Blob storage using .NET.

Warning

使用文件约定库持久保存的所有作业和任务输出都存储在同一容器中。All job and task outputs persisted with the File Conventions library are stored in the same container. 如果大量任务同时尝试持久保存文件,则可能会强制实施 Azure 存储限制。If a large number of tasks try to persist files at the same time, Azure Storage throttling limits may be enforced. 有关限制的详细信息,请参阅 Blob 存储的性能和可伸缩性清单For more information about throttling limits, see Performance and scalability checklist for Blob storage.

创建存储容器Create storage container

若要将任务输出持久保存到 Azure 存储,请先通过调用 CloudJob.PrepareOutputStorageAsync 来创建容器。To persist task output to Azure Storage, first create a container by calling CloudJob.PrepareOutputStorageAsync. 此扩展方法采用 CloudStorageAccount 对象作为参数。This extension method takes a CloudStorageAccount object as a parameter. 它会创建根据文件约定标准命名的容器,以便通过 Azure 门户和本文稍后介绍的检索方法发现其内容。It creates a container named according to the File Conventions standard, so that its contents are discoverable by the Azure portal and the retrieval methods discussed later in the article.

通常将创建容器所需的代码放入客户端应用程序 — 即创建池、作业和任务的应用程序。You typically place the code to create a container in your client application — the application that creates your pools, jobs, and tasks.

CloudJob job = batchClient.JobOperations.CreateJob(
    "myJob",
    new PoolInformation { PoolId = "myPool" });

// Create reference to the linked Azure Storage account
CloudStorageAccount linkedStorageAccount =
    new CloudStorageAccount(myCredentials, true);

// Create the blob storage container for the outputs
await job.PrepareOutputStorageAsync(linkedStorageAccount);

存储任务输出Store task outputs

在 Azure 存储中准备一个容器后,即可通过任务使用文件约定库中找到的 TaskOutputStorage 类将输出保存到该容器。Now that you've prepared a container in Azure Storage, tasks can save output to the container by using the TaskOutputStorage class found in the File Conventions library.

在任务代码中,请先创建一个 TaskOutputStorage 对象,然后,当任务完成其工作时,会调用 TaskOutputStorage.SaveAsync 方法将其输出保存到 Azure 存储。In your task code, first create a TaskOutputStorage object, then when the task has completed its work, call the TaskOutputStorage.SaveAsync method to save its output to Azure Storage.

CloudStorageAccount linkedStorageAccount = new CloudStorageAccount(myCredentials);
string jobId = Environment.GetEnvironmentVariable("AZ_BATCH_JOB_ID");
string taskId = Environment.GetEnvironmentVariable("AZ_BATCH_TASK_ID");

TaskOutputStorage taskOutputStorage = new TaskOutputStorage(
    linkedStorageAccount, jobId, taskId);

/* Code to process data and produce output file(s) */

await taskOutputStorage.SaveAsync(TaskOutputKind.TaskOutput, "frame_full_res.jpg");
await taskOutputStorage.SaveAsync(TaskOutputKind.TaskPreview, "frame_low_res.jpg");

TaskOutputStorage.SaveAsync 方法的 kind 参数可以将持久保存的文件分类。The kind parameter of the TaskOutputStorage.SaveAsync method categorizes the persisted files. 有四种预定义的 TaskOutputKind 类型:TaskOutputTaskPreviewTaskLogTaskIntermediate.,也可定义输出的自定义类别。There are four predefined TaskOutputKind types: TaskOutput, TaskPreview, TaskLog, and TaskIntermediate. You can also define custom categories of output.

以后在 Batch 中查询给定任务的已保存输出时,可以使用这些输出类型来指定要列出哪种类型的输出。These output types allow you to specify which type of outputs to list when you later query Batch for the persisted outputs of a given task. 换而言之,列出某个任务的输出时,可以根据某种输出类型来筛选列表。In other words, when you list the outputs for a task, you can filter the list on one of the output types. 例如,“列出任务 109 的预览输出。”For example, "Give me the preview output for task 109." 本文稍后的“检索输出”部分中会详细介绍如何列出和检索输出。More on listing and retrieving outputs appears in Retrieve output later in the article.

Tip

输出种类还决定了特定文件在 Azure 门户中的显示位置:TaskOutput 类别的文件显示在“任务输出文件”下,TaskLog 文件显示在“任务日志”下。The output kind also determines where in the Azure portal a particular file appears: TaskOutput-categorized files appear under Task output files, and TaskLog files appear under Task logs.

存储作业输出Store job outputs

除了存储任务输出以外,还可以存储与整个作业关联的输出。In addition to storing task outputs, you can store the outputs associated with an entire job. 例如,在电影渲染作业的合并任务中,可以将完全渲染的电影保存为作业输出。For example, in the merge task of a movie rendering job, you could persist the fully rendered movie as a job output. 作业完成后,客户端应用程序即可列出并检索该作业的输出,不需查询各个任务。When your job is completed, your client application can list and retrieve the outputs for the job, and does not need to query the individual tasks.

通过调用 JobOutputStorage.SaveAsync 方法存储作业输出,并指定 JobOutputKind 和文件名:Store job output by calling the JobOutputStorage.SaveAsync method, and specify the JobOutputKind and filename:

CloudJob job = new JobOutputStorage(acct, jobId);
JobOutputStorage jobOutputStorage = job.OutputStorage(linkedStorageAccount);

await jobOutputStorage.SaveAsync(JobOutputKind.JobOutput, "mymovie.mp4");
await jobOutputStorage.SaveAsync(JobOutputKind.JobPreview, "mymovie_preview.mp4");

与用于任务输出的 TaskOutputKind 类型一样,可以使用 JobOutputKind 类型对作业的持久保存文件分类。As with the TaskOutputKind type for task outputs, you use the JobOutputKind type to categorize a job's persisted files. 以后可以使用此参数查询(列出)特定的输出类型。This parameter allows you to later query for (list) a specific type of output. JobOutputKind 类型包括输出和预览类别,并支持创建自定义类别。The JobOutputKind type includes both output and preview categories, and supports creating custom categories.

存储任务日志Store task logs

除了在任务或作业完成时将文件持久保存到持久性存储以外,还可能需要持久保存在执行某个任务期间更新的文件 — 例如,日志文件或 stdout.txtstderr.txtIn addition to persisting a file to durable storage when a task or job completes, you may need to persist files that are updated during the execution of a task — log files or stdout.txt and stderr.txt, for example. 为此,Azure Batch 文件约定库提供了 TaskOutputStorage.SaveTrackedAsync 方法。For this purpose, the Azure Batch File Conventions library provides the TaskOutputStorage.SaveTrackedAsync method. 使用 SaveTrackedAsync,可以跟踪对节点上的文件所做的更新(按照指定的间隔),并将这些更新持久保存到 Azure 存储。With SaveTrackedAsync, you can track updates to a file on the node (at an interval that you specify) and persist those updates to Azure Storage.

在以下代码片段中,我们会在执行任务期间,每隔 15 秒使用 SaveTrackedAsync 更新 Azure 存储中的 stdout.txtIn the following code snippet, we use SaveTrackedAsync to update stdout.txt in Azure Storage every 15 seconds during the execution of the task:

TimeSpan stdoutFlushDelay = TimeSpan.FromSeconds(3);
string logFilePath = Path.Combine(
    Environment.GetEnvironmentVariable("AZ_BATCH_TASK_DIR"), "stdout.txt");

// The primary task logic is wrapped in a using statement that sends updates to
// the stdout.txt blob in Storage every 15 seconds while the task code runs.
using (ITrackedSaveOperation stdout =
        await taskStorage.SaveTrackedAsync(
        TaskOutputKind.TaskLog,
        logFilePath,
        "stdout.txt",
        TimeSpan.FromSeconds(15)))
{
    /* Code to process data and produce output file(s) */

    // We are tracking the disk file to save our standard output, but the
    // node agent may take up to 3 seconds to flush the stdout stream to
    // disk. So give the file a moment to catch up.
     await Task.Delay(stdoutFlushDelay);
}

注释的部分 Code to process data and produce output file(s) 是任务通常会执行的代码的占位符。The commented section Code to process data and produce output file(s) is a placeholder for the code that your task would normally perform. 例如,代码可能会从 Azure 存储下载数据,并对其执行转换或计算。For example, you might have code that downloads data from Azure Storage and performs transformation or calculation on it. 此代码片段的重要部分演示了如何在 using 块中包装此类代码,以定期使用 SaveTrackedAsync 更新文件。The important part of this snippet is demonstrating how you can wrap such code in a using block to periodically update a file with SaveTrackedAsync.

节点代理是一个程序,它在池中的每个节点上运行,并在节点与 Batch 服务之间提供命令和控制接口。The node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. 必须在此 using 块末尾进行 Task.Delay 调用,确保节点代理有时间将标准输出的内容刷新到节点上的 stdout.txt 文件中。The Task.Delay call is required at the end of this using block to ensure that the node agent has time to flush the contents of standard out to the stdout.txt file on the node. 若没有此延迟,可能会遗漏最后几秒的输出。Without this delay, it is possible to miss the last few seconds of output. 并非所有文件都需要此延迟。This delay may not be required for all files.

Note

启用 SaveTrackedAsync 文件跟踪时,只会在 Azure 存储中持久保存被跟踪文件的追加内容。When you enable file tracking with SaveTrackedAsync, only appends to the tracked file are persisted to Azure Storage. 此方法只用于跟踪非轮转的日志文件或使用追加操作(追加到文件末尾)写入的其他文件。Use this method only for tracking non-rotating log files or other files that are written to with append operations to the end of the file.

检索输出数据Retrieve output data

使用 Azure Batch 文件约定库检索保存的输出时,以任务和作业为中心的方式执行此操作。When you retrieve your persisted output using the Azure Batch File Conventions library, you do so in a task- and job-centric manner. 可以请求给定任务或作业的输出,而无需知道输出在 Azure 存储中的路径,甚至不需要知道其文件名。You can request the output for given task or job without needing to know its path in Azure Storage, or even its file name. 与之相反,可以按任务或作业 ID 请求输出文件。Instead, you can request output files by task or job ID.

以下代码片段将循环访问某个作业的任务,列显有关该任务的输出文件的一些信息,然后从存储下载该任务的文件。The following code snippet iterates through a job's tasks, prints some information about the output files for the task, and then downloads its files from Storage.

foreach (CloudTask task in myJob.ListTasks())
{
    foreach (OutputFileReference output in
        task.OutputStorage(storageAccount).ListOutputs(
            TaskOutputKind.TaskOutput))
    {
        Console.WriteLine($"output file: {output.FilePath}");

        output.DownloadToFileAsync(
            $"{jobId}-{output.FilePath}",
            System.IO.FileMode.Create).Wait();
    }
}

在 Azure 门户中查看输出文件View output files in the Azure portal

Azure 门户将显示使用 Batch 文件约定标准保存到链接的 Azure 存储帐户的任务输出文件和日志。The Azure portal displays task output files and logs that are persisted to a linked Azure Storage account using the Batch File Conventions standard. 可以使用所选语言自行实现这些约定,也可以在 .NET 应用程序中使用文件约定库。You can implement these conventions yourself in the a language of your choice, or you can use the File Conventions library in your .NET applications.

若要在门户中显示输出文件,必须满足以下要求:To enable the display of your output files in the portal, you must satisfy the following requirements:

  1. 将 Azure 存储帐户链接到 Batch 帐户。Link an Azure Storage account to your Batch account.
  2. 保存输出时遵循存储容器和文件的预定义命名约定。Adhere to the predefined naming conventions for Storage containers and files when persisting outputs. 可在文件约定库的自述文件中找到这些约定的定义。You can find the definition of these conventions in the File Conventions library README. 如果使用 Azure Batch 文件约定库来持久保存输出,则按文件约定标准来持久保存文件。If you use the Azure Batch File Conventions library to persist your output, your files are persisted according to the File Conventions standard.

若要在 Azure 门户中查看任务输出文件和日志,请导航到要查看其输出的任务,然后单击“保存的输出文件”或“保存的日志”。To view task output files and logs in the Azure portal, navigate to the task whose output you are interested in, then click either Saved output files or Saved logs. 下图显示了 ID 为“007”的任务的“保存的输出文件”:This image shows the Saved output files for the task with ID "007":

Azure 门户中的“任务输出”边栏选项卡Task outputs blade in the Azure portal

代码示例Code sample

PersistOutputs 示例项目是 GitHub 上的 Azure Batch 代码示例之一。The PersistOutputs sample project is one of the Azure Batch code samples on GitHub. 此 Visual Studio 解决方案演示如何使用 Azure Batch 文件约定库将任务输出保存到持久性存储。This Visual Studio solution demonstrates how to use the Azure Batch File Conventions library to persist task output to durable storage. 若要运行该示例,请遵循以下步骤:To run the sample, follow these steps:

  1. Visual Studio 2019 中打开该项目。Open the project in Visual Studio 2019.
  2. 将 Batch 和存储帐户凭据添加到 Microsoft.Azure.Batch.Samples.Common 项目中的 AccountSettings.settingsAdd your Batch and Storage account credentials to AccountSettings.settings in the Microsoft.Azure.Batch.Samples.Common project.
  3. 生成(但不要运行)该解决方案。Build (but do not run) the solution. 根据提示还原所有 NuGet 包。Restore any NuGet packages if prompted.
  4. 使用 Azure 门户上传 PersistOutputsTask应用程序包Use the Azure portal to upload an application package for PersistOutputsTask. 在 .zip 包中包含 PersistOutputsTask.exe 及其依赖程序集,将应用程序 ID 设置为“PersistOutputsTask”,将应用程序包版本设置为“1.0”。Include the PersistOutputsTask.exe and its dependent assemblies in the .zip package, set the application ID to "PersistOutputsTask", and the application package version to "1.0".
  5. 启动(运行)PersistOutputs 项目。Start (run) the PersistOutputs project.
  6. 当系统提示你选择用于运行示例的持久性技术时,请输入 1,以便运行示例,使用文件约定库来持久保存任务输出。When prompted to choose the persistence technology to use for running the sample, enter 1 to run the sample using the File Conventions library to persist task output.

后续步骤Next steps

获取适用于 .NET 的 Batch 文件约定库Get the Batch File Conventions library for .NET

NuGet 上提供适用于 .NET 的 Batch 文件约定库。The Batch File Conventions library for .NET is available on NuGet. 该库使用新方法扩展 CloudJobCloudTask 类。The library extends the CloudJob and CloudTask classes with new methods. 另请参阅文件约定库的参考文档Also see the reference documentation for the File Conventions library.

GitHub 上的用于 .NET 的 Microsoft Azure SDK 存储库中提供了文件约定库的源代码The source code for the File Conventions library is available on GitHub in the Microsoft Azure SDK for .NET repository.

了解持久保存输出数据的其他方法Explore other approaches for persisting output data