教程:使用 .NET API 通过 Azure Batch 运行并行工作负荷Tutorial: Run a parallel workload with Azure Batch using the .NET API

使用 Azure Batch 在 Azure 中高效运行大规模并行批处理作业。Use Azure Batch to run large-scale parallel batch jobs efficiently in Azure. 本教程通过一个 C# 示例演示了如何使用 Batch 运行并行工作负荷。This tutorial walks through a C# example of running a parallel workload using Batch. 你可以学习常用的 Batch 应用程序工作流,以及如何以编程方式与 Batch 和存储资源交互。You learn a common Batch application workflow and how to interact programmatically with Batch and Storage resources. 学习如何:You learn how to:

  • 将应用程序包添加到 Batch 帐户Add an application package to your Batch account
  • 通过 Batch 和存储帐户进行身份验证Authenticate with Batch and Storage accounts
  • 将输入文件上传到存储Upload input files to Storage
  • 创建运行应用程序所需的计算节点池Create a pool of compute nodes to run an application
  • 创建用于处理输入文件的作业和任务Create a job and tasks to process input files
  • 监视任务执行情况Monitor task execution
  • 检索输出文件Retrieve output files

本教程使用 ffmpeg 开源工具将 MP4 媒体文件并行转换为 MP3 格式。In this tutorial, you convert MP4 media files in parallel to MP3 format using the ffmpeg open-source tool.

如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.


登录 AzureSign in to Azure

https://portal.azure.cn 中登录 Azure 门户。Sign in to the Azure portal at https://portal.azure.cn.

添加应用程序包Add an application package

使用 Azure 门户,将 ffmpeg 作为应用程序包添加到 Batch 帐户。Use the Azure portal to add ffmpeg to your Batch account as an application package. 应用程序包有助于管理任务应用程序及其到池中计算节点的部署。Application packages help you manage task applications and their deployment to the compute nodes in your pool.

  1. 在 Azure 门户中,单击“更多服务” > “Batch 帐户”,然后单击 Batch 帐户的名称。In the Azure portal, click More services > Batch accounts, and click the name of your Batch account.
  2. 单击“应用程序” > “添加”。Click Applications > Add.
  3. 请输入“ffmpeg”作为“应用程序 ID”,并输入包版本“4.3.1”。For Application id enter ffmpeg, and a package version of 4.3.1. 选择此前下载的 ffmpeg zip 文件,然后单击“确定”。Select the ffmpeg zip file you downloaded previously, and then click OK. ffmpeg 应用程序包添加到 Batch 帐户。The ffmpeg application package is added to your Batch account.


获取帐户凭据Get account credentials

就此示例来说,需为 Batch 帐户和存储帐户提供凭据。For this example, you need to provide credentials for your Batch and Storage accounts. 若要获取所需凭据,一种直接的方法是使用 Azure 门户。A straightforward way to get the necessary credentials is in the Azure portal. (也可使用 Azure API 或命令行工具来获取这些凭据。)(You can also get these credentials using the Azure APIs or command-line tools.)

  1. 选择“所有服务” > “Batch 帐户”,然后选择 Batch 帐户的名称。Select All services > Batch accounts, and then select the name of your Batch account.

  2. 若要查看 Batch 凭据,请选择“密钥”。To see the Batch credentials, select Keys. 将“Batch 帐户”、“URL”和“主访问密钥”的值复制到文本编辑器。 Copy the values of Batch account, URL, and Primary access key to a text editor.

  3. 若要查看存储帐户名称和密钥,请选择“存储帐户”。To see the Storage account name and keys, select Storage account. 将“存储帐户名称”和“Key1”的值复制到文本编辑器。 Copy the values of Storage account name and Key1 to a text editor.

下载并运行示例Download and run the sample

下载示例Download the sample

从 GitHub 下载或克隆示例应用Download or clone the sample app from GitHub. 若要使用 Git 客户端克隆示例应用存储库,请使用以下命令:To clone the sample app repo with a Git client, use the following command:

git clone https://github.com/Azure-Samples/batch-dotnet-ffmpeg-tutorial.git

导航到包含 Visual Studio 解决方案文件 BatchDotNetFfmpegTutorial.sln 的目录。Navigate to the directory that contains the Visual Studio solution file BatchDotNetFfmpegTutorial.sln.

在 Visual Studio 中打开解决方案文件,使用为帐户获取的值更新 Program.cs 中的凭据字符串。Open the solution file in Visual Studio, and update the credential strings in Program.cs with the values you obtained for your accounts. 例如:For example:

// Batch account credentials
private const string BatchAccountName = "mybatchaccount";
private const string BatchAccountKey  = "xxxxxxxxxxxxxxxxE+yXrRvJAqT9BlXwwo1CwF+SwAYOxxxxxxxxxxxxxxxx43pXi/gdiATkvbpLRl3x14pcEQ==";
private const string BatchAccountUrl  = "https://mybatchaccount.mybatchregion.batch.chinacloudapi.cn";

// Storage account credentials
private const string StorageAccountName = "mystorageaccount";
private const string StorageAccountKey  = "xxxxxxxxxxxxxxxxy4/xxxxxxxxxxxxxxxxfwpbIC5aAWA8wDu+AFXZB827Mt9lybZB1nUcQbQiUrkPtilK5BQ==";


为简化示例,Batch 凭据和存储帐户凭据以明文形式显示。To simplify the example, the Batch and Storage account credentials appear in clear text. 在实践中,我们建议你限制对凭据的访问,并使用环境变量或配置文件在代码中引用凭据。In practice, we recommend that you restrict access to the credentials and refer to them in your code using environment variables or a configuration file. 有关示例,请参阅 Azure Batch 代码示例存储库For examples, see the Azure Batch code samples repo.

另请确保解决方案中 ffmpeg 应用程序包的 ID 和版本与上传到 Batch 帐户的 ffmpeg 包相同。Also, make sure that the ffmpeg application package reference in the solution matches the Id and version of the ffmpeg package that you uploaded to your Batch account.

const string appPackageId = "ffmpeg";
const string appPackageVersion = "4.3.1";

生成并运行示例项目Build and run the sample project

在 Visual Studio 中构建并运行应用程序,或在命令行中使用 dotnet builddotnet run 命令。Build and run the application in Visual Studio, or at the command line with the dotnet build and dotnet run commands. 运行应用程序后,请查看代码,了解应用程序的每个部分的作用。After running the application, review the code to learn what each part of the application does. 例如,在 Visual Studio 中:For example, in Visual Studio:

  • 右键单击解决方案资源管理器中的解决方案,然后单击“生成解决方案” 。Right-click the solution in Solution Explorer and click Build Solution.

  • 出现提示时,请确认还原任何 NuGet 包。Confirm the restoration of any NuGet packages, if you're prompted. 如果需要下载缺少的包,请确保 NuGet 包管理器已安装。If you need to download missing packages, ensure the NuGet Package Manager is installed.

然后运行它。Then run it. 运行示例应用程序时,控制台输出如下所示。When you run the sample application, the console output is similar to the following. 在执行期间启动池的计算节点时,会遇到暂停并看到Monitoring all tasks for 'Completed' state, timeout in 00:30:00...During execution, you experience a pause at Monitoring all tasks for 'Completed' state, timeout in 00:30:00... while the pool's compute nodes are started.

Sample start: 11/19/2018 3:20:21 PM

Container [input] created.
Container [output] created.
Uploading file LowPriVMs-1.mp4 to container [input]...
Uploading file LowPriVMs-2.mp4 to container [input]...
Uploading file LowPriVMs-3.mp4 to container [input]...
Uploading file LowPriVMs-4.mp4 to container [input]...
Uploading file LowPriVMs-5.mp4 to container [input]...
Creating pool [WinFFmpegPool]...
Creating job [WinFFmpegJob]...
Adding 5 tasks to job [WinFFmpegJob]...
Monitoring all tasks for 'Completed' state, timeout in 00:30:00...
Success! All tasks completed successfully within the specified timeout period.
Deleting container [input]...

Sample end: 11/19/2018 3:29:36 PM
Elapsed time: 00:09:14.3418742

转到 Azure 门户中的 Batch 帐户,监视池、计算节点、作业和任务。Go to your Batch account in the Azure portal to monitor the pool, compute nodes, job, and tasks. 例如,若要查看池中计算节点的热度地图,请单击“池” > “WinFFmpegPool”。For example, to see a heat map of the compute nodes in your pool, click Pools > WinFFmpegPool.

任务正在运行时,热度地图如下所示:When tasks are running, the heat map is similar to the following:


以默认配置运行应用程序时,典型的执行时间大约为 10 分钟Typical execution time is approximately 10 minutes when you run the application in its default configuration. 池创建过程需要最多时间。Pool creation takes the most time.

检索输出文件Retrieve output files

可以使用 Azure 门户下载 ffmpeg 任务生成的输出 MP3 文件。You can use the Azure portal to download the output MP3 files generated by the ffmpeg tasks.

  1. 单击“所有服务” > “存储帐户”,然后单击存储帐户的名称。Click All services > Storage accounts, and then click the name of your storage account.
  2. 单击“Blob” > “输出”。Click Blobs > output.
  3. 右键单击一个输出 MP3 文件,然后单击“下载”。Right-click one of the output MP3 files and then click Download. 在浏览器中按提示打开或保存该文件。Follow the prompts in your browser to open or save the file.


也可以编程方式从计算节点或存储容器下载这些文件(但在本示例中未演示)。Although not shown in this sample, you can also download the files programmatically from the compute nodes or from the storage container.

查看代码Review the code

以下部分将示例应用程序细分为多个执行步骤,用于处理 Batch 服务中的工作负荷。The following sections break down the sample application into the steps that it performs to process a workload in the Batch service. 在阅读本文的其余内容时,请参考解决方案中的文件 Program.cs,因为我们并没有讨论示例中的每行代码。Refer to the file Program.cs in the solution while you read the rest of this article, since not every line of code in the sample is discussed.

对 Blob 和 Batch 客户端进行身份验证Authenticate Blob and Batch clients

为了与关联的存储帐户交互,应用使用用于 .NET 的 Azure 存储客户端库。To interact with the linked storage account, the app uses the Azure Storage Client Library for .NET. 它使用 CloudStorageAccount 创建对帐户的引用,使用共享密钥身份验证进行身份验证,It creates a reference to the account with CloudStorageAccount, authenticating using shared key authentication. 然后创建 CloudBlobClientThen, it creates a CloudBlobClient.

// Construct the Storage account connection string
string storageConnectionString = String.Format("DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
                                StorageAccountName, StorageAccountKey);

// Retrieve the storage account
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageConnectionString);

CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

应用创建的 BatchClient 对象用于创建和管理 Batch 服务中的池、作业和任务。The app creates a BatchClient object to create and manage pools, jobs, and tasks in the Batch service. 示例中的 Batch 客户端使用共享密钥身份验证。The Batch client in the sample uses shared key authentication. Batch 还支持通过 Azure Active Directory 进行身份验证,以便对单个用户或无人参与应用程序进行身份验证。Batch also supports authentication through Azure Active Directory to authenticate individual users or an unattended application.

BatchSharedKeyCredentials sharedKeyCredentials = new BatchSharedKeyCredentials(BatchAccountUrl, BatchAccountName, BatchAccountKey);

using (BatchClient batchClient = BatchClient.Open(sharedKeyCredentials))

上传输入文件Upload input files

应用将 blobClient 对象传递至 CreateContainerIfNotExistAsync 方法,以便为输入文件(MP4 格式)创建一个存储容器,并为任务输出创建一个容器。The app passes the blobClient object to the CreateContainerIfNotExistAsync method to create a storage container for the input files (MP4 format) and a container for the task output.

CreateContainerIfNotExistAsync(blobClient, inputContainerName);
CreateContainerIfNotExistAsync(blobClient, outputContainerName);

然后,文件从本地 InputFiles 文件夹上传到输入容器。Then, files are uploaded to the input container from the local InputFiles folder. 存储中的文件定义为 Batch ResourceFile 对象,Batch 随后可以将这些对象下载到计算节点。The files in storage are defined as Batch ResourceFile objects that Batch can later download to compute nodes.

上传文件时,涉及到 Program.cs 中的两个方法:Two methods in Program.cs are involved in uploading the files:

  • UploadFilesToContainerAsync设置用户帐户 :返回 ResourceFile 对象的集合,并在内部调用 UploadResourceFileToContainerAsync 以上传在 inputFilePaths 参数中传递的每个文件。UploadFilesToContainerAsync: Returns a collection of ResourceFile objects and internally calls UploadResourceFileToContainerAsync to upload each file that is passed in the inputFilePaths parameter.
  • UploadResourceFileToContainerAsync设置用户帐户 :将每个文件作为 Blob 上传到输入容器。UploadResourceFileToContainerAsync: Uploads each file as a blob to the input container. 上传文件后,它会获取该 Blob 的共享访问签名 (SAS) 并返回代表它的 ResourceFile 对象。After uploading the file, it obtains a shared access signature (SAS) for the blob and returns a ResourceFile object to represent it.
string inputPath = Path.Combine(Environment.CurrentDirectory, "InputFiles");

List<string> inputFilePaths = new List<string>(Directory.GetFileSystemEntries(inputPath, "*.mp4",

List<ResourceFile> inputFiles = await UploadFilesToContainerAsync(

若要详细了解如何使用 .NET 将文件作为 Blob 上传到存储帐户,请参阅使用 .NET 上传、下载和列出 blobFor details about uploading files as blobs to a storage account with .NET, see Upload, download, and list blobs using .NET.

创建计算节点池Create a pool of compute nodes

然后,该示例会调用 CreatePoolIfNotExistAsync 以在 Batch 帐户中创建计算节点池。Next, the sample creates a pool of compute nodes in the Batch account with a call to CreatePoolIfNotExistAsync. 这个定义的方法使用 BatchClient.PoolOperations.CreatePool 方法设置节点数、VM 大小和池配置。This defined method uses the BatchClient.PoolOperations.CreatePool method to set the number of nodes, VM size, and a pool configuration. 在这里,VirtualMachineConfiguration 对象指定对 Azure 市场中发布的 Windows Server 映像的 ImageReferenceHere, a VirtualMachineConfiguration object specifies an ImageReference to a Windows Server image published in the Azure Marketplace. Batch 支持 Azure 市场中的各种 VM 映像以及自定义 VM 映像。Batch supports a wide range of VM images in the Azure Marketplace, as well as custom VM images.


请务必检查节点配额。Be sure you check your node quotas. 有关如何创建配额请求的说明,请参阅 Batch 服务配额和限制See Batch service quotas and limits for instructions on how to create a quota request."

ffmpeg 应用程序部署到计算节点的方法是添加对池配置的 ApplicationPackageReferenceThe ffmpeg application is deployed to the compute nodes by adding an ApplicationPackageReference to the pool configuration.

CommitAsync 方法将池提交到 Batch 服务。The CommitAsync method submits the pool to the Batch service.

ImageReference imageReference = new ImageReference(
    publisher: "MicrosoftWindowsServer",
    offer: "WindowsServer",
    sku: "2016-Datacenter-smalldisk",
    version: "latest");

VirtualMachineConfiguration virtualMachineConfiguration =
    new VirtualMachineConfiguration(
    imageReference: imageReference,
    nodeAgentSkuId: "batch.node.windows amd64");

pool = batchClient.PoolOperations.CreatePool(
    poolId: poolId,
    targetDedicatedComputeNodes: DedicatedNodeCount,
    targetLowPriorityComputeNodes: LowPriorityNodeCount,
    virtualMachineSize: PoolVMSize,
    virtualMachineConfiguration: virtualMachineConfiguration);

pool.ApplicationPackageReferences = new List<ApplicationPackageReference>
    new ApplicationPackageReference {
    ApplicationId = appPackageId,
    Version = appPackageVersion}};

await pool.CommitAsync();  

创建作业Create a job

Batch 作业可指定在其中运行任务的池以及可选设置,例如工作的优先级和计划。A Batch job specifies a pool to run tasks on and optional settings such as a priority and schedule for the work. 此示例通过调用 CreateJobAsync 创建一个作业。The sample creates a job with a call to CreateJobAsync. 这个定义的方法使用 BatchClient.JobOperations.CreateJob 方法在池中创建作业。This defined method uses the BatchClient.JobOperations.CreateJob method to create a job on your pool.

CommitAsync 方法将作业提交到 Batch 服务。The CommitAsync method submits the job to the Batch service. 作业一开始没有任务。Initially the job has no tasks.

CloudJob job = batchClient.JobOperations.CreateJob();
job.Id = JobId;
job.PoolInformation = new PoolInformation { PoolId = PoolId };

await job.CommitAsync();

创建任务Create tasks

此示例通过调用 AddTasksAsync 方法来创建 CloudTask 对象的列表,从而在作业中创建任务。The sample creates tasks in the job with a call to the AddTasksAsync method, which creates a list of CloudTask objects. 每个 CloudTask 都运行 ffmpeg,使用 CommandLine 属性处理输入 ResourceFile 对象。Each CloudTask runs ffmpeg to process an input ResourceFile object using a CommandLine property. ffmpeg 此前已在创建池时安装在每个节点上。ffmpeg was previously installed on each node when the pool was created. 在这里,命令行运行 ffmpeg 将每个输入 MP4(视频)文件转换为 MP3(音频)文件。Here, the command line runs ffmpeg to convert each input MP4 (video) file to an MP3 (audio) file.

此示例在运行命令行后为 MP3 文件创建 OutputFile 对象。The sample creates an OutputFile object for the MP3 file after running the command line. 每个任务的输出文件(在此示例中为一个)都会使用任务的 OutputFiles 属性上传到关联的存储帐户中的一个容器。Each task's output files (one, in this case) are uploaded to a container in the linked storage account, using the task's OutputFiles property. 我们在前面的代码示例中获取了共享访问签名 URL (outputContainerSasUrl),用于提供对输出容器的写权限。Previously in the code sample, a shared access signature URL (outputContainerSasUrl) was obtained to provide write access to the output container. 请注意 outputFile 对象上设置的条件。Note the conditions set on the outputFile object. 只有在任务成功完成后 (OutputFileUploadCondition.TaskSuccess),任务中的输出文件才会上传到容器。An output file from a task is only uploaded to the container after the task has successfully completed (OutputFileUploadCondition.TaskSuccess). 在 GitHub 上查看完整的代码示例,进一步了解实现的详细信息。See the full code sample on GitHub for further implementation details.

然后,示例使用 AddTaskAsync 方法将任务添加到作业,使任务按顺序在计算节点上运行。Then, the sample adds tasks to the job with the AddTaskAsync method, which queues them to run on the compute nodes.

 // Create a collection to hold the tasks added to the job.
List<CloudTask> tasks = new List<CloudTask>();

for (int i = 0; i < inputFiles.Count; i++)
    string taskId = String.Format("Task{0}", i);

    // Define task command line to convert each input file.
    string appPath = String.Format("%AZ_BATCH_APP_PACKAGE_{0}#{1}%", appPackageId, appPackageVersion);
    string inputMediaFile = inputFiles[i].FilePath;
    string outputMediaFile = String.Format("{0}{1}",
    string taskCommandLine = String.Format("cmd /c {0}\\ffmpeg-4.3.1-2020-09-21-full_build\\bin\\ffmpeg.exe -i {1} {2}", appPath, inputMediaFile, outputMediaFile);

    // Create a cloud task (with the task ID and command line)
    CloudTask task = new CloudTask(taskId, taskCommandLine);
    task.ResourceFiles = new List<ResourceFile> { inputFiles[i] };

    // Task output file
    List<OutputFile> outputFileList = new List<OutputFile>();
    OutputFileBlobContainerDestination outputContainer = new OutputFileBlobContainerDestination(outputContainerSasUrl);
    OutputFile outputFile = new OutputFile(outputMediaFile,
       new OutputFileDestination(outputContainer),
       new OutputFileUploadOptions(OutputFileUploadCondition.TaskSuccess));
    task.OutputFiles = outputFileList;

// Add tasks as a collection
await batchClient.JobOperations.AddTaskAsync(jobId, tasks);
return tasks

监视任务Monitor tasks

Batch 将任务添加到作业时,该服务自动对任务排队并进行计划,方便其在关联的池中的计算节点上执行。When Batch adds tasks to a job, the service automatically queues and schedules them for execution on compute nodes in the associated pool. Batch 根据指定的设置处理所有任务排队、计划、重试和其他任务管理工作。Based on the settings you specify, Batch handles all task queuing, scheduling, retrying, and other task administration duties.

监视任务的执行有许多方法。There are many approaches to monitoring task execution. 此示例定义的 MonitorTasks 方法仅在已完成的情况下状态为“任务失败”或“任务成功”时进行报告。This sample defines a MonitorTasks method to report only on completion and task failure or success states. MonitorTasks 代码指定 ODATADetailLevel,只选择有关任务的最少信息,十分高效。The MonitorTasks code specifies an ODATADetailLevel to efficiently select only minimal information about the tasks. 然后,它会创建 TaskStateMonitor,以便提供用于监视任务状态的帮助器实用程序。Then, it creates a TaskStateMonitor, which provides helper utilities for monitoring task states. MonitorTasks 中,示例会在某个时限内等待所有任务达到 TaskState.Completed 状态。In MonitorTasks, the sample waits for all tasks to reach TaskState.Completed within a time limit. 然后,它会终止作业,并对虽已完成但仍遇到故障(例如退出代码非零)的任务进行报告。Then it terminates the job and reports on any tasks that completed but may have encountered a failure such as a non-zero exit code.

TaskStateMonitor taskStateMonitor = batchClient.Utilities.CreateTaskStateMonitor();
    await taskStateMonitor.WhenAll(addedTasks, TaskState.Completed, timeout);
catch (TimeoutException)
    return false;

清理资源Clean up resources

运行任务之后,应用自动删除所创建的输入存储容器,并允许你选择是否删除 Batch 池和作业。After it runs the tasks, the app automatically deletes the input storage container it created, and gives you the option to delete the Batch pool and job. BatchClient 的 JobOperationsPoolOperations 类都有相应的删除方法(在确认删除时调用)。The BatchClient's JobOperations and PoolOperations classes both have corresponding delete methods, which are called if you confirm deletion. 虽然作业和任务本身不收费,但计算节点收费。Although you're not charged for jobs and tasks themselves, you are charged for compute nodes. 因此,建议只在需要的时候分配池。Thus, we recommend that you allocate pools only as needed. 删除池时会删除节点上的所有任务输出。When you delete the pool, all task output on the nodes is deleted. 但是,输出文件保留在存储帐户中。However, the output files remain in the storage account.

若不再需要资源组、Batch 帐户和存储帐户,请将其删除。When no longer needed, delete the resource group, Batch account, and storage account. 为此,请在 Azure 门户中选择 Batch 帐户所在的资源组,然后单击“删除资源组”。To do so in the Azure portal, select the resource group for the Batch account and click Delete resource group.

后续步骤Next steps

在本教程中,你了解了如何执行以下操作:In this tutorial, you learned how to:

  • 将应用程序包添加到 Batch 帐户Add an application package to your Batch account
  • 通过 Batch 和存储帐户进行身份验证Authenticate with Batch and Storage accounts
  • 将输入文件上传到存储Upload input files to Storage
  • 创建运行应用程序所需的计算节点池Create a pool of compute nodes to run an application
  • 创建用于处理输入文件的作业和任务Create a job and tasks to process input files
  • 监视任务执行情况Monitor task execution
  • 检索输出文件Retrieve output files

如需更多示例,以便了解如何使用 .NET API 来计划和处理 Batch 工作负荷,请参阅 GitHub 上的示例。For more examples of using the .NET API to schedule and process Batch workloads, see the samples on GitHub.