什么是 Azure Batch?What is Azure Batch?

使用 Azure Batch 在 Azure 中高效运行大规模并行和高性能计算 (HPC) 批处理作业。Use Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch 可创建和管理计算节点(虚拟机)池、安装要运行的应用程序,以及计划要在节点上运行的作业。Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. 没有要安装、管理或缩放的群集或作业计划程序软件。There is no cluster or job scheduler software to install, manage, or scale. 只需使用 Batch API 和工具、命令行脚本或 Azure 门户来配置、管理和监视作业即可。Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.

开发人员可以使用 Batch 作为平台服务,在需要大规模执行的情况下生成 SaaS 应用程序或客户端应用。Developers can use Batch as a platform service to build SaaS applications or client apps where large-scale execution is required. 例如,可以使用 Batch 生成一项服务,以便为某家财务服务公司运行 Monte Carlo 风险模拟,或者生成一项用于处理多个图像的服务。For example, build a service with Batch to run a Monte Carlo risk simulation for a financial services company, or a service to process many images.

使用 Batch 不会产生额外的费用。There is no additional charge for using Batch. 仅针对所使用的基础资源(例如虚拟机、存储和网络)计费。You only pay for the underlying resources consumed, such as the virtual machines, storage, and networking.

运行并行工作负荷Run parallel workloads

Batch 很适合处理本质并行(也称为“易并行”)的工作负荷。Batch works well with intrinsically parallel (also known as "embarrassingly parallel") workloads. 本质并行工作负荷是指应用程序可以在其中独立运行,每个实例都会完成一部分工作的工作负荷。Intrinsically parallel workloads are those where the applications can run independently, and each instance completes part of the work. 应用程序在执行时,可能会访问一些公共的数据,但不会与该应用程序的其他实例通信。When the applications are executing, they might access some common data, but they do not communicate with other instances of the application. 因此,本质并行工作负荷可以大规模运行,具体取决于可以用来同时运行应用程序的计算资源的量。Intrinsically parallel workloads can therefore run at a large scale, determined by the amount of compute resources available to run applications simultaneously.

可以带到 Batch 来处理的本质并行工作负荷的示例包括:Some examples of intrinsically parallel workloads you can bring to Batch:

  • 使用 Monte Carlo 模拟进行财务风险建模Financial risk modeling using Monte Carlo simulations
  • VFX 和 3D 图像渲染VFX and 3D image rendering
  • 图像分析和处理Image analysis and processing
  • 媒体转码Media transcoding
  • 基因序列分析Genetic sequence analysis
  • 光学字符识别 (OCR)Optical character recognition (OCR)
  • 数据引入、处理和 ETL 操作Data ingestion, processing, and ETL operations
  • 软件测试性执行Software test execution

也可使用 Batch 来运行紧密耦合工作负荷,此类工作负荷是指在其中运行的应用程序需要相互通信(而不是独立运行)的工作负荷。You can also use Batch to run tightly coupled workloads; these are workloads where the applications you run need to communicate with each other, as opposed to run independently. 紧密耦合应用程序通常使用消息传递接口 (MPI) API。Tightly coupled applications normally use the Message Passing Interface (MPI) API. 可以使用 Microsoft MPI 或 Intel MPI,通过 Batch 来运行紧密耦合工作负荷。You can run your tightly coupled workloads with Batch using Microsoft MPI or Intel MPI.

紧密耦合工作负荷的一些示例:Some examples of tightly coupled workloads:

  • 有限元素分析Finite element analysis
  • 流体动力学Fluid dynamics
  • 多节点 AI 训练Multi-node AI training

许多紧密耦合作业可以使用 Batch 来并行运行。Many tightly coupled jobs can be run in parallel using Batch. 例如,通过更改管道宽度对液体在管道中的流动进行多次模拟。For example, perform multiple simulations of a liquid flowing through a pipe with varying pipe widths.

其他 Batch 功能Additional Batch capabilities

更高级别的特定于工作负荷的功能也适用于 Azure Batch:Higher-level, workload-specific capabilities are also available for Azure Batch:

  • Batch 支持大规模渲染工作负荷,使用的渲染工具包括 Autodesk Maya、3ds Max、Arnold 和 V-Ray。Batch supports large-scale rendering workloads with rendering tools including Autodesk Maya, 3ds Max, Arnold, and V-Ray.
  • R 用户可以安装 doAzureParallel R 包,在 Batch 池中轻松地横向扩展 R 算法的执行。R users can install the doAzureParallel R package to easily scale out the execution of R algorithms on Batch pools.

还可以在更大型的 Azure 工作流中运行 Batch 作业,以便转换 Azure 数据工厂等工具管理的数据。You can also run Batch jobs as part of a larger Azure workflow to transform data, managed by tools such as Azure Data Factory.

工作原理How it works

Batch 的常用方案涉及在计算节点池中横向扩展本质并行工作(例如渲染 3D 场景的图像)。A common scenario for Batch involves scaling out intrinsically parallel work, such as the rendering of images for 3D scenes, on a pool of compute nodes. 此计算节点池可以是“渲染场”,为渲染作业提供数十、数百甚至数千个核心。This pool of compute nodes can be your "render farm" that provides tens, hundreds, or even thousands of cores to your rendering job.

下图显示一个常见 Batch 工作流中的步骤,其中有一个客户端应用程序或托管服务使用 Batch 运行并行工作负荷。The following diagram shows steps in a common Batch workflow, with a client application or hosted service using Batch to run a parallel workload.

Batch 解决方案演练

步骤Step 说明Description
1.将输入文件和处理这些文件的应用程序上传到 Azure 存储帐户。1. Upload input files and the applications to process those files to your Azure Storage account. 输入文件可以是应用程序处理的任何数据,例如财务建模数据或要转码的视频文件。The input files can be any data that your application processes, such as financial modeling data, or video files to be transcoded. 应用程序文件可以包含处理数据的脚本或应用程序,例如媒体转码器。The application files can include scripts or applications that process the data, such as a media transcoder.
2.创建一个包含 Batch 帐户中的计算节点的 Batch 、一个用于在池中运行工作负荷的作业,以及作业中的任务2. Create a Batch pool of compute nodes in your Batch account, a job to run the workload on the pool, and tasks in the job. 池节点是执行任务的 VM。Pool nodes are the VMs that execute your tasks. 指定属性,例如节点的数目和大小、Windows 或 Linux VM 映像,以及在节点加入池时要安装的应用程序。Specify properties such as the number and size of the nodes, a Windows or Linux VM image, and an application to install when the nodes join the pool. 管理池的成本和大小,方法是:在工作负荷变化时自动缩放节点数。Manage the cost and size of the pool by using automatically scaling the number of nodes as the workload changes.

当你将任务添加到作业时,Batch 服务自动计划任务在池中的计算节点上执行。When you add tasks to a job, the Batch service automatically schedules the tasks for execution on the compute nodes in the pool. 每项任务使用上传的应用程序来处理输入文件。Each task uses the application that you uploaded to process the input files.
3.将输入文件应用程序下载到 Batch3. Download input files and the applications to Batch 每个任务都可以在执行之前将要处理的输入数据下载到所分配的计算节点。Before each task executes, it can download the input data that it is to process to the assigned compute node. 如果应用程序尚未安装在池节点上,可以改从此处下载它。If the application isn't already installed on the pool nodes, it can be downloaded here instead. 完成从 Azure 存储进行的下载以后,任务就会在分配的节点上执行。When the downloads from Azure Storage complete, the task executes on the assigned node.
4.监视任务执行情况4. Monitor task execution 可以在运行任务时查询 Batch,以便监视作业及其任务的进度。As the tasks run, query Batch to monitor the progress of the job and its tasks. 客户端应用程序或服务通过 HTTPS 与 Batch 服务通信。Your client application or service communicates with the Batch service over HTTPS. 由于监视的任务可能成千上万,而这些任务又运行在成千上万的计算节点上,因此请确保高效查询批处理服务Because you may be monitoring thousands of tasks running on thousands of compute nodes, be sure to query the Batch service efficiently.
5.上传任务输出5. Upload task output 当任务完成时,它们可以将其输出数据上传到 Azure 存储。As the tasks complete, they can upload their result data to Azure Storage. 也可直接从计算节点上的文件系统检索文件。You can also retrieve files directly from the file system on a compute node.
6.下载输出文件6. Download output files 当监视检测到作业中的任务已完成时,客户端应用程序或服务可以下载需进一步处理的输出数据。When your monitoring detects that the tasks in your job have completed, your client application or service can download the output data for further processing.

请记住,这只是使用 Batch 的一种方式,此方案只介绍它的部分功能。Keep in mind this is just one way to use Batch, and this scenario describes just some of its features. 例如,可以在每个计算节点上执行多个并行任务For example, you can execute multiple tasks in parallel on each compute node. 也可以使用作业准备和完成任务为作业准备节点,然后进行事后清理。Or, use job preparation and completion tasks to prepare the nodes for your jobs, then clean up afterward.

请参阅适用于开发人员的 Batch 功能概述,详细了解池、节点、作业、任务,以及生成 Batch 应用程序时可以使用的许多 API 功能。See the Batch feature overview for developers for more detailed information about pools, nodes, jobs, and tasks, and the many API features that you can use while building your Batch application.

后续步骤Next steps

阅读以下快速入门文章之一,开始使用 Azure Batch:Get started with Azure Batch with one of these quickstarts: