什么是 Azure Batch?What is Azure Batch?

使用 Azure Batch 在 Azure 中高效运行大规模并行批处理作业。Use Azure Batch to run large-scale parallel batch jobs efficiently in Azure. Azure Batch 可创建和管理计算节点(虚拟机)池、安装要运行的应用程序,以及计划要在节点上运行的作业。Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. 没有要安装、管理或缩放的群集或作业计划程序软件。There is no cluster or job scheduler software to install, manage, or scale. 只需使用 Batch API 和工具、命令行脚本或 Azure 门户来配置、管理和监视作业即可。Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.

开发人员可以使用 Batch 作为平台服务,在需要大规模执行的情况下生成 SaaS 应用程序或客户端应用。Developers can use Batch as a platform service to build SaaS applications or client apps where large-scale execution is required. 例如,可使用 Batch 生成一项服务来为某家财务服务公司运行 Monte Carlo 风险模拟,或者生成一项服务来处理多个图像。For example, you can build a service with Batch to run a Monte Carlo risk simulation for a financial services company, or a service to process many images.

使用 Batch 不会产生额外的费用。There is no additional charge for using Batch. 仅针对所使用的基础资源(例如虚拟机、存储和网络)计费。You only pay for the underlying resources consumed, such as the virtual machines, storage, and networking.

运行并行工作负荷Run parallel workloads

Batch 很适合处理本质并行(也称为“易并行”)的工作负荷。Batch works well with intrinsically parallel (also known as "embarrassingly parallel") workloads. 本质上会并行运行的工作负载包含可独立运行的应用程序,其中每个实例会完成一部分工作。Intrinsically parallel workloads have applications which can run independently, with each instance completing part of the work. 应用程序在执行时,可能会访问一些公共数据,但不会与该应用程序的其他实例通信。When the applications are executing, they might access some common data, but they don't communicate with other instances of the application. 因此,本质并行工作负荷可以大规模运行,具体取决于可以用来同时运行应用程序的计算资源的量。Intrinsically parallel workloads can therefore run at a large scale, determined by the amount of compute resources available to run applications simultaneously.

可以带到 Batch 来处理的本质并行工作负荷的示例包括:Some examples of intrinsically parallel workloads you can bring to Batch:

  • 使用 Monte Carlo 模拟进行财务风险建模Financial risk modeling using Monte Carlo simulations
  • VFX 和 3D 图像渲染VFX and 3D image rendering
  • 图像分析和处理Image analysis and processing
  • 媒体转码Media transcoding
  • 基因序列分析Genetic sequence analysis
  • 光学字符识别 (OCR)Optical character recognition (OCR)
  • 数据引入、处理和 ETL 操作Data ingestion, processing, and ETL operations
  • 软件测试性执行Software test execution

也可使用 Batch 来运行紧密耦合的工作负载;在这些工作负载中,你运行的应用程序需要相互通信,而不是独立运行。You can also use Batch to run tightly coupled workloads, where the applications you run need to communicate with each other, rather than running independently. 紧密耦合应用程序通常使用消息传递接口 (MPI) API。Tightly coupled applications normally use the Message Passing Interface (MPI) API. 可以使用 Microsoft MPI 或 Intel MPI,通过 Batch 来运行紧密耦合工作负荷。You can run your tightly coupled workloads with Batch using Microsoft MPI or Intel MPI. 通过 GPU 优化型 VM 大小提高应用程序性能。Improve application performance with GPU-optimized VM sizes.

紧密耦合工作负荷的一些示例:Some examples of tightly coupled workloads:

  • 有限元素分析Finite element analysis
  • 流体动力学Fluid dynamics
  • 多节点 AI 训练Multi-node AI training

许多紧密耦合作业可以使用 Batch 来并行运行。Many tightly coupled jobs can be run in parallel using Batch. 例如,通过更改管道宽度对液体在管道中的流动进行多次模拟。For example, perform multiple simulations of a liquid flowing through a pipe with varying pipe widths.

其他 Batch 功能Additional Batch capabilities

更高级别的特定于工作负荷的功能也适用于 Azure Batch:Higher-level, workload-specific capabilities are also available for Azure Batch:

  • Batch 支持大规模渲染工作负荷,使用的渲染工具包括 Autodesk Maya、3ds Max、Arnold 和 V-Ray。Batch supports large-scale rendering workloads with rendering tools including Autodesk Maya, 3ds Max, Arnold, and V-Ray.
  • R 用户可以安装 doAzureParallel R 包,在 Batch 池中轻松地横向扩展 R 算法的执行。R users can install the doAzureParallel R package to easily scale out the execution of R algorithms on Batch pools.

还可以在更大型的 Azure 工作流中运行 Batch 作业,以便转换 Azure 数据工厂等工具管理的数据。You can also run Batch jobs as part of a larger Azure workflow to transform data, managed by tools such as Azure Data Factory.

工作原理How it works

Batch 的常用方案涉及在计算节点池中横向扩展本质并行工作(例如渲染 3D 场景的图像)。A common scenario for Batch involves scaling out intrinsically parallel work, such as the rendering of images for 3D scenes, on a pool of compute nodes. 此池可以是“渲染场”,为渲染作业提供数十、数百甚至数千个核心。This pool can be your "render farm" that provides tens, hundreds, or even thousands of cores to your rendering job.

下图显示一个常见 Batch 工作流中的步骤,其中有一个客户端应用程序或托管服务使用 Batch 运行并行工作负荷。The following diagram shows steps in a common Batch workflow, with a client application or hosted service using Batch to run a parallel workload.

Batch 解决方案中的步骤关系图。

步骤Step 说明Description
1.将输入文件和处理这些文件的应用程序上传到 Azure 存储帐户。1. Upload input files and the applications to process those files to your Azure Storage account. 输入文件可以是应用程序处理的任何数据,例如财务建模数据或要转码的视频文件。The input files can be any data that your application processes, such as financial modeling data, or video files to be transcoded. 应用程序文件可以包含处理数据的脚本或应用程序,例如媒体转码器。The application files can include scripts or applications that process the data, such as a media transcoder.
2.创建一个包含 Batch 帐户中的计算节点的 Batch 池、一个用于在池中运行工作负载的作业,以及作业中的任务 。2. Create a Batch pool of compute nodes in your Batch account, a job to run the workload on the pool, and tasks in the job. 计算节点是执行任务的 VM。Compute nodes are the VMs that execute your tasks. 指定池的属性,例如节点的数目和大小、Windows 或 Linux VM 映像,以及在节点加入池时要安装的应用程序。Specify properties for your pool, such as the number and size of the nodes, a Windows or Linux VM image, and an application to install when the nodes join the pool. 管理池的成本和大小,方法是:在工作负载变化时自动缩放节点数。Manage the cost and size of the pool by automatically scaling the number of nodes as the workload changes.

当你将任务添加到作业时,Batch 服务自动计划任务在池中的计算节点上执行。When you add tasks to a job, the Batch service automatically schedules the tasks for execution on the compute nodes in the pool. 每项任务使用上传的应用程序来处理输入文件。Each task uses the application that you uploaded to process the input files.
3.将输入文件应用程序下载到 Batch3. Download input files and the applications to Batch 每个任务都可在执行之前将要处理的输入数据下载到所分配的节点中。Before each task executes, it can download the input data that it will process to the assigned node. 如果应用程序尚未安装在池节点上,可以改从此处下载它。If the application isn't already installed on the pool nodes, it can be downloaded here instead. 完成从 Azure 存储进行的下载以后,任务就会在分配的节点上执行。When the downloads from Azure Storage complete, the task executes on the assigned node.
4.监视任务执行情况4. Monitor task execution 可以在运行任务时查询 Batch,以便监视作业及其任务的进度。As the tasks run, query Batch to monitor the progress of the job and its tasks. 客户端应用程序或服务通过 HTTPS 与 Batch 服务通信。Your client application or service communicates with the Batch service over HTTPS. 由于监视的任务可能成千上万,而这些任务又运行在成千上万的计算节点上,因此请确保高效查询批处理服务Because you may be monitoring thousands of tasks running on thousands of compute nodes, be sure to query the Batch service efficiently.
5.上传任务输出5. Upload task output 当任务完成时,它们可以将其输出数据上传到 Azure 存储。As the tasks complete, they can upload their result data to Azure Storage. 也可直接从计算节点上的文件系统检索文件。You can also retrieve files directly from the file system on a compute node.
6.下载输出文件6. Download output files 当监视检测到作业中的任务已完成时,客户端应用程序或服务可以下载需进一步处理的输出数据。When your monitoring detects that the tasks in your job have completed, your client application or service can download the output data for further processing.

请记住,上述工作流只是使用 Batch 的其中一种方式,还有许多其他功能和选项。Keep in mind that the workflow described above is just one way to use Batch, and there are many other features and options. 例如,可以在每个计算节点上执行多个并行任务For example, you can execute multiple tasks in parallel on each compute node. 或者,也可使用作业准备和完成任务为作业准备节点,用完后进行清理。Or you can use job preparation and completion tasks to prepare the nodes for your jobs, then clean up afterward.

有关池、节点、作业和任务等功能的概述,请参阅 Batch 服务工作流和资源See Batch service workflow and resources for an overview of features such as pools, nodes, jobs, and tasks. 另请参阅最新的 Batch 服务更新Also see the latest Batch service updates.

后续步骤Next steps

阅读以下快速入门文章之一,开始使用 Azure Batch:Get started with Azure Batch with one of these quickstarts: