快速入门：使用 Python 创建 Batch 池并运行作业

2025-10-24

本快速入门介绍如何通过运行使用用于 Python 的 Azure Batch 库的应用开始使用 Azure Batch。 Python 应用：

将多个输入数据文件上传到 Azure 存储 Blob 容器，以用于批处理任务处理。
创建了一个包含两个运行 Ubuntu 22.04 LTS 操作系统的虚拟机（VM）或计算节点的池。
创建一个作业和三个任务用来在节点上运行。每个任务都使用 Bash shell 命令行处理其中一个输入文件。
显示任务返回的输出文件。

完成本快速入门后，你将了解 Batch 服务的关键概念，并准备好将 Batch 用于更现实、更大规模的工作负载。

先决条件

拥有有效订阅的 Azure 帐户。如果没有，请创建一个试用版订阅。
链接到 Azure 存储帐户的 Batch 帐户。可以使用以下任一方法创建帐户： Azure CLI | Azure 门户 | Bicep | ARM 模板 | Terraform。
Python 版本 3.8 或更高版本，其中包括 pip 包管理器。

运行应用

若要完成本快速入门，请下载或克隆 Python 应用、提供帐户值、运行应用并验证输出。

下载或克隆应用

从 GitHub 下载或克隆 Azure Batch Python 快速入门应用。使用以下命令通过 Git 客户端克隆应用存储库：
```
git clone https://github.com/Azure-Samples/batch-python-quickstart.git
```
切换到 batch-python-quickstart/src 文件夹，并使用 pip 安装所需的包。
```
pip install -r requirements.txt
```

提供帐户信息

Python 应用需要使用 Batch 和存储帐户名称、帐户密钥值和 Batch 帐户终结点。可以从 Azure 门户、Azure API 或命令行工具获取此信息。

若要从 Azure 门户获取帐户信息，请执行以下作：

在 Azure 搜索栏中，搜索并选择 Batch 帐户名称。
在 Batch 帐户页上，从左侧导航中选择 “密钥 ”。
在 “密钥 ”页上，复制以下值：

批量帐户
帐户终结点
主访问密钥
存储帐户名称
Key1

在下载的 Python 应用中，编辑 config.py 文件中的以下字符串，以提供复制的值。

BATCH_ACCOUNT_NAME = '<batch account>'
BATCH_ACCOUNT_KEY = '<primary access key>'
BATCH_ACCOUNT_URL = '<account endpoint>'
STORAGE_ACCOUNT_NAME = '<storage account name>'
STORAGE_ACCOUNT_KEY = '<key1>'

重要

不建议在应用源中公开帐户密钥以供生产使用。应限制对凭据的访问，并使用变量或配置文件在代码中引用凭据。最好将 Batch 和存储帐户密钥存储在 Azure Key Vault 中。

运行应用并查看输出

运行应用以查看 Batch 工作流的动作。

python python_quickstart_client.py

典型的运行时大约为三分钟。初始池节点设置花费的时间最多。

应用返回类似于以下示例的输出：

Sample start: 11/26/2012 4:02:54 PM

Uploading file taskdata0.txt to container [input]...
Uploading file taskdata1.txt to container [input]...
Uploading file taskdata2.txt to container [input]...
Creating pool [PythonQuickstartPool]...
Creating job [PythonQuickstartJob]...
Adding 3 tasks to job [PythonQuickstartJob]...
Monitoring all tasks for 'Completed' state, timeout in 00:30:00...

Monitoring all tasks for 'Completed' state, timeout in 00:30:00...池的计算节点启动时，会有一个暂停。创建任务后，Batch 会将任务排入队列，使其在池上运行。一旦第一个计算节点可用，第一个任务就会在节点上运行。可以从 Azure 门户中的 Batch 帐户页监视节点、任务和作业状态。

完成每个任务后，会看到类似于以下示例的输出：

Printing task output...
Task: Task0
Node: tvm-2850684224_3-20171205t000401z
Standard output:
Batch processing began with mainframe computers and punch cards. Today it still plays a central role...

查看代码

查看代码以了解 Azure Batch Python 快速入门中的步骤。

创建服务客户端并上传资源文件

该应用创建一个 BlobServiceClient 对象，以便与存储帐户交互。

blob_service_client = BlobServiceClient(
        account_url=f"https://{config.STORAGE_ACCOUNT_NAME}.{config.STORAGE_ACCOUNT_DOMAIN}/",
        credential=config.STORAGE_ACCOUNT_KEY
    )

应用使用 blob_service_client 引用在存储帐户中创建容器，并将数据文件上传到容器。存储中的文件被定义为 Batch ResourceFile 对象，Batch 之后可以将这些对象下载到计算节点。

input_file_paths = [os.path.join(sys.path[0], 'taskdata0.txt'),
                    os.path.join(sys.path[0], 'taskdata1.txt'),
                    os.path.join(sys.path[0], 'taskdata2.txt')]

input_files = [
    upload_file_to_container(blob_service_client, input_container_name, file_path)
    for file_path in input_file_paths]

该应用创建一个 BatchServiceClient 对象，用于在 Batch 帐户中创建和管理池、作业和任务。 Batch 客户端使用共享密钥身份验证。 Batch 还支持Microsoft Entra 身份验证。
```
credentials = SharedKeyCredentials(config.BATCH_ACCOUNT_NAME,
        config.BATCH_ACCOUNT_KEY)

    batch_client = BatchServiceClient(
        credentials,
        batch_url=config.BATCH_ACCOUNT_URL)
```

创建计算节点池

若要创建 Batch 池，应用使用 PoolAddParameter 类设置节点数、VM 大小和池配置。以下 VirtualMachineConfiguration 对象指定一个 ImageReference，用于引用 Ubuntu Server 20.04 LTS Azure 市场映像。 Batch 支持各种 Linux 和 Windows Server 市场映像，还支持自定义 VM 映像。

POOL_NODE_COUNT并且POOL_VM_SIZE是定义的常量。应用创建一个包含两个Standard_DS1_v2大小节点的池。这种大小为本快速启动的性能与成本提供了良好的平衡。

pool.add 方法将池提交到 Batch 服务。

new_pool = batchmodels.PoolAddParameter(
        id=pool_id,
        virtual_machine_configuration=batchmodels.VirtualMachineConfiguration(
            image_reference=batchmodels.ImageReference(
                publisher="canonical",
                offer="0001-com-ubuntu-server-focal",
                sku="22_04-lts",
                version="latest"
            ),
            node_agent_sku_id="batch.node.ubuntu 22.04"),
        vm_size=config.POOL_VM_SIZE,
        target_dedicated_nodes=config.POOL_NODE_COUNT
    )
    batch_service_client.pool.add(new_pool)

创建批处理作业

批处理作业是一个或多个任务的逻辑分组。该作业包含这些任务的公用设置，例如优先级以及运行任务的池。

应用使用 JobAddParameter 类在池中创建作业。 job.add 方法将作业添加到指定的 Batch 帐户。作业一开始没有任务。

job = batchmodels.JobAddParameter(
    id=job_id,
    pool_info=batchmodels.PoolInformation(pool_id=pool_id))

batch_service_client.job.add(job)

创建任务

Batch 提供了多种将应用和脚本部署到计算节点的方式。此应用使用 TaskAddParameter 类创建任务对象列表。每个任务都使用 command_line 参数来处理输入文件，以指定应用或脚本。

以下脚本通过运行 Bash shell resource_files 命令来处理输入cat对象以显示文本文件。然后，应用使用 task.add_collection 方法将每个任务添加到作业中，从而将这些任务排队在计算节点上运行。

tasks = []

for idx, input_file in enumerate(resource_input_files):
    command = f"/bin/bash -c \"cat {input_file.file_path}\""
    tasks.append(batchmodels.TaskAddParameter(
        id=f'Task{idx}',
        command_line=command,
        resource_files=[input_file]
    )
    )

batch_service_client.task.add_collection(job_id, tasks)

查看任务输出

应用监视任务状态，以确保任务完成。成功运行每个任务时，任务命令输出将写入 stdout.txt 文件。然后，该应用显示每个已完成任务的 stdout.txt 文件。

tasks = batch_service_client.task.list(job_id)

for task in tasks:

    node_id = batch_service_client.task.get(job_id, task.id).node_info.node_id
    print(f"Task: {task.id}")
    print(f"Node: {node_id}")

    stream = batch_service_client.file.get_from_task(
        job_id, task.id, config.STANDARD_OUT_FILE_NAME)

    file_text = _read_stream_as_string(
        stream,
        text_encoding)

    if text_encoding is None:
        text_encoding = DEFAULT_ENCODING

    sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = text_encoding)
    sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = text_encoding)

    print("Standard output:")
    print(file_text)

清理资源

应用会自动删除它创建的存储容器，并提供删除 Batch 池和作业的选项。池和节点会在节点运行时产生费用，即使它们未运行作业也是如此。如果不再需要池，请将其删除。

不再需要 Batch 资源时，可以删除包含它们的资源组。在 Azure 门户中，选择资源组页面顶部的“ 删除资源组 ”。在 “删除资源组 ”屏幕上，输入资源组名称，然后选择“ 删除”。

后续步骤

在本快速入门中，你运行了一个应用，该应用使用 Batch Python API 创建 Batch 池、节点、作业和任务。作业将资源文件上传到存储容器，在节点上运行任务，并显示节点的输出。

现在你已经了解了 Batch 服务的关键概念，可以将 Batch 用于更现实、更大规模的工作负载了。若要详细了解 Azure Batch 并使用实际应用程序完成并行工作负荷，请继续学习 Batch Python 教程。

使用 Python 处理并行工作负荷

通过