在 Batch 池中预配 Linux 计算节点Provision Linux compute nodes in Batch pools

可以使用 Azure Batch 在 Linux 和 Windows 虚拟机上运行并行计算工作负荷。You can use Azure Batch to run parallel compute workloads on both Linux and Windows virtual machines. 本文详细介绍如何使用 Batch PythonBatch .NET 客户端库在 Batch 服务中创建 Linux 计算节点池。This article details how to create pools of Linux compute nodes in the Batch service by using both the Batch Python and Batch .NET client libraries.

备注

在 2017 年 7 月 5 日以后创建的所有 Batch 池都支持应用程序包。Application packages are supported on all Batch pools created after 5 July 2017. 在 2016 年 3 月 10 日和 2017 年 7 月 5 日期间创建的 Batch 池也支持应用程序包,但前提是该池是使用云服务配置创建的。They are supported on Batch pools created between 10 March 2016 and 5 July 2017 only if the pool was created using a Cloud Service configuration. 在 2016 年 3 月 10 日以前创建的 Batch 池不支持应用程序包。Batch pools created prior to 10 March 2016 do not support application packages. 若要详细了解如何使用应用程序包将应用程序部署到 Batch 节点,请参阅使用 Batch 应用程序包将应用程序部署到计算节点For more information about using application packages to deploy your applications to your Batch nodes, see Deploy applications to compute nodes with Batch application packages.

虚拟机配置Virtual machine configuration

在 Batch 中创建计算节点池时,可以使用两个选项来选择节点大小和操作系统:“云服务配置”和“虚拟机配置”。When you create a pool of compute nodes in Batch, you have two options from which to select the node size and operating system: Cloud Services Configuration and Virtual Machine Configuration.

“云服务配置”提供 Windows 计算节点。Cloud Services Configuration provides Windows compute nodes only. Sizes for Cloud Services(云服务的大小)中列出了可用的计算节点大小,Azure Guest OS releases and SDK compatibility matrix(Azure 来宾 OS 版本和 SDK 兼容性对照表)中列出了可用的操作系统。Available compute node sizes are listed in Sizes for Cloud Services, and available operating systems are listed in the Azure Guest OS releases and SDK compatibility matrix. 创建包含 Azure 云服务节点的池时,需指定上述文章中所述的节点大小和 OS 系列。When you create a pool that contains Azure Cloud Services nodes, you specify the node size and the OS family, which are described in the previously mentioned articles. 对于 Windows 计算节点池,最常使用的是云服务。For pools of Windows compute nodes, Cloud Services is most commonly used.

“虚拟机配置”为计算节点提供 Linux 和 Windows 映像。Virtual Machine Configuration provides both Linux and Windows images for compute nodes. Sizes for virtual machines in Azure(Azure 中虚拟机的大小)(Linux) 和 Sizes for virtual machines in Azure(Azure 中虚拟机的大小)(Windows) 中列出了可用的计算节点大小。Available compute node sizes are listed in Sizes for virtual machines in Azure (Linux) and Sizes for virtual machines in Azure (Windows). 创建包含虚拟机配置节点的池时,必须指定节点的大小、虚拟机映像引用,以及要在节点上安装的 Batch 节点代理 SKU。When you create a pool that contains Virtual Machine Configuration nodes, you must specify the size of the nodes, the virtual machine image reference, and the Batch node agent SKU to be installed on the nodes.

虚拟机映像引用Virtual machine image reference

Batch 服务使用虚拟机规模集提供虚拟机配置中的计算节点。The Batch service uses virtual machine scale sets to provide compute nodes in the Virtual Machine Configuration. 可指定 Azure 市场中的一个映像,或者提供一个准备好的自定义映像。You can specify an image from the Azure Marketplace, or provide a custom image that you have prepared. 有关自定义映像的详细信息,请参阅使用自定义映像创建池For more details about custom images, see Create a pool with a custom image.

配置虚拟机映像引用时,需指定虚拟机映像的属性。When you configure a virtual machine image reference, you specify the properties of the virtual machine image. 创建虚拟机映像引用时,需提供以下属性:The following properties are required when you create a virtual machine image reference:

映像引用属性Image reference properties 示例Example
PublisherPublisher CanonicalCanonical
产品/服务Offer UbuntuServerUbuntuServer
SKUSKU 14.04.4-LTS14.04.4-LTS
版本Version 最新latest

提示

可以在 Navigate and select Linux virtual machine images in Azure with CLI or PowerShell(使用 CLI 或 PowerShell 在 Azure 中导航和选择 Linux 虚拟机映像)中详细了解这些属性,以及如何列出市场映像。You can learn more about these properties and how to list Marketplace images in Navigate and select Linux virtual machine images in Azure with CLI or PowerShell. 请注意,目前并非所有市场映像都与 Batch 兼容。Note that not all Marketplace images are currently compatible with Batch. 有关详细信息,请参阅 节点代理 SKUFor more information, see Node agent SKU.

节点代理 SKUNode agent SKU

Batch 节点代理是一个程序,它在池中的每个节点上运行,并在节点与 Batch 服务之间提供命令和控制接口。The Batch node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. 节点代理对于不同操作系统有不同的实现(称为 SKU)。There are different implementations of the node agent, known as SKUs, for different operating systems. 从根本上讲,在创建虚拟机配置时,需要先指定虚拟机映像引用,并指定要在其上安装映像的代理节点。Essentially, when you create a Virtual Machine Configuration, you first specify the virtual machine image reference, and then you specify the node agent to install on the image. 通常,每个节点代理 SKU 与多个虚拟机映像兼容。Typically, each node agent SKU is compatible with multiple virtual machine images. 下面是节点代理 SKU 的几个示例:Here are a few examples of node agent SKUs:

  • batch.node.ubuntu 14.04batch.node.ubuntu 14.04
  • batch.node.centos 7batch.node.centos 7
  • batch.node.windows amd64batch.node.windows amd64

重要

并非市场中的所有可用虚拟机映像都与当前可用的 Batch 节点代理兼容。Not all virtual machine images that are available in the Marketplace are compatible with the currently available Batch node agents. 使用 Batch SDK 列出可用的节点代理 SKU 及其兼容的虚拟机映像。Use the Batch SDKs to list the available node agent SKUs and the virtual machine images with which they are compatible. 有关详细信息以及如何在运行时检索有效映像列表的示例,请参阅本文后半部分的虚拟机映像列表See the List of Virtual Machine images later in this article for more information and examples of how to retrieve a list of valid images at runtime.

创建 Linux 池:Batch PythonCreate a Linux pool: Batch Python

以下代码片段示范如何使用 用于 Python 的 Azure Batch 客户端库 创建 Ubuntu Server 计算节点池。The following code snippet shows an example of how to use the Azure Batch Client Library for Python to create a pool of Ubuntu Server compute nodes. 有关 Batch Python 模块的参考文档可在“阅读文档”上的 azure.batch package 处找到。Reference documentation for the Batch Python module can be found at azure.batch package on Read the Docs.

此代码片段显式创建 ImageReference,并指定它的每个属性(publisher、offer、SKU、version)。This snippet creates an ImageReference explicitly and specifies each of its properties (publisher, offer, SKU, version). 但是,我们建议在生产代码中使用 [list_node_agent_skus][py_list_skus] 方法在运行时从可用映像和节点代理 SKU 组合中做出决定和选择。In production code, however, we recommend that you use the [list_node_agent_skus][py_list_skus] method to determine and select from the available image and node agent SKU combinations at runtime.

# Import the required modules from the
# Azure Batch Client Library for Python
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify Batch account credentials
account = "<batch-account-name>"
key = "<batch-account-key>"
batch_url = "<batch-account-url>"

# Pool settings
pool_id = "LinuxNodesSamplePoolPython"
vm_size = "STANDARD_D2_V3"
node_count = 1

# Initialize the Batch client
creds = batchauth.SharedKeyCredentials(account, key)
config = batch.BatchServiceClientConfiguration(creds, batch_url)
client = batch.BatchServiceClient(creds, batch_url)

# Create the unbound pool
new_pool = batchmodels.PoolAddParameter(id=pool_id, vm_size=vm_size)
new_pool.target_dedicated = node_count

# Configure the start task for the pool
start_task = batchmodels.StartTask()
start_task.run_elevated = True
start_task.command_line = "printenv AZ_BATCH_NODE_STARTUP_DIR"
new_pool.start_task = start_task

# Create an ImageReference which specifies the Marketplace
# virtual machine image to install on the nodes.
ir = batchmodels.ImageReference(
    publisher="Canonical",
    offer="UbuntuServer",
    sku="18.04-LTS",
    version="latest")

# Create the VirtualMachineConfiguration, specifying
# the VM image reference and the Batch node agent to
# be installed on the node.
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=ir,
    node_agent_sku_id="batch.node.ubuntu 18.04")

# Assign the virtual machine configuration to the pool
new_pool.virtual_machine_configuration = vmc

# Create pool in the Batch service
client.pool.add(new_pool)

如上所述,我们建议不要显式创建 ImageReference,而使用 [list_supported_images][py_list_supported_images] 方法,以从当前支持的节点代理/市场映像组合中进行动态选择。As mentioned previously, we recommend that instead of creating the ImageReference explicitly, you use the [list_supported_images][py_list_supported_images] method to dynamically select from the currently supported node agent/Marketplace image combinations. 以下 Python 代码片段演示如何使用此方法。The following Python snippet shows how to use this method.

# Get the list of supported images from the Batch service
images = client.account.list_supported_images()

# Obtain the desired image reference
image = None
for img in images:
  if (img.image_reference.publisher.lower() == "canonical" and
        img.image_reference.offer.lower() == "ubuntuserver" and
        img.image_reference.sku.lower() == "18.04-lts"):
    image = img
    break

if image is None:
  raise RuntimeError('invalid image reference for desired configuration')

# Create the VirtualMachineConfiguration, specifying the VM image
# reference and the Batch node agent to be installed on the node.
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=image.image_reference,
    node_agent_sku_id=image.node_agent_sku_id)

创建 Linux 池:Batch .NETCreate a Linux pool: Batch .NET

以下代码片段示范如何使用 Batch .NET 客户端库创建 Ubuntu Server 计算节点池。The following code snippet shows an example of how to use the Batch .NET client library to create a pool of Ubuntu Server compute nodes. 可以在 docs.microsoft.com 上找到 Batch .NET 参考文档You can find the Batch .NET reference documentation on docs.microsoft.com.

以下代码片段使用 PoolOperations.[ListSupportedImages][net_list_supported_images] 方法从当前支持的市场映像和节点代理 SKU 组合列表中进行选择。The following code snippet uses the PoolOperations.[ListSupportedImages][net_list_supported_images] method to select from the list of currently supported Marketplace image and node agent SKU combinations. 这种做法非常有效,因为支持的组合列表可能随着时间改变。This technique is desirable because the list of supported combinations may change from time to time. 通常情况下,添加支持的组合。Most commonly, supported combinations are added.

// Pool settings
const string poolId = "LinuxNodesSamplePoolDotNet";
const string vmSize = "STANDARD_D2_V3";
const int nodeCount = 1;

// Obtain a collection of all available node agent SKUs.
// This allows us to select from a list of supported
// VM image/node agent combinations.
List<ImageInformation> images =
    batchClient.PoolOperations.ListSupportedImages().ToList();

// Find the appropriate image information
ImageInformation image = null;
foreach (var img in images)
{
    if (img.ImageReference.Publisher == "Canonical" &&
        img.ImageReference.Offer == "UbuntuServer" &&
        img.ImageReference.Sku == "18.04-LTS")
    {
        image = img;
        break;
    }
}

// Create the VirtualMachineConfiguration for use when actually
// creating the pool
VirtualMachineConfiguration virtualMachineConfiguration =
    new VirtualMachineConfiguration(image.ImageReference, image.NodeAgentSkuId);

// Create the unbound pool object using the VirtualMachineConfiguration
// created above
CloudPool pool = batchClient.PoolOperations.CreatePool(
    poolId: poolId,
    virtualMachineSize: vmSize,
    virtualMachineConfiguration: virtualMachineConfiguration,
    targetDedicatedComputeNodes: nodeCount);

// Commit the pool to the Batch service
await pool.CommitAsync();

尽管上述代码片段使用 PoolOperations.[ListSupportedImages][net_list_supported_images] 方法动态列出了支持的映像和节点代理 SKU 组合并从中做出选择(建议的做法),但也可以显式配置 ImageReferenceAlthough the previous snippet uses the PoolOperations.[ListSupportedImages][net_list_supported_images] method to dynamically list and select from supported image and node agent SKU combinations (recommended), you can also configure an ImageReference explicitly:

ImageReference imageReference = new ImageReference(
    publisher: "Canonical",
    offer: "UbuntuServer",
    sku: "18.04-LTS",
    version: "latest");

虚拟机映像列表List of virtual machine images

若要获取 Batch 服务及其相应节点代理支持的所有市场虚拟机映像的列表,请利用[list_supported_images][py_list_supported_images] (Python)、[ListSupportedImages][net_list_supported_images] (Batch .NET) 或所选的各个语言 SDK 中相应的 API。To obtain the list of all supported Marketplace virtual machine images for the Batch service and their corresponding node agents, please leverage the [list_supported_images][py_list_supported_images] (Python), [ListSupportedImages][net_list_supported_images] (Batch .NET) or the corresponding API in the respective language SDK of your choosing.

使用 SSH 连接到 Linux 节点Connect to Linux nodes using SSH

在开发期间或进行故障排除时,可能会发现需要登录到池中的节点。During development or while troubleshooting, you may find it necessary to sign in to the nodes in your pool. 与 Windows 计算节点不同,无法使用远程桌面协议 (RDP) 连接到 Linux 节点。Unlike Windows compute nodes, you cannot use Remote Desktop Protocol (RDP) to connect to Linux nodes. 相反,Batch 服务在每个节点上启用 SSH 访问以建立远程连接。Instead, the Batch service enables SSH access on each node for remote connection.

以下 Python 代码片段会在池中的每个节点上创建一个用户(远程连接时需要)。The following Python code snippet creates a user on each node in a pool, which is required for remote connection. 然后列显每个节点的安全外壳 (SSH) 连接信息。It then prints the secure shell (SSH) connection information for each node.

import datetime
import getpass
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify your own account credentials
batch_account_name = ''
batch_account_key = ''
batch_account_url = ''

# Specify the ID of an existing pool containing Linux nodes
# currently in the 'idle' state
pool_id = ''

# Specify the username and prompt for a password
username = 'linuxuser'
password = getpass.getpass()

# Create a BatchClient
credentials = batchauth.SharedKeyCredentials(
    batch_account_name,
    batch_account_key
)
batch_client = batch.BatchServiceClient(
    credentials,
    base_url=batch_account_url
)

# Create the user that will be added to each node in the pool
user = batchmodels.ComputeNodeUser(username)
user.password = password
user.is_admin = True
user.expiry_time = \
    (datetime.datetime.today() + datetime.timedelta(days=30)).isoformat()

# Get the list of nodes in the pool
nodes = batch_client.compute_node.list(pool_id)

# Add the user to each node in the pool and print
# the connection information for the node
for node in nodes:
    # Add the user to the node
    batch_client.compute_node.add_user(pool_id, node.id, user)

    # Obtain SSH login information for the node
    login = batch_client.compute_node.get_remote_login_settings(pool_id,
                                                                node.id)

    # Print the connection info for the node
    print("{0} | {1} | {2} | {3}".format(node.id,
                                         node.state,
                                         login.remote_login_ip_address,
                                         login.remote_login_port))

下面是针对包含四个 Linux 节点的池运行上述代码后的示例输出:Here is sample output for the previous code for a pool that contains four Linux nodes:

Password:
tvm-1219235766_1-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50000
tvm-1219235766_2-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50003
tvm-1219235766_3-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50002
tvm-1219235766_4-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50001

在节点上创建用户时不需要指定密码,而可以指定 SSH 公钥。Instead of a password, you can specify an SSH public key when you create a user on a node. 在 Python SDK 中,请在 ComputeNodeUser 上使用 ssh_public_key 参数。In the Python SDK, use the ssh_public_key parameter on ComputeNodeUser. 在 .NET 中,请使用 ComputeNodeUser.SshPublicKey 属性。In .NET, use the ComputeNodeUser.SshPublicKey property.

定价Pricing

Azure Batch 构建在 Azure 云服务和 Azure 虚拟机技术基础之上。Azure Batch is built on Azure Cloud Services and Azure Virtual Machines technology. Batch 服务本身是免费提供的,这意味着你仅需为 Batch 解决方案使用的计算资源(及其包含的相关费用)付费。The Batch service itself is offered at no cost, which means you are charged only for the compute resources (and associated costs that entails) that your Batch solutions consume. 如果选择“云服务配置” ,则要根据云服务定价结构付费。When you choose Cloud Services Configuration, you are charged based on the Cloud Services pricing structure. 如果选择“虚拟机配置” ,则要根据虚拟机定价结构收费。When you choose Virtual Machine Configuration, you are charged based on the Virtual Machines pricing structure.

如果使用应用程序包将应用程序部署到 Batch 节点,系统还会对应用程序包使用的 Azure 存储资源收费。If you deploy applications to your Batch nodes using application packages, you are also charged for the Azure Storage resources that your application packages consume.

后续步骤Next steps

GitHub 上 azure-batch-samples 存储库中的 Python 代码示例包含演示如何执行常见 Batch 操作(例如创建池、作业和任务)的多个脚本。The Python code samples in the azure-batch-samples repository on GitHub contain scripts that show you how to perform common Batch operations, such as pool, job, and task creation. Python 示例随附的 README 文件包含有关如何安装所需包的详细信息。The README that accompanies the Python samples has details about how to install the required packages.