在 Batch 池中预配 Linux 计算节点Provision Linux compute nodes in Batch pools

可以使用 Azure Batch 在 Linux 和 Windows 虚拟机上运行并行计算工作负荷。You can use Azure Batch to run parallel compute workloads on both Linux and Windows virtual machines. 本文详细介绍如何使用 Batch PythonBatch .NET 客户端库在 Batch 服务中创建 Linux 计算节点池。This article details how to create pools of Linux compute nodes in the Batch service by using both the Batch Python and Batch .NET client libraries.

虚拟机配置Virtual Machine Configuration

在 Batch 中创建计算节点池时,可以使用两个选项来选择节点大小和操作系统:“云服务配置”和“虚拟机配置”。When you create a pool of compute nodes in Batch, you have two options from which to select the node size and operating system: Cloud Services Configuration and Virtual Machine Configuration. 大多数 Windows 计算节点池使用云服务配置,该配置指定池由 Azure 云服务节点组成。这些池只提供 Windows 计算节点。Most pools of Windows compute nodes use Cloud Services Configuration, which specifies that the pool is composed of Azure Cloud Services nodes.These pools provide only Windows compute nodes.

相比之下,虚拟机配置则指定池由 Azure VM 组成,这些 VM 可以从 Linux 或 Windows 映像中创建。In contrast, Virtual Machine Configuration specifies that the pool is composed of Azure VMs, which may be created from either Linux or Windows images. 在使用虚拟机配置创建池时,必须指定可用的计算节点大小、虚拟机映像引用和 Batch 节点代理 SKU(一个程序,它在每个节点上运行并在节点与 Batch 服务之间提供接口),以及将会安装在节点上的虚拟机映像引用。When you create a pool with Virtual Machine Configuration, you must specify an available compute node size, the virtual machine image reference,and the Batch node agent SKU (a program that runs on each node and provides an interface between the node and the Batch service), and the virtual machine image reference that will be installed on the nodes.

虚拟机映像引用Virtual machine image reference

Batch 服务使用虚拟机规模集提供虚拟机配置中的计算节点。The Batch service uses virtual machine scale sets to provide compute nodes in the Virtual Machine Configuration. 可以从 Azure 市场中指定映像,也可以使用共享映像库来准备自定义映像You can specify an image from the Azure Marketplace, or use the Shared Image Gallery to prepare a custom image.

在创建虚拟机映像引用时,必须指定以下属性:When you create a virtual machine image reference, you must specify the following properties:

映像引用属性Image reference property 示例Example
PublisherPublisher CanonicalCanonical
产品/服务Offer UbuntuServerUbuntuServer
SKUSKU 18.04-LTS18.04-LTS
版本Version 最新latest

提示

可以在使用 Azure CLI 在 Azure 市场中查找 Linux VM 映像中详细了解这些属性以及如何指定市场映像。You can learn more about these properties and how to specify Marketplace images in Find Linux VM images in the Azure Marketplace with the Azure CLI. 请注意,目前并非所有市场映像都与 Batch 兼容。Note that not all Marketplace images are currently compatible with Batch.

节点代理 SKUNode agent SKU

Batch 节点代理是一个程序,它在池中的每个节点上运行,并在节点与 Batch 服务之间提供命令和控制接口。The Batch node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. 节点代理对于不同操作系统有不同的实现(称为 SKU)。There are different implementations of the node agent, known as SKUs, for different operating systems. 从根本上讲,在创建虚拟机配置时,需要先指定虚拟机映像引用,然后指定要在其上安装映像的代理节点。Essentially, when you create a Virtual Machine Configuration, you first specify the virtual machine image reference, and then you specify the node agent to install on the image. 通常,每个节点代理 SKU 与多个虚拟机映像兼容。Typically, each node agent SKU is compatible with multiple virtual machine images. 下面是节点代理 SKU 的几个示例:Here are a few examples of node agent SKUs:

  • batch.node.ubuntu 18.04batch.node.ubuntu 18.04
  • batch.node.centos 7batch.node.centos 7
  • batch.node.windows amd64batch.node.windows amd64

虚拟机映像列表List of virtual machine images

并非所有市场映像都与当前可用的 Batch 节点代理兼容。Not all Marketplace images are compatible with the currently available Batch node agents. 若要列出 Batch 服务及其相应节点代理 SKU 支持的所有市场虚拟机映像,请使用 list_supported_images (Python)、ListSupportedImages (Batch .NET) 或其他语言 SDK 的相应 API。To list all supported Marketplace virtual machine images for the Batch service and their corresponding node agent SKUs, use list_supported_images (Python), ListSupportedImages (Batch .NET), or the corresponding API in another language SDK.

创建 Linux 池:Batch PythonCreate a Linux pool: Batch Python

以下代码片段示范如何使用用于 Python 的 Azure Batch 客户端库创建 Ubuntu Server 计算节点池。The following code snippet shows an example of how to use the Azure Batch Client Library for Python to create a pool of Ubuntu Server compute nodes. 有关 Batch Python 模块的更多详细信息,请查看参考文档For more details about the Batch Python module, view the reference documentation.

此代码片段显式创建 ImageReference,并指定它的每个属性(publisher、offer、SKU、version)。This snippet creates an ImageReference explicitly and specifies each of its properties (publisher, offer, SKU, version). 但是,我们建议在生产代码中使用 list_node_agent_skus 方法在运行时从可用映像和节点代理 SKU 组合中做出选择。In production code, however, we recommend that you use the list_supported_images method to select from the available image and node agent SKU combinations at runtime.

# Import the required modules from the
# Azure Batch Client Library for Python
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify Batch account credentials
account = "<batch-account-name>"
key = "<batch-account-key>"
batch_url = "<batch-account-url>"

# Pool settings
pool_id = "LinuxNodesSamplePoolPython"
vm_size = "STANDARD_D2_V3"
node_count = 1

# Initialize the Batch client
creds = batchauth.SharedKeyCredentials(account, key)
config = batch.BatchServiceClientConfiguration(creds, batch_url)
client = batch.BatchServiceClient(creds, batch_url)

# Create the unbound pool
new_pool = batchmodels.PoolAddParameter(id=pool_id, vm_size=vm_size)
new_pool.target_dedicated = node_count

# Configure the start task for the pool
start_task = batchmodels.StartTask()
start_task.run_elevated = True
start_task.command_line = "printenv AZ_BATCH_NODE_STARTUP_DIR"
new_pool.start_task = start_task

# Create an ImageReference which specifies the Marketplace
# virtual machine image to install on the nodes
ir = batchmodels.ImageReference(
    publisher="Canonical",
    offer="UbuntuServer",
    sku="18.04-LTS",
    version="latest")

# Create the VirtualMachineConfiguration, specifying
# the VM image reference and the Batch node agent
# to install on the node
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=ir,
    node_agent_sku_id="batch.node.ubuntu 18.04")

# Assign the virtual machine configuration to the pool
new_pool.virtual_machine_configuration = vmc

# Create pool in the Batch service
client.pool.add(new_pool)

如上所述,建议使用 list_supported_images 方法从当前支持的节点代理/市场映像组合中进行动态选择(而不是显式创建 ImageReference)。As mentioned earlier, we recommend using the list_supported_images method to dynamically select from the currently supported node agent/Marketplace image combinations (rather than creating an ImageReference explicitly). 以下 Python 代码片段演示如何使用此方法。The following Python snippet shows how to use this method.

# Get the list of supported images from the Batch service
images = client.account.list_supported_images()

# Obtain the desired image reference
image = None
for img in images:
  if (img.image_reference.publisher.lower() == "canonical" and
        img.image_reference.offer.lower() == "ubuntuserver" and
        img.image_reference.sku.lower() == "18.04-lts"):
    image = img
    break

if image is None:
  raise RuntimeError('invalid image reference for desired configuration')

# Create the VirtualMachineConfiguration, specifying the VM image
# reference and the Batch node agent to be installed on the node
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=image.image_reference,
    node_agent_sku_id=image.node_agent_sku_id)

创建 Linux 池:批处理 .NETCreate a Linux pool: Batch .NET

以下代码片段示范如何使用 Batch .NET 客户端库创建 Ubuntu Server 计算节点池。The following code snippet shows an example of how to use the Batch .NET client library to create a pool of Ubuntu Server compute nodes. 有关 Batch .NET 的更多详细信息,请查看参考文档For more details about Batch .NET, view the reference documentation.

以下代码片段使用 PoolOperations.ListSupportedImages 方法从当前支持的市场映像和节点代理 SKU 组合列表中进行选择。The following code snippet uses the PoolOperations.ListSupportedImages method to select from the list of currently supported Marketplace image and node agent SKU combinations. 建议使用这种方法,因为受支持组合的列表有时会发生变化。This technique is recommended, because the list of supported combinations may change from time to time. 通常情况下,添加支持的组合。Most commonly, supported combinations are added.

// Pool settings
const string poolId = "LinuxNodesSamplePoolDotNet";
const string vmSize = "STANDARD_D2_V3";
const int nodeCount = 1;

// Obtain a collection of all available node agent SKUs.
// This allows us to select from a list of supported
// VM image/node agent combinations.
List<ImageInformation> images =
    batchClient.PoolOperations.ListSupportedImages().ToList();

// Find the appropriate image information
ImageInformation image = null;
foreach (var img in images)
{
    if (img.ImageReference.Publisher == "Canonical" &&
        img.ImageReference.Offer == "UbuntuServer" &&
        img.ImageReference.Sku == "18.04-LTS")
    {
        image = img;
        break;
    }
}

// Create the VirtualMachineConfiguration for use when actually
// creating the pool
VirtualMachineConfiguration virtualMachineConfiguration =
    new VirtualMachineConfiguration(image.ImageReference, image.NodeAgentSkuId);

// Create the unbound pool object using the VirtualMachineConfiguration
// created above
CloudPool pool = batchClient.PoolOperations.CreatePool(
    poolId: poolId,
    virtualMachineSize: vmSize,
    virtualMachineConfiguration: virtualMachineConfiguration,
    targetDedicatedComputeNodes: nodeCount);

// Commit the pool to the Batch service
await pool.CommitAsync();

尽管上述代码片段是使用 PoolOperations.istSupportedImages 方法来动态列出支持的映像和节点代理 SKU 组合并从中做出选择(建议的做法),但也可以显式配置 ImageReferenceAlthough the previous snippet uses the PoolOperations.istSupportedImages method to dynamically list and select from supported image and node agent SKU combinations (recommended), you can also configure an ImageReference explicitly:

ImageReference imageReference = new ImageReference(
    publisher: "Canonical",
    offer: "UbuntuServer",
    sku: "18.04-LTS",
    version: "latest");

使用 SSH 连接到 Linux 节点Connect to Linux nodes using SSH

在开发期间或进行故障排除时,可能会发现需要登录到池中的节点。During development or while troubleshooting, you may find it necessary to sign in to the nodes in your pool. 与 Windows 计算节点不同,你无法使用远程桌面协议 (RDP) 来连接到 Linux 节点。Unlike Windows compute nodes, you can't use Remote Desktop Protocol (RDP) to connect to Linux nodes. 相反,Batch 服务在每个节点上启用 SSH 访问以建立远程连接。Instead, the Batch service enables SSH access on each node for remote connection.

以下 Python 代码片段会在池中的每个节点上创建一个用户(远程连接时需要)。The following Python code snippet creates a user on each node in a pool, which is required for remote connection. 然后列显每个节点的安全外壳 (SSH) 连接信息。It then prints the secure shell (SSH) connection information for each node.

import datetime
import getpass
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify your own account credentials
batch_account_name = ''
batch_account_key = ''
batch_account_url = ''

# Specify the ID of an existing pool containing Linux nodes
# currently in the 'idle' state
pool_id = ''

# Specify the username and prompt for a password
username = 'linuxuser'
password = getpass.getpass()

# Create a BatchClient
credentials = batchauth.SharedKeyCredentials(
    batch_account_name,
    batch_account_key
)
batch_client = batch.BatchServiceClient(
    credentials,
    base_url=batch_account_url
)

# Create the user that will be added to each node in the pool
user = batchmodels.ComputeNodeUser(username)
user.password = password
user.is_admin = True
user.expiry_time = \
    (datetime.datetime.today() + datetime.timedelta(days=30)).isoformat()

# Get the list of nodes in the pool
nodes = batch_client.compute_node.list(pool_id)

# Add the user to each node in the pool and print
# the connection information for the node
for node in nodes:
    # Add the user to the node
    batch_client.compute_node.add_user(pool_id, node.id, user)

    # Obtain SSH login information for the node
    login = batch_client.compute_node.get_remote_login_settings(pool_id,
                                                                node.id)

    # Print the connection info for the node
    print("{0} | {1} | {2} | {3}".format(node.id,
                                         node.state,
                                         login.remote_login_ip_address,
                                         login.remote_login_port))

此代码将会有类似于以下示例的输出。This code will have output similar to the following example. 在本例中,该池包含四个 Linux 节点。In this case, the pool contains four Linux nodes.

Password:
tvm-1219235766_1-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50000
tvm-1219235766_2-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50003
tvm-1219235766_3-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50002
tvm-1219235766_4-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50001

在节点上创建用户时不需要指定密码,而可以指定 SSH 公钥。Instead of a password, you can specify an SSH public key when you create a user on a node. 在 Python SDK 中,请在 ComputeNodeUser 上使用 ssh_public_key 参数。In the Python SDK, use the ssh_public_key parameter on ComputeNodeUser. 在 .NET 中,请使用 ComputeNodeUser.SshPublicKey 属性。In .NET, use the ComputeNodeUser.SshPublicKey property.

定价Pricing

Azure Batch 构建在 Azure 云服务和 Azure 虚拟机技术基础之上。Azure Batch is built on Azure Cloud Services and Azure Virtual Machines technology. Batch 服务本身是免费提供的,这意味着,只需支付 Batch 解决方案使用的计算资源费用(以及产生的相关费用)。The Batch service itself is offered at no cost, which means you are charged only for the compute resources (and associated costs that entails) that your Batch solutions consume. 如果选择“虚拟机配置”,系统会根据虚拟机定价结构收费。When you choose Virtual Machine Configuration, you are charged based on the Virtual Machines pricing structure.

如果使用应用程序包将应用程序部署到 Batch 节点,系统还会对应用程序包使用的 Azure 存储资源收费。If you deploy applications to your Batch nodes using application packages, you are also charged for the Azure Storage resources that your application packages consume.

后续步骤Next steps