Provision Linux compute nodes in Batch pools

You can use Azure Batch to run parallel compute workloads on both Linux and Windows virtual machines. This article details how to create pools of Linux compute nodes in the Batch service by using both the Batch Python and Batch .NET client libraries.

Virtual Machine Configuration

When you create a pool of compute nodes in Batch, you have two options from which to select the node size and operating system: Cloud Services Configuration and Virtual Machine Configuration. Virtual Machine Configuration pools are composed of Azure VMs, which may be created from either Linux or Windows images. When you create a pool with Virtual Machine Configuration, you specify an available compute node size, the virtual machine image reference to be installed on the nodes,and the Batch node agent SKU (a program that runs on each node and provides an interface between the node and the Batch service).

Virtual machine image reference

The Batch service uses virtual machine scale sets to provide compute nodes in the Virtual Machine Configuration. You can specify an image from the Azure Marketplace, or use the Azure Compute Gallery to prepare a custom image.

When you create a virtual machine image reference, you must specify the following properties:

Image reference property Example
Publisher canonical
Offer 0001-com-ubuntu-server-focal
SKU 20_04-lts
Version latest

Tip

You can learn more about these properties and how to specify Marketplace images in Find Linux VM images in the Azure Marketplace with the Azure CLI. Note that some Marketplace images are not currently compatible with Batch.

List of virtual machine images

Not all Marketplace images are compatible with the currently available Batch node agents. To list all supported Marketplace virtual machine images for the Batch service and their corresponding node agent SKUs, use list_supported_images (Python), ListSupportedImages (Batch .NET), or the corresponding API in another language SDK.

Node agent SKU

The Batch node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. There are different implementations of the node agent, known as SKUs, for different operating systems. Essentially, when you create a Virtual Machine Configuration, you first specify the virtual machine image reference, and then you specify the node agent to install on the image. Typically, each node agent SKU is compatible with multiple virtual machine images. Here are a few examples of node agent SKUs:

  • batch.node.ubuntu 18.04
  • batch.node.centos 7
  • batch.node.windows amd64

Create a Linux pool: Batch Python

The following code snippet shows an example of how to use the Azure Batch Client Library for Python to create a pool of Ubuntu Server compute nodes. For more details about the Batch Python module, view the reference documentation.

This snippet creates an ImageReference explicitly and specifies each of its properties (publisher, offer, SKU, version). In production code, however, we recommend that you use the list_supported_images method to select from the available image and node agent SKU combinations at runtime.

# Import the required modules from the
# Azure Batch Client Library for Python
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify Batch account credentials
account = "<batch-account-name>"
key = "<batch-account-key>"
batch_url = "<batch-account-url>"

# Pool settings
pool_id = "LinuxNodesSamplePoolPython"
vm_size = "STANDARD_D2_V3"
node_count = 1

# Initialize the Batch client
creds = batchauth.SharedKeyCredentials(account, key)
config = batch.BatchServiceClientConfiguration(creds, batch_url)
client = batch.BatchServiceClient(creds, batch_url)

# Create the unbound pool
new_pool = batchmodels.PoolAddParameter(id=pool_id, vm_size=vm_size)
new_pool.target_dedicated = node_count

# Configure the start task for the pool
start_task = batchmodels.StartTask()
start_task.run_elevated = True
start_task.command_line = "printenv AZ_BATCH_NODE_STARTUP_DIR"
new_pool.start_task = start_task

# Create an ImageReference which specifies the Marketplace
# virtual machine image to install on the nodes
ir = batchmodels.ImageReference(
    publisher="canonical",
    offer="0001-com-ubuntu-server-focal",
    sku="20_04-lts",
    version="latest")

# Create the VirtualMachineConfiguration, specifying
# the VM image reference and the Batch node agent
# to install on the node
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=ir,
    node_agent_sku_id="batch.node.ubuntu 20.04")

# Assign the virtual machine configuration to the pool
new_pool.virtual_machine_configuration = vmc

# Create pool in the Batch service
client.pool.add(new_pool)

As mentioned earlier, we recommend using the list_supported_images method to dynamically select from the currently supported node agent/Marketplace image combinations (rather than creating an ImageReference explicitly). The following Python snippet shows how to use this method.

# Get the list of supported images from the Batch service
images = client.account.list_supported_images()

# Obtain the desired image reference
image = None
for img in images:
  if (img.image_reference.publisher.lower() == "canonical" and
        img.image_reference.offer.lower() == "0001-com-ubuntu-server-focal" and
        img.image_reference.sku.lower() == "20_04-lts"):
    image = img
    break

if image is None:
  raise RuntimeError('invalid image reference for desired configuration')

# Create the VirtualMachineConfiguration, specifying the VM image
# reference and the Batch node agent to be installed on the node
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=image.image_reference,
    node_agent_sku_id=image.node_agent_sku_id)

Create a Linux pool: Batch .NET

The following code snippet shows an example of how to use the Batch .NET client library to create a pool of Ubuntu Server compute nodes. For more details about Batch .NET, view the reference documentation.

The following code snippet uses the PoolOperations.ListSupportedImages method to select from the list of currently supported Marketplace image and node agent SKU combinations. This technique is recommended, because the list of supported combinations may change from time to time. Most commonly, supported combinations are added.

// Pool settings
const string poolId = "LinuxNodesSamplePoolDotNet";
const string vmSize = "STANDARD_D2_V3";
const int nodeCount = 1;

// Obtain a collection of all available node agent SKUs.
// This allows us to select from a list of supported
// VM image/node agent combinations.
List<ImageInformation> images =
    batchClient.PoolOperations.ListSupportedImages().ToList();

// Find the appropriate image information
ImageInformation image = null;
foreach (var img in images)
{
    if (img.ImageReference.Publisher == "canonical" &&
        img.ImageReference.Offer == "0001-com-ubuntu-server-focal" &&
        img.ImageReference.Sku == "20_04-lts")
    {
        image = img;
        break;
    }
}

// Create the VirtualMachineConfiguration for use when actually
// creating the pool
VirtualMachineConfiguration virtualMachineConfiguration =
    new VirtualMachineConfiguration(image.ImageReference, image.NodeAgentSkuId);

// Create the unbound pool object using the VirtualMachineConfiguration
// created above
CloudPool pool = batchClient.PoolOperations.CreatePool(
    poolId: poolId,
    virtualMachineSize: vmSize,
    virtualMachineConfiguration: virtualMachineConfiguration,
    targetDedicatedComputeNodes: nodeCount);

// Commit the pool to the Batch service
await pool.CommitAsync();

Although the previous snippet uses the PoolOperations.istSupportedImages method to dynamically list and select from supported image and node agent SKU combinations (recommended), you can also configure an ImageReference explicitly:

ImageReference imageReference = new ImageReference(
    publisher: "canonical",
    offer: "0001-com-ubuntu-server-focal",
    sku: "20_04-lts",
    version: "latest");

Connect to Linux nodes using SSH

During development or while troubleshooting, you may find it necessary to sign in to the nodes in your pool. Unlike Windows compute nodes, you can't use Remote Desktop Protocol (RDP) to connect to Linux nodes. Instead, the Batch service enables SSH access on each node for remote connection.

The following Python code snippet creates a user on each node in a pool, which is required for remote connection. It then prints the secure shell (SSH) connection information for each node.

import datetime
import getpass
import azure.batch.batch_service_client as batch
import azure.batch.batch_auth as batchauth
import azure.batch.models as batchmodels

# Specify your own account credentials
batch_account_name = ''
batch_account_key = ''
batch_account_url = ''

# Specify the ID of an existing pool containing Linux nodes
# currently in the 'idle' state
pool_id = ''

# Specify the username and prompt for a password
username = 'linuxuser'
password = getpass.getpass()

# Create a BatchClient
credentials = batchauth.SharedKeyCredentials(
    batch_account_name,
    batch_account_key
)
batch_client = batch.BatchServiceClient(
    credentials,
    base_url=batch_account_url
)

# Create the user that will be added to each node in the pool
user = batchmodels.ComputeNodeUser(username)
user.password = password
user.is_admin = True
user.expiry_time = \
    (datetime.datetime.today() + datetime.timedelta(days=30)).isoformat()

# Get the list of nodes in the pool
nodes = batch_client.compute_node.list(pool_id)

# Add the user to each node in the pool and print
# the connection information for the node
for node in nodes:
    # Add the user to the node
    batch_client.compute_node.add_user(pool_id, node.id, user)

    # Obtain SSH login information for the node
    login = batch_client.compute_node.get_remote_login_settings(pool_id,
                                                                node.id)

    # Print the connection info for the node
    print("{0} | {1} | {2} | {3}".format(node.id,
                                         node.state,
                                         login.remote_login_ip_address,
                                         login.remote_login_port))

This code will have output similar to the following example. In this case, the pool contains four Linux nodes.

Password:
tvm-1219235766_1-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50000
tvm-1219235766_2-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50003
tvm-1219235766_3-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50002
tvm-1219235766_4-20160414t192511z | ComputeNodeState.idle | 13.91.7.57 | 50001

Instead of a password, you can specify an SSH public key when you create a user on a node.

In the Python SDK, use the ssh_public_key parameter on ComputeNodeUser.

In .NET, use the ComputeNodeUser.SshPublicKey property.

Pricing

Azure Batch is built on Azure Cloud Services and Azure Virtual Machines technology. The Batch service itself is offered at no cost, which means you are charged only for the compute resources (and associated costs that entails) that your Batch solutions consume. When you choose Virtual Machine Configuration, you are charged based on the Virtual Machines pricing structure.

If you deploy applications to your Batch nodes using application packages, you are also charged for the Azure Storage resources that your application packages consume.

Next steps