Manage node pools for a cluster in Azure Kubernetes Service (AKS)

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. When you create an AKS cluster, you define the initial number of nodes and their size (SKU). As application demands change, you may need to change the settings on your node pools. For example, you may need to scale the number of nodes in a node pool or upgrade the Kubernetes version of a node pool.

This article shows you how to manage one or more node pools in an AKS cluster.

Before you begin

Limitations

The following limitations apply when you create and manage AKS clusters that support multiple node pools:

  • See Quotas, virtual machine size restrictions, and region availability in Azure Kubernetes Service (AKS).
  • System pools must contain at least one node, and user node pools may contain zero or more nodes.
  • You can't change the VM size of a node pool after you create it.
  • When you create multiple node pools at cluster creation time, all Kubernetes versions used by node pools must match the version set for the control plane. You can make updates after provisioning the cluster using per node pool operations.
  • You can't simultaneously run upgrade and scale operations on a cluster or node pool. If you attempt to run them at the same time, you receive an error. Each operation type must complete on the target resource prior to the next request on that same resource. For more information, see the troubleshooting guide.

Upgrade a single node pool

Note

The node pool OS image version is tied to the Kubernetes version of the cluster. You only get OS image upgrades, following a cluster upgrade.

In this example, we upgrade the mynodepool node pool. Since there are two node pools, we must use the az aks nodepool upgrade command to upgrade.

  1. Check for any available upgrades using the az aks get-upgrades command.

    az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster
    
  2. Upgrade the mynodepool node pool using the az aks nodepool upgrade command.

    az aks nodepool upgrade \
        --resource-group myResourceGroup \
        --cluster-name myAKSCluster \
        --name mynodepool \
        --kubernetes-version KUBERNETES_VERSION \
        --no-wait
    
  3. List the status of your node pools using the az aks nodepool list command.

    az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
    

    The following example output shows mynodepool is in the Upgrading state:

    [
      {
        ...
        "count": 3,
        ...
        "name": "mynodepool",
        "orchestratorVersion": "KUBERNETES_VERSION",
        ...
        "provisioningState": "Upgrading",
        ...
        "vmSize": "Standard_DS2_v2",
        ...
      },
      {
        ...
        "count": 2,
        ...
        "name": "nodepool1",
        "orchestratorVersion": "1.15.7",
        ...
        "provisioningState": "Succeeded",
        ...
        "vmSize": "Standard_DS2_v2",
        ...
      }
    ]
    

    It takes a few minutes to upgrade the nodes to the specified version.

As a best practice, you should upgrade all node pools in an AKS cluster to the same Kubernetes version. The default behavior of az aks upgrade is to upgrade all node pools together with the control plane to achieve this alignment. The ability to upgrade individual node pools lets you perform a rolling upgrade and schedule pods between node pools to maintain application uptime within the above constraints mentioned.

Upgrade a cluster control plane with multiple node pools

Note

Kubernetes uses the standard Semantic Versioning versioning scheme. The version number is expressed as x.y.z, where x is the major version, y is the minor version, and z is the patch version. For example, in version 1.12.6, 1 is the major version, 12 is the minor version, and 6 is the patch version. The Kubernetes version of the control plane and the initial node pool are set during cluster creation. Other node pools have their Kubernetes version set when they are added to the cluster. The Kubernetes versions may differ between node pools and between a node pool and the control plane.

An AKS cluster has two cluster resource objects with Kubernetes versions associated to them:

  1. The cluster control plane Kubernetes version, and
  2. A node pool with a Kubernetes version.

The control plane maps to one or many node pools. The behavior of an upgrade operation depends on which Azure CLI command you use.

  • az aks upgrade upgrades the control plane and all node pools in the cluster to the same Kubernetes version.
  • az aks upgrade with the --control-plane-only flag upgrades only the cluster control plane and leaves all node pools unchanged.
  • az aks nodepool upgrade upgrades only the target node pool with the specified Kubernetes version.

Validation rules for upgrades

Kubernetes upgrades for a cluster control plane and node pools are validated using the following sets of rules:

  • Rules for valid versions to upgrade node pools:

    • The node pool version must have the same major version as the control plane.
    • The node pool minor version must be within two minor versions of the control plane version.
    • The node pool version can't be greater than the control major.minor.patch version.
  • Rules for submitting an upgrade operation:

    • You can't downgrade the control plane or a node pool Kubernetes version.
    • If a node pool Kubernetes version isn't specified, the behavior depends on the client. In Resource Manager templates, declaration falls back to the existing version defined for the node pool. If nothing is set, it uses the control plane version to fall back on.
    • You can't simultaneously submit multiple operations on a single control plane or node pool resource. You can either upgrade or scale a control plane or a node pool at a given time.

Scale a node pool manually

As your application workload demands change, you may need to scale the number of nodes in a node pool. The number of nodes can be scaled up or down.

  1. Scale the number of nodes in a node pool using the az aks node pool scale command.

    az aks nodepool scale \
        --resource-group myResourceGroup \
        --cluster-name myAKSCluster \
        --name mynodepool \
        --node-count 5 \
        --no-wait
    
  2. List the status of your node pools using the az aks node pool list command.

    az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
    

    The following example output shows mynodepool is in the Scaling state with a new count of five nodes:

    [
      {
        ...
        "count": 5,
        ...
        "name": "mynodepool",
        "orchestratorVersion": "1.15.7",
        ...
        "provisioningState": "Scaling",
        ...
        "vmSize": "Standard_DS2_v2",
        ...
      },
      {
        ...
        "count": 2,
        ...
        "name": "nodepool1",
        "orchestratorVersion": "1.15.7",
        ...
        "provisioningState": "Succeeded",
        ...
        "vmSize": "Standard_DS2_v2",
        ...
      }
    ]
    

    It takes a few minutes for the scale operation to complete.

Scale a specific node pool automatically using the cluster autoscaler

AKS offers a separate feature to automatically scale node pools with a feature called the cluster autoscaler. You can enable this feature with unique minimum and maximum scale counts per node pool.

For more information, see use the cluster autoscaler.

Remove specific VMs in the existing node pool

Note

When you delete a VM with this command, AKS doesn't perform cordon and drain. To minimize the disruption of rescheduling pods currently running on the VM you plan to delete, perform a cordon and drain on the VM before deleting. You can learn more about how to cordon and drain using the example scenario provided in the resizing node pools tutorial.

  1. List the existing nodes using the kubectl get nodes command.

    kubectl get nodes
    

    Your output should look similar to the following example output:

    NAME                                 STATUS   ROLES   AGE   VERSION
    aks-mynodepool-20823458-vmss000000   Ready    agent   63m   v1.21.9
    aks-mynodepool-20823458-vmss000001   Ready    agent   63m   v1.21.9
    aks-mynodepool-20823458-vmss000002   Ready    agent   63m   v1.21.9
    
  2. Delete the specified VMs using the az aks nodepool delete-machines command. Make sure to replace the placeholders with your own values.

    az aks nodepool delete-machines \
        --resource-group <resource-group-name> \
        --cluster-name <cluster-name> \
        --name <node-pool-name>
        --machine-names <vm-name-1> <vm-name-2>
    
  3. Verify the VMs were successfully deleted using the kubectl get nodes command.

    kubectl get nodes
    

    Your output should no longer include the VMs that you specified in the az aks nodepool delete-machines command.

Associate capacity reservation groups to node pools

As your workload demands change, you can associate existing capacity reservation groups to node pools to guarantee allocated capacity for your node pools.

Prerequisites to use capacity reservation groups with AKS

  • Use CLI version 2.56 or above and API version 2023-10-01 or higher.

  • The capacity reservation group should already exist and should contain minimum one capacity reservation, otherwise the node pool is added to the cluster with a warning and no capacity reservation group gets associated. For more information, see [capacity reservation groups][capacity-reservation-groups].

  • You need to create a user-assigned managed identity for the resource group that contains the capacity reservation group (CRG). System-assigned managed identities won't work for this feature. In the following example, replace the environment variables with your own values.

    IDENTITY_NAME=myID
    RG_NAME=myResourceGroup
    CLUSTER_NAME=myAKSCluster
    VM_SKU=Standard_D4s_v3
    NODE_COUNT=2
    LOCATION=westus2
    az identity create --name $IDENTITY_NAME --resource-group $RG_NAME  
    IDENTITY_ID=$(az identity show --name $IDENTITY_NAME --resource-group $RG_NAME --query identity.id -o tsv)
    
  • You need to assign the Contributor role to the user-assigned identity created above. For more details, see Steps to assign an Azure role.

  • Create a new cluster and assign the newly created identity.

      az aks create \
          --resource-group $RG_NAME \
          --name $CLUSTER_NAME \
          --location $LOCATION \
          --node-vm-size $VM_SKU --node-count $NODE_COUNT \
          --assign-identity $IDENTITY_ID \
          --generate-ssh-keys 
    
  • You can also assign the user-managed identity on an existing managed cluster with update command.

    az aks update \
        --resource-group $RG_NAME \
        --name $CLUSTER_NAME \
        --location $LOCATION \
        --node-vm-size $VM_SKU \
        --node-count $NODE_COUNT \
        --enable-managed-identity \
        --assign-identity $IDENTITY_ID         
    

Associate an existing capacity reservation group with a node pool

Associate an existing capacity reservation group with a node pool using the az aks nodepool add command and specify a capacity reservation group with the --crg-id flag. The following example assumes you have a CRG named "myCRG".

RG_NAME=myResourceGroup
CLUSTER_NAME=myAKSCluster
NODEPOOL_NAME=myNodepool
CRG_NAME=myCRG
CRG_ID=$(az capacity reservation group show --capacity-reservation-group $CRG_NAME --resource-group $RG_NAME --query id -o tsv)
az aks nodepool add --resource-group $RG_NAME --cluster-name $CLUSTER_NAME --name $NODEPOOL_NAME --crg-id $CRG_ID

Associate an existing capacity reservation group with a system node pool

To associate an existing capacity reservation group with a system node pool, associate the cluster with the user-assigned identity with the Contributor role on your CRG and the CRG itself during cluster creation. Use the az aks create command with the --assign-identity and --crg-id flags.

IDENTITY_NAME=myID
RG_NAME=myResourceGroup
CLUSTER_NAME=myAKSCluster
NODEPOOL_NAME=myNodepool
CRG_NAME=myCRG
CRG_ID=$(az capacity reservation group show --capacity-reservation-group $CRG_NAME --resource-group $RG_NAME --query id -o tsv)
IDENTITY_ID=$(az identity show --name $IDENTITY_NAME --resource-group $RG_NAME --query identity.id -o tsv)

az aks create \
    --resource-group $RG_NAME \
    --cluster-name $CLUSTER_NAME \
    --crg-id $CRG_ID \
    --assign-identity $IDENTITY_ID \
    --generate-ssh-keys

Note

Deleting a node pool implicitly dissociates that node pool from any associated capacity reservation group before the node pool is deleted. Deleting a cluster implicitly dissociates all node pools in that cluster from their associated capacity reservation groups.

Note

You cannot update an existing node pool with a capacity reservation group. The recommended approach is to associate a capacity reservation group during the node pool creation.

Specify a VM size for a node pool

You may need to create node pools with different VM sizes and capabilities. For example, you may create a node pool that contains nodes with large amounts of CPU or memory or a node pool that provides GPU support. In the next section, you use taints and tolerations to tell the Kubernetes scheduler how to limit access to pods that can run on these nodes.

In the following example, we create a GPU-based node pool that uses the Standard_NC6s_v3 VM size. These VMs are powered by the NVIDIA Tesla K80 card. For information, see Available sizes for Linux virtual machines in Azure.

  1. Create a node pool using the az aks node pool add command. Specify the name gpunodepool and use the --node-vm-size parameter to specify the Standard_NC6 size.

    az aks nodepool add \
        --resource-group myResourceGroup \
        --cluster-name myAKSCluster \
        --name gpunodepool \
        --node-count 1 \
        --node-vm-size Standard_NC6s_v3 \
        --no-wait
    
  2. Check the status of the node pool using the az aks nodepool list command.

    az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
    

    The following example output shows the gpunodepool node pool is Creating nodes with the specified VmSize:

    [
      {
        ...
        "count": 1,
        ...
        "name": "gpunodepool",
        "orchestratorVersion": "1.15.7",
        ...
        "provisioningState": "Creating",
        ...
        "vmSize": "Standard_NC6s_v3",
        ...
      },
      {
        ...
        "count": 2,
        ...
        "name": "nodepool1",
        "orchestratorVersion": "1.15.7",
        ...
        "provisioningState": "Succeeded",
        ...
        "vmSize": "Standard_DS2_v2",
        ...
      }
    ]
    

    It takes a few minutes for the gpunodepool to be successfully created.

Specify a taint, label, or tag for a node pool

When creating a node pool, you can add taints, labels, or tags to it. When you add a taint, label, or tag, all nodes within that node pool also get that taint, label, or tag.

Important

Adding taints, labels, or tags to nodes should be done for the entire node pool using az aks nodepool. We don't recommend using kubectl to apply taints, labels, or tags to individual nodes in a node pool.

Set node pool taints

AKS supports two kinds of node taints: node taints and node initialization taints (preview). For more information, see Use node taints in an Azure Kubernetes Service (AKS) cluster.

For more information on how to use advanced Kubernetes scheduled features, see Best practices for advanced scheduler features in AKS

Set node pool tolerations

In the previous step, you applied the sku=gpu:NoSchedule taint when creating your node pool. The following example YAML manifest uses a toleration to allow the Kubernetes scheduler to run an NGINX pod on a node in that node pool.

  1. Create a file named nginx-toleration.yaml and copy in the following example YAML.

    apiVersion: v1
    kind: Pod
    metadata:
      name: mypod
    spec:
      containers:
     - image: mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine
        name: mypod
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 1
            memory: 2G
      tolerations:
     - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
    
  2. Schedule the pod using the kubectl apply command.

    kubectl apply -f nginx-toleration.yaml
    

    It takes a few seconds to schedule the pod and pull the NGINX image.

  3. Check the status using the kubectl describe pod command.

    kubectl describe pod mypod
    

    The following condensed example output shows the sku=gpu:NoSchedule toleration is applied. In the events section, the scheduler assigned the pod to the aks-taintnp-28993262-vmss000000 node:

    [...]
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
                     sku=gpu:NoSchedule
    Events:
      Type    Reason     Age    From                Message
      ----    ------     ----   ----                -------
      Normal  Scheduled  4m48s  default-scheduler   Successfully assigned default/mypod to aks-taintnp-28993262-vmss000000
      Normal  Pulling    4m47s  kubelet             pulling image "mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine"
      Normal  Pulled     4m43s  kubelet             Successfully pulled image "mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine"
      Normal  Created    4m40s  kubelet             Created container
      Normal  Started    4m40s  kubelet             Started container
    

    Only pods that have this toleration applied can be scheduled on nodes in taintnp. Any other pods are scheduled in the nodepool1 node pool. If you create more node pools, you can use taints and tolerations to limit what pods can be scheduled on those node resources.

Setting node pool labels

For more information, see Use labels in an Azure Kubernetes Service (AKS) cluster.

Setting node pool Azure tags

For more information, see Use Azure tags in Azure Kubernetes Service (AKS).

Manage node pools using a Resource Manager template

When you use an Azure Resource Manager template to create and manage resources, you can change settings in your template and redeploy it to update resources. With AKS node pools, you can't update the initial node pool profile once the AKS cluster has been created. This behavior means you can't update an existing Resource Manager template, make a change to the node pools, and then redeploy the template. Instead, you must create a separate Resource Manager template that updates the node pools for the existing AKS cluster.

  1. Create a template, such as aks-agentpools.json, and paste in the following example manifest. Make sure to edit the values as needed. This example template configures the following settings:

    • Updates the Linux node pool named myagentpool to run three nodes.
    • Sets the nodes in the node pool to run Kubernetes version 1.15.7.
    • Defines the node size as Standard_DS2_v2.
    {
        "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "parameters": {
            "clusterName": {
                "type": "string",
                "metadata": {
                    "description": "The name of your existing AKS cluster."
                }
            },
            "location": {
                "type": "string",
                "metadata": {
                    "description": "The location of your existing AKS cluster."
                }
            },
            "agentPoolName": {
                "type": "string",
                "defaultValue": "myagentpool",
                "metadata": {
                    "description": "The name of the agent pool to create or update."
                }
            },
            "vnetSubnetId": {
                "type": "string",
                "defaultValue": "",
                "metadata": {
                    "description": "The Vnet subnet resource ID for your existing AKS cluster."
                }
            }
        },
        "variables": {
            "apiVersion": {
                "aks": "2020-01-01"
            },
            "agentPoolProfiles": {
                "maxPods": 30,
                "osDiskSizeGB": 0,
                "agentCount": 3,
                "agentVmSize": "Standard_DS2_v2",
                "osType": "Linux",
                "vnetSubnetId": "[parameters('vnetSubnetId')]"
            }
        },
        "resources": [
            {
                "apiVersion": "2020-01-01",
                "type": "Microsoft.ContainerService/managedClusters/agentPools",
                "name": "[concat(parameters('clusterName'),'/', parameters('agentPoolName'))]",
                "location": "[parameters('location')]",
                "properties": {
                    "maxPods": "[variables('agentPoolProfiles').maxPods]",
                    "osDiskSizeGB": "[variables('agentPoolProfiles').osDiskSizeGB]",
                    "count": "[variables('agentPoolProfiles').agentCount]",
                    "vmSize": "[variables('agentPoolProfiles').agentVmSize]",
                    "osType": "[variables('agentPoolProfiles').osType]",
                    "type": "VirtualMachineScaleSets",
                    "vnetSubnetID": "[variables('agentPoolProfiles').vnetSubnetId]",
                    "orchestratorVersion": "1.15.7"
                }
            }
        ]
    }
    
  2. Deploy the template using the az deployment group create command.

    az deployment group create \
        --resource-group myResourceGroup \
        --template-file aks-agentpools.json
    

    Tip

    You can add a tag to your node pool by adding the tag property in the template, as shown in the following example:

    ...
    "resources": [
    {
      ...
      "properties": {
        ...
        "tags": {
          "name1": "val1"
        },
        ...
      }
    }
    ...
    

    It may take a few minutes to update your AKS cluster depending on the node pool settings and operations you define in your Resource Manager template.

Next steps