Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
You might want to change the size of your virtual machines (VMs) to accommodate an increasing number of deployments or to run a larger workload. Resizing AKS instances directly isn't supported when using Virtual Machine Scale Sets in AKS, as outlined in the support policies for AKS:
AKS agent nodes appear in the Azure portal as regular Azure IaaS resources. But these virtual machines are deployed into a custom Azure resource group (usually prefixed with MC_*). You can't make direct customizations to these nodes using the IaaS APIs or resources. Any custom changes that aren't done via the AKS API won't persist through an upgrade, scale, update, or reboot.
In this article, you learn the recommended method to resize a node pool by creating a new node pool with the desired SKU size, cordoning and draining the existing nodes, and then removing the existing node pool.
Important
This method is specific to Virtual Machine Scale Sets-based AKS clusters. When using Virtual Machines-based node pools, you can easily update the VM sizes in an existing node pool using a single Azure CLI command and have multiple VM sizes in the same node pool. For more information, see the Virtual Machines node pools documentation.
Create a new node pool with the desired SKU
Note
Every AKS cluster must contain at least one system node pool with at least one node. In this example, we use a --mode
of System
to add a system node pool to replace the system node pool we want to resize. You can update the mode of a node pool at any time. You can also add a user node pool by setting --mode
to User
.
When resizing, make sure you consider all workload requirements, such as availability zones, and configure your VMSS node pool accordingly. You might need to modify the following command to best fit your needs. For a full list of the configuration options, see the az aks nodepool add
reference page.
Create a new node pool using the
az aks nodepool add
command. In this example, we create a new node pool,mynodepool
, with three nodes and theStandard_DS3_v2
VM SKU to replace an existing node pool,nodepool1
, that has theStandard_DS2_v2
VM SKU.az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --node-count 3 \ --node-vm-size Standard_DS3_v2 \ --mode System \ --no-wait
It takes a few minutes for the new node pool to be created.
Get the status of the new node pool using the
kubectl get nodes
command.kubectl get nodes
Your output should resemble the following example output, showing both the new node pool
mynodepool
and the existing node poolnodepool1
:NAME STATUS ROLES AGE VERSION aks-mynodepool-98765432-vmss000000 Ready agent 23m v1.21.9 aks-mynodepool-98765432-vmss000001 Ready agent 23m v1.21.9 aks-mynodepool-98765432-vmss000002 Ready agent 23m v1.21.9 aks-nodepool1-12345678-vmss000000 Ready agent 10d v1.21.9 aks-nodepool1-12345678-vmss000001 Ready agent 10d v1.21.9 aks-nodepool1-12345678-vmss000002 Ready agent 10d v1.21.9
Cordon the existing nodes
Cordoning marks specified nodes as unschedulable and prevents any more pods from being added to the nodes.
Get the names of the nodes you want to cordon using the
kubectl get nodes
command.kubectl get nodes
Your output should resemble the following example output, showing the nodes in the existing node pool
nodepool1
that you want to cordon:NAME STATUS ROLES AGE VERSION aks-nodepool1-12345678-vmss000000 Ready agent 7d21h v1.21.9 aks-nodepool1-12345678-vmss000001 Ready agent 7d21h v1.21.9 aks-nodepool1-12345678-vmss000002 Ready agent 7d21h v1.21.9
Cordon the existing nodes using the
kubectl cordon
command, specifying the desired nodes in a space-separated list. For example:kubectl cordon aks-nodepool1-12345678-vmss000000 aks-nodepool1-12345678-vmss000001 aks-nodepool1-12345678-vmss000002
Your output should resemble the following example output, showing that the nodes are cordoned:
node/aks-nodepool1-12345678-vmss000000 cordoned node/aks-nodepool1-12345678-vmss000001 cordoned node/aks-nodepool1-12345678-vmss000002 cordoned
Drain the existing nodes
Important
To successfully drain nodes and evict running pods, ensure that any PodDisruptionBudgets (PDBs) allow for at least one pod replica to be moved at a time. Otherwise, the drain/evict operation fails. To check this, you can run kubectl get pdb -A
and verify ALLOWED DISRUPTIONS
is at least 1
or higher.
When you drain nodes, the pods running on them are evicted and recreated on the other schedulable nodes.
Drain the existing nodes using the
kubectl drain
command with the--ignore-daemonsets
and--delete-emptydir-data
flags, specifying the desired nodes in a space-separated list. For example:Important
Using
--delete-emptydir-data
is required to evict the AKS-createdcoredns
andmetrics-server
pods. If you don't use this flag, you get an error. For more information, see the documentation on emptydir.kubectl drain aks-nodepool1-12345678-vmss000000 aks-nodepool1-12345678-vmss000001 aks-nodepool1-12345678-vmss000002 --ignore-daemonsets --delete-emptydir-data
After the drain operation finishes, all pods (excluding the pods controlled by daemon sets) should be running on the new node pool. You can verify this using the
kubectl get pods
command.kubectl get pods -o wide -A
Troubleshoot pod eviction issues
You might encounter the following error when draining nodes:
Error when evicting pods/[podname] -n [namespace] (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
By default, your cluster has AKS-managed pod disruption budgets (such as coredns-pdb
or konnectivity-agent
) with a MinAvailable
of 1
. For example, if there are two coredns
pods running, only one can be disrupted at a time. While one of them is getting recreated and is unavailable, the other coredns
pod can't be evicted due to the pod disruption budget. This issue resolves itself after the initial coredns
pod is scheduled and running, allowing the second pod to be properly evicted and recreated.
Tip
Consider draining nodes one by one for a smoother eviction experience and to avoid throttling. For more information, see:
Remove the existing node pool
Important
When you delete a node pool, AKS doesn't perform cordon and drain. To minimize the disruption of rescheduling pods currently running on the node pool you plan to delete, perform a cordon and drain on all nodes in the node pool before deleting.
Delete the original node pool using the
az aks nodepool delete
command.az aks nodepool delete \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name nodepool1
Verify that your AKS cluster has only the new node pool with the applications and pods properly running using the
kubectl get nodes
command.kubectl get nodes
Your output should resemble the following example output, showing only the new node pool
mynodepool
:NAME STATUS ROLES AGE VERSION aks-mynodepool-98765432-vmss000000 Ready agent 63m v1.21.9 aks-mynodepool-98765432-vmss000001 Ready agent 63m v1.21.9 aks-mynodepool-98765432-vmss000002 Ready agent 63m v1.21.9
Next steps
After resizing a node pool by cordoning and draining, learn more about using multiple node pools.