Manage node pools for a cluster in Azure Kubernetes Service (AKS)
In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. When you create an AKS cluster, you define the initial number of nodes and their size (SKU). As application demands change, you may need to change the settings on your node pools. For example, you may need to scale the number of nodes in a node pool or upgrade the Kubernetes version of a node pool.
This article shows you how to manage one or more node pools in an AKS cluster.
Before you begin
- Review Create node pools for a cluster in Azure Kubernetes Service (AKS) to learn how to create node pools for your AKS clusters.
- You need the Azure CLI version 2.2.0 or later installed and configured. Run
az --version
to find the version. If you need to install or upgrade, see Install Azure CLI. - Review Storage options for applications in Azure Kubernetes Service to plan your storage configuration.
Limitations
The following limitations apply when you create and manage AKS clusters that support multiple node pools:
- See Quotas, virtual machine size restrictions, and region availability in Azure Kubernetes Service (AKS).
- System pools must contain at least one node, and user node pools may contain zero or more nodes.
- You can't change the VM size of a node pool after you create it.
- When you create multiple node pools at cluster creation time, all Kubernetes versions used by node pools must match the version set for the control plane. You can make updates after provisioning the cluster using per node pool operations.
- You can't simultaneously run upgrade and scale operations on a cluster or node pool. If you attempt to run them at the same time, you receive an error. Each operation type must complete on the target resource prior to the next request on that same resource. For more information, see the troubleshooting guide.
Upgrade a single node pool
Note
The node pool OS image version is tied to the Kubernetes version of the cluster. You only get OS image upgrades, following a cluster upgrade.
In this example, we upgrade the mynodepool node pool. Since there are two node pools, we must use the az aks nodepool upgrade
command to upgrade.
Check for any available upgrades using the
az aks get-upgrades
command.az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster
Upgrade the mynodepool node pool using the
az aks nodepool upgrade
command.az aks nodepool upgrade \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --kubernetes-version KUBERNETES_VERSION \ --no-wait
List the status of your node pools using the
az aks nodepool list
command.az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
The following example output shows mynodepool is in the Upgrading state:
[ { ... "count": 3, ... "name": "mynodepool", "orchestratorVersion": "KUBERNETES_VERSION", ... "provisioningState": "Upgrading", ... "vmSize": "Standard_DS2_v2", ... }, { ... "count": 2, ... "name": "nodepool1", "orchestratorVersion": "1.15.7", ... "provisioningState": "Succeeded", ... "vmSize": "Standard_DS2_v2", ... } ]
It takes a few minutes to upgrade the nodes to the specified version.
As a best practice, you should upgrade all node pools in an AKS cluster to the same Kubernetes version. The default behavior of az aks upgrade
is to upgrade all node pools together with the control plane to achieve this alignment. The ability to upgrade individual node pools lets you perform a rolling upgrade and schedule pods between node pools to maintain application uptime within the above constraints mentioned.
Upgrade a cluster control plane with multiple node pools
Note
Kubernetes uses the standard Semantic Versioning versioning scheme. The version number is expressed as x.y.z, where x is the major version, y is the minor version, and z is the patch version. For example, in version 1.12.6, 1 is the major version, 12 is the minor version, and 6 is the patch version. The Kubernetes version of the control plane and the initial node pool are set during cluster creation. Other node pools have their Kubernetes version set when they are added to the cluster. The Kubernetes versions may differ between node pools and between a node pool and the control plane.
An AKS cluster has two cluster resource objects with Kubernetes versions associated to them:
- The cluster control plane Kubernetes version, and
- A node pool with a Kubernetes version.
The control plane maps to one or many node pools. The behavior of an upgrade operation depends on which Azure CLI command you use.
az aks upgrade
upgrades the control plane and all node pools in the cluster to the same Kubernetes version.az aks upgrade
with the--control-plane-only
flag upgrades only the cluster control plane and leaves all node pools unchanged.az aks nodepool upgrade
upgrades only the target node pool with the specified Kubernetes version.
Validation rules for upgrades
Kubernetes upgrades for a cluster control plane and node pools are validated using the following sets of rules:
Rules for valid versions to upgrade node pools:
- The node pool version must have the same major version as the control plane.
- The node pool minor version must be within two minor versions of the control plane version.
- The node pool version can't be greater than the control
major.minor.patch
version.
Rules for submitting an upgrade operation:
- You can't downgrade the control plane or a node pool Kubernetes version.
- If a node pool Kubernetes version isn't specified, the behavior depends on the client. In Resource Manager templates, declaration falls back to the existing version defined for the node pool. If nothing is set, it uses the control plane version to fall back on.
- You can't simultaneously submit multiple operations on a single control plane or node pool resource. You can either upgrade or scale a control plane or a node pool at a given time.
Scale a node pool manually
As your application workload demands change, you may need to scale the number of nodes in a node pool. The number of nodes can be scaled up or down.
Scale the number of nodes in a node pool using the
az aks node pool scale
command.az aks nodepool scale \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --node-count 5 \ --no-wait
List the status of your node pools using the
az aks node pool list
command.az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
The following example output shows mynodepool is in the Scaling state with a new count of five nodes:
[ { ... "count": 5, ... "name": "mynodepool", "orchestratorVersion": "1.15.7", ... "provisioningState": "Scaling", ... "vmSize": "Standard_DS2_v2", ... }, { ... "count": 2, ... "name": "nodepool1", "orchestratorVersion": "1.15.7", ... "provisioningState": "Succeeded", ... "vmSize": "Standard_DS2_v2", ... } ]
It takes a few minutes for the scale operation to complete.
Scale a specific node pool automatically using the cluster autoscaler
AKS offers a separate feature to automatically scale node pools with a feature called the cluster autoscaler. You can enable this feature with unique minimum and maximum scale counts per node pool.
For more information, see use the cluster autoscaler.
Remove specific VMs in the existing node pool
Note
When you delete a VM with this command, AKS doesn't perform cordon and drain. To minimize the disruption of rescheduling pods currently running on the VM you plan to delete, perform a cordon and drain on the VM before deleting. You can learn more about how to cordon and drain using the example scenario provided in the resizing node pools tutorial.
List the existing nodes using the
kubectl get nodes
command.kubectl get nodes
Your output should look similar to the following example output:
NAME STATUS ROLES AGE VERSION aks-mynodepool-20823458-vmss000000 Ready agent 63m v1.21.9 aks-mynodepool-20823458-vmss000001 Ready agent 63m v1.21.9 aks-mynodepool-20823458-vmss000002 Ready agent 63m v1.21.9
Delete the specified VMs using the
az aks nodepool delete-machines
command. Make sure to replace the placeholders with your own values.az aks nodepool delete-machines \ --resource-group <resource-group-name> \ --cluster-name <cluster-name> \ --name <node-pool-name> --machine-names <vm-name-1> <vm-name-2>
Verify the VMs were successfully deleted using the
kubectl get nodes
command.kubectl get nodes
Your output should no longer include the VMs that you specified in the
az aks nodepool delete-machines
command.
Associate capacity reservation groups to node pools
As your workload demands change, you can associate existing capacity reservation groups to node pools to guarantee allocated capacity for your node pools.
Prerequisites to use capacity reservation groups with AKS
Use CLI version 2.56 or above and API version 2023-10-01 or higher.
The capacity reservation group should already exist and should contain minimum one capacity reservation, otherwise the node pool is added to the cluster with a warning and no capacity reservation group gets associated. For more information, see [capacity reservation groups][capacity-reservation-groups].
You need to create a user-assigned managed identity for the resource group that contains the capacity reservation group (CRG). System-assigned managed identities won't work for this feature. In the following example, replace the environment variables with your own values.
IDENTITY_NAME=myID RG_NAME=myResourceGroup CLUSTER_NAME=myAKSCluster VM_SKU=Standard_D4s_v3 NODE_COUNT=2 LOCATION=westus2 az identity create --name $IDENTITY_NAME --resource-group $RG_NAME IDENTITY_ID=$(az identity show --name $IDENTITY_NAME --resource-group $RG_NAME --query identity.id -o tsv)
You need to assign the
Contributor
role to the user-assigned identity created above. For more details, see Steps to assign an Azure role.Create a new cluster and assign the newly created identity.
az aks create \ --resource-group $RG_NAME \ --name $CLUSTER_NAME \ --location $LOCATION \ --node-vm-size $VM_SKU --node-count $NODE_COUNT \ --assign-identity $IDENTITY_ID \ --generate-ssh-keys
You can also assign the user-managed identity on an existing managed cluster with update command.
az aks update \ --resource-group $RG_NAME \ --name $CLUSTER_NAME \ --location $LOCATION \ --node-vm-size $VM_SKU \ --node-count $NODE_COUNT \ --enable-managed-identity \ --assign-identity $IDENTITY_ID
Associate an existing capacity reservation group with a node pool
Associate an existing capacity reservation group with a node pool using the az aks nodepool add
command and specify a capacity reservation group with the --crg-id
flag. The following example assumes you have a CRG named "myCRG".
RG_NAME=myResourceGroup
CLUSTER_NAME=myAKSCluster
NODEPOOL_NAME=myNodepool
CRG_NAME=myCRG
CRG_ID=$(az capacity reservation group show --capacity-reservation-group $CRG_NAME --resource-group $RG_NAME --query id -o tsv)
az aks nodepool add --resource-group $RG_NAME --cluster-name $CLUSTER_NAME --name $NODEPOOL_NAME --crg-id $CRG_ID
Associate an existing capacity reservation group with a system node pool
To associate an existing capacity reservation group with a system node pool, associate the cluster with the user-assigned identity with the Contributor role on your CRG and the CRG itself during cluster creation. Use the az aks create
command with the --assign-identity
and --crg-id
flags.
IDENTITY_NAME=myID
RG_NAME=myResourceGroup
CLUSTER_NAME=myAKSCluster
NODEPOOL_NAME=myNodepool
CRG_NAME=myCRG
CRG_ID=$(az capacity reservation group show --capacity-reservation-group $CRG_NAME --resource-group $RG_NAME --query id -o tsv)
IDENTITY_ID=$(az identity show --name $IDENTITY_NAME --resource-group $RG_NAME --query identity.id -o tsv)
az aks create \
--resource-group $RG_NAME \
--cluster-name $CLUSTER_NAME \
--crg-id $CRG_ID \
--assign-identity $IDENTITY_ID \
--generate-ssh-keys
Note
Deleting a node pool implicitly dissociates that node pool from any associated capacity reservation group before the node pool is deleted. Deleting a cluster implicitly dissociates all node pools in that cluster from their associated capacity reservation groups.
Note
You cannot update an existing node pool with a capacity reservation group. The recommended approach is to associate a capacity reservation group during the node pool creation.
Specify a VM size for a node pool
You may need to create node pools with different VM sizes and capabilities. For example, you may create a node pool that contains nodes with large amounts of CPU or memory or a node pool that provides GPU support. In the next section, you use taints and tolerations to tell the Kubernetes scheduler how to limit access to pods that can run on these nodes.
In the following example, we create a GPU-based node pool that uses the Standard_NC6s_v3 VM size. These VMs are powered by the NVIDIA Tesla K80 card. For information, see Available sizes for Linux virtual machines in Azure.
Create a node pool using the
az aks node pool add
command. Specify the name gpunodepool and use the--node-vm-size
parameter to specify the Standard_NC6 size.az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name gpunodepool \ --node-count 1 \ --node-vm-size Standard_NC6s_v3 \ --no-wait
Check the status of the node pool using the
az aks nodepool list
command.az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster
The following example output shows the gpunodepool node pool is Creating nodes with the specified VmSize:
[ { ... "count": 1, ... "name": "gpunodepool", "orchestratorVersion": "1.15.7", ... "provisioningState": "Creating", ... "vmSize": "Standard_NC6s_v3", ... }, { ... "count": 2, ... "name": "nodepool1", "orchestratorVersion": "1.15.7", ... "provisioningState": "Succeeded", ... "vmSize": "Standard_DS2_v2", ... } ]
It takes a few minutes for the gpunodepool to be successfully created.
Specify a taint, label, or tag for a node pool
When creating a node pool, you can add taints, labels, or tags to it. When you add a taint, label, or tag, all nodes within that node pool also get that taint, label, or tag.
Important
Adding taints, labels, or tags to nodes should be done for the entire node pool using az aks nodepool
. We don't recommend using kubectl
to apply taints, labels, or tags to individual nodes in a node pool.
Set node pool taints
AKS supports two kinds of node taints: node taints and node initialization taints (preview). For more information, see Use node taints in an Azure Kubernetes Service (AKS) cluster.
For more information on how to use advanced Kubernetes scheduled features, see Best practices for advanced scheduler features in AKS
Set node pool tolerations
In the previous step, you applied the sku=gpu:NoSchedule taint when creating your node pool. The following example YAML manifest uses a toleration to allow the Kubernetes scheduler to run an NGINX pod on a node in that node pool.
Create a file named
nginx-toleration.yaml
and copy in the following example YAML.apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - image: mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine name: mypod resources: requests: cpu: 100m memory: 128Mi limits: cpu: 1 memory: 2G tolerations: - key: "sku" operator: "Equal" value: "gpu" effect: "NoSchedule"
Schedule the pod using the
kubectl apply
command.kubectl apply -f nginx-toleration.yaml
It takes a few seconds to schedule the pod and pull the NGINX image.
Check the status using the
kubectl describe pod
command.kubectl describe pod mypod
The following condensed example output shows the sku=gpu:NoSchedule toleration is applied. In the events section, the scheduler assigned the pod to the aks-taintnp-28993262-vmss000000 node:
[...] Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s sku=gpu:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m48s default-scheduler Successfully assigned default/mypod to aks-taintnp-28993262-vmss000000 Normal Pulling 4m47s kubelet pulling image "mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine" Normal Pulled 4m43s kubelet Successfully pulled image "mcr.azk8s.cn/oss/nginx/nginx:1.15.9-alpine" Normal Created 4m40s kubelet Created container Normal Started 4m40s kubelet Started container
Only pods that have this toleration applied can be scheduled on nodes in taintnp. Any other pods are scheduled in the nodepool1 node pool. If you create more node pools, you can use taints and tolerations to limit what pods can be scheduled on those node resources.
Setting node pool labels
For more information, see Use labels in an Azure Kubernetes Service (AKS) cluster.
Setting node pool Azure tags
For more information, see Use Azure tags in Azure Kubernetes Service (AKS).
Manage node pools using a Resource Manager template
When you use an Azure Resource Manager template to create and manage resources, you can change settings in your template and redeploy it to update resources. With AKS node pools, you can't update the initial node pool profile once the AKS cluster has been created. This behavior means you can't update an existing Resource Manager template, make a change to the node pools, and then redeploy the template. Instead, you must create a separate Resource Manager template that updates the node pools for the existing AKS cluster.
Create a template, such as
aks-agentpools.json
, and paste in the following example manifest. Make sure to edit the values as needed. This example template configures the following settings:- Updates the Linux node pool named myagentpool to run three nodes.
- Sets the nodes in the node pool to run Kubernetes version 1.15.7.
- Defines the node size as Standard_DS2_v2.
{ "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "clusterName": { "type": "string", "metadata": { "description": "The name of your existing AKS cluster." } }, "location": { "type": "string", "metadata": { "description": "The location of your existing AKS cluster." } }, "agentPoolName": { "type": "string", "defaultValue": "myagentpool", "metadata": { "description": "The name of the agent pool to create or update." } }, "vnetSubnetId": { "type": "string", "defaultValue": "", "metadata": { "description": "The Vnet subnet resource ID for your existing AKS cluster." } } }, "variables": { "apiVersion": { "aks": "2020-01-01" }, "agentPoolProfiles": { "maxPods": 30, "osDiskSizeGB": 0, "agentCount": 3, "agentVmSize": "Standard_DS2_v2", "osType": "Linux", "vnetSubnetId": "[parameters('vnetSubnetId')]" } }, "resources": [ { "apiVersion": "2020-01-01", "type": "Microsoft.ContainerService/managedClusters/agentPools", "name": "[concat(parameters('clusterName'),'/', parameters('agentPoolName'))]", "location": "[parameters('location')]", "properties": { "maxPods": "[variables('agentPoolProfiles').maxPods]", "osDiskSizeGB": "[variables('agentPoolProfiles').osDiskSizeGB]", "count": "[variables('agentPoolProfiles').agentCount]", "vmSize": "[variables('agentPoolProfiles').agentVmSize]", "osType": "[variables('agentPoolProfiles').osType]", "type": "VirtualMachineScaleSets", "vnetSubnetID": "[variables('agentPoolProfiles').vnetSubnetId]", "orchestratorVersion": "1.15.7" } } ] }
Deploy the template using the
az deployment group create
command.az deployment group create \ --resource-group myResourceGroup \ --template-file aks-agentpools.json
Tip
You can add a tag to your node pool by adding the tag property in the template, as shown in the following example:
... "resources": [ { ... "properties": { ... "tags": { "name1": "val1" }, ... } } ...
It may take a few minutes to update your AKS cluster depending on the node pool settings and operations you define in your Resource Manager template.
Next steps
- For more information about how to control pods across node pools, see Best practices for advanced scheduler features in AKS.
- Use proximity placement groups to reduce latency for your AKS applications.
- Use instance-level public IP addresses to enable your nodes to directly serve traffic.