Manage system node pools in Azure Kubernetes Service (AKS)

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. Node pools contain the underlying VMs that run your applications. System node pools and user node pools are two different node pool modes for your AKS clusters. System node pools serve the primary purpose of hosting critical system pods such as CoreDNS and metrics-server. User node pools serve the primary purpose of hosting your application pods. However, application pods can be scheduled on system node pools if you wish to only have one pool in your AKS cluster. Every AKS cluster must contain at least one system node pool with at least one node.

Important

If you run a single system node pool for your AKS cluster in a production environment, we recommend you use at least three nodes for the node pool.

This article explains how to manage system node pools in AKS. For information about how to use multiple node pools, see use multiple node pools.

Before you begin

You need the Azure CLI version 2.3.1 or later installed and configured. Run az --version to find the version. If you need to install or upgrade, see Install Azure CLI.

Limitations

The following limitations apply when you create and manage AKS clusters that support system node pools.

  • See Quotas, VM size restrictions, and region availability in AKS.
  • An API version of 2020-03-01 or greater must be used to set a node pool mode. Clusters created on API versions older than 2020-03-01 contain only user node pools, but can be migrated to contain system node pools by following update pool mode steps.
  • The name of a node pool may only contain lowercase alphanumeric characters and must begin with a lowercase letter. For Linux node pools, the length must be between 1 and 12 characters. For Windows node pools, the length must be between one and six characters.
  • The mode of a node pool is a required property and must be explicitly set when using ARM templates or direct API calls.

System and user node pools

For a system node pool, AKS automatically assigns the label kubernetes.azure.com/mode: system to its nodes. This causes AKS to prefer scheduling system pods on node pools that contain this label. This label doesn't prevent you from scheduling application pods on system node pools. However, we recommend you isolate critical system pods from your application pods to prevent misconfigured or rogue application pods from accidentally deleting system pods.

You can enforce this behavior by creating a dedicated system node pool. Use the CriticalAddonsOnly=true:NoSchedule taint to prevent application pods from being scheduled on system node pools.

System node pools have the following restrictions:

  • System node pools must support at least 30 pods as described by the minimum and maximum value formula for pods.
  • System pools osType must be Linux.
  • User node pools osType may be Linux or Windows.
  • System pools must contain at least one node, and user node pools may contain zero or more nodes.
  • System node pools require a VM SKU of at least 2 vCPUs and 4 GB memory. But burstable-VM(B series) isn't recommended.
  • A minimum of two nodes 4 vCPUs is recommended (for example, Standard_DS4_v2), especially for large clusters (Multiple CoreDNS Pod replicas, 3-4+ add-ons, etc.).
  • Spot node pools require user node pools.
  • Adding another system node pool or changing which node pool is a system node pool does not automatically move system pods. System pods can continue to run on the same node pool, even if you change it to a user node pool. If you delete or scale down a node pool running system pods that were previously a system node pool, those system pods are redeployed with preferred scheduling to the new system node pool.

You can do the following operations with node pools:

  • Create a dedicated system node pool (prefer scheduling of system pods to node pools of mode:system)
  • Change a system node pool to be a user node pool, provided you have another system node pool to take its place in the AKS cluster.
  • Change a user node pool to be a system node pool.
  • Delete user node pools.
  • You can delete system node pools, provided you have another system node pool to take its place in the AKS cluster.
  • An AKS cluster may have multiple system node pools and requires at least one system node pool.
  • If you want to change various immutable settings on existing node pools, you can create new node pools to replace them. One example is to add a new node pool with a new maxPods setting and delete the old node pool.
  • Use node affinity to require or prefer which nodes can be scheduled based on node labels. You can set key to kubernetes.azure.com, operator to In, and values of either user or system to your YAML, applying this definition using kubectl apply -f yourYAML.yaml.

Create a new AKS cluster with a system node pool

When you create a new AKS cluster, you automatically create a system node pool with a single node. The initial node pool defaults to a mode of type system. When you create new node pools with az aks nodepool add, those node pools are user node pools unless you explicitly specify the mode parameter.

The following example creates a resource group named myResourceGroup in the chinaeast2 region.

az group create --name myResourceGroup --location chinaeast2

Use the az aks create command to create an AKS cluster. The following example creates a cluster named myAKSCluster with one dedicated system pool containing one node. For your production workloads, ensure you're using system node pools with at least three nodes. This operation may take several minutes to complete.

# Create a new AKS cluster with a single system pool
az aks create -g myResourceGroup --name myAKSCluster --node-count 1 --generate-ssh-keys

Add a dedicated system node pool to an existing AKS cluster

You can add one or more system node pools to existing AKS clusters. It's recommended to schedule your application pods on user node pools, and dedicate system node pools to only critical system pods. This prevents rogue application pods from accidentally deleting system pods. Enforce this behavior with the CriticalAddonsOnly=true:NoSchedule taint for your system node pools.

The following command adds a dedicated node pool of mode type system with a default count of three nodes.

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name systempool \
    --node-count 3 \
    --node-taints CriticalAddonsOnly=true:NoSchedule \
    --mode System

Show details for your node pool

You can check the details of your node pool with the following command.

az aks nodepool show -g myResourceGroup --cluster-name myAKSCluster -n systempool

A mode of type System is defined for system node pools, and a mode of type User is defined for user node pools. For a system pool, verify the taint is set to CriticalAddonsOnly=true:NoSchedule, which will prevent application pods from beings scheduled on this node pool.

{
  "agentPoolType": "VirtualMachineScaleSets",
  "availabilityZones": null,
  "count": 3,
  "enableAutoScaling": null,
  "enableNodePublicIp": false,
  "id": "/subscriptions/yourSubscriptionId/resourcegroups/myResourceGroup/providers/Microsoft.ContainerService/managedClusters/myAKSCluster/agentPools/systempool",
  "maxCount": null,
  "maxPods": 110,
  "minCount": null,
  "mode": "System",
  "name": "systempool",
  "nodeImageVersion": "AKSUbuntu-1604-2020.06.30",
  "nodeLabels": {},
  "nodeTaints": [
    "CriticalAddonsOnly=true:NoSchedule"
  ],
  "orchestratorVersion": "1.16.10",
  "osDiskSizeGb": 128,
  "osType": "Linux",
  "provisioningState": "Succeeded",
  "proximityPlacementGroupId": null,
  "resourceGroup": "myResourceGroup",
  "scaleSetEvictionPolicy": null,
  "scaleSetPriority": null,
  "spotMaxPrice": null,
  "tags": null,
  "type": "Microsoft.ContainerService/managedClusters/agentPools",
  "upgradeSettings": {
    "maxSurge": null
  },
  "vmSize": "Standard_DS2_v2",
  "vnetSubnetId": null
}

Update existing cluster system and user node pools

Note

An API version of 2020-03-01 or greater must be used to set a system node pool mode. Clusters created on API versions older than 2020-03-01 contain only user node pools as a result. To receive system node pool functionality and benefits on older clusters, update the mode of existing node pools with the following commands on the latest Azure CLI version.

You can change modes for both system and user node pools. You can change a system node pool to a user pool only if another system node pool already exists on the AKS cluster.

This command changes a system node pool to a user node pool.

az aks nodepool update -g myResourceGroup --cluster-name myAKSCluster -n mynodepool --mode user

This command changes a user node pool to a system node pool.

az aks nodepool update -g myResourceGroup --cluster-name myAKSCluster -n mynodepool --mode system

Delete a system node pool

Note

To use system node pools on AKS clusters before API version 2020-03-02, add a new system node pool, then delete the original default node pool.

You must have at least two system node pools on your AKS cluster before you can delete one of them.

az aks nodepool delete -g myResourceGroup --cluster-name myAKSCluster -n mynodepool

Clean up resources

To delete the cluster, use the az group delete command to delete the AKS resource group:

az group delete --name myResourceGroup --yes --no-wait

Next steps

In this article, you learned how to create and manage system node pools in an AKS cluster. For information about how to start and stop AKS node pools, see start and stop AKS node pools.