Migrate from cluster autoscaler to node auto-provisioning

Migrate your existing Azure Kubernetes Service (AKS) cluster from cluster autoscaler to node auto-provisioning using the steps in this guide.

Node auto-provisioning (NAP) uses pending pod resource requirements to decide the optimal virtual machine (VM) configuration to run those workloads in the most efficient and cost-effective manner.

Node auto-provisioning is based on the open-source Karpenter project and the AKS Karpenter provider. Node auto-provisioning automatically deploys, configures, and manages Karpenter on your AKS clusters.

Cluster autoscaler vs. node auto-provisioning

Why migrate from cluster autoscaler to node auto-provisioning

Node auto-provisioning improves bin-packing, automates node lifecycle management, and reduces operational overhead compared to cluster autoscaler.

Reason to Migrate	Cluster Autoscaler (CAS)	Node Auto Provisioning (NAP)
VM Size Flexibility	Preexisting node pools with single VM size per pool	Dynamic provisioning of mixed VM sizes for cost/performance balance
Cost Optimization	Adds/removes nodes in pools; risk of underutilization	Intelligent bin-packing reduces fragmentation and lowers costs
Management Overhead	Requires manual tuning of CAS profiles	Fully managed experience integrated with AKS
Lifecycle Management	Basic scale-up/scale-down only	Advanced node lifecycle optimization; manage node updates, disruption + more
Future Feature Development	Cluster autoscaler is maintained, with minimal feature enhancements	Continuous active development and new feature enhancements

Cluster autoscaler profile settings vs. node auto-provisioning configuration settings

The following table maps cluster autoscaler profile settings to node auto-provisioning configuration settings for the NodePool CRD. This table also shows the cluster autoscaler Azure CLI command and its NAP CRD equivalent.

Cluster Autoscaler Profile Setting	Description	CAS CLI Example	NAP Disruption Setting	Description	NAP YAML Example
`balance-similar-node-groups`	Balances node pools across zones	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile balance-similar-node-groups=true`	N/A	NAP uses Karpenter’s provisioning logic; no direct equivalent	YAML: `# Not applicable in NAP`
`expander`	Strategy for selecting node pool for scale-up	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile expander=least-waste`	N/A	NAP dynamically provisions optimal VM sizes; no expander concept	YAML: `# Not applicable in NAP`
`scale-down-unneeded-time`	Time a node must be unneeded before eligible for scale down (default: 10m)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile scale-down-unneeded-time=10m`	`consolidateAfter`	Time NAP waits after discovering consolidation opportunity before disrupting node	YAML: `disruption:` `consolidateAfter: 10m`
`scale-down-unready-time`	Time an unready node must be unneeded before eligible for scale down (default: 20m)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile scale-down-unready-time=20m`	`terminationGracePeriod`	Grace period for pod termination before node removal	YAML: `disruption:` `terminationGracePeriod: 20m`
`scale-down-utilization-threshold`	Node utilization threshold for scale down (default: 0.5)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile scale-down-utilization-threshold=0.5`	`consolidationPolicy`	Policy for consolidation: `WhenEmpty` or `WhenEmptyOrUnderUtilized`	YAML: `disruption:` `consolidationPolicy: WhenEmptyOrUnderUtilized`
`scan-interval`	How often autoscaler reevaluates cluster (default: 10s)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile scan-interval=10s`	N/A	NAP doesn't use periodic scans; decisions are event-driven	YAML: `# Not applicable in NAP`
`skip-nodes-with-local-storage`	Prevents deleting nodes with local storage	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile skip-nodes-with-local-storage=true`	Annotation: `karpenter.sh/do-not-disrupt`	Blocks disruption for specific nodes or pods	YAML: `metadata:` `annotations:` `karpenter.sh/do-not-disrupt: "true"`
`skip-nodes-with-system-pods`	Prevents deleting nodes with system pods	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile skip-nodes-with-system-pods=true`	Annotation: `karpenter.sh/do-not-disrupt`	Same behavior for NAP	YAML: `metadata:` `annotations:` `karpenter.sh/do-not-disrupt: "true"`
`max-empty-bulk-delete`	Max empty nodes deleted at once (default: 10)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile max-empty-bulk-delete=10`	`budgets`	Rate limits voluntary disruptions (percentage or absolute nodes)	YAML: `disruption:` `budgets:` `- nodes: "10"`
`max-graceful-termination-sec`	Max seconds to wait for pod termination during scale down (default: 600s)	CLI: `az aks update --resource-group <rg> --name <cluster> --cluster-autoscaler-profile max-graceful-termination-sec=600`	`terminationGracePeriod`	Explicitly sets termination grace period for NAP nodes	YAML: `disruption:` `terminationGracePeriod: 600s`
`max-node-provision-time`	Max time to wait for node provisioning (default: 15m)	CLI: `az aks update --cluster-autoscaler-profile max-node-provision-time=15m`	N/A	NAP provisions nodes immediately based on pending pods	YAML: `# Not applicable in NAP`
`ok-total-unready-count` / `max-total-unready-percentage`	Limits unready nodes during autoscaling	CLI: `az aks update --cluster-autoscaler-profile ok-total-unready-count=3`	`budgets`	Can enforce disruption limits during maintenance windows	YAML: `disruption:` `budgets:` `- nodes: "20%"`

Note

Unlike cluster autoscaler, NAP doesn't use Azure CLI commands to manage node behavior, so all decision making for NAP-managed nodes is determined by the CRDs. For more on configuring your cluster specifications for NAP, visit our NodePool documentation and AKSNodeClass documentation.

Before you begin

Prerequisite	Notes
Azure Subscription	If you don't have an Azure subscription, you can create a Trial.
Azure CLI	`2.76.0` or later. To find the version, run `az --version`. For more information about installing or upgrading the Azure CLI, see Install Azure CLI.

Limitations

See NAP limitations and unsupported features.

Disable cluster autoscaler

Pre-migration checklist

Confirm cluster eligibility for node auto-provisioning. For more on NAP requirements, see Overview of NAP documentation.
Right-size workloads for consolidation.
- Set proper resource requests/limits, replicas, and pod disruption budgets (PDBs) to allow for a gradual migration. This migration method requires properly set PDBs to ensure well-managed disruption of your workloads.
Verify your system node pool is active.
- AKS requires a system node pool for system components (such as CoreDNS and Karpenter). When NAP is enabled, AKS is responsible for autoscaling the system pool.

Important

If your workloads depend on custom subnets or network policies, configure custom subnets or network policies in the AKSNodeClass before migrating workloads to avoid scheduling failures. See the AKSNodeClass documentation for details.

Disable cluster autoscaler safely

If cluster autoscaler is enabled cluster-wide, disable it at the cluster level using the --disable-cluster-autoscaler flag. Nodes aren’t removed when you disable cluster autoscaler, so your capacity stays steady.

az aks update --resource-group myResourceGroup --name myAKSCluster --disable-cluster-autoscaler

If cluster autoscaler is only enabled on select node pools, disable cluster autoscaler for specific node pools using the --disable-cluster-autoscaler flag.

# Disable CAS on a specific pool
az aks nodepool update \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name mypool1 \
  --disable-cluster-autoscaler

You can also set the node count of your node pool to a pinned count as you begin the migration to node auto-provisioning. The following az aks nodepool scale command pins the node count of node pool mypool1 in cluster myAKSCluster to five (5).

# (Optional) Pin to a safe desired count before the switch
az aks nodepool scale \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name mypool1 \
  --node-count 5

Enable node auto-provisioning

Enable node auto-provisioning on an existing cluster

Enable node auto-provisioning on an existing cluster using the az aks update command and set --node-provisioning-mode to Auto.

az aks update --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP_NAME --node-provisioning-mode Auto

Define your first NodePool and AKSNodeClass

Basic
Advanced

After enabling node auto-provisioning on your cluster, create a basic NodePool and AKSNodeClass to start provisioning nodes. These custom resource definition (CRD) files are used by NAP to define the types of nodes provisioned for your workloads.

This example creates a basic NodePool that:

Supports on-demand instances
Uses D series VMs
Sets a CPU limit of 100
Enables consolidation when nodes are empty or underutilized

#nodepool-default.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: apps
    spec:
      nodeClassRef:
        name: default
        group: karpenter.azure.com
        kind: AKSNodeClass
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: [on-demand]
        - key: karpenter.azure.com/sku-family
          operator: In
          values: [D]
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 0s
    expireAfter: Never
---
apiVersion: karpenter.azure.com/v1beta1
kind: AKSNodeClass
metadata:
  name: default
  annotations:
    kubernetes.io/description: "General purpose AKSNodeClass for running Ubuntu nodes"
spec:
  imageFamily: Ubuntu

You can now deploy the custom resources to your cluster with the following kubectl command:

kubectl apply -f nodepool-default.yaml

After enabling node auto-provisioning on your cluster, create a NodePool and AKSNodeClass to start provisioning nodes. These custom resource definition (CRD) files are used by NAP to define the types of nodes provisioned for your workloads.

This example creates an advanced NodePool that:

Supports both spot and on-demand instances
Uses D, E, and F series VMs
Sets a CPU limit of 100
Sets nodes to never expire
Enables consolidation when nodes are empty or underutilized

#nodepool-default.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: apps
    spec:
      nodeClassRef:
        apiVersion: karpenter.azure.com/v1beta1
        kind: AKSNodeClass
        name: comprehensive-example
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: [spot, on-demand]
        - key: karpenter.azure.com/sku-family
          operator: In
          values: [D, E, F]
      expireAfter: Never
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 0s
---
apiVersion: karpenter.azure.com/v1beta1
kind: AKSNodeClass
metadata:
  name: comprehensive-example
spec:
  # Image family configuration
  # Default: Ubuntu
  # Valid values: Ubuntu, AzureLinux
  imageFamily: Ubuntu

  # Virtual network subnet configuration (optional)
  # If not specified, uses the default --vnet-subnet-id from Karpenter installation
  vnetSubnetID: "/subscriptions/12345678-1234-1234-1234-123456789012/resourceGroups/my-rg/providers/Microsoft.Network/virtualNetworks/my-vnet/subnets/my-subnet"

  # OS disk size configuration
  # Default: 128 GB
  # Minimum: 30 GB
  osDiskSizeGB: 128

  # Maximum pods per node configuration
  # Default behavior depends on network plugin:
  # - Azure CNI with standard networking: 30 pods
  # - Azure CNI with overlay networking: 250 pods
  # - Other configurations: 110 pods
  # Range: 10-250
  maxPods: 30

  # Azure resource tags (optional)
  # Applied to all VM instances created with this AKSNodeClass
  tags:
    Environment: "production"
    Team: "platform-team"
    Application: "web-service"
    CostCenter: "engineering"

  # Kubelet configuration (optional)
  # All fields are optional with sensible defaults
  kubelet:
    # CPU management policy
    # Default: "none"
    # Valid values: none, static
    cpuManagerPolicy: "static"

    # CPU CFS quota enforcement
    # Default: true
    cpuCFSQuota: true

    # CPU CFS quota period
    # Default: "100ms"
    cpuCFSQuotaPeriod: "100ms"

    # Image garbage collection thresholds
    # imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent
    # Range: 0-100
    imageGCHighThresholdPercent: 85
    imageGCLowThresholdPercent: 80

    # Topology manager policy
    # Default: "none"
    # Valid values: none, restricted, best-effort, single-numa-node
    topologyManagerPolicy: "best-effort"

    # Container log configuration
    # containerLogMaxSize default: "50Mi"
    containerLogMaxSize: "50Mi"
    
    # containerLogMaxFiles default: 5, minimum: 2
    containerLogMaxFiles: 5

    # Pod process limits
    # Default: -1 (unlimited)
    podPidsLimit: 4096

Important

You can now deploy the custom resources to your cluster with the following kubectl command:

kubectl apply -f nodepool-default.yaml

Migrate workloads from fixed pools to node auto-provisioning managed nodes

Note

Consider setting node affinity that matches your specifications in NAP's NodePool and AKSNodeClass CRDs to ensure that your workloads can tolerate the types of nodes you defined NAP to provision and that they're scheduled to the NAP-managed nodes when desired. See the AKS node selector and affinity documentation for best practices.

Now scale down user pools gradually (keep the system pool):

# For each user pool, step down to 0 (this command should respect properly set PDBs)
az aks nodepool scale \
  --resource-group <RG> \
  --cluster-name <CLUSTER> \
  --name <USER_POOL> \
  --node-count 0

As pods evict, node auto-provisioning provisions replacement nodes per your NodePool and AKSNodeClass rules. If a user pool must go to zero, remember you can only do that on user pools (not system pool), and with cluster autoscaler disabled, which is already disabled in an earlier step.

Note

We recommend a gradual scale down in waves, and watch replicas/PDBs to avoid dips in availability.

To confirm that the scale down is working and workloads are being scheduled to NAP-managed nodes safely, check:

Custom resource definition files are active
Karpenter events detailing NAP decisions
Nodeclaims are created in response to pending pod pressure

Verify node auto-provisioning

Check CRDs and understand NAP fields

Check CRDs to confirm they are in use:

# Verify CRDs
kubectl get crd | grep karpenter

View field descriptions with the kubectl explain command:

# Use help api to describe fields
kubectl explain nodepool.spec

Confirm new NAP-managed nodes are being created

To ensure that NAP is properly provisioning new nodes in response to pending pod pressure, verify that the new nodes are being created. Node auto-provisioning produces cluster events that you can use to monitor deployment and scheduling decisions. View events through the Kubernetes events stream.

kubectl get events -A --field-selector source=karpenter -w

Alternatively, view the NodeClaims that represent the nodes being created:

kubectl get nodeclaims

A populated list confirms NAP is responding to pending pod pressure.

Clean up old autoscaling

If you're using managed AKS cluster autoscaler only, cluster autoscaler is already disabled with the above steps.
If you're using self-hosted cluster autoscaler installed in kube-system, scale the cluster autoscaler pods to zero and remove.

kubectl -n kube-system scale deploy/cluster-autoscaler --replicas=0
kubectl -n kube-system delete deploy/cluster-autoscaler

Fine-tune node auto-provisioning post-migration

After you complete your migration, you can fine-tune your cluster with these capabilities.

Manage disruption behavior - Tune disruption consolidationPolicy and consolidateAfter windows to balance cost vs. virtual machine churn. See the NAP Disruption documentation.
Multiple NodePools - Split by workload class (for example, Spot vs On-Demand, GPU vs CPU) and use requirements, weights, and taints to control placement. See the NAP NodePool documentation.
Networking - For more information on managing networking with custom virtual networks, see the NAP networking documentation.
Observability - Stream Karpenter events and expose NAP control-plane metrics via Azure Monitor managed Prometheus. See the NAP observability documentation.

Next steps

For more information on node auto-provisioning in AKS, see the following articles:

Last updated on 2026-04-24

Migrate from cluster autoscaler to node auto-provisioning

Cluster autoscaler vs. node auto-provisioning

Why migrate from cluster autoscaler to node auto-provisioning

Cluster autoscaler profile settings vs. node auto-provisioning configuration settings

Before you begin

Limitations

Disable cluster autoscaler

Pre-migration checklist

Disable cluster autoscaler safely

Enable node auto-provisioning

Enable node auto-provisioning on an existing cluster

Define your first NodePool and AKSNodeClass

Migrate workloads from fixed pools to node auto-provisioning managed nodes

Verify node auto-provisioning

Check CRDs and understand NAP fields

Confirm new NAP-managed nodes are being created

Clean up old autoscaling

Fine-tune node auto-provisioning post-migration

Next steps

Additional resources