Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
When planning an upgrade for your Azure Kubernetes Service (AKS) cluster, it's essential to consider the capacity and cost implications. Proper planning ensures successful upgrades while minimizing unexpected costs and resource constraints. This article outlines key considerations for capacity and cost planning during AKS upgrades, including surge node requirements, quota management, and IP address planning.
AKS upgrade requirements
AKS upgrades require extra compute capacity for surge nodes, which are temporary nodes created during the upgrade process. The number of surge nodes needed depends on the size of your cluster and the upgrade strategy you choose. Understanding these requirements is crucial for effective capacity planning and helps you:
- Avoid upgrade failures due to capacity constraints.
- Plan for quota and subnet IP requirements.
- Optimize upgrade settings to balance cost and performance.
- Manage costs associated with temporary resources.
Surge node capacity planning
During an upgrade, AKS creates surge nodes to maintain workload availability. The number of surge nodes depends on your maxSurge configuration, which defines how many extra nodes can be created during the upgrade. The following table outlines the surge node behavior based on different maxSurge settings:
maxSurge setting |
Surge nodes created | Capacity impact |
|---|---|---|
| 1 (default) | One extra node | • Minimal impact • Suitable for small clusters |
| 33% | ~1/3 of node pool size | • Moderate impact • Balances cost and performance • Recommended for production clusters |
| 50% | ~1/2 of node pool size | • Higher impact • Faster upgrades • Increased cost due to more temporary nodes |
Quota considerations for surge nodes
Before initiating an upgrade, ensure that your Azure subscription has sufficient quota to accommodate the surge nodes. Key quotas to monitor include:
- Virtual machine (VM) cores: Current nodes + surge nodes must not exceed your VM core quota.
- Public IP addresses: Each surge node might require a public IP address, depending on your network configuration.
- Load balancer resources: Ensure enough backend pool capacity for surge nodes.
Check your current quotas
You can check your current quotas using the az vm list-usage Azure CLI command:
az vm list-usage --location <your-region> --output table
Calculate IP address requirements
Surge nodes require extra IP addresses in your subnet. To calculate the total IP address requirement during an upgrade, use the following formula:
Total IPs needed = (Current nodes + `maxSurge` nodes) × (1 + `maxPods` per node)
For AKS clusters using Azure CNI, ensure your subnet has capacity for all nodes, surge nodes, and their associated pods.
Common capacity planning scenarios and solutions
The following scenarios illustrate common capacity planning challenges during AKS upgrades and their solutions:
Scenario 1: Capacity constraints due to insufficient VM core quota
If your cluster is limited by VM SKU availability, regional capacity, or quota limits, upgrades might fail when surge nodes can't be provisioned. Common errors include:
SKUNotAvailableAllocationFailedOverconstrainedAllocationRequestQuotaExceeded
Solutions include:
Use
maxUnavailableinstead ofmaxSurge: Upgrade using existing node capacity without provisioning new nodes. You can update this setting using theaz aks nodepool updatecommand. For example:az aks nodepool update \ --resource-group <resource-group-name> \ --cluster-name <cluster-name> \ --name <node-pool-name> \ --max-surge 0 \ --max-unavailable 1Lower
maxSurgevalue: Reduce extra capacity requirements by setting a lowermaxSurgepercentage. You can update this setting using theaz aks nodepool updatecommand. For example:az aks nodepool update \ --resource-group <resource-group-name> \ --cluster-name <cluster-name> \ --name <node-pool-name> \ --max-surge 10%
Scenario 2: IP exhaustion in subnet
When your subnet lacks sufficient IP addresses for surge nodes and pods, node provisioning fails with errors like SubnetIsFull. Solutions include:
Calculate IP needs: Use the IP calculation formula to determine required subnet size.
Reclaim unused IPs: Clean up unused resources in the subnet.
Expand subnet size: Increase the subnet's address space to accommodate extra IPs. For example, from
/24to/22.Lower
maxSurgevalue: Reduce the number of surge nodes to decrease IP requirements.Reduce
maxPodsper node: Free up IP addresses by lowering the maximum number of pods per node. You can update this setting using theaz aks nodepool updatecommand. For example:az aks nodepool update \ --resource-group <resource-group-name> \ --cluster-name <cluster-name> \ --name <node-pool-name> \ --max-pods 50
Cost optimization strategies for AKS upgrades
The following sections outline strategies to optimize costs during AKS upgrades:
Minimize surge costs
- Use lower
maxSurgevalues: Reduces temporary node costs but might increase upgrade duration. - Schedule upgrades during off-peak hours: Take advantage of lower compute costs during off-peak times. Faster upgrades mean fewer hours of surge node billing.
- Use spot instances for non-production workloads: If applicable, configure node pools with spot instances to reduce costs during upgrades.
Balance upgrade speed and cost
The following table outlines priorities for different upgrade strategies, recommended settings, and trade-offs:
| Priority | Recommended settings | Trade-offs |
|---|---|---|
| Minimize cost | maxSurge=1, longer upgrade window |
Slower upgrades |
| Balance cost and speed | maxSurge=33%, Planned Maintenance |
Moderate costs and speed |
| Maximize speed | maxSurge=50% or higher |
Higher temporary costs due to more surge nodes |
Monitor upgrade costs
Track the cost associated with AKS upgrades using the following methods:
- Tag surge resources: AKS automatically tags surge nodes. Use these tags to filter and analyze costs in Azure Cost Management.
- Use Azure Cost Management: Monitor and analyze costs related to AKS upgrades, focusing on surge node expenses.
- Set up budgets and alerts: Create budgets in Azure Cost Management to monitor upgrade-related expenses and receive alerts when approaching budget limits.
Planning checklist for AKS upgrades
Before initiating an AKS upgrade, use the following checklist to ensure proper capacity and cost planning:
- [ ] Quota: Sufficient VM cores for current + surge nodes.
- [ ] Subnet IPs: Enough addresses for all nodes and pods.
- [ ] SKU availability: Target VM size available in region.
- [ ] Budget: Allocated for temporary surge resources.
- [ ] Maintenance window: Scheduled for optimal capacity availability.