AKS upgrades frequently asked questions

Answers to frequently asked questions about Azure Kubernetes Service (AKS) upgrades.

Upgrade process and requirements

Can I upgrade my AKS cluster directly to a newer Kubernetes version, or do I need to upgrade sequentially?

AKS cluster upgrades must be performed sequentially through each minor Kubernetes version. Direct upgrades that skip minor versions aren't supported, except when upgrading from an unsupported version to a supported one. AKS ensures that the requested upgrade path is valid before allowing the upgrade operation to proceed.

See the following articles for more recommendations on selecting an upgrade path for your cluster and guidance on AKS upgrade strategies:

What happens if I skip minor versions during an AKS upgrade?

Skipping minor versions during an upgrade is only allowed when upgrading from an unsupported version to a supported one. Check Kubernetes version upgrades for specific considerations which should be taken into account when performing an upgrade from an unsupported version that skips two or more minor versions.

How do I handle deprecated Kubernetes APIs used by workloads before upgrading AKS?

The Kubernetes API changes with each new version release. As the Kubernetes API evolves, APIs are periodically reorganized or upgraded. When APIs evolve and graduate to GA versions, pre-release API versions are deprecated and eventually removed. By default, and to prevent upgrade or workload failures, AKS validates the usage of deprecated APIs during the 12 hour period prior to the upgrade operation getting triggered and blocks the operation if such API usage is found.

Before upgrading your AKS cluster, you should identify deprecated API usage in your cluster and migrate your workloads to use the new supported API versions. See the following articles for detailed guidance on how to identify deprecated API usage in your cluster and migrate to new API versions:

Can I upgrade an AKS cluster control plane and only some node pools?

Yes, you can upgrade the control plane of an AKS cluster independently and along with specific node pools. However, ensuring compatibility between the control plane and all node pools is crucial for cluster stability and therefore the version skew between control plane and node pools follows specific rules and restrictions. See the following resources for more information on these rules and guidance on control plane/node pool version skew management:

Impact and mitigation

Will upgrading my AKS cluster cause downtime or disruption to running applications?

The AKS cluster upgrade process involves the rolling update of worker nodes, which is based on node draining and pod eviction, and can therefore cause brief periods of unavailability in deployed workloads. You can minimize downtime during an AKS cluster upgrade through proper workload configuration. By leveraging several AKS/Kubernetes native features, you can guarantee a sufficient number of adequately distributed pod replicas exist at any given moment during the upgrade.

For more information and best practices around workload configuration for minimizing impact during upgrade operations, see the following articles:

How are StatefulSet pods and persistent volumes affected during an AKS cluster upgrade?

During an AKS cluster upgrade, StatefulSet Pods (like all other pods) are evicted and rescheduled as part of the node upgrade process. The CSI driver running on the nodes is responsible for seamlessly reattaching persistent volumes to the new nodes, which ensures data integrity and availability. This process minimizes disruption to stateful applications. See Stateful workloads in AKS for more information on using stateful applications and persistent volumes with AKS.

How to plan AKS upgrade to minimize workload disruption?

The AKS cluster upgrade process is designed with the goal of minimizing disruptions to the workloads running on the cluster, however, the node draining and pod evictions that occur during the upgrade might cause disruptions. Along with following best practices around workload configuration, we recommend:

  • Upgrade during off-business hours or periods of low activity.
  • Use AKS Planned Maintenance to schedule and control cluster auto-upgrade behavior.
  • Scale workloads appropriately to ensure sufficient replicas are available to handle traffic.
  • Use Pod Disruption Budgets to control the number of concurrent pod evictions for critical workloads.
  • Test the upgrade in lower environments first (e.g. development/test environments) before upgrading the production environment.

See the following articles for guidance on minimizing impact during upgrade operations:

Does upgrading AKS from the Free SKU to Standard SKU cause downtime?

You shouldn't experience any downtime while updating your cluster tier from Free to Standard. The process is designed to transition smoothly without interrupting running applications. See Update the tier of an existing AKS cluster for more guidance on updating your AKS cluster tier.

Permissions

What Azure RBAC permissions or roles are required to perform an AKS cluster upgrade?

To trigger an AKS cluster upgrade a user must have the Azure Kubernetes Service Contributor role assigned at the cluster resource level. See the following articles for a detailed description of this role and further guidance on providing users with granular access to AKS resources:

Backup and rollback

While backing up your AKS cluster before every upgrade isn't mandatory, having appropriate backup and recovery capabilities in place is an essential part of any organization's operational and disaster recovery strategy. You can use tools like Velero and AKS Backup to back up cluster resources and data and guarantee that service resumes as quickly as possible. See the following articles for detailed guidance on setting up backup and restore for your AKS cluster:

Is rollback supported if the AKS upgrade fails or causes issues?

Rollback of AKS cluster upgrade operations isn't supported. If an upgrade operation fails or doesn't successfully complete for some reason the path forward for the cluster is to:

  • Address whatever issue prevented the upgrade from succeeding. See the AKS troubleshooting documentation for guidance on resolving specific issues which might occur during upgrades.
  • Retry the cluster upgrade operation.

If an upgrade is successfully completed but introduces workload issues, such as API compatibility issues, the recommended solution is to recreate the cluster in the previous version and restore the workloads from previously created backups. AKS recommends:

  • Thoroughly verifying workload compatibility with the new AKS version in lower environments (e.g. development, test) before proceeding with the upgrade in production environments.
  • Backing up your workloads and configurations before initiating an upgrade. See AKS Backup for detailed guidance on backup/restore strategies for AKS using Azure Backup.

AKS also recommends implementing a blue-green deployment strategy, which allows phased introduction of new AKS versions while maintaining service availability. See Blue-green deployment of AKS clusters for more information on using this approach.

Auto-upgrade and maintenance

Will the AKS auto-upgrade channel upgrade my cluster to the latest patch version automatically?

Yes, setting the auto-upgrade channel to Patch automatically upgrades the cluster to the latest available patch version, typically during previously defined maintenance windows. This ensures that your cluster remains up to date with the latest security and stability patches without manual intervention. See the following articles for guidance on configuring auto-upgrade channels and using planned maintenance windows with your AKS cluster:

Does the AKS cluster upgrade process restart or reimage nodes?

During an AKS cluster upgrade, nodes are reimaged rather than just restarted. Reimaging ensures that nodes run the updated Kubernetes version and configurations. This process helps maintain cluster consistency and health. See Upgrade an AKS cluster for a detailed description of the AKS node upgrade process.

AKS also provides the option to use the OS Security Patch Channel, which guarantees worker nodes get the latest operating system security updates with minimum disruption. See Enhancing Your Operating System’s Security with OS Security Patches in AKS for more information on using OS Security Patch Channel and its benefits.

What is AKS Planned Maintenance, and how can I use it to control AKS auto-upgrade behavior?

To minimize impact on running workloads, AKS Planned Maintenance allows defining specific schedules for upgrade operations triggered by cluster and node OS image auto-upgrade channels. Clusters can have one or more planned maintenance configurations pertaining to different upgrade scenarios, using specific recurrence and duration settings.

See the following articles for more information on using AKS Planned Maintenance and auto-upgrade channels with your cluster:

How can I monitor maintenance activity in my cluster?

AKS Communication Manager allows you to receive alerts before and during maintenance events, including auto-upgrade operation results and guidance in case of failure. See AKS Communication Manager for more information on how to set up maintenance related notifications for your AKS cluster.

Support and deprecation

How long can I continue running unsupported AKS Kubernetes versions, and what's the grace period for upgrades?

AKS provides a 30-day grace period after every version removal, during which you will still receive support for your cluster. We highly recommended you monitor AKS deprecation notices to ensure your clusters remain within the supported version range. Once the grace period ends, running unsupported versions might lead to a lack of support and potential exposure to security vulnerabilities. See the following resources for detailed information on AKS version deprecations, release calendars and release notes:

AKS also provides a long-term support (LTS) option, which extends the support window for a Kubernetes version to give you more time to plan and test upgrades to newer Kubernetes versions. See Long-term support for AKS versions for more information on using AKS LTS.

What are the best practices for upgrading AKS clusters running unsupported or significantly outdated versions?

For AKS clusters running unsupported or significantly outdated Kubernetes versions, the recommended best practices include:

  1. Create a new cluster with a supported Kubernetes version.
  2. Migrate workloads to the new cluster to ensure they run on supported versions.
  3. Avoid upgrading across multiple versions. Instead of upgrading through several minor versions, moving to a new cluster minimizes complexity and potential issues.
  4. Back up and validate data before migration.
  5. Perform thorough testing in a staging environment to identify and resolve any compatibility issues.

Is standby or proactive Microsoft support available during AKS cluster upgrades?

No, Microsoft provides only reactive support for AKS cluster upgrades. If you require proactive or standby support during upgrades, you need to engage Azure Event Management services, which are separate and might involve additional costs. See Azure support plans for more information on Azure support options.

Are Helm versions compatible with newer AKS Kubernetes versions?

Yes, Helm version 3 is compatible with all supported AKS Kubernetes versions. We recommend using Helm v3 for deploying and managing applications on AKS clusters to ensure compatibility and access to the latest features. See the following resources for guidance on using Helm with AKS:

Can I use Terraform to upgrade the AKS cluster or change SKU tiers?

While support is guaranteed in Azure CLI for all generally available (GA) AKS features, support in Terraform for some GA features might lag behind and is subject to Terraform Provider development constraints. We recommend checking Terraform Azure provider documentation for the support status of specific features.

Are Beta Kubernetes APIs supported in AKS clusters after upgrading to version 1.30 or above?

Beta APIs are disabled by default in AKS clusters starting with Kubernetes versions 1.30 and 1.27 LTS, consistent with AKS support policies. Furthermore, as of November 30th, 2024, Beta APIs will also be disabled in AKS clusters created with Kubernetes versions 1.28 and 1.29. For more information, see Upgrade an AKS cluster.

Networking and security

Who creates inbound NSG rules for AKS clusters, and why are they required?

AKS automatically manages Network Security Group (NSG) configuration according to the services deployed in the cluster to allow internet access to the public IPs associated with these workloads. The inbound rules AKS configures are essential for enabling external access to services deployed on the cluster. Manually modifying the managed NSG configuration might impact service availability. See Networking concepts for applications in AKS for more on AKS networking concepts.