Upgrade Istio-based service mesh add-on for Azure Kubernetes Service
This article addresses upgrade experiences for Istio-based service mesh add-on for Azure Kubernetes Service (AKS).
Announcements about the releases of new minor revisions or patches to the Istio-based service mesh add-on are published in the AKS release notes. To learn more about the release schedule and support for service mesh add-on revisions, read the support policy.
Minor revision upgrade
Istio add-on allows upgrading the minor revision using canary upgrade process. When an upgrade is initiated, the control plane of the new (canary) revision is deployed alongside the initial (stable) revision's control plane. You can then manually roll over data plane workloads while using monitoring tools to track the health of workloads during this process. If you don't observe any issues with the health of your workloads, you can complete the upgrade so that only the new revision remains on the cluster. Else, you can roll back to the previous revision of Istio.
If the cluster is currently using a supported minor revision of Istio, upgrades are only allowed one minor revision at a time. If the cluster is using an unsupported revision of Istio, you must upgrade to the lowest supported minor revision of Istio for that Kubernetes version. After that, upgrades can again be done one minor revision at a time.
The following example illustrates how to upgrade from revision asm-1-22
to asm-1-23
with all workloads in the default
namespace. The steps are the same for all minor upgrades and may be used for any number of namespaces.
Use the az aks mesh get-upgrades command to check which revisions are available for the cluster as upgrade targets:
az aks mesh get-upgrades --resource-group $RESOURCE_GROUP --name $CLUSTER
If you expect to see a newer revision not returned by this command, you may need to upgrade your AKS cluster first so that it's compatible with the newest revision.
If you set up mesh configuration for the existing mesh revision on your cluster, you need to create a separate ConfigMap corresponding to the new revision in the
aks-istio-system
namespace before initiating the canary upgrade in the next step. This configuration is applicable the moment the new revision's control plane is deployed on cluster. More details can be found here.Initiate a canary upgrade from revision
asm-1-22
toasm-1-23
using az aks mesh upgrade start:az aks mesh upgrade start --resource-group $RESOURCE_GROUP --name $CLUSTER --revision asm-1-23
A canary upgrade means the 1.23 control plane is deployed alongside the 1.22 control plane. They continue to coexist until you either complete or roll back the upgrade.
Optionally, revision tags may be used to roll over the data plane to the new revision without needing to manually relabel each namespace. Manually relabeling namespaces when moving them to a new revision can be tedious and error-prone. Revision tags solve this problem by serving as stable identifiers that point to revisions.
Rather than relabeling each namespace, a cluster operator can change the tag to point to a new revision. All namespaces labeled with that tag are updated at the same time. However, you still need to restart the workloads to make sure the correct version of
istio-proxy
sidecars are injected.To use revision tags during an upgrade:
Create a revision tag for the initial revision. In this example, we name it
prod-stable
:istioctl tag set prod-stable --revision asm-1-22 --istioNamespace aks-istio-system
Create a revision tag for the revision installed during the upgrade. In this example, we name it
prod-canary
:istioctl tag set prod-canary --revision asm-1-23 --istioNamespace aks-istio-system
Label application namespaces to map to revision tags:
# label default namespace to map to asm-1-22 kubectl label ns default istio.io/rev=prod-stable --overwrite
You may also label namespaces with
istio.io/rev=prod-canary
for the newer revision. However, the workloads in those namespaces aren't updated to a new sidecar until they're restarted.If a new application is created in a namespace after it is labeled, a sidecar will be injected corresponding to the revision tag on that namespace.
Verify control plane pods corresponding to both
asm-1-22
andasm-1-23
exist:Verify
istiod
pods:kubectl get pods -n aks-istio-system
Example output:
NAME READY STATUS RESTARTS AGE istiod-asm-1-22-55fccf84c8-dbzlt 1/1 Running 0 58m istiod-asm-1-22-55fccf84c8-fg8zh 1/1 Running 0 58m istiod-asm-1-23-f85f46bf5-7rwg4 1/1 Running 0 51m istiod-asm-1-23-f85f46bf5-8p9qx 1/1 Running 0 51m
If ingress is enabled, verify ingress pods:
kubectl get pods -n aks-istio-ingress
Example output:
NAME READY STATUS RESTARTS AGE aks-istio-ingressgateway-external-asm-1-22-58f889f99d-qkvq2 1/1 Running 0 59m aks-istio-ingressgateway-external-asm-1-22-58f889f99d-vhtd5 1/1 Running 0 58m aks-istio-ingressgateway-external-asm-1-23-7466f77bb9-ft9c8 1/1 Running 0 51m aks-istio-ingressgateway-external-asm-1-23-7466f77bb9-wcb6s 1/1 Running 0 51m aks-istio-ingressgateway-internal-asm-1-22-579c5d8d4b-4cc2l 1/1 Running 0 58m aks-istio-ingressgateway-internal-asm-1-22-579c5d8d4b-jjc7m 1/1 Running 0 59m aks-istio-ingressgateway-internal-asm-1-23-757d9b5545-g89s4 1/1 Running 0 51m aks-istio-ingressgateway-internal-asm-1-23-757d9b5545-krq9w 1/1 Running 0 51m
Observe that ingress gateway pods of both revisions are deployed side-by-side. However, the service and its IP remain immutable.
Relabel the namespace so that any new pods are mapped to the Istio sidecar associated with the new revision and its control plane:
If using revision tags, overwrite the
prod-stable
tag itself to change its mapping:istioctl tag set prod-stable --revision asm-1-23 --istioNamespace aks-istio-system --overwrite
Verify the tag-to-revision mappings:
istioctl tag list
Both tags should point to the newly installed revision:
TAG REVISION NAMESPACES prod-canary asm-1-23 default prod-stable asm-1-23 ...
In this case, you don't need to relabel each namespace individually.
If not using revision tags, data plane namespaces must be relabeled to point to the new revision:
kubectl label namespace default istio.io/rev=asm-1-23 --overwrite
Relabeling doesn't affect your workloads until they're restarted.
Individually roll over each of your application workloads by restarting them. For example:
kubectl rollout restart deployment <deployment name> -n <deployment namespace>
Check your monitoring tools and dashboards to determine whether your workloads are all running in a healthy state after the restart. Based on the outcome, you have two options:
Complete the canary upgrade: If you're satisfied that the workloads are all running in a healthy state as expected, you can complete the canary upgrade. Completion of the upgrade removes the previous revision's control plane and leaves behind the new revision's control plane on the cluster. Run the following command to complete the canary upgrade:
az aks mesh upgrade complete --resource-group $RESOURCE_GROUP --name $CLUSTER
Rollback the canary upgrade: In case you observe any issues with the health of your workloads, you can roll back to the previous revision of Istio:
Relabel the namespace to the previous revision: If using revision tags:
istioctl tag set prod-stable --revision asm-1-22 --istioNamespace aks-istio-system --overwrite
Or, if not using revision tags:
kubectl label namespace default istio.io/rev=asm-1-22 --overwrite
Roll back the workloads to use the sidecar corresponding to the previous Istio revision by restarting these workloads again:
kubectl rollout restart deployment <deployment name> -n <deployment namespace>
Roll back the control plane to the previous revision:
az aks mesh upgrade rollback --resource-group $RESOURCE_GROUP --name $CLUSTER
The
prod-canary
revision tag can be removed:istioctl tag remove prod-canary --istioNamespace aks-istio-system
If mesh configuration was previously set up for the revisions, you can now delete the ConfigMap for the revision that was removed from the cluster during complete/rollback.
Minor revision upgrades with the ingress gateway
If you're currently using Istio ingress gateways and are performing a minor revision upgrade, keep in mind that Istio ingress gateway pods / deployments are deployed per-revision. However, we provide a single LoadBalancer service across all ingress gateway pods over multiple revisions, so the external/internal IP address of the ingress gateways remains unchanged throughout the course of an upgrade.
Thus, during the canary upgrade, when two revisions exist simultaneously on the cluster, the ingress gateway pods of both revisions serve incoming traffic.
Minor revision upgrades with horizontal pod autoscaling customizations
If you have customized horizontal pod autoscaling (HPA) settings for Istiod or the ingress gateways, note the following behavior for how HPA settings are applied across both revisions to maintain consistency during a canary upgrade:
- If you update the HPA spec before initiating an upgrade, the settings from the existing (stable) revision will be applied to the HPAs of the canary revision when the new control plane is installed.
- If you update the HPA spec while a canary upgrade is in progress, the HPA spec of the stable revision will take precedence and be applied to the HPA of the canary revision.
- If you update the HPA of the stable revision during an upgrade, the HPA spec of the canary revision will be updated to reflect the new settings applied to the stable revision.
- If you update the HPA of the canary revision during an upgrade, the HPA spec of the canary revision will be reverted to the HPA spec of the stable revision.
Patch version upgrade
- Istio add-on patch version availability information is published in AKS release notes.
- Patches are rolled out automatically for istiod and ingress pods as part of these AKS releases, which respect the
default
planned maintenance window set up for the cluster. - User needs to initiate patches to Istio proxy in their workloads by restarting the pods for reinjection:
Check the version of the Istio proxy intended for new or restarted pods. This version is the same as the version of the istiod and Istio ingress pods after they were patched:
kubectl get cm -n aks-istio-system -o yaml | grep "mcr.azk8s.cn\/oss\/istio\/proxyv2"
Example output:
"image": "mcr.azk8s.cn/oss/istio/proxyv2:1.22.0-distroless", "image": "mcr.azk8s.cn/oss/istio/proxyv2:1.22.0-distroless"
Check the Istio proxy image version for all pods in a namespace:
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' |\ sort |\ grep "mcr.azk8s.cn\/oss\/istio\/proxyv2"
Example output:
productpage-v1-979d4d9fc-p4764: docker.io/istio/examples-bookinfo-productpage-v1:1.22.0, mcr.azk8s.cn/oss/istio/proxyv2:1.22.0-distroless
To trigger reinjection, restart the workloads. For example:
kubectl rollout restart deployments/productpage-v1 -n default
To verify that they're now on the newer versions, check the Istio proxy image version again for all pods in the namespace:
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' |\ sort |\ grep "mcr.azk8s.cn\/oss\/istio\/proxyv2"
Example output:
productpage-v1-979d4d9fc-p4764: docker.io/istio/examples-bookinfo-productpage-v1:1.2.0, mcr.azk8s.cn/oss/istio/proxyv2:1.22.0-distroless
Note
In case of any issues encountered during upgrades, refer to article on troubleshooting mesh revision upgrades