升级 Azure Kubernetes 服务 (AKS) 群集Upgrade an Azure Kubernetes Service (AKS) cluster

AKS 群集生命周期的一部分涉及到定期升级到最新的 Kubernetes 版本。Part of the AKS cluster lifecycle involves performing periodic upgrades to the latest Kubernetes version. 必须应用最新的安全版本,或者通过升级来获取最新功能。It is important you apply the latest security releases, or upgrade to get the latest features. 本文演示如何在 AKS 群集中升级主组件或单个默认的节点池。This article shows you how to upgrade the master components or a single, default node pool in an AKS cluster.

对于使用多个节点池或 Windows Server 节点的 AKS 群集,请参阅升级 AKS 中的节点池For AKS clusters that use multiple node pools or Windows Server nodes, see Upgrade a node pool in AKS.

准备阶段Before you begin

本文要求运行 Azure CLI 2.0.65 或更高版本。This article requires that you are running the Azure CLI version 2.0.65 or later. 运行 az --version 即可查找版本。Run az --version to find the version. 如果需要进行安装或升级,请参阅安装 Azure CLIIf you need to install or upgrade, see Install Azure CLI.

警告

AKS 群集升级会触发节点的隔离和排空。An AKS cluster upgrade triggers a cordon and drain of your nodes. 如果可用计算配额较低,则升级可能会失败。If you have a low compute quota available, the upgrade may fail. 有关详细信息,请参阅增加配额For more information, see increase quotas

检查是否有可用的 AKS 群集升级Check for available AKS cluster upgrades

若要检查哪些 Kubernetes 版本可用于群集,请使用 az aks get-upgrades 命令。To check which Kubernetes releases are available for your cluster, use the az aks get-upgrades command. 下面的示例会检查 myResourceGroup 中到 myAKSCluster 的可用升级 :The following example checks for available upgrades to myAKSCluster in myResourceGroup:

az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster --output table

备注

升级受支持的 AKS 群集时,不能跳过 Kubernetes 次要版本。When you upgrade a supported AKS cluster, Kubernetes minor versions cannot be skipped. 所有升级都必须按主版本号依次执行。All upgrades must be performed sequentially by major version number. 例如,允许 1.14.x -> 1.15.x 或 1.15.x -> 1.16.x 之间进行升级,但不允许 1.14.x -> 1.16.x 之间的升级 。For example, upgrades between 1.14.x -> 1.15.x or 1.15.x -> 1.16.x are allowed, however 1.14.x -> 1.16.x is not allowed.

仅当从不受支持的版本升级回受支持的版本时,才可以跳过多个版本 。Skipping multiple versions can only be done when upgrading from an unsupported version back to a supported version. 例如,可从不受支持的 1.10.x 升级到受支持的 1.15.x 。For example, an upgrade from an unsupported 1.10.x --> a supported 1.15.x can be completed.

以下示例输出表明,群集可升级到版本 1.19.1 和 1.19.3 :The following example output shows that the cluster can be upgraded to versions 1.19.1 and 1.19.3:

Name     ResourceGroup    MasterVersion    Upgrades
------- ---------------  --------------- --------------
default  myResourceGroup  1.18.10          1.19.1, 1.19.3

如果没有可用的升级,你将收到消息:If no upgrade is available, you will get the message:

ERROR: Table output unavailable. Use the --query option to specify an appropriate query. Use --debug for more info.

自定义节点激增升级Customize node surge upgrade

重要

节点激增要求与每个升级操作所请求的最大激增计数相对应的订阅配额。Node surges require subscription quota for the requested max surge count for each upgrade operation. 例如,具有 5 个节点池(每个池的节点计数为 4 个)的群集总共包含 20 个节点。For example, a cluster that has 5 node pools, each with a count of 4 nodes, has a total of 20 nodes. 如果每个节点池的最大激增值为 50%,则额外需要 10 个节点(2 个节点 * 5 个池)的计算和 IP 配额才能完成升级。If each node pool has a max surge value of 50%, additional compute and IP quota of 10 nodes (2 nodes * 5 pools) is required to complete the upgrade.

如果使用 Azure CNI,另请验证子网中是否存在满足 Azure CNI 的 IP 要求的可用 IP。If using Azure CNI, validate there are available IPs in the subnet as well to satisfy IP requirements of Azure CNI.

默认情况下,AKS 将升级配置为使用一个额外节点进行激增。By default, AKS configures upgrades to surge with one additional node. 最大激增设置的默认值为 1,这样就会在隔离/排空现有应用程序之前创建一个额外的节点来替换旧版本的节点,从而使 AKS 能够最大限度地减少工作负荷中断。A default value of one for the max surge settings will enable AKS to minimize workload disruption by creating an additional node before the cordon/drain of existing applications to replace an older versioned node. 可以为每个节点池自定义最大激增值,以便在升级速度和升级中断之间进行权衡。The max surge value may be customized per node pool to enable a trade-off between upgrade speed and upgrade disruption. 通过增大最大激增值,升级过程可以更快地完成,但是将最大激增值设置为较大的值可能会导致在升级过程中发生中断。By increasing the max surge value, the upgrade process completes faster, but setting a large value for max surge may cause disruptions during the upgrade process.

例如,最大激增值为 100% 可以让升级过程尽可能快(节点计数加倍),但也会导致节点池中的所有节点同时被排空。For example, a max surge value of 100% provides the fastest possible upgrade process (doubling the node count) but also causes all nodes in the node pool to be drained simultaneously. 对于测试环境,你可以使用这样的较高的值。You may wish to use a higher value such as this for testing environments. 对于生产节点池,建议将 max-surge 设置为 33%。For production node pools, we recommend a max_surge setting of 33%.

AKS 接受整数值和百分比值作为最大激增值。AKS accepts both integer values and a percentage value for max surge. 例如,整数 5 指示要激增五个额外的节点。An integer such as "5" indicates five additional nodes to surge. 值为“50%”指示激增值为池中当前节点计数的一半。A value of "50%" indicates a surge value of half the current node count in the pool. 最大激增百分比值最小可以为 1%,最大可以为 100%。Max surge percent values can be a minimum of 1% and a maximum of 100%. 百分比值将向上舍入到最接近的节点计数。A percent value is rounded up to the nearest node count. 如果在升级时最大激增值小于当前的节点计数,则会使用当前节点计数作为最大激增值。If the max surge value is lower than the current node count at the time of upgrade, the current node count is used for the max surge value.

在升级过程中,最大激增值最小可以为 1,最大可以等于节点池中的节点数。During an upgrade, the max surge value can be a minimum of 1 and a maximum value equal to the number of nodes in your node pool. 你可以设置更大的值,但用于最大激增的最大节点数不会高于升级时池中的节点数。You can set larger values, but the maximum number of nodes used for max surge won't be higher than the number of nodes in the pool at the time of upgrade.

重要

节点池中的最大激增设置是永久性的。The max surge setting on a node pool is permanent. 后续的 Kubernetes 升级或节点版本升级都将使用此设置。Subsequent Kubernetes upgrades or node version upgrades will use this setting. 你可以随时更改节点池的最大激增值。You may change the max surge value for your node pools at any time. 对于生产节点池,建议将 max-surge 设置为 33%。For production node pools, we recommend a max-surge setting of 33%.

可以使用以下命令为新的或现有的节点池设置最大激增值。Use the following commands to set max surge values for new or existing node pools.

# Set max surge for a new node pool
az aks nodepool add -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 33%
# Update max surge for an existing node pool 
az aks nodepool update -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 5

升级 AKS 群集Upgrade an AKS cluster

如果有一系列适用于 AKS 群集的版本,则可使用 az aks upgrade 命令进行升级。With a list of available versions for your AKS cluster, use the az aks upgrade command to upgrade. 在升级过程中,AKS 将:During the upgrade process, AKS will:

  • 向运行指定 Kubernetes 版本的群集添加一个新的缓冲区节点(或 max surge 中配置的节点数)。add a new buffer node (or as many nodes as configured in max surge) to the cluster that runs the specified Kubernetes version.
  • 隔离并排空其中一个旧节点,以最大程度地减少对正在运行的应用程序的干扰(如果你使用的是最大浪涌,它会同时隔离并排空与指定的缓冲区节点数相同的节点数)。cordon and drain one of the old nodes to minimize disruption to running applications (if you're using max surge it will cordon and drain as many nodes at the same time as the number of buffer nodes specified).
  • 旧节点在完全排空时,会被重置映像以接收新版本,并且会成为下一个要升级的节点的缓冲区节点。When the old node is fully drained, it will be reimaged to receive the new version and it will become the buffer node for the following node to be upgraded.
  • 此过程会重复进行,直至群集中的所有节点都已升级完毕。This process repeats until all nodes in the cluster have been upgraded.
  • 在此过程结束时,将删除上一个缓冲区节点,从而维持现有的代理节点计数和区域均衡。At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.
az aks upgrade \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --kubernetes-version KUBERNETES_VERSION

升级群集需要几分钟时间,具体取决于有多少节点。It takes a few minutes to upgrade the cluster, depending on how many nodes you have.

重要

确保任何 PodDisruptionBudgets (PDB) 都允许一次至少移动 1 个 Pod 副本,否则排空/逐出操作将失败。Ensure that any PodDisruptionBudgets (PDBs) allow for at least 1 pod replica to be moved at a time otherwise the drain/evict operation will fail. 如果排空操作失败,则根据设计,升级操作将失败,以确保应用程序不会中断。If the drain operation fails, the upgrade operation will fail by design to ensure that the applications are not disrupted. 请消除导致操作停止的原因(PDB 错误、缺少配额等),然后重试该操作。Please correct what caused the operation to stop (incorrect PDBs, lack of quota, and so on) and re-try the operation.

若要确认升级是否成功,请使用 az aks show 命令:To confirm that the upgrade was successful, use the az aks show command:

az aks show --resource-group myResourceGroup --name myAKSCluster --output table

以下示例输出表明群集现在运行 1.18.10:The following example output shows that the cluster now runs 1.18.10:

Name          Location    ResourceGroup    KubernetesVersion    ProvisioningState    Fqdn
------------ ----------  --------------- -------------------  ------------------- ----------------------------------------------
myAKSCluster  chinaeast2      myResourceGroup  1.18.10              Succeeded            myakscluster-dns-379cbbb9.hcp.chinaeast2.cx.prod.service.azk8s.cn

后续步骤Next steps

本文演示了如何升级现有的 AKS 群集。This article showed you how to upgrade an existing AKS cluster. 若要详细了解如何部署和管理 AKS 群集,请参阅相关教程系列。To learn more about deploying and managing AKS clusters, see the set of tutorials.