自动缩放群集以满足 Azure Kubernetes 服务 (AKS) 中的应用程序需求Automatically scale a cluster to meet application demands on Azure Kubernetes Service (AKS)

若要满足 Azure Kubernetes 服务 (AKS) 中的应用程序需求,可能需要调整运行工作负载的节点数。To keep up with application demands in Azure Kubernetes Service (AKS), you may need to adjust the number of nodes that run your workloads. 群集自动缩放程序组件可以监视群集中由于资源约束而无法进行计划的 Pod。The cluster autoscaler component can watch for pods in your cluster that can't be scheduled because of resource constraints. 检测到问题时,节点池中的节点数会增加,以满足应用程序需求。When issues are detected, the number of nodes in a node pool is increased to meet the application demand. 还会定期检查节点是否缺少正在运行的 Pod,随后根据需要减少节点数。Nodes are also regularly checked for a lack of running pods, with the number of nodes then decreased as needed. 这种自动增加或减少 AKS 群集中的节点数的功能使你可以运行具有成本效益的高效群集。This ability to automatically scale up or down the number of nodes in your AKS cluster lets you run an efficient, cost-effective cluster.

本文演示如何在 AKS 群集中启用和管理群集自动缩放程序。This article shows you how to enable and manage the cluster autoscaler in an AKS cluster.

准备阶段Before you begin

本文要求运行 Azure CLI 2.0.76 或更高版本。This article requires that you are running the Azure CLI version 2.0.76 or later. 运行 az --version 即可查找版本。Run az --version to find the version. 如果需要进行安装或升级,请参阅安装 Azure CLIIf you need to install or upgrade, see Install Azure CLI.

限制Limitations

创建和管理使用群集自动缩放程序的 AKS 群集时存在以下限制:The following limitations apply when you create and manage AKS clusters that use the cluster autoscaler:

  • 无法使用 HTTP 应用程序路由加载项。The HTTP application routing add-on can't be used.

关于群集自动缩放程序About the cluster autoscaler

若要进行调整以适应不断变化的应用程序需求(如工作日与夜间或周末之间),群集通常需要一种自动缩放方式。To adjust to changing application demands, such as between the workday and evening or on a weekend, clusters often need a way to automatically scale. AKS 群集可以采用两种方式之一进行缩放:AKS clusters can scale in one of two ways:

  • 群集自动缩放程序会监视由于资源约束而无法在节点上计划的 Pod。The cluster autoscaler watches for pods that can't be scheduled on nodes because of resource constraints. 群集随后会自动增加节点数。The cluster then automatically increases the number of nodes.
  • 水平 Pod 自动缩放程序会在 Kubernetes 群集中使用指标服务器来监视 Pod 的资源需求。The horizontal pod autoscaler uses the Metrics Server in a Kubernetes cluster to monitor the resource demand of pods. 如果应用程序需要更多资源,则会自动增加 Pod 数以满足需求。If an application needs more resources, the number of pods is automatically increased to meet the demand.

群集自动缩放程序和水平 Pod 自动缩放程序通常协同工作以支持所需的应用程序需求

水平 Pod 自动缩放程序和群集自动缩放程序随后还可以根据需要减少 Pod 和节点数。Both the horizontal pod autoscaler and cluster autoscaler can also decrease the number of pods and nodes as needed. 当已有一段时间存在未使用的容量时,群集自动缩放程序会减少节点数。The cluster autoscaler decreases the number of nodes when there has been unused capacity for a period of time. 要由群集自动缩放程序删除的节点上的 Pod 会在群集中的其他位置安全地进行计划。Pods on a node to be removed by the cluster autoscaler are safely scheduled elsewhere in the cluster. 如果无法移动 Pod,则群集自动缩放程序可能无法纵向缩减,如以下情况:The cluster autoscaler may be unable to scale down if pods can't move, such as in the following situations:

  • Pod 直接创建,不由控制器对象(如部署或副本集)提供支持。A pod is directly created and isn't backed by a controller object, such as a deployment or replica set.
  • Pod 中断预算 (PDB) 限制太多,不允许 Pod 数低于特定阈值。A pod disruption budget (PDB) is too restrictive and doesn't allow the number of pods to be fall below a certain threshold.
  • Pod 使用在不同节点上进行计划时无法遵循的节点选择器或反相关性。A pod uses node selectors or anti-affinity that can't be honored if scheduled on a different node.

有关群集自动缩放程序如何可能无法减少的详细信息,请参阅哪些类型的 Pod 可能会阻止群集自动缩放程序删除节点?For more information about how the cluster autoscaler may be unable to scale down, see What types of pods can prevent the cluster autoscaler from removing a node?

群集自动缩放程序对诸如缩放事件与资源阈值之间的时间间隔等内容使用启动参数。The cluster autoscaler uses startup parameters for things like time intervals between scale events and resource thresholds. 有关群集自动缩放程序使用的参数的详细信息,请参阅群集自动缩放程序参数是什么?For more information on what parameters the cluster autoscaler uses, see What are the cluster autoscaler parameters?

群集和水平 Pod 自动缩放程序可以协同工作,通常部署在一个群集中。The cluster and horizontal pod autoscalers can work together, and are often both deployed in a cluster. 结合使用时,水平 Pod 自动缩放程序侧重于运行满足应用程序需求所需的 Pod 数。When combined, the horizontal pod autoscaler is focused on running the number of pods required to meet application demand. 群集自动缩放程序侧重于运行支持计划 Pod 所需的节点数。The cluster autoscaler is focused on running the number of nodes required to support the scheduled pods.

备注

使用群集自动缩放程序时,会禁用手动缩放。Manual scaling is disabled when you use the cluster autoscaler. 请让群集自动缩放程序确定所需的节点数。Let the cluster autoscaler determine the required number of nodes. 如果要手动缩放群集,则禁用群集自动缩放程序If you want to manually scale your cluster, disable the cluster autoscaler.

创建 AKS 群集并启用群集自动缩放程序Create an AKS cluster and enable the cluster autoscaler

如果需要创建 AKS 群集,请使用 az aks create 命令。If you need to create an AKS cluster, use the az aks create command. 若要在群集的节点池中启用和配置群集自动缩放程序,请使用 --enable-cluster-autoscaler 参数,并指定节点 --min-count--max-countTo enable and configure the cluster autoscaler on the node pool for the cluster, use the --enable-cluster-autoscaler parameter, and specify a node --min-count and --max-count.

重要

群集自动缩放程序是 Kubernetes 组件。The cluster autoscaler is a Kubernetes component. 虽然 AKS 群集对节点使用虚拟机规模集,但请勿在 Azure 门户中或使用 Azure CLI 手动启用或编辑规模集自动缩放设置。Although the AKS cluster uses a virtual machine scale set for the nodes, don't manually enable or edit settings for scale set autoscale in the Azure portal or using the Azure CLI. 让 Kubernetes 群集自动缩放程序管理所需的规模设置。Let the Kubernetes cluster autoscaler manage the required scale settings. 有关详细信息,请参阅我可以修改节点资源组中的 AKS 资源吗?For more information, see Can I modify the AKS resources in the node resource group?

以下示例使用虚拟机规模集支持的单个节点池创建 AKS 群集。The following example creates an AKS cluster with a single node pool backed by a virtual machine scale set. 它还会在群集的节点池中启用群集自动缩放程序,并将最小节点数设置为 1,将最大节点数设置为 3: It also enables the cluster autoscaler on the node pool for the cluster and sets a minimum of 1 and maximum of 3 nodes:

# First create a resource group
az group create --name myResourceGroup --location chinaeast2

# Now create the AKS cluster and enable the cluster autoscaler
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --node-count 1 \
  --vm-set-type VirtualMachineScaleSets \
  --load-balancer-sku standard \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3

创建群集并配置群集自动缩放程序设置需要几分钟时间。It takes a few minutes to create the cluster and configure the cluster autoscaler settings.

更新现有的 AKS 群集以启用群集自动缩放程序Update an existing AKS cluster to enable the cluster autoscaler

使用 az aks update 命令来启用和配置现有群集的节点池上的群集自动缩放程序。Use the az aks update command to enable and configure the cluster autoscaler on the node pool for the existing cluster. 使用 --enable-cluster-autoscaler 参数,并指定节点 --min-count 和 --max-count。Use the --enable-cluster-autoscaler parameter, and specify a node --min-count and --max-count.

重要

群集自动缩放程序是 Kubernetes 组件。The cluster autoscaler is a Kubernetes component. 虽然 AKS 群集对节点使用虚拟机规模集,但请勿在 Azure 门户中或使用 Azure CLI 手动启用或编辑规模集自动缩放设置。Although the AKS cluster uses a virtual machine scale set for the nodes, don't manually enable or edit settings for scale set autoscale in the Azure portal or using the Azure CLI. 让 Kubernetes 群集自动缩放程序管理所需的规模设置。Let the Kubernetes cluster autoscaler manage the required scale settings. 有关详细信息,请参阅我可以修改节点资源组中的 AKS 资源吗?For more information, see Can I modify the AKS resources in the node resource group?

以下示例更新现有 AKS 群集以在群集的节点池中启用群集自动缩放程序,并将节点的最小数目设置为 1,最大数目设置为 3 :The following example updates an existing AKS cluster to enable the cluster autoscaler on the node pool for the cluster and sets a minimum of 1 and maximum of 3 nodes:

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3

更新群集并配置群集自动缩放程序设置需要几分钟时间。It takes a few minutes to update the cluster and configure the cluster autoscaler settings.

更改群集自动缩放程序设置Change the cluster autoscaler settings

重要

如果 AKS 群集包含多个节点池,请跳到使用多个代理池进行自动缩放部分。If you have multiple node pools in your AKS cluster, skip to the autoscale with multiple agent pools section. 包含多个代理池的群集要求使用 az aks nodepool 命令集而不是 az aks 来更改节点池特定的属性。Clusters with multiple agent pools require use of the az aks nodepool command set to change node pool specific properties instead of az aks.

在创建 AKS 群集或更新现有节点池的上一步骤中,群集自动缩放程序最小节点计数设置为 1,且最大节点计数设置为 3In the previous step to create an AKS cluster or update an existing node pool, the cluster autoscaler minimum node count was set to 1, and the maximum node count was set to 3. 随着应用程序需求发生变化,可能需要调整群集自动缩放程序节点计数。As your application demands change, you may need to adjust the cluster autoscaler node count.

若要更改节点计数,请使用 az aks update 命令。To change the node count, use the az aks update command.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --update-cluster-autoscaler \
  --min-count 1 \
  --max-count 5

以上示例将 myAKSCluster 中单个节点池上的群集自动缩放程序更新为最少 1 个和最多 5 个节点。The above example updates cluster autoscaler on the single node pool in myAKSCluster to a minimum of 1 and maximum of 5 nodes.

备注

群集自动缩放程序根据每个节点池上设置的最小计数和最大计数来做出缩放决定,但更新最小计数或最大计数后不会强制执行。The cluster autoscaler makes scaling decisions based on the minimum and maximum counts set on each node pool, but it does not enforce them after updating the min or max counts. 例如,如果当前节点计数为 3,则将最小计数设置为 5 不会立即将池扩展到 5。For example, setting a min count of 5 when the current node count is 3 will not immediately scale the pool up to 5. 如果节点池的最小计数大于当前节点数的值,则当存在足够多的计划外 Pod(需要 2 个额外的新节点并触发自动缩放程序事件)时,将采用新的最小或最大设置。If the minimum count on the node pool has a value higher than the current number of nodes, the new min or max settings will be respected when there are enough unschedulable pods present that would require 2 new additional nodes and trigger an autoscaler event. 缩放事件之后,将遵循新的计数限制。After the scale event, the new count limits are respected.

监视应用程序和服务的性能,并调整群集自动缩放程序节点计数以匹配所需性能。Monitor the performance of your applications and services, and adjust the cluster autoscaler node counts to match the required performance.

使用自动缩放程序配置文件Using the autoscaler profile

还可以通过更改群集范围的自动缩放程序配置文件中的默认值,来配置群集自动缩放程序的更高粒度详细信息。You can also configure more granular details of the cluster autoscaler by changing the default values in the cluster-wide autoscaler profile. 例如,在节点未充分利用 10 分钟后,将发生纵向缩减事件。For example, a scale down event happens after nodes are under-utilized after 10 minutes. 如果你的工作负荷每 15 分钟运行一次,则可能需要更改自动缩放程序配置文件,以便在 15 到 20 分钟后纵向缩减未充分利用的节点。If you had workloads that ran every 15 minutes, you may want to change the autoscaler profile to scale down under utilized nodes after 15 or 20 minutes. 启用群集自动缩放程序后,除非指定不同的设置,否则将使用默认配置文件。When you enable the cluster autoscaler, a default profile is used unless you specify different settings. 可以更新群集自动缩放程序配置文件中的以下设置:The cluster autoscaler profile has the following settings that you can update:

设置Setting 描述Description 默认值Default value
scan-intervalscan-interval 重新评估群集纵向扩展或缩减的频率How often cluster is reevaluated for scale up or down 10 秒10 seconds
scale-down-delay-after-addscale-down-delay-after-add 纵向扩展后经过多长时间恢复评估纵向缩减How long after scale up that scale down evaluation resumes 10 分钟10 minutes
scale-down-delay-after-deletescale-down-delay-after-delete 删除节点后经过多长时间恢复评估纵向缩减How long after node deletion that scale down evaluation resumes scan-intervalscan-interval
scale-down-delay-after-failurescale-down-delay-after-failure 纵向缩减失败后经过多长时间恢复评估纵向缩减How long after scale down failure that scale down evaluation resumes 3 分钟3 minutes
scale-down-unneeded-timescale-down-unneeded-time 在节点符合纵向缩减的条件之前应有多长时间不需要它How long a node should be unneeded before it is eligible for scale down 10 分钟10 minutes
scale-down-unready-timescale-down-unready-time 在未准备就绪的节点符合纵向缩减的条件之前应有多长时间不需要它How long an unready node should be unneeded before it is eligible for scale down 20 分钟20 minutes
scale-down-utilization-thresholdscale-down-utilization-threshold 节点利用率级别,定义为所请求资源的总和除以容量,低于计算结果的节点可被视为符合纵向缩减的条件Node utilization level, defined as sum of requested resources divided by capacity, below which a node can be considered for scale down 0.50.5
max-graceful-termination-secmax-graceful-termination-sec 群集自动缩放程序在尝试纵向缩减节点时等待 Pod 终止的最大秒数。Maximum number of seconds the cluster autoscaler waits for pod termination when trying to scale down a node. 600 秒600 seconds
balance-similar-node-groupsbalance-similar-node-groups 检测类似的节点池并在它们之间平衡节点数Detect similar node pools and balance the number of nodes between them falsefalse

重要

群集自动缩放程序配置文件影响所有使用群集自动缩放程序的节点池。The cluster autoscaler profile affects all node pools that use the cluster autoscaler. 无法为每个节点池设置自动缩放程序配置文件。You can't set an autoscaler profile per node pool.

安装 aks-preview CLI 扩展Install aks-preview CLI extension

若要设置群集自动缩放程序设置配置文件,需要安装 aks-preview CLI 扩展 0.4.30 或更高版本。To set the cluster autoscaler settings profile, you need the aks-preview CLI extension version 0.4.30 or higher. 使用 az extension add 命令安装 aks-preview Azure CLI 扩展,然后使用 az extension update 命令检查是否有任何可用的更新:Install the aks-preview Azure CLI extension using the az extension add command, then check for any available updates using the az extension update command:

# Install the aks-preview extension
az extension add --name aks-preview

# Update the extension to make sure you have the latest version installed
az extension update --name aks-preview

在现有 AKS 群集上设置群集自动缩放程序配置文件Set the cluster autoscaler profile on an existing AKS cluster

结合 cluster-autoscaler-profile 参数使用 az aks update 命令在群集上设置群集自动缩放程序配置文件。Use the az aks update command with the cluster-autoscaler-profile parameter to set the cluster autoscaler profile on your cluster. 以下示例在配置文件中将扫描间隔设置配置为 30 秒。The following example configures the scan interval setting as 30s in the profile.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --cluster-autoscaler-profile scan-interval=30s

在群集中的节点池上启用群集自动缩放程序时,这些群集也将使用群集自动缩放程序配置文件。When you enable the cluster autoscaler on node pools in the cluster, those clusters will also use the cluster autoscaler profile. 例如:For example:

az aks nodepool update \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name mynodepool \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3

重要

设置群集自动缩放程序配置文件时,启用了群集自动缩放程序的任何现有节点池将立即开始使用该配置文件。When you set the cluster autoscaler profile, any existing node pools with the cluster autoscaler enabled will start using the profile immediately.

创建 AKS 群集时设置群集自动缩放程序配置文件Set the cluster autoscaler profile when creating an AKS cluster

也可以在创建群集时使用 cluster-autoscaler-profile 参数。You can also use the cluster-autoscaler-profile parameter when you create your cluster. 例如:For example:

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --node-count 1 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3 \
  --cluster-autoscaler-profile scan-interval=30s

上述命令创建一个 AKS 群集,并将群集范围的自动缩放程序配置文件的扫描间隔定义为 30 秒。The above command creates an AKS cluster and defines the scan interval as 30 seconds for the cluster-wide autoscaler profile. 该命令还会在初始节点池上启用群集自动缩放程序,并将最小节点计数设置为 1,将最大节点计数设置为 3。The command also enables the cluster autoscaler on the initial node pool, sets the minimum node count to 1 and the maximum node count to 3.

将群集自动缩放程序配置文件重置为默认值Reset cluster autoscaler profile to default values

使用 az aks update 命令在群集上重置群集自动缩放程序配置文件。Use the az aks update command to reset the cluster autoscaler profile on your cluster.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --cluster-autoscaler-profile ""

禁用群集自动缩放程序Disable the cluster autoscaler

如果你不再想要使用群集自动缩放程序,可以使用 az aks update 命令并指定 --disable-cluster-autoscaler 参数将其禁用。If you no longer wish to use the cluster autoscaler, you can disable it using the az aks update command, specifying the --disable-cluster-autoscaler parameter. 群集自动缩放程序处于禁用状态时,不会删除节点。Nodes aren't removed when the cluster autoscaler is disabled.

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --disable-cluster-autoscaler

禁用群集自动缩放程序后,可以使用 az aks scale 命令手动缩放群集。You can manually scale your cluster after disabling the cluster autoscaler by using the az aks scale command. 如果使用水平 Pod 自动缩放程序,则该功能会在群集自动缩放程序禁用的情况下继续运行,但 Pod 可能会在节点资源全部已使用时最终无法进行计划。If you use the horizontal pod autoscaler, that feature continues to run with the cluster autoscaler disabled, but pods may end up unable to be scheduled if all node resources are in use.

重新启用已禁用的群集自动缩放程序Re-enable a disabled cluster autoscaler

若要对现有的群集重新启用群集自动缩放程序,可以使用 az aks update 命令并指定 --enable-cluster-autoscaler--min-count--max-count 参数。If you wish to re-enable the cluster autoscaler on an existing cluster, you can re-enable it using the az aks update command, specifying the --enable-cluster-autoscaler, --min-count, and --max-count parameters.

检索群集自动缩放程序日志和状态Retrieve cluster autoscaler logs and status

若要诊断和调试自动缩放程序事件,可以从自动缩放程序加载项检索日志和状态。To diagnose and debug autoscaler events, logs and status can be retrieved from the autoscaler add-on.

AKS 将代你管理群集自动缩放程序,并在托管控制平面中运行它。AKS manages the cluster autoscaler on your behalf and runs it in the managed control plane. 必须将主节点日志配置为可作为结果查看。Master node logs must be configured to be viewed as a result.

若要配置将从群集自动缩放程序推送到 Log Analytics 的日志,请执行以下步骤。To configure logs to be pushed from the cluster autoscaler into Log Analytics, follow these steps.

  1. 设置一条资源日志规则,以便将群集自动缩放程序日志推送到 Log Analytics。Set up a rule for resource logs to push cluster-autoscaler logs to Log Analytics. 此处提供了详细说明,请确保在选择“日志”的选项时选中 cluster-autoscaler 的复选框。Instructions are detailed here, ensure you check the box for cluster-autoscaler when selecting options for "Logs".

  2. 在 Azure 门户中单击群集上的“日志”部分。Click on the "Logs" section on your cluster via the Azure portal.

  3. 将以下示例查询输入 Log Analytics:Input the following example query into Log Analytics:

    AzureDiagnostics
    | where Category == "cluster-autoscaler"
    

只要有可检索的日志,就会看到类似于以下示例的日志。You should see logs similar to the following example as long as there are logs to retrieve.

Log Analytics 日志

群集自动缩放程序还会将运行状况写出到名为 cluster-autoscaler-status 的 configmap。The cluster autoscaler will also write out health status to a configmap named cluster-autoscaler-status. 若要检索这些日志,请执行以下 kubectl 命令。To retrieve these logs, execute the following kubectl command. 将报告配置了群集自动缩放程序的每个节点池的运行状况。A health status will be reported for each node pool configured with the cluster autoscaler.

kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml

若要详细了解通过自动缩放程序记录的内容,请阅读有关 Kubernetes/自动缩放程序 GitHub 项目的常见问题解答。To learn more about what is logged from the autoscaler, read the FAQ on the Kubernetes/autoscaler GitHub project.

使用启用了多个节点池的群集自动缩放程序Use the cluster autoscaler with multiple node pools enabled

群集自动缩放程序可与启用的多个节点池结合使用。The cluster autoscaler can be used together with multiple node pools enabled. 参阅该文档了解如何启用多个节点池,以及如何将其他节点池添加到现有群集。Follow that document to learn how to enable multiple node pools and add additional node pools to an existing cluster. 将这两个功能结合使用时,可对群集中的每个节点池启用群集自动缩放程序,并可将唯一的自动缩放规则传递到每个节点池。When using both features together, you enable the cluster autoscaler on each individual node pool in the cluster and can pass unique autoscaling rules to each.

以下命令假设已按照本文档前面提供的初始说明操作,并且你要将现有节点池的最大计数从 3 更新为 5The below command assumes you followed the initial instructions earlier in this document and you want to update an existing node pool's max-count from 3 to 5. 使用 az aks nodepool update 命令更新现有节点池的设置。Use the az aks nodepool update command to update an existing node pool's settings.

az aks nodepool update \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name nodepool1 \
  --update-cluster-autoscaler \
  --min-count 1 \
  --max-count 5

可以使用 az aks nodepool update 并传递 --disable-cluster-autoscaler 参数来禁用群集自动缩放程序。The cluster autoscaler can be disabled with az aks nodepool update and passing the --disable-cluster-autoscaler parameter.

az aks nodepool update \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name nodepool1 \
  --disable-cluster-autoscaler

若要对现有的群集重新启用群集自动缩放程序,可以使用 az aks nodepool update 命令并指定 --enable-cluster-autoscaler--min-count--max-count 参数。If you wish to re-enable the cluster autoscaler on an existing cluster, you can re-enable it using the az aks nodepool update command, specifying the --enable-cluster-autoscaler, --min-count, and --max-count parameters.

后续步骤Next steps

本文演示了如何自动缩放 AKS 节点数。This article showed you how to automatically scale the number of AKS nodes. 还可以使用水平 Pod 自动缩放程序自动调整运行应用程序的 Pod 数。You can also use the horizontal pod autoscaler to automatically adjust the number of pods that run your application. 有关使用水平 Pod 自动缩放程序的步骤,请参阅在 AKS 中缩放应用程序For steps on using the horizontal pod autoscaler, see Scale applications in AKS.