有关 Azure Kubernetes 服务 (AKS) 中的基本计划程序功能的最佳做法Best practices for basic scheduler features in Azure Kubernetes Service (AKS)

在 Azure Kubernetes 服务 (AKS) 中管理群集时,通常需要隔离团队和工作负荷。As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. Kubernetes 计划程序提供所需的功能让你控制计算资源的分配,或限制维护事件造成的影响。The Kubernetes scheduler provides features that let you control the distribution of compute resources, or limit the impact of maintenance events.

本最佳做法文章重点介绍面向群集操作员的基本 Kubernetes 计划功能。This best practices article focuses on basic Kubernetes scheduling features for cluster operators. 在本文中,学习如何:In this article, you learn how to:

  • 使用资源配额向团队或工作负荷提供固定的资源量Use resource quotas to provide a fixed amount of resources to teams or workloads
  • 使用 pod 中断预算限制计划性维护造成的影响Limit the impact of scheduled maintenance using pod disruption budgets
  • 使用 kube-advisor 工具检查是否缺少 pod 资源请求和限制Check for missing pod resource requests and limits using the kube-advisor tool

强制实施资源配额Enforce resource quotas

最佳做法指导 - 在命名空间级别规划和应用资源配额。Best practice guidance - Plan and apply resource quotas at the namespace level. 如果 pod 未定义资源请求和限制,则拒绝部署。If pods don't define resource requests and limits, reject the deployment. 监视资源用量并根据需要调整配额。Monitor resource usage and adjust quotas as needed.

在 pod 规范中放置资源请求和限制。Resource requests and limits are placed in the pod specification. 在部署时,Kubernetes 计划程序使用这些限制在群集中查找可用的节点。These limits are used by the Kubernetes scheduler at deployment time to find an available node in the cluster. 这些限制和请求在单个 pod 级别应用。These limits and requests work at the individual pod level. 有关如何定义这些值的详细信息,请参阅定义 Pod 资源请求和限制For more information about how to define these values, see Define pod resource requests and limits

若要通过某个方式来保留和限制整个开发团队或项目的资源,应使用资源配额。 To provide a way to reserve and limit resources across a development team or project, you should use resource quotas. 这些配额在命名空间中定义,可用于根据以下条件设置配额:These quotas are defined on a namespace, and can be used to set quotas on the following basis:

  • 计算资源:例如 CPU 和内存,或 GPU。Compute resources, such as CPU and memory, or GPUs.
  • 存储资源:包括给定存储类的总卷数或磁盘空间量。Storage resources, includes the total number of volumes or amount of disk space for a given storage class.
  • 对象计数:例如,可创建的最大机密、服务或作业数。Object count, such as maximum number of secrets, services, or jobs can be created.

Kubernetes 不会过度使用资源。Kubernetes doesn't overcommit resources. 一旦资源请求或限制的累积总数超过分配的配额,则所有后续部署都不会成功。Once the cumulative total of resource requests or limits passes the assigned quota, no further deployments are successful.

定义资源配额时,命名空间中创建的所有 pod 必须在其 pod 规范中提供限制或请求。When you define resource quotas, all pods created in the namespace must provide limits or requests in their pod specifications. 如果它们未提供这些值,则你可以拒绝部署。If they don't provide these values, you can reject the deployment. 可以针对命名空间配置默认请求和限制Instead, you can configure default requests and limits for a namespace.

以下名为 dev-app-team-quotas.yaml 的示例 YAML 清单设置了总共 10 个 CPU、20Gi 内存和 10 个 pod 的硬限制:The following example YAML manifest named dev-app-team-quotas.yaml sets a hard limit of a total of 10 CPUs, 20Gi of memory, and 10 pods:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-app-team
spec:
  hard:
    cpu: "10"
    memory: 20Gi
    pods: "10"

可以通过指定命名空间(例如 dev-apps)来应用此资源配额:This resource quota can be applied by specifying the namespace, such as dev-apps:

kubectl apply -f dev-app-team-quotas.yaml --namespace dev-apps

请咨询应用程序开发人员和所有者以了解其需求,并应用适当的资源配额。Work with your application developers and owners to understand their needs and apply the appropriate resource quotas.

有关可用资源对象、范围和优先级的详细信息,请参阅 Kubernetes 中的资源配额For more information about available resource objects, scopes, and priorities, see Resource quotas in Kubernetes.

使用 pod 中断预算进行可用性规划Plan for availability using pod disruption budgets

最佳做法指导 - 若要维持应用程序的可用性,请定义 Pod 中断预算 (PDB),以确保群集中有最少量的 pod 可供使用。Best practice guidance - To maintain the availability of applications, define Pod Disruption Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster.

有两个中断性事件会导致 pod 被删除:There are two disruptive events that cause pods to be removed:

  • 非自愿性中断是群集操作员或应用程序所有者无法以一般方式进行控制的事件。 Involuntary disruptions are events beyond the typical control of the cluster operator or application owner.
    • 这些非自愿性中断包括物理机上的硬件故障、内核崩溃或删除节点 VMThese involuntary disruptions include a hardware failure on the physical machine, a kernel panic, or the deletion of a node VM
  • 自愿性中断是群集操作员或应用程序所有者请求的事件。 Voluntary disruptions are events requested by the cluster operator or application owner.
    • 这些自愿性中断包括群集升级、部署模板更新,或意外删除 pod。These voluntary disruptions include cluster upgrades, an updated deployment template, or accidentally deleting a pod.

在部署中使用 pod 的多个副本可以缓解非自愿性中断。The involuntary disruptions can be mitigated by using multiple replicas of your pods in a deployment. 在 AKS 群集中运行多个节点也有助于缓解这些非自愿性中断。Running multiple nodes in the AKS cluster also helps with these involuntary disruptions. Kubernetes 针对自愿性中断提供 pod 中断预算,让群集操作员定义最小可用资源计数或最大不可用资源计数。 For voluntary disruptions, Kubernetes provides pod disruption budgets that let the cluster operator define a minimum available or maximum unavailable resource count. 使用这些 pod 中断预算可以规划当发生自愿性中断事件时,部署或副本集如何做出响应。These pod disruption budgets let you plan for how deployments or replica sets respond when a voluntary disruption event occurs.

如果要升级群集或更新部署模板,Kubernetes 计划程序会确保在其他节点上计划其他 pod,然后,自愿性中断事件可以继续。If a cluster is to be upgraded or a deployment template updated, the Kubernetes scheduler makes sure additional pods are scheduled on other nodes before the voluntary disruption events can continue. 在重新启动节点之前,计划程序将一直等到在群集中的其他节点上成功计划了定义的 pod 数为止。The scheduler waits before a node is rebooted until the defined number of pods are successfully scheduled on other nodes in the cluster.

让我们探讨一个副本集示例,其中包含五个运行 NGINX 的 pod。Let's look at an example of a replica set with five pods that run NGINX. 将为副本集中的 Pod 指定 app: nginx-frontend 标签。The pods in the replica set are assigned the label app: nginx-frontend. 在发生自愿性中断事件(例如群集升级)期间,你想要确保至少有三个 pod 可继续运行。During a voluntary disruption event, such as a cluster upgrade, you want to make sure at least three pods continue to run. PodDisruptionBudget 对象的以下 YAML 清单定义了这些要求:The following YAML manifest for a PodDisruptionBudget object defines these requirements:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   minAvailable: 3
   selector:
   matchLabels:
      app: nginx-frontend

还可以定义一个百分比(例如 60% ),以便在扩展 pod 数目时可以自动补偿副本集。You can also define a percentage, such as 60%, which allows you to automatically compensate for the replica set scaling up the number of pods.

可在副本集中定义最大不可用实例数。You can define a maximum number of unavailable instances in a replica set. 同样,也可以定义最大不可用 pod 数的百分比。Again, a percentage for the maximum unavailable pods can also be defined. 以下 pod 中断预算 YAML 清单定义副本集中不可用的 pod 数不能超过两个:The following pod disruption budget YAML manifest defines that no more than two pods in the replica set be unavailable:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   maxUnavailable: 2
   selector:
   matchLabels:
      app: nginx-frontend

定义 pod 中断预算后,可以像创建其他任何 Kubernetes 对象一样,在 AKS 群集中创建中断预算:Once your pod disruption budget is defined, you create it in your AKS cluster as with any other Kubernetes object:

kubectl apply -f nginx-pdb.yaml

请咨询应用程序开发人员和所有者以了解其需求,并应用适当的 pod 中断预算。Work with your application developers and owners to understand their needs and apply the appropriate pod disruption budgets.

有关使用 Pod 中断预算的详细信息,请参阅为应用程序指定中断预算For more information about using pod disruption budgets, see Specify a disruption budget for your application.

定期使用 kube-advisor 检查群集问题Regularly check for cluster issues with kube-advisor

最佳做法指导 - 定期运行最新版本的 kube-advisor 开放源代码工具,以检测群集中的问题。Best practice guidance - Regularly run the latest version of kube-advisor open source tool to detect issues in your cluster. 如果针对现有 AKS 群集应用资源配额,请先运行 kube-advisor,以查找未定义资源请求和限制的 pod。If you apply resource quotas on an existing AKS cluster, run kube-advisor first to find pods that don't have resource requests and limits defined.

kube-advisor 工具是一个关联的 AKS 开放源代码项目,它将扫描 Kubernetes 群集,并报告它找到的问题。The kube-advisor tool is an associated AKS open source project that scans a Kubernetes cluster and reports on issues that it finds. 一项有用的检查是识别未应用资源请求和限制的 pod。One useful check is to identify pods that don't have resource requests and limits in place.

kube-advisor 工具可以报告 PodSpecs for Windows 应用程序以及 Linux 应用程序中缺少的资源请求和限制,但 kube-advisor 工具本身必须在 Linux Pod 上进行计划。The kube-advisor tool can report on resource request and limits missing in PodSpecs for Windows applications as well as Linux applications, but the kube-advisor tool itself must be scheduled on a Linux pod. 可以使用 Pod 配置中的节点选择器安排 Pod 在具有特定 OS 的节点池上运行。You can schedule a pod to run on a node pool with a specific OS using a node selector in the pod's configuration.

在托管多个开发团队和应用程序的 AKS 群集中,可能很难跟踪未设置这些资源请求和限制的 pod。In an AKS cluster that hosts multiple development teams and applications, it can be hard to track pods without these resource requests and limits set. 最佳做法是定期针对 AKS 群集运行 kube-advisor,尤其是未向命名空间分配资源配额时。As a best practice, regularly run kube-advisor on your AKS clusters, especially if you don't assign resource quotas to namespaces.

后续步骤Next steps

本文重点介绍了基本 Kubernetes 计划程序功能。This article focused on basic Kubernetes scheduler features. 有关 AKS 中的群集操作的详细信息,请参阅以下最佳做法:For more information about cluster operations in AKS, see the following best practices: