有关 Azure Kubernetes 服务 (AKS) 中的基本计划程序功能的最佳做法Best practices for basic scheduler features in Azure Kubernetes Service (AKS)

在 Azure Kubernetes 服务 (AKS) 中管理群集时,通常需要隔离团队和工作负荷。As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. 利用 Kubernetes 计划程序,你可以控制计算资源的分配,或限制维护事件造成的影响。The Kubernetes scheduler lets you control the distribution of compute resources, or limit the impact of maintenance events.

本最佳做法文章重点介绍面向群集操作员的基本 Kubernetes 计划功能。This best practices article focuses on basic Kubernetes scheduling features for cluster operators. 在本文中,学习如何:In this article, you learn how to:

  • 使用资源配额向团队或工作负荷提供固定的资源量Use resource quotas to provide a fixed amount of resources to teams or workloads
  • 使用 pod 中断预算限制计划性维护造成的影响Limit the impact of scheduled maintenance using pod disruption budgets
  • 使用 kube-advisor 工具检查是否缺少 pod 资源请求和限制Check for missing pod resource requests and limits using the kube-advisor tool

强制实施资源配额Enforce resource quotas

最佳实践指南Best practice guidance

在命名空间级别规划和应用资源配额。Plan and apply resource quotas at the namespace level. 如果 pod 未定义资源请求和限制,则拒绝部署。If pods don't define resource requests and limits, reject the deployment. 监视资源用量并根据需要调整配额。Monitor resource usage and adjust quotas as needed.

在 pod 规范中放置资源请求和限制。Resource requests and limits are placed in the pod specification. 在部署时,Kubernetes 计划程序使用限制在群集中查找可用的节点。Limits are used by the Kubernetes scheduler at deployment time to find an available node in the cluster. 限制和请求在单个 Pod 级别应用。Limits and requests work at the individual pod level. 有关如何定义这些值的详细信息,请参阅定义 pod 资源请求和限制For more information about how to define these values, see Define pod resource requests and limits

若要通过某个方式来保留和限制整个开发团队或项目的资源,应使用资源配额。To provide a way to reserve and limit resources across a development team or project, you should use resource quotas. 这些配额在命名空间中定义,可用于根据以下条件设置配额:These quotas are defined on a namespace, and can be used to set quotas on the following basis:

  • 计算资源:例如 CPU 和内存,或 GPU。Compute resources, such as CPU and memory, or GPUs.
  • 存储资源:包括给定存储类的总卷数或磁盘空间量。Storage resources, including the total number of volumes or amount of disk space for a given storage class.
  • 对象计数:例如,可创建的最大机密、服务或作业数。Object count, such as maximum number of secrets, services, or jobs can be created.

Kubernetes 不会过度使用资源。Kubernetes doesn't overcommit resources. 累积资源请求总数超过了分配的配额后,所有进一步的部署都将失败。Once your cumulative resource request total passes the assigned quota, all further deployments will be unsuccessful.

定义资源配额时,命名空间中创建的所有 pod 必须在其 pod 规范中提供限制或请求。When you define resource quotas, all pods created in the namespace must provide limits or requests in their pod specifications. 如果它们未提供这些值,则你可以拒绝部署。If they don't provide these values, you can reject the deployment. 可以针对命名空间配置默认请求和限制Instead, you can configure default requests and limits for a namespace.

以下名为 dev-app-team-quotas.yaml 的示例 YAML 清单设置了总共 10 个 CPU、20Gi 内存和 10 个 pod 的硬限制:The following example YAML manifest named dev-app-team-quotas.yaml sets a hard limit of a total of 10 CPUs, 20Gi of memory, and 10 pods:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-app-team
spec:
  hard:
    cpu: "10"
    memory: 20Gi
    pods: "10"

可以通过指定命名空间(例如 dev-apps)来应用此资源配额:This resource quota can be applied by specifying the namespace, such as dev-apps:

kubectl apply -f dev-app-team-quotas.yaml --namespace dev-apps

请咨询应用程序开发人员和所有者以了解其需求,并应用适当的资源配额。Work with your application developers and owners to understand their needs and apply the appropriate resource quotas.

有关可用资源对象、范围和优先级的详细信息,请参阅 Kubernetes 中的资源配额For more information about available resource objects, scopes, and priorities, see Resource quotas in Kubernetes.

使用 pod 中断预算进行可用性规划Plan for availability using pod disruption budgets

最佳实践指南Best practice guidance

若要维持应用程序的可用性,请定义 Pod 中断预算 (PDB),以确保群集中可用的 Pod 数量最少。To maintain the availability of applications, define Pod Disruption Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster.

有两个中断性事件会导致 pod 被删除:There are two disruptive events that cause pods to be removed:

非自愿性中断Involuntary disruptions

非自愿性中断是群集操作员或应用程序所有者无法以一般方式进行控制的事件。Involuntary disruptions are events beyond the typical control of the cluster operator or application owner. 包含:Include:

  • 物理计算机上的硬件失败Hardware failure on the physical machine
  • 内核死机Kernel panic
  • 删除节点 VMDeletion of a node VM

可以通过以下方式来缓解非自愿性中断:Involuntary disruptions can be mitigated by:

  • 在部署中使用 Pod 的多个副本。Using multiple replicas of your pods in a deployment.
  • 在 AKS 群集中运行多个节点。Running multiple nodes in the AKS cluster.

自愿性中断Voluntary disruptions

自愿性中断是群集操作员或应用程序所有者请求的事件。Voluntary disruptions are events requested by the cluster operator or application owner. 包含:Include:

  • 群集升级Cluster upgrades
  • 更新了部署模板Updated deployment template
  • 意外删除 PodAccidentally deleting a pod

Kubernetes 为自愿性中断提供 Pod 中断预算,以便规划发生自愿性中断事件时部署或副本集的响应方式。Kubernetes provides pod disruption budgets for voluntary disruptions,letting you plan for how deployments or replica sets respond when a voluntary disruption event occurs. 使用 Pod 中断预算,群集操作员可以定义可用资源数下限或不可用资源数上限。Using pod disruption budgets, cluster operators can define a minimum available or maximum unavailable resource count.

如果升级群集或更新部署模板,则 Kubernetes 计划程序将在允许自愿性中断事件继续之前,在其他节点上安排额外的 Pod。If you upgrade a cluster or update a deployment template, the Kubernetes scheduler will schedule extra pods on other nodes before allowing voluntary disruption events to continue. 计划程序将等待重启节点,直到在群集中的其他节点上成功安排了定义的 Pod 数为止。The scheduler waits to reboot a node until the defined number of pods are successfully scheduled on other nodes in the cluster.

让我们探讨一个副本集示例,其中包含五个运行 NGINX 的 pod。Let's look at an example of a replica set with five pods that run NGINX. 将为副本集中的 Pod 指定 app: nginx-frontend 标签。The pods in the replica set are assigned the label app: nginx-frontend. 在发生自愿性中断事件(例如群集升级)期间,你想要确保至少有三个 pod 可继续运行。During a voluntary disruption event, such as a cluster upgrade, you want to make sure at least three pods continue to run. PodDisruptionBudget 对象的以下 YAML 清单定义了这些要求:The following YAML manifest for a PodDisruptionBudget object defines these requirements:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   minAvailable: 3
   selector:
     matchLabels:
       app: nginx-frontend

还可以定义一个百分比(例如 60%),以便在扩展 pod 数目时可以自动补偿副本集。You can also define a percentage, such as 60%, which allows you to automatically compensate for the replica set scaling up the number of pods.

可在副本集中定义最大不可用实例数。You can define a maximum number of unavailable instances in a replica set. 同样,也可以定义最大不可用 pod 数的百分比。Again, a percentage for the maximum unavailable pods can also be defined. 以下 pod 中断预算 YAML 清单定义副本集中不可用的 pod 数不能超过两个:The following pod disruption budget YAML manifest defines that no more than two pods in the replica set be unavailable:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   maxUnavailable: 2
   selector:
     matchLabels:
       app: nginx-frontend

定义 pod 中断预算后,可以像创建其他任何 Kubernetes 对象一样,在 AKS 群集中创建中断预算:Once your pod disruption budget is defined, you create it in your AKS cluster as with any other Kubernetes object:

kubectl apply -f nginx-pdb.yaml

请咨询应用程序开发人员和所有者以了解其需求,并应用适当的 pod 中断预算。Work with your application developers and owners to understand their needs and apply the appropriate pod disruption budgets.

有关使用 pod 中断预算的详细信息,请参阅为应用程序指定中断预算For more information about using pod disruption budgets, see Specify a disruption budget for your application.

定期使用 kube-advisor 检查群集问题Regularly check for cluster issues with kube-advisor

最佳实践指南Best practice guidance

定期运行最新版本的 kube-advisor 开放源代码工具,以检测群集中的问题。Regularly run the latest version of kube-advisor open source tool to detect issues in your cluster. 如果针对现有 AKS 群集应用资源配额,请先运行 kube-advisor,以查找未定义资源请求和限制的 pod。If you apply resource quotas on an existing AKS cluster, run kube-advisor first to find pods that don't have resource requests and limits defined.

kube-advisor 工具是一个关联的 AKS 开放源代码项目,它将扫描 Kubernetes 群集,并报告识别的问题。The kube-advisor tool is an associated AKS open source project that scans a Kubernetes cluster and reports identified issues. kube-advisor 证明在识别没有资源请求和限制的 Pod 时很有用。kube-advisor proves useful in identifying pods without resource requests and limits in place.

虽然 kube-advisor 工具可以报告 Windows 和 Linux 应用程序在 PodSpecs 中缺少的资源请求和限制,但该工具本身必须在 Linux Pod 上进行计划。While the kube-advisor tool can report on resource request and limits missing in PodSpecs for Windows and Linux applications, the tool itself must be scheduled on a Linux pod. 使用 Pod 配置中的节点选择器安排 Pod 在具有特定 OS 的节点池上运行。Schedule a pod to run on a node pool with a specific OS using a node selector in the pod's configuration.

在托管多个开发团队和应用程序的 AKS 群集中,可能很难跟踪未设置资源请求和限制的 Pod。Tracking pods without set resource requests and limits in an AKS cluster hosting multiple development teams and applications can be difficult. 最佳做法是定期针对 AKS 群集运行 kube-advisor,尤其是未向命名空间分配资源配额时。As a best practice, regularly run kube-advisor on your AKS clusters, especially if you don't assign resource quotas to namespaces.

后续步骤Next steps

本文重点介绍了基本 Kubernetes 计划程序功能。This article focused on basic Kubernetes scheduler features. 有关 AKS 中的群集操作的详细信息,请参阅以下最佳做法:For more information about cluster operations in AKS, see the following best practices: