有关 Azure Kubernetes 服务 (AKS) 中的群集隔离的最佳做法Best practices for cluster isolation in Azure Kubernetes Service (AKS)

在 Azure Kubernetes 服务 (AKS) 中管理群集时,通常需要隔离团队和工作负荷。As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. AKS 可让你灵活运行多租户群集和隔离资源。AKS provides flexibility in how you can run multi-tenant clusters and isolate resources. 为了最大化 Kubernetes 的投资回报,请了解并实施这些多租户和隔离功能。To maximize your investment in Kubernetes, these multi-tenancy and isolation features should be understood and implemented.

本最佳做法文章向群集操作员重点介绍隔离。This best practices article focuses on isolation for cluster operators. 在本文中,学习如何:In this article, you learn how to:

  • 规划多租户群集和资源隔离Plan for multi-tenant clusters and separation of resources
  • 在 AKS 群集中使用逻辑或物理隔离Use logical or physical isolation in your AKS clusters

设计多租户群集Design clusters for multi-tenancy

Kubernetes 提供所需的功能让你在同一个群集中逻辑隔离团队和工作负荷。Kubernetes provides features that let you logically isolate teams and workloads in the same cluster. 目标应该是提供最少量的特权,特权范围限定为每个团队所需的资源。The goal should be to provide the least number of privileges, scoped to the resources each team needs. Kubernetes 中的命名空间创建逻辑隔离边界。A Namespace in Kubernetes creates a logical isolation boundary. 其他 Kubernetes 功能以及有关隔离和多租户的注意事项包括以下几个方面:Additional Kubernetes features and considerations for isolation and multi-tenancy include the following areas:

  • “计划”包括资源配额和 pod 中断预算等基本功能的用法。Scheduling includes the use of basic features such as resource quotas and pod disruption budgets. 有关这些功能的详细信息,请参阅有关 AKS 中基本计划程序功能的最佳做法For more information about these features, see Best practices for basic scheduler features in AKS.
  • “网络”包括用于控制传入和传出 pod 的流量流的网络策略的用法。Networking includes the use of network policies to control the flow of traffic in and out of pods.
  • “身份验证和授权”包括基于角色的访问控制 (RBAC) 和 Azure Active Directory (AD) 集成、pod 标识以及 Azure Key Vault 中的机密的用法。Authentication and authorization include the user of role-based access control (RBAC) and Azure Active Directory (AD) integration, pod identities, and secrets in Azure Key Vault. 有关这些功能的详细信息,请参阅有关 AKS 中身份验证和授权的最佳做法For more information about these features, see Best practices for authentication and authorization in AKS.
  • “容器”包括 pod 安全策略、pod 安全上下文,以及扫描映像和运行时中的漏洞。Containers include pod security policies, pod security contexts, scanning images and runtimes for vulnerabilities. 此外,还涉及到使用 App Armor 或 Seccomp(安全计算)来限制容器对基础节点的访问。Also involves using App Armor or Seccomp (Secure Computing) to restrict container access to the underlying node.

逻辑隔离群集Logically isolate clusters

最佳做法指导 - 使用逻辑隔离来隔离团队和项目。Best practice guidance - Use logical isolation to separate teams and projects. 尝试尽量减少要部署的物理 AKS 群集数,以隔离团队或应用程序。Try to minimize the number of physical AKS clusters you deploy to isolate teams or applications.

使用逻辑隔离可将单个 AKS 群集用于多个工作负荷、团队或环境。With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes 命名空间构成了工作负荷和资源的逻辑隔离边界。Kubernetes Namespaces form the logical isolation boundary for workloads and resources.

AKS 中 Kubernetes 群集的逻辑隔离

群集逻辑分隔提供的 pod 密度通常比物理隔离的群集更高。Logical separation of clusters usually provides a higher pod density than physically isolated clusters. 群集中闲置的超额计算容量更少。There's less excess compute capacity that sits idle in the cluster. 与 Kubernetes 群集自动缩放程序相结合,可根据需求增加或减少节点数目。When combined with the Kubernetes cluster autoscaler, you can scale the number of nodes up or down to meet demands. 采用这种自动缩放最佳做法,可以只运行所需数目的节点并尽量降低成本。This best practice approach to autoscaling lets you run only the number of nodes required and minimizes costs.

AKS 或其他位置中的 Kubernetes 环境并不完全安全,因为可能存在恶意的多租户使用情况。Kubernetes environments, in AKS or elsewhere, aren't completely safe for hostile multi-tenant usage. 在多租户环境中,多个租户使用公共的共享基础设施。In a multi-tenant environment multiple tenants are working on a common, shared infrastructure. 因此,如果不能信任所有租户,则需要进行额外的规划,以避免一个租户影响另一个租户的安全和服务。As a result if all tenants cannot be trusted, you need to do additional planning to avoid one tenant impacting the security and service of another. 增加面向节点的安全功能(如 Pod 安全策略或更细粒度的基于角色的访问控制 (RBAC))可增加攻击的难度。Additional security features such as Pod Security Policy and more fine-grained role-based access controls (RBAC) for nodes make exploits more difficult. 但是,为了在运行恶意多租户工作负荷时获得真正的安全性,虚拟机监控程序应是你唯一信任的安全级别。However, for true security when running hostile multi-tenant workloads, a hypervisor is the only level of security that you should trust. Kubernetes 的安全域成为整个群集,而不是单个节点。The security domain for Kubernetes becomes the entire cluster, not an individual node. 对于这些类型的恶意多租户工作负荷,应使用物理隔离的群集。For these types of hostile multi-tenant workloads, you should use physically isolated clusters.

物理隔离群集Physically isolate clusters

最佳做法指导 - 对于每个独立的团队或应用程序部署,尽量减少使用物理隔离。Best practice guidance - Minimize the use of physical isolation for each separate team or application deployment. 应改用上一部分所述的逻辑隔离。Instead, use logical isolation, as discussed in the previous section.

群集隔离的常用方法是使用物理上独立的 AKS 群集。A common approach to cluster isolation is to use physically separate AKS clusters. 在此隔离模型中,将为团队或工作负荷分配其自身的 AKS 群集。In this isolation model, teams or workloads are assigned their own AKS cluster. 通常,这种方法看上去是最简单的隔离工作负荷或团队的方法,但会带来额外的管理和财务开销。This approach often looks like the easiest way to isolate workloads or teams, but adds additional management and financial overhead. 现在必须维护多个群集,并且必须单独提供访问权限和分配权限。You now have to maintain these multiple clusters, and have to individually provide access and assign permissions. 此外,需要为每个节点付费。You're also billed for all the individual nodes.

AKS 中各个 Kubernetes 群集的物理隔离

物理上独立的群集的 pod 密度通常较低。Physically separate clusters usually have a low pod density. 由于每个团队或工作负荷具有自身的 AKS 群集,因此往往会为群集过度预配计算资源。As each team or workload has their own AKS cluster, the cluster is often over-provisioned with compute resources. 通常在这些节点上计划少量的 pod。Often, a small number of pods are scheduled on those nodes. 节点上未使用的容量不可由其他团队用于开发中的应用程序或服务。Unused capacity on the nodes can't be used for applications or services in development by other teams. 这些超额的资源会导致物理独立群集的成本增加。These excess resources contribute to the additional costs in physically separate clusters.

后续步骤Next steps

本文重点介绍了群集隔离。This article focused on cluster isolation. 有关 AKS 中的群集操作的详细信息,请参阅以下最佳做法:For more information about cluster operations in AKS, see the following best practices: