有关 Azure Kubernetes 服务 (AKS) 中的群集安全性和升级的最佳做法Best practices for cluster security and upgrades in Azure Kubernetes Service (AKS)

在 Azure Kubernetes 服务 (AKS) 中管理群集时,关键是要确保工作负荷和数据的安全性。As you manage clusters in Azure Kubernetes Service (AKS), the security of your workloads and data is a key consideration. 特别是在使用逻辑隔离运行多租户群集时,需要保护对资源和工作负荷的访问。Especially when you run multi-tenant clusters using logical isolation, you need to secure access to resources and workloads. 为了尽量减少攻击风险,还需确保应用最新的 Kubernetes 和节点 OS 安全更新。To minimize the risk of attack, you also need to make sure you apply the latest Kubernetes and node OS security updates.

本文重点介绍如何保护 AKS 群集。This article focuses on how to secure your AKS cluster. 你将学习如何执行以下操作:You learn how to:

  • 使用 Azure Active Directory 和基于角色的访问控制 (RBAC) 来保护 API 服务器访问Use Azure Active Directory and role-based access control (RBAC) to secure API server access
  • 保护容器对节点资源的访问Secure container access to node resources
  • 将 AKS 群集升级到最新的 Kubernetes 版本Upgrade an AKS cluster to the latest Kubernetes version
  • 使节点保持最新状态并自动应用安全修补程序Keep nodes up to date and automatically apply security patches

还可以阅读有关容器映像管理Pod 安全性的最佳做法。You can also read the best practices for container image management and for pod security.

还可使用 Azure Kubernetes 服务与安全中心的集成,帮助检测威胁和查看关于保护 AKS 群集的建议。You can also use Azure Kubernetes Services integration with Security Center to help detect threats and view recommendations for securing your AKS clusters.

保护对 API 服务器和群集节点的访问Secure access to the API server and cluster nodes

最佳做法指南 - 若要保护群集,保护对 Kubernetes API 服务器的访问是你能够做的最重要的事情之一。Best practice guidance - Securing access to the Kubernetes API-Server is one of the most important things you can do to secure your cluster. 将 Kubernetes 基于角色的访问控制 (RBAC) 与 Azure Active Directory 集成,以控制对 API 服务器的访问。Integrate Kubernetes role-based access control (RBAC) with Azure Active Directory to control access to the API server. 借助这些控制,可以像保护对 Azure 订阅的访问一样保护 AKS。These controls let you secure AKS the same way that you secure access to your Azure subscriptions.

Kubernetes API 服务器为在群集中执行操作的请求提供单一连接点。The Kubernetes API server provides a single connection point for requests to perform actions within a cluster. 若要保护和审核对 API 服务器的访问,请限制访问权限并提供所需的最低特权访问权限。To secure and audit access to the API server, limit access and provide the least privileged access permissions required. 这种方法并不是 Kubernetes 独有的,但它在将 AKS 群集进行逻辑隔离以供多租户使用时特别重要。This approach isn't unique to Kubernetes, but is especially important when the AKS cluster is logically isolated for multi-tenant use.

Azure Active Directory (AD) 提供可与 AKS 群集集成的企业级标识管理解决方案。Azure Active Directory (AD) provides an enterprise-ready identity management solution that integrates with AKS clusters. 由于 Kubernetes 不提供标识管理解决方案,因此,若不与 Azure AD 集成,可能很难以精细的方式限制对 API 服务器的访问。As Kubernetes doesn't provide an identity management solution, it can otherwise be hard to provide a granular way to restrict access to the API server. 借助 AKS 中与 Azure AD 集成的群集,可以使用现有用户和组帐户向 API 服务器验证用户身份。With Azure AD-integrated clusters in AKS, you use your existing user and group accounts to authenticate users to the API server.

用于 AKS 群集的 Azure Active Directory 集成

通过使用 Kubernetes RBAC 和 Azure AD 集成,可保护 API 服务器并提供限定范围的资源集(例如单个命名空间)所需的最少权限。Use Kubernetes RBAC and Azure AD-integration to secure the API server and provide the least number of permissions required to a scoped set of resources, such as a single namespace. 可以向 Azure AD 中不同的用户或组授予不同的 RBAC 角色。Different users or groups in Azure AD can be granted different RBAC roles. 借助这些细化的权限,可以限制对 API 服务器的访问,并提供已执行操作的清晰审核线索。These granular permissions let you restrict access to the API server, and provide a clear audit trail of actions performed.

建议的最佳做法是使用组(而非单个标识)提供对文件和文件夹的访问,使用 Azure AD 成员身份将用户绑定到 RBAC 角色,而不是将单个 用户 绑定到 RBAC 角色。The recommended best practice is to use groups to provide access to files and folders versus individual identities, use Azure AD group membership to bind users to RBAC roles rather than individual users . 当某个用户的组成员身份发生变化时,该用户对 AKS 群集的访问权限也会相应发生变化。As a user's group membership changes, their access permissions on the AKS cluster would change accordingly. 如果将该用户直接绑定到某个角色,则其工作职能可能会发生变化。If you bind the user directly to a role, their job function may change. Azure AD 组成员身份会更新,但这一更新不会反映在对 AKS 群集的权限上。The Azure AD group memberships would update, but permissions on the AKS cluster would not reflect that. 在这种情况下,该用户最终被授予的权限将超过一个用户所需的权限。In this scenario, the user ends up being granted more permissions than a user requires.

有关 Azure AD 集成和 RBAC 的详细信息,请参阅有关 AKS 中身份验证和授权的最佳做法For more information about Azure AD integration and RBAC, see Best practices for authentication and authorization in AKS.

保护容器对资源的访问Secure container access to resources

最佳做法指南 - 限制对容器可以执行的操作的访问。Best practice guidance - Limit access to actions that containers can perform. 提供最少的权限,并避免使用 root/特权提升。Provide the least number of permissions, and avoid the use of root / privileged escalation.

与应该向用户或组授予所需最少权限的方式一样,也应将容器限制为只能访问它们所需的操作和进程。In the same way that you should grant users or groups the least number of privileges required, containers should also be limited to only the actions and processes that they need. 为了尽量减少攻击风险,请勿配置需要提升的权限或 root 访问权限的应用程序和容器。To minimize the risk of attack, don't configure applications and containers that require escalated privileges or root access. 例如,在 Pod 清单中设置 allowPrivilegeEscalation: falseFor example, set allowPrivilegeEscalation: false in the pod manifest. 这些 Pod 安全性上下文 内置于 Kubernetes 中,可用于定义其他权限(例如要以其身份运行的用户或组)或者要公开的 Linux 功能。These pod security contexts are built in to Kubernetes and let you define additional permissions such as the user or group to run as, or what Linux capabilities to expose. 有关更多最佳做法,请参阅保护 Pod 对资源的访问For more best practices, see Secure pod access to resources.

若要更精确地控制容器操作,还可以使用内置 Linux 安全功能,例如 AppArmorseccompFor more granular control of container actions, you can also use built-in Linux security features such as AppArmor and seccomp . 这些功能在节点级别定义,然后通过 Pod 清单实现。These features are defined at the node level, and then implemented through a pod manifest. 内置的 Linux 安全功能仅在 Linux 节点和 Pod 上提供。Built-in Linux security features are only available on Linux nodes and pods.

备注

AKS 或其他位置中的 Kubernetes 环境并不完全安全,因为可能存在恶意的多租户使用情况。Kubernetes environments, in AKS or elsewhere, aren't completely safe for hostile multi-tenant usage. 用于节点的其他安全功能(如 AppArmor、seccomp、Pod 安全策略或更细粒度的基于角色的访问控制 (RBAC))可增加攻击的难度。Additional security features such as AppArmor , seccomp , Pod Security Policies , or more fine-grained role-based access control (RBAC) for nodes make exploits more difficult. 但是,为了在运行恶意多租户工作负荷时获得真正的安全性,虚拟机监控程序应是你唯一信任的安全级别。However, for true security when running hostile multi-tenant workloads, a hypervisor is the only level of security that you should trust. Kubernetes 的安全域成为整个群集,而不是单个节点。The security domain for Kubernetes becomes the entire cluster, not an individual node. 对于这些类型的恶意多租户工作负荷,应使用物理隔离的群集。For these types of hostile multi-tenant workloads, you should use physically isolated clusters.

App ArmorApp Armor

若要限制容器可以执行的操作,可以使用 AppArmor Linux 内核安全模块。To limit the actions that containers can perform, you can use the AppArmor Linux kernel security module. AppArmor 作为基础 AKS 节点 OS 的一部分提供,默认情况下处于启用状态。AppArmor is available as part of the underlying AKS node OS, and is enabled by default. 可以创建 AppArmor 配置文件来限制读取、写入或执行等操作或者装载文件系统等系统功能。You create AppArmor profiles that restrict actions such as read, write, or execute, or system functions such as mounting filesystems. 默认 AppArmor 配置文件限制对各种 /proc/sys 位置的访问,并提供一种在逻辑上将容器与基础节点隔离的方法。Default AppArmor profiles restrict access to various /proc and /sys locations, and provide a means to logically isolate containers from the underlying node. AppArmor 适用于 Linux 上运行的任何应用程序,而不仅仅是 Kubernetes Pod。AppArmor works for any application that runs on Linux, not just Kubernetes pods.

AKS 群集中用来限制容器操作的 AppArmor 配置文件

为了通过实际操作了解 AppArmor,以下示例将创建一个阻止写入文件的配置文件。To see AppArmor in action, the following example creates a profile that prevents writing to files. 通过 SSH 连接到 AKS 节点,然后创建一个名为 deny-write.profile 的文件并粘贴以下内容:SSH to an AKS node, then create a file named deny-write.profile and paste the following content:

#include <tunables/global>
profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {
  #include <abstractions/base>

  file,
  # Deny all file writes.
  deny /** w,
}

使用 apparmor_parser 命令添加 AppArmor 配置文件。AppArmor profiles are added using the apparmor_parser command. 将配置文件添加到 AppArmor 并指定在上一步中创建的配置文件的名称:Add the profile to AppArmor and specify the name of the profile created in the previous step:

sudo apparmor_parser deny-write.profile

如果正确分析配置文件并将其应用于 AppArmor,则不会返回任何输出。There's no output returned if the profile is correctly parsed and applied to AppArmor. 你将返回到命令提示符处。You're returned to the command prompt.

在本地计算机上,现在创建一个名为 aks-apparmor.yaml 的 Pod 清单并粘贴以下内容。From your local machine, now create a pod manifest named aks-apparmor.yaml and paste the following content. 此清单用于为 container.apparmor.security.beta.kubernetes 定义注释并引用在前面的步骤中创建的 deny-write 配置文件:This manifest defines an annotation for container.apparmor.security.beta.kubernetes add references the deny-write profile created in the previous steps:

apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor
  annotations:
    container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write
spec:
  containers:
  - name: hello
    image: busybox
    command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]

使用 kubectl apply 命令部署示例 Pod:Deploy the sample pod using the kubectl apply command:

kubectl apply -f aks-apparmor.yaml

部署 Pod 后,使用 kubectl exec 命令写入到文件中。With the pod deployed, use the kubectl exec command to write to a file. 系统无法执行该命令,如以下示例输出所示:The command can't be executed, as shown in the following example output:

$ kubectl exec hello-apparmor touch /tmp/test

touch: /tmp/test: Permission denied
command terminated with exit code 1

有关 AppArmor 的详细信息,请参阅 Kubernetes 中的 AppArmor 配置文件For more information about AppArmor, see AppArmor profiles in Kubernetes.

安全计算Secure computing

AppArmor 适用于任何 Linux 应用程序, seccomp( 算)则在进程级别运行。While AppArmor works for any Linux application, seccomp ( sec ure comp uting) works at the process level. Seccomp 也是一个 Linux 内核安全模块,并由 AKS 节点所用的 Docker 运行时提供本机支持。Seccomp is also a Linux kernel security module, and is natively supported by the Docker runtime used by AKS nodes. Seccomp 可限制容器可以执行的进程调用。With seccomp, the process calls that containers can perform are limited. 你可以创建筛选器来定义要允许或拒绝的操作,然后使用 Pod YAML 清单中的注释与 seccomp 筛选器进行关联。You create filters that define what actions to allow or deny, and then use annotations within a pod YAML manifest to associate with the seccomp filter. 这符合仅授予容器运行所需的最少权限(不授予更多权限)的最佳做法。This aligns to the best practice of only granting the container the minimal permissions that are needed to run, and no more.

为了通过实际操作了解 seccomp,请创建一个筛选器来阻止更改文件权限。To see seccomp in action, create a filter that prevents changing permissions on a file. 通过 SSH 连接到 AKS 节点,然后创建一个名为 /var/lib/kubelet/seccomp/prevent-chmod 的 seccomp 筛选器,并粘贴以下内容:SSH to an AKS node, then create a seccomp filter named /var/lib/kubelet/seccomp/prevent-chmod and paste the following content:

{
  "defaultAction": "SCMP_ACT_ALLOW",
  "syscalls": [
    {
      "name": "chmod",
      "action": "SCMP_ACT_ERRNO"
    }
  ]
}

在本地计算机上,现在创建一个名为 aks-seccomp.yaml 的 Pod 清单并粘贴以下内容。From your local machine, now create a pod manifest named aks-seccomp.yaml and paste the following content. 此清单用于为 seccomp.security.alpha.kubernetes.io 定义注释并引用在上一步中创建的 prevent-chmod 筛选器:This manifest defines an annotation for seccomp.security.alpha.kubernetes.io and references the prevent-chmod filter created in the previous step:

apiVersion: v1
kind: Pod
metadata:
  name: chmod-prevented
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: localhost/prevent-chmod
spec:
  containers:
  - name: chmod
    image: busybox
    command:
      - "chmod"
    args:
     - "777"
     - /etc/hostname
  restartPolicy: Never

使用 kubectl apply 命令部署示例 Pod:Deploy the sample pod using the kubectl apply command:

kubectl apply -f ./aks-seccomp.yaml

使用 kubectl get pods 命令查看 Pod 的状态。View the status of the pods using the kubectl get pods command. 该 Pod 报告一个错误。The pod reports an error. Seccomp 筛选器阻止 chmod 命令运行,如以下示例输出所示:The chmod command is prevented from running by the seccomp filter, as shown in the following example output:

$ kubectl get pods

NAME                      READY     STATUS    RESTARTS   AGE
chmod-prevented           0/1       Error     0          7s

有关可用筛选器的详细信息,请参阅 Seccomp security profiles for Docker(Docker 的 Seccomp 安全配置文件)。For more information about available filters, see Seccomp security profiles for Docker.

定期更新到最新的 Kubernetes 版本Regularly update to the latest version of Kubernetes

最佳做法指南 - 若要及时了解新功能和 bug 修复,请定期升级 AKS 群集中的 Kubernetes 版本。Best practice guidance - To stay current on new features and bug fixes, regularly upgrade the Kubernetes version in your AKS cluster.

与更传统的基础结构平台相比,Kubernetes 发布新功能的速度更快。Kubernetes releases new features at a quicker pace than more traditional infrastructure platforms. Kubernetes 更新包括新功能和 bug 或安全修补程序。Kubernetes updates include new features, and bug or security fixes. 新功能通常会在经历 alphabeta 状态后变得 稳定 ,这时便可公开发布,并建议用于生产环境中。New features typically move through an alpha and then beta status before they become stable and are generally available and recommended for production use. 在此发布周期内,可对 Kubernetes 进行更新,而不会经常遇到中断性变更,也无需调整部署和模板。This release cycle should allow you to update Kubernetes without regularly encountering breaking changes or adjusting your deployments and templates.

AKS 支持三个 Kubernetes 次要版本。AKS supports three minor versions of Kubernetes. 这意味着,在引入新的次要修补程序版本后,将停止对最早次要版本和修补程序版本的支持。This means that when a new minor patch version is introduced, the oldest minor version and patch releases supported are retired. 系统会定期更新 Kubernetes 的次要版本。Minor updates to Kubernetes happen on a periodic basis. 请确保设置一个根据需要进行检查和升级的管理流程,以免失去支持。Make sure that you have a governance process to check and upgrade as needed so you don't fall out of support. 有关详细信息,请参阅支持的 Kubernetes 版本 AKSFor more information, see Supported Kubernetes versions AKS.

若要检查可用于群集的版本,请使用 az aks get-upgrades 命令,如以下示例所示:To check the versions that are available for your cluster, use the az aks get-upgrades command as shown in the following example:

az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster

然后,可以使用 az aks upgrade 命令升级 AKS 群集。You can then upgrade your AKS cluster using the az aks upgrade command. 升级过程会以安全的方式逐一封锁并清空节点,在剩余的节点上计划 Pod,然后部署一个运行最新 OS 和 Kubernetes 版本的新节点。The upgrade process safely cordons and drains one node at a time, schedules pods on remaining nodes, and then deploys a new node running the latest OS and Kubernetes versions.

强烈建议在开发测试环境中测试新的次要版本,以便可以使用新的 Kubernetes 版本验证工作负载是否能继续正常运行。It is highly recommended to test new minor versions in a dev test environment so you can validate your workload continues healthy operation with the new Kubernetes version. Kubernetes 可能会弃用 API(例如版本 1.16),而你的工作负载可能会依赖这些 API。Kubernetes may deprecate APIs, such as in version 1.16, which could be relied on by your workloads. 将新版本投入生产时,请考虑在单独的版本上使用多个节点池,并一次升级一个池,从而循序渐进地在整个群集中滚动更新。When bringing new versions into production, consider using multiple node pools on separate versions and upgrade individual pools one at a time to progressively roll the update across a cluster. 如果运行多个群集,则每次升级一个群集,从而循序渐进地监视影响或更改。If running multiple clusters, upgrade one cluster at a time to progressively monitor for impact or changes.

az aks upgrade --resource-group myResourceGroup --name myAKSCluster --kubernetes-version KUBERNETES_VERSION

有关 AKS 中的升级的详细信息,请参阅 AKS 中支持的 Kubernetes 版本升级 AKS 群集For more information about upgrades in AKS, see Supported Kubernetes versions in AKS and Upgrade an AKS cluster.

使用 kured 处理 Linux 节点更新和重启Process Linux node updates and reboots using kured

最佳做法指南 - AKS 会自动在每个 Linux 节点上下载并安装安全修补程序,但不会在必要时自动重启。Best practice guidance - AKS automatically downloads and installs security fixes on each Linux nodes, but does not automatically reboot if necessary. 使用 kured 监视挂起的重启操作,然后安全地封锁并排空节点以允许节点重启,应用更新并尽可能安全地保护 OS。Use kured to watch for pending reboots, then safely cordon and drain the node to allow the node to reboot, apply the updates and be as secure as possible with respect to the OS. 对于 Windows Server 节点,定期执行 AKS 升级操作,以安全隔离和清空 Pod 并部署更新的节点。For Windows Server nodes, regularly perform an AKS upgrade operation to safely cordon and drain pods and deploy updated nodes.

每天晚上,AKS 中的 Linux 节点都会通过其发行版更新通道获得安全修补程序。Each evening, Linux nodes in AKS get security patches available through their distro update channel. 当在 AKS 群集中部署节点时,会​​自动配置此行为。This behavior is configured automatically as the nodes are deployed in an AKS cluster. 为了尽量减少对正在运行的工作负荷的中断和潜在影响,AKS 不会在安全修补程序或内核更新需要进行重启时自动重启节点。To minimize disruption and potential impact to running workloads, nodes are not automatically rebooted if a security patch or kernel update requires it.

Weaveworks 的 kured(KUbernetes 重启守护程序)开源项目可监视挂起的节点重启操作。The open-source kured (KUbernetes REboot Daemon) project by Weaveworks watches for pending node reboots. 当 Linux 节点应用需要进行重启的更新时,系统会安全地封锁并排空该节点,以便将 Pod 移至群集中的其他节点上并在这些节点上计划 Pod。When a Linux node applies updates that require a reboot, the node is safely cordoned and drained to move and schedule the pods on other nodes in the cluster. 重启节点后,会将其重新添加到群集中,Kubernetes 将继续在该节点上计划 Pod。Once the node is rebooted, it is added back into the cluster and Kubernetes resumes scheduling pods on it. 为了尽量减少中断,kured 一次只允许重启一个节点。To minimize disruption, only one node at a time is permitted to be rebooted by kured.

使用 kured 的 AKS 节点重启过程

如果希望以更精细的粒度控制何时进行重启,kured 可以与 Prometheus 集成,以防止在出现其他维护事件或群集问题时进行重启。If you want finer grain control over when reboots happen, kured can integrate with Prometheus to prevent reboots if there are other maintenance events or cluster issues in progress. 此集成可在你主动排查其他问题时,最大限度地减少因重启节点而导致的其他复杂情况。This integration minimizes additional complications by rebooting nodes while you are actively troubleshooting other issues.

有关如何处理节点重启的详细信息,请参阅将安全更新和内核更新应用于 AKS 中的节点For more information about how to handle node reboots, see Apply security and kernel updates to nodes in AKS.

后续步骤Next steps

本文重点介绍了如何保护 AKS 群集。This article focused on how to secure your AKS cluster. 若要实施其中某些做法,请参阅以下文章:To implement some of these areas, see the following articles: