将安全更新和内核更新应用于 Azure Kubernetes 服务 (AKS) 中的 Linux 节点Apply security and kernel updates to Linux nodes in Azure Kubernetes Service (AKS)

为保护群集,安全更新会自动应用于 AKS 中的 Linux 节点。To protect your clusters, security updates are automatically applied to Linux nodes in AKS. 这些更新包括 OS 安全修复项或内核更新。These updates include OS security fixes or kernel updates. 其中的部分更新需要重启节点才能完成更新进程。Some of these updates require a node reboot to complete the process. AKS 不会自动重启这些 Linux 节点以完成更新进程。AKS doesn't automatically reboot these Linux nodes to complete the update process.

用来保持 Windows Server 节点处于最新状态的过程稍有不同。The process to keep Windows Server nodes up to date is a little different. Windows Server 节点不接收每日更新,Windows Server nodes don't receive daily updates. 而是需要你执行 AKS 升级,该升级使用最新的基础 Window Server 映像和补丁来部署新节点。Instead, you perform an AKS upgrade that deploys new nodes with the latest base Window Server image and patches. 对于使用 Windows Server 节点的 AKS 群集,请参阅升级 AKS 中的节点池For AKS clusters that use Windows Server nodes, see Upgrade a node pool in AKS.

本文介绍了如何使用开源 kured (KUbernetes REboot Daemon) 来查看需要重启的 Linux 节点,然后自动重新调度运行中的 Pod 并处理节点重启进程。This article shows you how to use the open-source kured (KUbernetes REboot Daemon) to watch for Linux nodes that require a reboot, then automatically handle the rescheduling of running pods and node reboot process.

备注

Kured 是 Weaveworks 提供的一个开源项目。Kured is an open-source project by Weaveworks. 我们尽可能地在 AKS 中提供对该项目的支持。Support for this project in AKS is provided on a best-effort basis. 在 #weave-community Slack 通道中可找到其他支持。Additional support can be found in the #weave-community Slack channel.

准备阶段Before you begin

本文假定你拥有现有的 AKS 群集。This article assumes that you have an existing AKS cluster. 如果需要 AKS 群集,请参阅 AKS 快速入门使用 Azure CLI使用 Azure 门户If you need an AKS cluster, see the AKS quickstart using the Azure CLI or using the Azure portal.

还需安装并配置 Azure CLI 2.0.59 或更高版本。You also need the Azure CLI version 2.0.59 or later installed and configured. 运行  az --version 即可查找版本。Run az --version to find the version. 如果需要进行安装或升级,请参阅 安装 Azure CLIIf you need to install or upgrade, see Install Azure CLI.

了解 AKS 节点更新体验Understand the AKS node update experience

在 AKS 群集中,Kubernetes 节点作为 Azure 虚拟机 (VM) 运行。In an AKS cluster, your Kubernetes nodes run as Azure virtual machines (VMs). 这些基于 Linux 的虚拟机使用 Ubuntu 映像,其 OS 配置为每晚自动检查更新。These Linux-based VMs use an Ubuntu image, with the OS configured to automatically check for updates every night. 如果有可用的安全更新或内核更新,则会自动下载并进行安装。If security or kernel updates are available, they are automatically downloaded and installed.

使用 kured 进行的 AKS 节点更新和重启进程

部分安全更新(如内核更新)需要重启节点才能完成更新进程。Some security updates, such as kernel updates, require a node reboot to finalize the process. 需要重启的 Linux 节点会创建名为 /var/run/reboot-required 的文件。A Linux node that requires a reboot creates a file named /var/run/reboot-required. 此重启进程不会自动进行。This reboot process doesn't happen automatically.

你可以使用自己的工作流和进程来重启节点,或使用 kured 安排该进程。You can use your own workflows and processes to handle node reboots, or use kured to orchestrate the process. 使用 kured,可以部署在群集每个 Linux 节点上运行 Pod 的 DaemonSetWith kured, a DaemonSet is deployed that runs a pod on each Linux node in the cluster. DaemonSet 中的这些 pod 会监视是否存在 /var/run/reboot-required 文件,然后启动重启节点的进程。These pods in the DaemonSet watch for existence of the /var/run/reboot-required file, and then initiate a process to reboot the nodes.

节点升级Node upgrades

AKS 中还有额外的进程,可通过该进程升级群集。There is an additional process in AKS that lets you upgrade a cluster. 升级通常是指移动到 Kubernetes 的较新版本,而不仅是应用节点安全更新。An upgrade is typically to move to a newer version of Kubernetes, not just apply node security updates. AKS 升级执行以下操作:An AKS upgrade performs the following actions:

  • 部署新节点,应用最新的安全更新和 Kubernetes 版本。A new node is deployed with the latest security updates and Kubernetes version applied.
  • 封锁并排除旧节点。An old node is cordoned and drained.
  • 将 Pod 调度到新节点上。Pods are scheduled on the new node.
  • 删除旧节点。The old node is deleted.

在升级事件期间,无法继续使用同一 Kubernetes 版本。You can't remain on the same Kubernetes version during an upgrade event. 必须指定 Kubernetes 的较新版本。You must specify a newer version of Kubernetes. 若要升级到 Kubernetes 的最新版本,可以升级 AKS 群集To upgrade to the latest version of Kubernetes, you can upgrade your AKS cluster.

在 AKS 群集中部署 kuredDeploy kured in an AKS cluster

若要部署 kured DaemonSet,请安装以下正式的 Kured Helm 图表。To deploy the kured DaemonSet, install the following official Kured Helm chart. 这将创建角色和群集角色、绑定以及服务帐户,然后使用 kured 部署 DaemonSet。This creates a role and cluster role, bindings, and a service account, then deploys the DaemonSet using kured.

# Add the Kured Helm repository
helm repo add kured https://weaveworks.github.io/kured

# Update your local Helm chart repository cache
helm repo update

# Create a dedicated namespace where you would like to deploy kured into
kubectl create namespace kured

# Install kured in that namespace with Helm 3 (only on Linux nodes, kured is not working on Windows nodes)
helm install kured kured/kured --namespace kured --set nodeSelector."beta\.kubernetes\.io/os"=linux

也可以为 kured 配置其他参数,例如与 Prometheus 或 Slack 集成。You can also configure additional parameters for kured, such as integration with Prometheus or Slack. 有关其他配置参数的详细信息,请参阅 kured Helm 图表For more information about additional configuration parameters, see the kured Helm chart.

更新群集节点Update cluster nodes

默认情况下,AKS 中的 Linux 节点会每晚检查更新。By default, Linux nodes in AKS check for updates every evening. 如果不想等待,可以手动执行更新以检查 kured 是否正常运行。If you don't want to wait, you can manually perform an update to check that kured runs correctly. 首先,按照步骤与任意 AKS 节点建立 SSH 连接First, follow the steps to SSH to one of your AKS nodes. 与 Linux 节点建立 SSH 连接后,检查更新并按如下方式应用更新:Once you have an SSH connection to the Linux node, check for updates and apply them as follows:

sudo apt-get update && sudo apt-get upgrade -y

如果所应用的更新需要重启节点,则会将文件写入 /var/run/reboot-required。If updates were applied that require a node reboot, a file is written to /var/run/reboot-required. 默认情况下,Kured 每 60 分钟检查一次需要重启的节点。Kured checks for nodes that require a reboot every 60 minutes by default.

监视和查看重启进程Monitor and review reboot process

如果 DaemonSet 中的某个副本检测到某个节点需要重启,系统将通过 Kubernetes API 对该节点进行锁定。When one of the replicas in the DaemonSet has detected that a node reboot is required, a lock is placed on the node through the Kubernetes API. 该锁定可防止将其他 pod 调度到此节点上。This lock prevents additional pods being scheduled on the node. 该锁定还指示一次只应重启一个节点。The lock also indicates that only one node should be rebooted at a time. 封锁节点后,将从节点中排除运行中的 pod,并重启节点。With the node cordoned off, running pods are drained from the node, and the node is rebooted.

可使用 kubectl get nodes 命令监视节点的状态。You can monitor the status of the nodes using the kubectl get nodes command. 以下示例输出显示了一个在节点准备进行重启时状态为 SchedulingDisabled 的节点:The following example output shows a node with a status of SchedulingDisabled as the node prepares for the reboot process:

NAME                       STATUS                     ROLES     AGE       VERSION
aks-nodepool1-28993262-0   Ready,SchedulingDisabled   agent     1h        v1.11.7

更新过程完成后,可使用带有 --output wide 参数的 kubectl get nodes 命令查看节点的状态。Once the update process is complete, you can view the status of the nodes using the kubectl get nodes command with the --output wide parameter. 通过此附加输出,可发现基础节点的 KERNEL-VERSION 会有所差异,如以下示例输出所示。This additional output lets you see a difference in KERNEL-VERSION of the underlying nodes, as shown in the following example output. 在上一步中已更新 aks-nodepool1-28993262-0,并显示内核版本为 4.15.0-1039-azureThe aks-nodepool1-28993262-0 was updated in a previous step and shows kernel version 4.15.0-1039-azure. 尚未更新的 aks-nodepool1-28993262-1 节点显示的内核版本为 4.15.0-1037-azureThe node aks-nodepool1-28993262-1 that hasn't been updated shows kernel version 4.15.0-1037-azure.

NAME                       STATUS    ROLES     AGE       VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-nodepool1-28993262-0   Ready     agent     1h        v1.11.7   10.240.0.4    <none>        Ubuntu 16.04.6 LTS   4.15.0-1039-azure   docker://3.0.4
aks-nodepool1-28993262-1   Ready     agent     1h        v1.11.7   10.240.0.5    <none>        Ubuntu 16.04.6 LTS   4.15.0-1037-azure   docker://3.0.4

后续步骤Next steps

本文详细介绍了如何在安全更新进程中使用 kured 自动重启 Linux 节点。This article detailed how to use kured to reboot Linux nodes automatically as part of the security update process. 若要升级到 Kubernetes 的最新版本,可以升级 AKS 群集To upgrade to the latest version of Kubernetes, you can upgrade your AKS cluster.

对于使用 Windows Server 节点的 AKS 群集,请参阅升级 AKS 中的节点池For AKS clusters that use Windows Server nodes, see Upgrade a node pool in AKS.