Azure Kubernetes 服务的支持策略Support policies for Azure Kubernetes Service

本文提供有关 Azure Kubernetes 服务 (AKS) 的技术支持策略和限制的详细信息。This article provides details about technical support policies and limitations for Azure Kubernetes Service (AKS). 本文还详细介绍了代理节点管理、托管控制平面组件、第三方开源组件,以及安全性或补丁管理。The article also details agent node management, managed control plane components, third-party open-source components, and security or patch management.

服务更新和版本Service updates and releases

AKS 中的托管功能Managed features in AKS

通过基本的基础结构即服务 (IaaS) 云组件(例如计算或网络组件),可以访问低级别的控件机制和自定义选项。Base infrastructure as a service (IaaS) cloud components, such as compute or networking components, allow you access to low-level controls and customization options. 相比之下,AKS 提供统包式的 Kubernetes 部署,为你的群集提供了一组所需的通用配置和功能。By contrast, AKS provides a turnkey Kubernetes deployment that gives you the common set of configurations and capabilities you need for your cluster. 作为 AKS 用户,你可选的自定义和部署选项并不完整。As an AKS user, you have limited customization and deployment options. 但在这种情况下,你无需考虑或无需直接管理 Kubernetes 群集。In exchange, you don't need to worry about or manage Kubernetes clusters directly.

借助 AKS,可以获取一个完全托管的控制平面。With AKS, you get a fully managed control plane. 该控制平面包含执行操作并向最终用户提供 Kubernetes 群集所需的所有组件和服务。The control plane contains all of the components and services you need to operate and provide Kubernetes clusters to end users. 所有 Kubernetes 组件都由 Azure 维护和运营。All Kubernetes components are maintained and operated by Azure.

Azure 通过控制平面管理和监视以下组件:Azure manages and monitors the following components through the control pane:

  • Kubelet 或 Kubernetes API 服务器Kubelet or Kubernetes API servers
  • Etcd 或兼容的键-值存储,提供服务质量 (QoS)、可伸缩性和运行时Etcd or a compatible key-value store, providing Quality of Service (QoS), scalability, and runtime
  • DNS 服务(例如 kube-dns 或 CoreDNS)DNS services (for example, kube-dns or CoreDNS)
  • Kubernetes 代理或网络Kubernetes proxy or networking
  • 在 kube-system 命名空间中运行的任何其他加载项或系统组件Any additional addon or system component running in the kube-system namespace

AKS 不是平台即服务 (PaaS) 解决方案。AKS isn't a Platform-as-a-Service (PaaS) solution. 某些组件(例如代理节点)实行分担责任制,用户需要帮助维护 AKS 群集。Some components, such as agent nodes, have shared responsibility, where users must help maintain the AKS cluster. 例如,必须提供用户输入才能应用代理节点操作系统 (OS) 安全补丁。User input is required, for example, to apply an agent node operating system (OS) security patch.

服务是托管型的,Azure 和 AKS 团队将部署、操作并负责服务的可用性和功能。The services are managed in the sense that Azure and the AKS team deploys, operates, and is responsible for service availability and functionality. 客户无法改动这些托管组件。Customers can't alter these managed components. 为确保一致且可缩放的用户体验,Azure 将限制自定义。Azure limits customization to ensure a consistent and scalable user experience. 有关完全可自定义的解决方案,请参阅 AKS 引擎For a fully customizable solution, see AKS Engine.

共担责任Shared responsibility

创建群集时,你将定义 AKS 创建的 Kubernetes 代理节点。When a cluster is created, you define the Kubernetes agent nodes that AKS creates. 你的工作负载将在这些节点上执行。Your workloads are executed on these nodes.

由于代理节点会执行专用代码并存储敏感数据,Azure 支持只能以非常受限的方式访问这些信息。Because your agent nodes execute private code and store sensitive data, Azure Support can access them only in a very limited way. 在未得到你的明确许可或者协助的情况下,Azure 支持不能登录到这些节点、在其中执行命令或查看其日志。Azure Support can't sign in to, execute commands in, or view logs for these nodes without your express permission or assistance.

使用任何 IaaS API 直接对代理节点进行的任何修改都将导致群集不受支持。Any modification done directly to the agent nodes using any of the IaaS APIs renders the cluster unsupportable. 对代理节点进行的任何修改都必须使用本机 Kubernetes 机制(如 Daemon Sets)来完成。Any modification done to the agent nodes must be done using kubernetes-native mechanisms such as Daemon Sets.

同样,虽然你可以将任何元数据添加到群集和节点(如标记和标签),但更改任何系统创建的元数据都将导致群集不受支持。Similarly, while you may add any metadata to the cluster and nodes, such as tags and labels, changing any of the system created metadata will render the cluster unsupported.

AKS 支持范围AKS support coverage

Azure 为以下示例提供技术支持:Azure provides technical support for the following examples:

  • 连接到 Kubernetes 服务提供和支持的所有 Kubernetes 组件,例如 API 服务器。Connectivity to all Kubernetes components that the Kubernetes service provides and supports, such as the API server.
  • Kubernetes 控制平面服务(例如 Kubernetes 控制平面、API 服务器、etcd 和 coreDNS)的管理、运行时间、QoS 和操作。Management, uptime, QoS, and operations of Kubernetes control plane services (Kubernetes control plane, API server, etcd, and coreDNS, for example).
  • Etcd 数据存储。Etcd data store. 支持包括每隔 30 分钟以透明方式自动备份所有 etcd 数据,以实现灾难规划和群集状态还原。Support includes automated, transparent backups of all etcd data every 30 minutes for disaster planning and cluster state restoration. 你或任何用户都不可直接使用这些备份。These backups aren't directly available to you or any users. 这些备份用于确保数据的可靠性和一致性。They ensure data reliability and consistency. Etcd。Etcd. 不支持按需回滚或还原功能。on-demand rollback or restore is not supported as a feature.
  • 适用于 Azure 云提供程序驱动程序中的任何集成点。Any integration points in the Azure cloud provider driver for Kubernetes. 这包括与负载均衡器、永久性卷或网络组件(Kubernetes 和 Azure CNI)等其他 Azure 服务的集成。These include integrations into other Azure services such as load balancers, persistent volumes, or networking (Kubernetes and Azure CNI).
  • 有关控制平面组件(例如 Kubernetes API 服务器、etcd 和 coreDNS)的自定义的问题。Questions or issues about customization of control plane components such as the Kubernetes API server, etcd, and coreDNS.
  • 有关网络组件(例如 Azure CNI、kubenet)的问题,或其他网络访问和功能问题。Issues about networking, such as Azure CNI, kubenet, or other network access and functionality issues. 问题可能包括 DNS 解析、数据包丢失、路由等。Issues could include DNS resolution, packet loss, routing, and so on. Azure 支持各种网络方案:Azure supports various networking scenarios:
    • 使用托管 VNET 或自定义(自带)子网的 Kubenet 和 Azure CNI。Kubenet and Azure CNI using managed VNETs or with custom (bring your own) subnets.
    • 连接到其他 Azure 服务和应用程序Connectivity to other Azure services and applications
    • 入口控制器以及入口或负载均衡器配置Ingress controllers and ingress or load balancer configurations
    • 网络性能和延迟Network performance and latency
    • 网络策略Network policies

备注

Microsoft/AKS 所执行的任何群集操作都是经用户同意,在内置 Kubernetes 角色 aks-service 和内置角色绑定 aks-service-rolebinding 下执行的。Any cluster actions taken by Microsoft/AKS are made with user consent under a built-in Kubernetes role aks-service and built-in role binding aks-service-rolebinding. 此角色允许 AKS 对群集问题进行故障排除和诊断,但不能修改权限,也不能创建角色或角色绑定,或者执行其他高特权操作。This role enables AKS to troubleshoot and diagnose cluster issues, but can't modify permissions nor create roles or role bindings, or other high privilege actions. 仅在具有实时 (JIT) 访问权限的活动支持票证下启用角色访问。Role access is only enabled under active support tickets with just-in-time (JIT) access.

Azure 不为以下示例提供技术支持:Azure doesn't provide technical support for the following examples:

  • 有关 Kubernetes 用法的问题。Questions about how to use Kubernetes. 例如,Azure 支持部门不提供有关如何创建自定义入口控制器、使用应用程序工作负荷,或者应用第三方/开源软件包或工具的建议。For example, Azure Support doesn't provide advice on how to create custom ingress controllers, use application workloads, or apply third-party or open-source software packages or tools.

    备注

    Azure 支持部门可以提供有关 AKS 群集功能、自定义和优化的建议(例如 Kubernetes 操作问题和过程)。Azure Support can advise on AKS cluster functionality, customization, and tuning (for example, Kubernetes operations issues and procedures).

  • 不是作为 Kubernetes 控制平面的一部分提供的,或者不是在 AKS 群集中部署的第三方开源项目。Third-party open-source projects that aren't provided as part of the Kubernetes control plane or deployed with AKS clusters. 这些项目可能包括 Istio、Helm、Envoy 等等。These projects might include Istio, Helm, Envoy, or others.

    备注

    Azure 可以尽最大努力为 Helm 等第三方开源项目提供支持。Azure can provide best-effort support for third-party open-source projects such as Helm. 如果需要将第三方开源工具与 Kubernetes Azure 云提供程序相集成,或者存在其他特定于 AKS 的 bug,则 Azure 可以通过 Azure 文档提供示例和应用程序方面的支持。Where the third-party open-source tool integrates with the Kubernetes Azure cloud provider or other AKS-specific bugs, Azure supports examples and applications from Azure documentation.

  • 第三方闭源软件。Third-party closed-source software. 此类软件可能包括安全扫描工具以及网络设备或软件。This software can include security scanning tools and networking devices or software.
  • AKS 文档中未列出的网络自定义。Network customizations other than the ones listed in the AKS documentation.

针对代理节点的 AKS 支持范围AKS support coverage for agent nodes

Azure 对 AKS 代理节点的责任Azure responsibilities for AKS agent nodes

在以下情况下,由 Azure 和用户共同承担 Kubernetes 代理节点的责任:Azure and users share responsibility for Kubernetes agent nodes where:

  • 基本 OS 映像收到了必需的新增功能(例如监视和网络代理)。The base OS image has required additions (such as monitoring and networking agents).
  • 代理节点自动收到了 OS 补丁。The agent nodes receive OS patches automatically.
  • 在代理节点上运行的 Kubernetes 控制平面组件的问题会自动修复。Issues with the Kubernetes control plane components that run on the agent nodes are automatically remediated. 组件包括以下各项:These components include the below:
    • Kube-proxy
    • 为 Kubernetes 主控组件提供通信路径的网络隧道Networking tunnels that provide communication paths to the Kubernetes master components
    • Kubelet
    • MobyContainerDMoby or ContainerD

备注

如果代理节点不可操作,AKS 可能会重启单个组件或整个代理节点。If an agent node is not operational, AKS might restart individual components or the entire agent node. 这些重启操作会自动执行,并为常见问题提供自动修正。These restart operations are automated and provide auto-remediation for common issues. 如果要了解有关自动修正机制的详细信息,请参阅节点自动修复If you want to know more about the auto-remediation mechanisms, see Node Auto-Repair

客户对 AKS 代理节点的责任Customer responsibilities for AKS agent nodes

Azure 每周为你的映像节点提供补丁和新映像,但默认情况下不会自动对其进行修补。Azure provides patches and new images for your image nodes weekly, but doesn't automatically patch them by default. 要保持对代理节点 OS 和运行时组件进行修补,应定期执行节点映像升级计划或使其自动化。To keep your agent node OS and runtime components patched, you should keep a regular node image upgrade schedule or automate it.

同样,AKS 会定期发布新的 Kubernetes 补丁和次要版本。Similarly, AKS regularly releases new kubernetes patches and minor versions. 这些更新可能包含 Kubernetes 的安全或功能改进。These updates can contain security or functionality improvements to Kubernetes. 你负责根据 AKS Kubernetes 支持版本策略来持续更新集群的 Kubernetes 版本。You're responsible to keep your clusters' kubernetes version updated and according to the AKS Kubernetes Support Version Policy.

代理节点的用户自定义User customization of agent nodes

备注

AKS 代理节点在 Azure 门户中显示为常规 Azure IaaS 资源。AKS agent nodes appear in the Azure portal as regular Azure IaaS resources. 但是这些虚拟机被部署到自定义的 Azure 资源组(通常前缀为 MC_*)中。But these virtual machines are deployed into a custom Azure resource group (usually prefixed with MC_*). 不能更改基础 OS 映像,或使用 IaaS API 或资源对这些节点进行任何直接的自定义。You cannot change the base OS image or do any direct customizations to these nodes using the IaaS APIs or resources. 非通过 AKS API 执行的任何自定义更改都无法在升级、缩放、更新或重启后保留。Any custom changes that are not done via the AKS API will not persist through an upgrade, scale, update or reboot. 除非 Azure 支持指示你进行更改,否则请避免更改代理节点。Avoid performing changes to the agent nodes unless Azure Support directs you to make changes.

AKS 代表你管理代理节点的生命周期和操作 - 不支持 修改与该代理节点关联的 IaaS 资源。AKS manages the lifecycle and operations of agent nodes on your behalf - modifying the IaaS resources associated with the agent nodes is not supported. 不支持的操作的一个示例是通过虚拟机规模集门户或 API 手动更改配置来自定义节点池虚拟机规模集。An example of an unsupported operation is customizing a node pool virtual machine scale set by manually changing configurations through the virtual machine scale set portal or API.

对于工作负载特定的配置或包,AKS 建议使用 Kubernetes daemon setsFor workload-specific configurations or packages, AKS recommends using Kubernetes daemon sets.

使用 Kubernetes 特权 daemon sets 和 init 容器,可以在群集代理节点上优化/修改或安装第三方软件。Using Kubernetes privileged daemon sets and init containers enables you to tune/modify or install 3rd party software on cluster agent nodes. 此类自定义的示例包括添加自定义安全扫描软件或更新 sysctl 设置。Examples of such customizations include adding custom security scanning software or updating sysctl settings.

虽然当满足上述要求时,这是建议的路径,但 AKS 工程和支持人员无法协助排查或诊断导致节点因自定义部署 daemon set 而不可用的修改。While this path is recommended if the above requirements apply, AKS engineering and support cannot assist in troubleshooting or diagnosing modifications that render the node unavailable due to a custom deployed daemon set.

安全问题和修补Security issues and patching

如果在 AKS 的一个或多个托管的组件中发现了安全缺陷,则 AKS 团队将修补所有受影响的群集以缓解此问题。If a security flaw is found in one or more of the managed components of AKS, the AKS team will patch all affected clusters to mitigate the issue. 或者,AKS 团队将为用户提供升级指导。Alternatively, the team will give users upgrade guidance.

对于受安全缺陷影响的代理节点,Azure 将通知你有关影响的详细信息,以及解决或缓解安全问题的步骤(通常是节点映像升级或群集补丁升级)。For agent nodes affected by a security flaw, Azure will notify you with details on the impact and the steps to fix or mitigate the security issue (normally a node image upgrade or a cluster patch upgrade).

节点维护和访问Node maintenance and access

尽管可以登录并更改代理节点,但不建议这样做,因为更改可能会导致群集不受支持。Although you can sign in to and change agent nodes, doing this operation is discouraged because changes can make a cluster unsupportable.

网络端口、访问和 NSGNetwork ports, access, and NSGs

只能在自定义子网中自定义 NSG。You may only customize the NSGs on custom subnets. 不能在托管子网上或代理节点的 NIC 级别上自定义 NSG。You may not customize NSGs on managed subnets or at the NIC level of the agent nodes. AKS 对特定终结点有流出量要求,目的是控制流出量并确保必要的连接性,请参阅限制出口流量AKS has egress requirements to specific endpoints, to control egress and ensure the necessary connectivity, see limit egress traffic.

停止或取消分配的群集Stopped or de-allocated clusters

如前所述,通过 IaaS API/CLI/门户以手动方式将所有群集节点取消分配会导致群集不受支持。As stated earlier, manually de-allocating all cluster nodes via the IaaS APIs/CLI/portal renders the cluster out of support.

在 AKS API 外取消分配的群集不保证状态保留。Clusters that are de-allocated outside of the AKS APIs have no state preservation guarantees. 处于此状态的群集控制平面将在 30 天后存档,并在 12 个月后删除。The control planes for clusters in this state will be archived after 30 days, and deleted after 12 months.

对于已配置了“停止支持”规则以将支持期限延长至等于或超过 30 天的控制平面,AKS 保留了将其存档的权利。AKS reserves the right to archive control planes that have been configured out of support guidelines for extended periods equal to and beyond 30 days. AKS 维护群集 etcd 元数据的备份,并可轻松地重新分配群集。AKS maintains backups of cluster etcd metadata and can readily reallocate the cluster. 此重新分配可以由任何使群集重获支持的 PUT 操作(例如升级或缩放到活动代理节点)启动。This reallocation can be initiated by any PUT operation bringing the cluster back into support, such as an upgrade or scale to active agent nodes.

如果订阅被暂停或删除,则群集的控制平面和状态将在 90 天后删除。If your subscription is suspended or deleted, your cluster's control plane and state will be deleted after 90 days.

不受支持的 alpha 和 beta Kubernetes 功能Unsupported alpha and beta Kubernetes features

AKS 仅支持上游 Kubernetes 项目中的稳定和 beta 版功能。AKS only supports stable and beta features within the upstream Kubernetes project. 除非另有说明,否则,AKS 不支持上游 Kubernetes 项目中可用的任何 alpha 功能。Unless otherwise documented, AKS doesn't support any alpha feature that is available in the upstream Kubernetes project.

预览功能或功能标志Preview features or feature flags

对于需要扩展测试和用户反馈的功能,Azure 会发布新的预览功能或在采用了功能标志的情况下发布功能。For features and functionality that requires extended testing and user feedback, Azure releases new preview features or features behind a feature flag. 请将这些功能视为预发行版或 beta 功能。Consider these features as prerelease or beta features.

预览功能或功能标志功能不适用于生产环境。Preview features or feature-flag features aren't meant for production. API 和行为的不断变化、bug 修复和其他更改可能会导致群集不稳定和停机。Ongoing changes in APIs and behavior, bug fixes, and other changes can result in unstable clusters and downtime.

公共预览版中的功能受到“尽力而为”支持,因为这些功能处于预览状态,而不是用于生产环境,并且仅在工作时间由 AKS 技术支持团队提供支持。Features in public preview are fall under 'best effort' support as these features are in preview and not meant for production and are supported by the AKS technical support teams during business hours only. 有关详细信息,请参阅:For more information, see:

上游 bug 和问题Upstream bugs and issues

由于上游 Kubernetes 项目的开发速度,不可避免地会出现 bug。Given the speed of development in the upstream Kubernetes project, bugs invariably arise. 其中的某些 bug 无法在 AKS 系统内部得到修补或解决。Some of these bugs can't be patched or worked around within the AKS system. 相反,bug 修复需要对上游项目(例如 Kubernetes、节点或代理操作系统及内核)应用更大的补丁。Instead, bug fixes require larger patches to upstream projects (such as Kubernetes, node or agent operating systems, and kernel). 对于 Azure 拥有的组件(例如 Azure 云提供程序),AKS 和 Azure 人员承诺在社区中解决上游问题。For components that Azure owns (such as the Azure cloud provider), AKS and Azure personnel are committed to fixing issues upstream in the community.

如果判定某个技术支持问题是一个或多个上游 bug 的根本原因,则 AKS 支持和工程团队将会:When a technical support issue is root-caused by one or more upstream bugs, AKS support and engineering teams will:

  • 使用任何支持详细信息来识别并链接上游 bug,以帮助解释为何此问题会影响群集或工作负荷。Identify and link the upstream bugs with any supporting details to help explain why this issue affects your cluster or workload. 客户会收到所需存储库的链接,以便可以观察问题,并了解何时有新的版本可以提供修复措施。Customers receive links to the required repositories so they can watch the issues and see when a new release will provide fixes.
  • 提供可能的解决方法或缓解措施。Provide potential workarounds or mitigation. 如果该问题可以缓解,则会在 AKS 存储库中创建已知问题的备案。If the issue can be mitigated, a known issue will be filed in the AKS repository. 已知问题备案将会解释:The known-issue filing explains:
    • 该问题,包括上游 bug 的链接。The issue, including links to upstream bugs.
    • 解决方法,以及有关解决方案升级或其他持久化措施的详细信息。The workaround and details about an upgrade or another persistence of the solution.
    • 问题包含内容的大致时间线,根据上游版本发布频率提供。Rough timelines for the issue's inclusion, based on the upstream release cadence.