Azure Kubernetes 服务的支持策略Support policies for Azure Kubernetes Service

本文提供有关 Azure Kubernetes 服务 (AKS) 的技术支持策略和限制的详细信息。This article provides details about technical support policies and limitations for Azure Kubernetes Service (AKS). 本文还详细介绍了工作器节点管理、托管控制平面组件、第三方开源组件,以及安全性或修补程序管理。The article also details worker node management, managed control plane components, third-party open-source components, and security or patch management.

服务更新和版本Service updates and releases

AKS 中的托管功能Managed features in AKS

基本的基础结构即服务 (IaaS) 云组件(例如计算或网络组件)可让用户访问低级别的控件机制和自定义选项。Base infrastructure as a service (IaaS) cloud components, such as compute or networking components, give users access to low-level controls and customization options. 相比之下,AKS 提供统包式的 Kubernetes 部署,为客户提供一组通用配置和所需的功能。By contrast, AKS provides a turnkey Kubernetes deployment that gives customers the common set of configurations and capabilities they need. AKS 客户可用的自定义功能、部署和其他选项有限。AKS customers have limited customization, deployment, and other options. 这些客户不需要考虑或直接管理 Kubernetes 群集。These customers don't need to worry about or manage Kubernetes clusters directly.

借助 AKS,客户可以获得一个完全托管型的控制平面。With AKS, the customer gets a fully managed control plane. 该控制平面包含客户操作以及向最终用户提供 Kubernetes 群集的全部所需组件和服务。The control plane contains all of the components and services the customer needs to operate and provide Kubernetes clusters to end users. 所有 Kubernetes 组件都由 Azure 维护和运营。All Kubernetes components are maintained and operated by Azure.

Azure 通过控制平面管理和监视以下组件:Azure manages and monitors the following components through the control pane:

  • Kubelet 或 Kubernetes API 服务器Kubelet or Kubernetes API servers
  • Etcd 或兼容的键-值存储,提供服务质量 (QoS)、可伸缩性和运行时Etcd or a compatible key-value store, providing Quality of Service (QoS), scalability, and runtime
  • DNS 服务(例如 kube-dns 或 CoreDNS)DNS services (for example, kube-dns or CoreDNS)
  • Kubernetes 代理或网络Kubernetes proxy or networking

AKS 不是完全托管型的群集解决方案。AKS isn't a completely managed cluster solution. 某些组件(例如工作器节点)实行分担责任制,用户必须帮助维护 AKS 群集。Some components, such as worker nodes, have shared responsibility, where users must help maintain the AKS cluster. 例如,必须提供用户输入才能应用工作器节点操作系统 (OS) 安全修补程序。User input is required, for example, to apply a worker node operating system (OS) security patch.

服务是托管型的,Azure 和 AKS 团队将部署、操作并负责服务的可用性和功能。The services are managed in the sense that Azure and the AKS team deploys, operates, and is responsible for service availability and functionality. 客户无法改动这些托管组件。Customers can't alter these managed components. 为确保一致且可缩放的用户体验,Azure 将限制自定义。Azure limits customization to ensure a consistent and scalable user experience. 有关完全可自定义的解决方案,请参阅 AKS 引擎For a fully customizable solution, see AKS Engine.

共担责任Shared responsibility

创建群集时,客户需定义 AKS 创建的 Kubernetes 工作器节点。When a cluster is created, the customer defines the Kubernetes worker nodes that AKS creates. 客户工作负荷将在这些节点上执行。Customer workloads are executed on these nodes. 客户拥有并可以查看或修改工作器节点。Customers own and can view or modify the worker nodes.

由于客户群集节点会执行专用代码并存储敏感数据,Azure 支持人员只能以受限的方式访问这些信息。Because customer cluster nodes execute private code and store sensitive data, Azure Support can access them in only a limited way. 在未经得客户明确许可或者提供协助的情况下,Azure 支持人员登录到这些节点、在其中执行命令或查看其日志。Azure Support can't sign in to, execute commands in, or view logs for these nodes without express customer permission or assistance.

由于工作器节点是敏感的,Azure 将努力限制其后台管理。Because worker nodes are sensitive, Azure takes great care to limit their background management. 在许多情况下,即使 Kubernetes 主节点、etcd 和其他 Azure 托管组件发生故障,工作负荷也能持续运行。In many cases, your workload will continue to run even if the Kubernetes master nodes, etcd, and other Azure-managed components fail. 不小心修改工作器节点可能会导致数据和工作负荷丢失,并使群集变得不受支持。Carelessly modified worker nodes can cause losses of data and workloads and can render the cluster unsupportable.

AKS 支持范围AKS support coverage

Azure 提供以下方面的技术支持:Azure provides technical support for the following:

备注

Azure 支持所采取的任何群集操作都是经用户同意并通过名称为 aks-support-rolebinding 的 Kubernetes“编辑”角色进行的。Any cluster actions taken by Azure support are made with user consent under a built-in Kubernetes "edit" role of the name aks-support-rolebinding. 使用此角色,可以启用 AKS 支持来编辑群集配置和资源,以便对群集问题进行故障排除和诊断,但该角色不能修改权限,也不能创建角色或角色绑定。With this role AKS support is enabled to edit cluster configuration and resources to troubleshoot and diagnose cluster issues, but the role can not modify permissions nor create roles or role bindings. 仅在具有实时 (JIT) 访问权限的活动支持票证下启用角色访问。Role access is only enabled under active support tickets with just-in-time (JIT) access.

  • 连接到 Kubernetes 服务提供和支持的所有 Kubernetes 组件,例如 API 服务器。Connectivity to all Kubernetes components that the Kubernetes service provides and supports, such as the API server.
  • Kubernetes 控制平面服务(例如 Kubernetes 主节点、API 服务器、etcd 和 kube-dns)的管理、运行时间、QoS 和操作。Management, uptime, QoS, and operations of Kubernetes control plane services (Kubernetes master nodes, API server, etcd, and kube-dns, for example).
  • Etcd。Etcd. 支持包括每隔 30 分钟以透明方式自动备份所有 etcd 数据,以实现灾难规划和群集状态还原。Support includes automated, transparent backups of all etcd data every 30 minutes for disaster planning and cluster state restoration. 客户或用户不可直接使用这些备份。These backups aren't directly available to customers or users. 这些备份用于确保数据的可靠性和一致性。They ensure data reliability and consistency. Etcd。Etcd. 不支持按需回滚或还原功能。on demand roll back or restore is not supported as a feature.
  • 适用于 Azure 云提供程序驱动程序中的任何集成点。Any integration points in the Azure cloud provider driver for Kubernetes. 这包括与负载均衡器、永久性卷或网络组件(Kubernetes 和 Azure CNI)等其他 Azure 服务的集成。These include integrations into other Azure services such as load balancers, persistent volumes, or networking (Kubernetes and Azure CNI).
  • 有关控制平面组件(例如 Kubernetes API 服务器、etcd 和 kube-dns)等控制平面组件的自定义问题。Questions or issues about customization of control plane components such as the Kubernetes API server, etcd, and kube-dns.
  • 有关网络组件(例如 Azure CNI、kubenet)的问题,或其他网络访问和功能问题。Issues about networking, such as Azure CNI, kubenet, or other network access and functionality issues. 问题可能包括 DNS 解析、数据包丢失、路由等。Issues could include DNS resolution, packet loss, routing, and so on. Azure 支持各种网络方案:Azure supports various networking scenarios:
    • 群集和关联组件中的 Kubenet(基本)和高级网络 (Azure CNI)Kubenet (basic) and advanced networking (Azure CNI) within the cluster and associated components
    • 连接到其他 Azure 服务和应用程序Connectivity to other Azure services and applications
    • 入口控制器以及入口或负载均衡器配置Ingress controllers and ingress or load balancer configurations
    • 网络性能和延迟Network performance and latency

Azure 不提供以下方面的技术支持:Azure doesn't provide technical support for the following:

  • 有关 Kubernetes 用法的问题。Questions about how to use Kubernetes. 例如,Azure 支持部门不提供有关如何创建自定义入口控制器、使用应用程序工作负荷,或者应用第三方/开源软件包或工具的建议。For example, Azure Support doesn't provide advice on how to create custom ingress controllers, use application workloads, or apply third-party or open-source software packages or tools.

    备注

    Azure 支持部门可以提供有关 AKS 群集功能、自定义和优化的建议(例如 Kubernetes 操作问题和过程)。Azure Support can advise on AKS cluster functionality, customization, and tuning (for example, Kubernetes operations issues and procedures).

  • 不是作为 Kubernetes 控制平面的一部分提供的,或者不是在 AKS 群集中部署的第三方开源项目。Third-party open-source projects that aren't provided as part of the Kubernetes control plane or deployed with AKS clusters. 这些项目可能包括 Istio、Helm、Envoy 等等。These projects might include Istio, Helm, Envoy, or others.

    备注

    Azure 可以尽最大努力为 Helm 和 Kured 等第三方开源项目提供支持。Azure can provide best-effort support for third-party open-source projects such as Helm and Kured. 如果需要将第三方开源工具与 Kubernetes Azure 云提供程序相集成,或者存在其他特定于 AKS 的 bug,则 Azure 可以通过 Azure 文档提供示例和应用程序方面的支持。Where the third-party open-source tool integrates with the Kubernetes Azure cloud provider or other AKS-specific bugs, Azure supports examples and applications from Azure documentation.

  • 第三方闭源软件。Third-party closed-source software. 此类软件可能包括安全扫描工具以及网络设备或软件。This software can include security scanning tools and networking devices or software.
  • 有关多云或多供应商扩建的问题。Issues about multicloud or multivendor build-outs. 例如,Azure 不会为运行联合多公共端云供应商解决方案的相关问题提供支持。For example, Azure doesn't support issues related to running a federated multipublic cloud vendor solution.
  • AKS 文档中未列出的网络自定义。Network customizations other than those listed in the AKS documentation.

    备注

    Azure 不会为有关网络安全组 (NSG) 的问题和 bug 提供支持。Azure does support issues and bugs related to network security groups (NSGs). 例如,Azure 支持可以解答有关 NSG 无法更新或出现意外的 NSG 或负载均衡器行为的问题。For example, Azure Support can answer questions about an NSG failure to update or an unexpected NSG or load balancer behavior.

针对工作器节点的 AKS 支持范围AKS support coverage for worker nodes

Azure 负责维护 AKS 工作器节点Azure responsibilities for AKS worker nodes

存在以下问题时,由 Azure 和客户共同负责维护 Kubernetes 工作器节点:Azure and customers share responsibility for Kubernetes worker nodes where:

  • 基本 OS 映像收到了必需的新增功能(例如监视和网络代理)。The base OS image has required additions (such as monitoring and networking agents).
  • 工作器节点自动收到了 OS 修补程序。The worker nodes receive OS patches automatically.
  • 在工作器节点上运行的 Kubernetes 控制平面组件的问题可自动得到解决。Issues with the Kubernetes control plane components that run on the worker nodes are automatically remediated. 组件包括:Components include the following:
    • Kube 代理Kube-proxy
    • 为 Kubernetes 主控组件提供通信路径的网络隧道Networking tunnels that provide communication paths to the Kubernetes master components
    • KubeletKubelet
    • Docker 或 Moby 守护程序Docker or Moby daemon

备注

在工作器节点上,如果某个控制平面组件无法正常运行,则 AKS 团队可能需要重新启动单个组件或整个工作器节点。On a worker node, if a control plane component is not operational, the AKS team might need to reboot individual components or the entire worker node. 这些重启操作会自动执行,并为常见问题提供自动修正。These reboot operations are automated and provide auto-remediation for common issues. 这些重新启动仅发生在_节点_级别,而不是群集,除非是紧急维护或停机。These reboots occur only on the node level and not the cluster unless there is an emergency maintenance or outage.

客户对 AKS 工作器节点承担的责任Customer responsibilities for AKS worker nodes

Azure 不会自动重新启动工作器节点来应用 OS 级修补程序。Azure doesn't automatically reboot worker nodes to apply OS-level patches. 尽管 OS 修补程序是为工作器节点交付的,但客户需负责重新启动工作器节点来应用更改。Although OS patches are delivered to the worker nodes, the customer is responsible for rebooting the worker nodes to apply the changes. 系统或 OS 级别的共享库、守护程序(例如固态混合驱动器 (SSHD))和其他组件将自动得到修补。Shared libraries, daemons such as solid-state hybrid drive (SSHD), and other components at the level of the system or OS are automatically patched.

客户负责执行 Kubernetes 升级。Customers are responsible for executing Kubernetes upgrades. 他们可以通过 Azure 控制面板或 Azure CLI 执行升级。They can execute upgrades through the Azure control panel or the Azure CLI. 这适用于包含 Kubernetes 的安全或功能改进的更新。This applies for updates that contain security or functionality improvements to Kubernetes.

对工作器节点进行的用户自定义User customization of worker nodes

备注

AKS 工作器节点在 Azure 门户中显示为常规 Azure IaaS 资源。AKS worker nodes appear in the Azure portal as regular Azure IaaS resources. 但是这些虚拟机被部署到自定义的 Azure 资源组(前缀为 MC\)中。But these virtual machines are deployed into a custom Azure resource group (prefixed with MC\). 可以在 AKS 工作器节点的基础配置的基础上对其进行扩充。It is possible to augment AKS worker nodes from their base configurations. 例如,你可以使用安全外壳 (SSH) 更改 AKS 工作器节点,就像更改普通虚拟机一样。For example, you can use Secure Shell (SSH) to change AKS worker nodes the way you change normal virtual machines. 但是,你无法更改基础操作系统映像。You cannot, however, change the base OS image. 自定义更改在升级、缩放、更新或重启后可能不会保留。Any custom changes may not persist through an upgrade, scale, update or reboot. 但是,在 AKS API 的带外和作用域外进行更改会导致 AKS 群集变得不受支持。However, making changes out of band and out of scope of the AKS API leads to the AKS cluster becoming unsupported. 除非 Azure 支持人员指示你进行更改,否则请避免更改工作器节点。Avoid changing worker nodes unless Azure Support directs you to make changes.

发出上面定义的不受支持的操作(如所有代理节点的带外解除分配)会导致群集不受支持。Issuing unsupported operations as defined above, such as out of band deallocation of all agent nodes, renders the cluster unsupported. 对于已配置了“停止支持”规则以将支持期限延长至等于或超过 30 天的控制平面,AKS 保留了将其存档的权利。AKS reserves the right to archive control planes that have been configured out of support guidelines for extended periods equal to and beyond 30 days. AKS 维护群集 etcd 元数据的备份,并可轻松地重新分配群集。AKS maintains backups of cluster etcd metadata and can readily reallocate the cluster. 此重新分配可以由任何使群集重获支持的 PUT 操作(例如升级或缩放到活动代理节点)启动。This reallocation can be initiated by any PUT operation bringing the cluster back into support, such as an upgrade or scale to active agent nodes.

AKS 代表客户管理工作器节点的生命周期和操作 - 不支持修改与工作器节点关联的 IaaS 资源。AKS manages the lifecycle and operations of worker nodes on behalf of customers - modifying the IaaS resources associated with the worker nodes is not supported. 不支持的操作的一个示例是通过 VMSS 门户或 VMSS API 手动更改 VMSS 上的配置来自定义节点池 VM 规模集。An example of an unsupported operation is customizing a node pool VM Scale Set by manually changing configurations on the VMSS through the VMSS portal or VMSS API.

对于特定于工作负荷的配置或包,AKS 建议使用 Kubernetes daemonsetFor workload specific configurations or packages, AKS recommends using Kubernetes daemonsets.

使用 Kubernetes 特权 daemonset 和 init 容器,客户可以在群集工作器节点上调整/修改或安装第三方软件。Using Kubernetes privileged daemonsets and init containers enables customers to tune/modify or install 3rd party software on cluster worker nodes. 此类自定义的示例包括添加自定义安全扫描软件或更新 sysctl 设置。Examples of such customizations include adding custom security scanning software or updating sysctl settings.

虽然当上述要求适用时这是建议的路径,但 AKS 工程和支持部门无法协助排查或诊断损坏性/非功能性修改的问题,也无法协助排查或诊断由于客户部署的 daemonset 而导致节点不可用的修改的问题。While this is a recommended path if the above requirements apply, AKS engineering and support can not assist in troubleshooting or diagnosis of broken/nonfunctional modifications or those that render the node unavailable due to a customer deployed daemonset.

备注

作为一项托管服务,AKS 的最终目标包括免除用户在修补、更新和日志收集方面的职责,使服务管理变得更完整且无需人工干预。AKS as a managed service has end goals such as removing responsibility for patches, updates, and log collection to make the service management more complete and hands-off. 随着服务的端到端管理能力的增强,将来的版本可能会省略一些功能(例如,节点重启和自动修补)。As the service's capacity for end-to-end management increases, future releases might omit some functions (for example, node rebooting and automatic patching).

安全问题和修补Security issues and patching

如果在 AKS 的一个或多个组件中找到安全缺陷,AKS 团队将修补所有受影响的群集以缓解此问题。If a security flaw is found in one or more components of AKS, the AKS team will patch all affected clusters to mitigate the issue. 或者,AKS 团队将为用户提供升级指导。Alternatively, the team will give users upgrade guidance.

对于安全缺陷所影响到的工作器节点,如果有不会造成任何停机的修补程序可用,则 AKS 团队会应用该修补程序,并通知用户发生了更改。For worker nodes that a security flaw affects, if a zero-downtime patch is available, the AKS team will apply that patch and notify users of the change.

如果安全修补程序需要重新启动工作器节点,Azure 会将此要求告知客户。When a security patch requires worker node reboots, Azure will notify customers of this requirement. 客户负责重新启动或更新以获取群集修补程序。The customer is responsible for rebooting or updating to get the cluster patch. 如果用户未根据 AKS 的指导应用修补程序,其群集仍会受到安全问题的影响。If users don't apply the patches according to AKS guidance, their cluster will continue to be vulnerable to the security issue.

节点维护和访问Node maintenance and access

工作器节点由客户拥有,其责任由客户与 Azure 共同分担。Worker nodes are a shared responsibility and are owned by customers. 因此,客户可能会登录到其工作器节点并进行潜在有害的更改,例如更新内核,以及安装或删除包。Because of this, customers have the ability to sign in to their worker nodes and make potentially harmful changes such as kernel updates and installing or removing packages.

如果客户做出了破坏性的更改或者导致控制平面组件脱机或异常,则 AKS 会检测到这种故障,并将工作器节点自动还原到以前的正常运行状态。If customers make destructive changes or cause control plane components to go offline or become nonfunctional, AKS will detect this failure and automatically restore the worker node to the previous working state.

尽管客户可以登录并更改工作器节点,但我们不建议这样做,因为做出更改可能会导致群集不受支持。Although customers can sign in to and change worker nodes, doing this is discouraged because changes can make a cluster unsupportable.

网络端口、访问和 NSGNetwork ports, access, and NSGs

作为一项托管服务,AKS 在网络和连接方面存在特定的要求。As a managed service, AKS has specific networking and connectivity requirements. 与普通 IaaS 组件的要求相比,这些要求不太灵活。These requirements are less flexible than requirements for normal IaaS components. 在 AKS 中,自定义 NSG 规则、阻止特定端口(例如,使用防火墙规则阻止出站端口 443)以及将 URL 加入允许列表等操作可能导致群集不受支持。In AKS, operations like customizing NSG rules, blocking a specific port (for example, using firewall rules that block outbound port 443), and whitelisting URLs can make your cluster unsupportable.

备注

目前,AKS 不允许完全锁定群集的出口流量。Currently, AKS doesn't allow you to completely lock down egress traffic from your cluster. 若要控制可供群集用于出站流量的 URL 和端口的列表,请参阅限制出口流量To control the list of URLs and ports your cluster can use for outbound traffic see limit egress traffic.

不受支持的 alpha 和 beta Kubernetes 功能Unsupported alpha and beta Kubernetes features

AKS 仅支持上游 Kubernetes 项目中的稳定功能。AKS supports only stable features within the upstream Kubernetes project. 除非另有说明,否则,AKS 不支持上游 Kubernetes 项目中提供的 alpha 和 beta 功能。Unless otherwise documented, AKS doesn't support alpha and beta features that are available in the upstream Kubernetes project.

在以下两种情况下,alpha 或 beta 功能可能在正式版发布之前便已推出:In two scenarios, alpha or beta features might be rolled out before they're generally available:

  • 客户已与 AKS 产品、支持或工程团队会谈,团队请求他们尝试这些新功能。Customers have met with the AKS product, support, or engineering teams and have been asked to try these new features.
  • 这些功能已通过功能标志启用These features have been enabled by a feature flag. 客户必须明确选择使用这些功能。Customers must explicitly opt in to use these features.

预览功能或功能标志Preview features or feature flags

对于需要扩展测试和用户反馈的功能,Azure 会发布新的预览功能或在采用了功能标志的情况下发布功能。For features and functionality that require extended testing and user feedback, Azure releases new preview features or features behind a feature flag. 请将这些功能视为预发行版或 beta 功能。Consider these features as prerelease or beta features.

预览功能或功能标志功能不适用于生产环境。Preview features or feature-flag features aren't meant for production. API 和行为的不断变化、bug 修复和其他更改可能会导致群集不稳定和停机。Ongoing changes in APIs and behavior, bug fixes, and other changes can result in unstable clusters and downtime.

公共预览版中的功能受到“尽力而为”支持,因为这些功能处于预览状态,而不是用于生产环境,并且仅在工作时间由 AKS 技术支持团队提供支持。Features in public preview are fall under 'best effort' support as these features are in preview and not meant for production and are supported by the AKS technical support teams during business hours only. 有关其他信息,请参阅:For additional information please see:

备注

预览功能在 Azure 订阅级别生效。Preview features take effect at the Azure subscription level. 请勿在生产订阅上安装预览功能。Don't install preview features on a production subscription. 在生产订阅上,预览功能可以更改默认 API 行为并影响常规操作。On a production subscription, preview features can change default API behavior and affect regular operations.

上游 bug 和问题Upstream bugs and issues

由于上游 Kubernetes 项目的开发速度,不可避免地会出现 bug。Given the speed of development in the upstream Kubernetes project, bugs invariably arise. 其中的某些 bug 无法在 AKS 系统内部得到修补或解决。Some of these bugs can't be patched or worked around within the AKS system. bug 修复需要对上游项目(例如 Kubernetes、节点或辅助角色操作系统以及内核)应用较大的修补程序。Instead, bug fixes require larger patches to upstream projects (such as Kubernetes, node or worker operating systems, and kernels). 对于 Azure 拥有的组件(例如 Azure 云提供程序),AKS 和 Azure 人员承诺在社区中解决上游问题。For components that Azure owns (such as the Azure cloud provider), AKS and Azure personnel are committed to fixing issues upstream in the community.

如果判定某个技术支持问题是一个或多个上游 bug 的根本原因,则 AKS 支持和工程团队将会:When a technical support issue is root-caused by one or more upstream bugs, AKS support and engineering teams will:

  • 使用任何支持详细信息来识别并链接上游 bug,以帮助解释为何此问题会影响群集或工作负荷。Identify and link the upstream bugs with any supporting details to help explain why this issue affects your cluster or workload. 客户会收到所需存储库的链接,以便可以观察问题,并了解何时有新的版本可以提供修复措施。Customers receive links to the required repositories so they can watch the issues and see when a new release will provide fixes.
  • 提供潜在的解决方法或缓解措施。Provide potential workarounds or mitigations. 如果该问题可以缓解,则会在 AKS 存储库中创建已知问题的备案。If the issue can be mitigated, a known issue will be filed in the AKS repository. 已知问题备案将会解释:The known-issue filing explains:
    • 该问题,包括上游 bug 的链接。The issue, including links to upstream bugs.
    • 解决方法,以及有关解决方案升级或其他持久化措施的详细信息。The workaround and details about an upgrade or another persistence of the solution.
    • 问题包含内容的大致时间线,根据上游版本发布频率提供。Rough timelines for the issue's inclusion, based on the upstream release cadence.