关于服务网格About service meshes

服务网格为工作负荷提供流量管理、复原能力、策略、安全性、强标识和可观测性等功能。A service mesh provides capabilities like traffic management, resiliency, policy, security, strong identity, and observability to your workloads. 应用程序与这些操作功能相分离,服务网格将这些功能移出应用层并移到基础结构层。Your application is decoupled from these operational capabilities and the service mesh moves them out of the application layer, and down to the infrastructure layer.


下面是使用服务网格时可为工作负荷实现的方案:These are some of the scenarios that can be enabled for your workloads when you use a service mesh:

  • 加密群集中的所有流量 - 在群集中的指定服务之间启用相互 TLS。Encrypt all traffic in cluster - Enable mutual TLS between specified services in the cluster. 这可以扩展到外围网络中的入口和出口。This can be extended to ingress and egress at the network perimeter. 为应用程序代码和基础结构提供安全的默认选项,且无需进行任何更改。Provides a secure by default option with no changes needed for application code and infrastructure.

  • Canary 和分阶段推出 - 指定将一部分流量路由到群集中的一组新服务所要满足的条件。Canary and phased rollouts - Specify conditions for a subset of traffic to be routed to a set of new services in the cluster. 成功测试 Canary 发布后,删除条件性路由,并分阶段逐渐增加路由到新服务的所有流量的百分比。On successful test of canary release, remove conditional routing and phase gradually increasing % of all traffic to new service. 最终,所有流量都会定向到新服务。Eventually all traffic will be directed to new service.

  • 流量管理和操作 - 针对服务创建一个策略,用于限制将所有流量从特定来源路由到某个服务版本的速率。Traffic management and manipulation - Create a policy on a service that will rate limit all traffic to a version of a service from a specific origin. 或创建一个策略用于对指定服务之间的故障类应用重试策略。Or a policy that applies a retry strategy to classes of failures between specified services. 在迁移或调试问题期间将实时流量镜像到新的服务版本。Mirror live traffic to new versions of services during a migration or to debug issues. 在测试环境中的服务之间注入故障以测试复原能力。Inject faults between services in a test environment to test resiliency.

  • 可观测性 - 洞察服务如何连接到它们之间流动的流量。Observability - Gain insight into how your services are connected the traffic that flows between them. 获取群集中所有流量以及入口/出口的指标、日志和跟踪。Obtain metrics, logs, and traces for all traffic in cluster, and ingress/egress. 将分布式跟踪功能添加到应用程序。Add distributed tracing abilities to your applications.


服务网格通常由控制平面和数据平面构成。A service mesh is typically composed of a control plane and the data plane.

控制平面具有许多为服务网格的管理提供支持的组件。The control plane has a number of components that support managing the service mesh. 这些组件通常包括一个管理界面(UI 或 API)。This will typically include a management interface which could be a UI or an API. 通常,还有一个组件管理规则和策略定义(定义服务网格应如何实现特定的功能)。There will also typically be components that manage the rule and policy definitions that define how the service mesh should implement specific capabilities. 还有一些组件管理安全性的各个方面,例如 mTLS 的强标识和证书。There are also components that manage aspects of security like strong identity and certificates for mTLS. 服务网格通常还包含指标或可观测性组件,该组件从工作负荷收集和聚合指标与遥测数据。Service meshes will also typically have a metrics or observability component that collects and aggregates metrics and telemetry from the workloads.

数据平面通常包括一个代理,该代理以透明方式作为分支注入到工作负荷。The data plane typically consists of a proxy that is transparently injected as a sidecar to your workloads. 此代理配置为控制进出包含工作负荷的 pod 的所有网络流量。This proxy is configured to control all network traffic in and out of the pod containing your workload. 这样,就可以将代理配置为通过 mTLS 保护流量、动态路由流量、对流量应用策略,以及收集指标和跟踪信息。This allows the proxy to be configured to secure traffic via mTLS, dynamically route traffic, apply policies to traffic and to collect metrics and tracing information.



每个服务网格原生适合并专注于支持特定的方案,但你通常会发现,大多数服务网格实现下面的许多(但不是全部)功能。Each of the service meshes have a natural fit and focus on supporting specific scenarios, but you'll typically find that most will implement a number of, if not all, of the following capabilities.

流量管理Traffic management

  • 协议 - 第7层(http、grpc)Protocol - layer 7 (http, grpc)
  • 动态路由 - 条件、加权、镜像Dynamic Routing - conditional, weighting, mirroring
  • 复原能力 - 超时、重试、断路器Resiliency - timeouts, retries, circuit breakers
  • 策略 - 访问控制、速率限制、配额Policy - access control, rate limits, quotas
  • 测试 - 故障注入Testing - fault injection


  • 加密 - mTLS、证书管理、外部 CAEncryption - mTLS, certificate management, external CA
  • 强标识 - SPIFFE 或类似功能Strong Identity - SPIFFE or similar
  • 身份验证 - 身份验证、授权Auth - authentication, authorisation


  • 指标 - 黄金指标、prometheus、grafanaMetrics - golden metrics, prometheus, grafana
  • 跟踪 - 跨工作负荷跟踪Tracing - traces across workloads
  • 流量 - 群集、入口/出口Traffic - cluster, ingress/egress


  • 支持的计算 - Kubernetes、虚拟机Supported Compute - Kubernetes, virtual machines
  • 多群集 - 网关、联合Multi-cluster - gateways, federation

选择条件Selection criteria

在选择服务网格之前,请确保了解安装服务网格的要求和理由。Before you select a service mesh, ensure that you understand your requirements and the reasons for installing a service mesh. 尝试提出以下问题。Try asking the following questions.

  • 入口控制器是否足以满足我的需求?Is an Ingress Controller sufficient for my needs? - 有时,在入口位置使用 a/b 测试或流量拆分等功能就足以支持所需的方案。- Sometimes having a capability like a/b testing or traffic splitting at the ingress is sufficient to support the required scenario. 不要无谓地增大环境的复杂性。Don't add complexity to your environment with no upside.

  • 我的工作负荷和环境是否可容忍额外的开销?Can my workloads and environment tolerate the additional overheads? - 用于支持服务网格的所有附加组件都需要增加 CPU 和内存等资源。- All the additional components required to support the service mesh require additional resources like cpu and memory. 此外,所有代理及其关联的策略检查会增大流量延迟。In addition, all the proxies and their associated policy checks add latency to your traffic. 如果工作负荷对延迟非常敏感或者无法提供额外的资源来满足服务网格组件的需求,请再三考虑。If you have workloads that are very sensitive to latency or cannot provide the additional resources to cover the service mesh components, then re-consider.

  • 这是否会不必要地增大复杂性?Is this adding additional complexity unnecessarily? - 如果安装服务网格的理由是获取某种对于业务团队或运营团队而言不一定重要的功能,请考虑是否值得增大安装、维护和配置的复杂性。- If the reason for installing a service mesh is to gain a capability that is not necessarily critical to the business or operational teams, then consider whether the additional complexity of installation, maintenance, and configuration is worth it.

  • 是否能够以递进的方式采用此服务网格?Can this be adopted in an incremental approach? - 某些提供多种功能的服务网格能够以更递进的方式采用。- Some of the service meshes that provide a lot of capabilities can be adopted in a more incremental approach. 安装所需的组件即可确保成功。Install just the components you need to ensure your success. 随着使用越来越熟练或者需要更多的功能,可以探索这些功能。Once you are more confident and additional capabilities are required, then explore those. 避免从一开始就迫不急待地安装所有组件。 Resist the urge to install everything from the start.

如果在经过仔细的考虑后,你确定需要使用服务网格来提供所需的功能,则下一个决策是“要使用哪个服务网格?” If, after careful consideration, you decide that you need a service mesh to provide the capabilities required, then your next decision is which service mesh?

请考虑以下几个方面,以及哪种描述与要求最相符。Consider the following areas and which of them are most aligned with your requirements. 这可以引导你找出最适合环境和工作负荷的解决方案。This will guide you towards the best fit for your environment and workloads. 后续步骤部分提供了进一步详细介绍具体的服务网格及其如何映射到这些方面的链接。The Next steps section will take you to further detailed information about specific service meshes and how they map to these areas.

  • 技术 - 流量管理、策略、安全性、可观测性Technical - traffic management, policy, security, observability

  • 商务 - 商业支持、基金会 (CNCF)、OSS 许可证、监管Business - commercial support, foundation (CNCF), OSS license, governance

  • 操作 - 安装/升级、资源要求、性能要求、集成(指标、遥测、仪表板、工具、SMI)、混合工作负荷(Linux 和 Windows 节点池)、计算(Kubernetes、虚拟机)、多群集Operational - installation/upgrades, resource requirements, performance requirements, integrations (metrics, telemetry, dashboards, tools, SMI), mixed workloads (Linux and Windows node pools), compute (Kubernetes, virtual machines), multi-cluster

  • 安全性 - 身份验证、标识、证书管理和轮换、可插接式外部 CASecurity - auth, identity, certificate management and rotation, pluggable external CA

后续步骤Next steps

以下文档提供了有关可在 Azure Kubernetes 服务 (AKS) 中试用的服务网格的详细信息:The following documentation provides more information about service meshes that you can try out on Azure Kubernetes Service (AKS):

你还可能想要了解服务网格接口 (SMI) - 适用于 Kubernetes 中的服务网格的标准接口:You may also want to explore Service Mesh Interface (SMI), a standard interface for service meshes on Kubernetes: