对 Azure Kubernetes 服务 (AKS) 的 Open Service Mesh (OSM) 加载项进行故障排除

为 Azure Kubernetes 服务 (AKS) 部署 Open Service Mesh (OSM) 加载项时,可能会遇到与服务网格配置相关的问题。 本文探讨常见的故障排除错误以及如何解决这些错误。

验证 OSM 组件并对其进行故障排除

检查 OSM 控制器部署、Pod 和服务

  • 使用 kubectl get deployment,pod,service 命令检查 OSM 控制器部署、Pod 和服务运行状况。

    kubectl get deployment,pod,service -n kube-system --selector app=osm-controller
    

    正常 OSM 控制器提供类似于以下示例输出的输出:

    NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/osm-controller   2/2     2            2           3m4s
    
    NAME                                  READY   STATUS    RESTARTS   AGE
    pod/osm-controller-65bd8c445c-zszp4   1/1     Running   0          2m
    pod/osm-controller-65bd8c445c-xqhmk   1/1     Running   0          16s
    
    NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
    service/osm-controller   ClusterIP   10.96.185.178   <none>        15128/TCP,9092/TCP,9091/TCP   3m4s
    service/osm-validator    ClusterIP   10.96.11.78     <none>        9093/TCP                      3m4s
    

    注意

    对于 osm-controller 服务,CLUSTER-IP 不同。 服务的 NAME 和 PORT(S) 必须与示例输出相同。

检查 OSM 注入程序部署、Pod 和服务

  • 使用 kubectl get deployment,pod,service 命令检查 OSM 注入程序部署、Pod 和服务运行状况。

    kubectl get deployment,pod,service -n kube-system --selector app=osm-injector
    

    正常运行的 OSM 注入程序会提供类似于以下示例输出的输出:

    NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/osm-injector   2/2     2            2           4m37s
    
    NAME                                READY   STATUS    RESTARTS   AGE
    pod/osm-injector-5c49bd8d7c-b6cx6   1/1     Running   0          4m21s
    pod/osm-injector-5c49bd8d7c-dx587   1/1     Running   0          4m37s
    
    NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    service/osm-injector   ClusterIP   10.96.236.108   <none>        9090/TCP   4m37s
    

检查 OSM 启动部署、Pod 和服务

  • 使用 kubectl get deployment,pod,service 命令检查 OSM 启动部署、Pod 和服务运行状况。

    kubectl get deployment,pod,service -n kube-system --selector app=osm-bootstrap
    

    正常的 OSM 启动会提供类似于以下示例输出的输出:

    NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/osm-bootstrap   1/1     1            1           5m25s
    
    NAME                                 READY   STATUS    RESTARTS   AGE
    pod/osm-bootstrap-594ffc6cb7-jc7bs   1/1     Running   0          5m25s
    
    NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    service/osm-bootstrap   ClusterIP   10.96.250.208   <none>        9443/TCP,9095/TCP   5m25s
    

检查“验证”Webhook 和“变异”Webhook

  1. 使用 kubectl get ValidatingWebhookConfiguration 命令检查 OSM 验证 Webhook。

    kubectl get ValidatingWebhookConfiguration --selector app=osm-controller
    

    正常的 OSM 验证 Webhook 会提供类似于以下示例输出的输出:

    NAME              WEBHOOKS   AGE
    aks-osm-validator-mesh-osm   1      81m
    
  2. 使用 kubectl get MutatingWebhookConfiguration 命令检查 OSM 变异 Webhook。

    kubectl get MutatingWebhookConfiguration --selector app=osm-injector
    

    正常的 OSM 变异 Webhook 会提供类似于以下示例输出的输出:

    NAME              WEBHOOKS   AGE
    aks-osm-webhook-osm   1      102m
    

检查“验证”Webhook 的服务和 CA 捆绑包

  • kubectl get ValidatingWebhookConfiguration 命令 aks-osm-validator-mesh-osmjq '.webhooks[0].clientConfig.service' 配合使用检查 OSM 验证 Webhook 的服务和 CA 捆绑包。

    kubectl get ValidatingWebhookConfiguration aks-osm-validator-mesh-osm -o json | jq '.webhooks[0].clientConfig.service'
    

    配置好的验证 Webhook 配置类似于以下示例 JSON 输出:

    {
      "name": "osm-config-validator",
      "namespace": "kube-system",
      "path": "/validate-webhook",
      "port": 9093
    }
    

检查“变异”Webhook 的服务和 CA 捆绑包

  • kubectl get ValidatingWebhookConfiguration 命令与 aks-osm-validator-mesh-osmjq '.webhooks[0].clientConfig.service' 配合使用检查 OSM 变异 Webhook 的服务和 CA 捆绑包。

    kubectl get MutatingWebhookConfiguration aks-osm-webhook-osm -o json | jq '.webhooks[0].clientConfig.service'
    

    配置好的变异 Webhook 配置类似于以下示例 JSON 输出:

    {
      "name": "osm-injector",
      "namespace": "kube-system",
      "path": "/mutate-pod-creation",
      "port": 9090
    }
    

检查 osm-mesh-config 资源

  1. 使用 kubectl get meshconfig 命令检查 OSM MeshConfig 资源是否存在。

    kubectl get meshconfig osm-mesh-config -n kube-system
    
  2. kubectl get meshconfig 命令和 -o yaml 配合使用检查 OSM MeshConfig 资源的内容。

    kubectl get meshconfig osm-mesh-config -n kube-system -o yaml
    
    apiVersion: config.openservicemesh.io/v1alpha1
    kind: MeshConfig
    metadata:
      creationTimestamp: "0000-00-00A00:00:00A"
      generation: 1
      name: osm-mesh-config
      namespace: kube-system
      resourceVersion: "2494"
      uid: 6c4d67f3-c241-4aeb-bf4f-b029b08faa31
    spec:
      certificate:
        serviceCertValidityDuration: 24h
      featureFlags:
        enableEgressPolicy: true
        enableMulticlusterMode: false
        enableWASMStats: true
      observability:
        enableDebugServer: true
        osmLogLevel: info
        tracing:
          address: jaeger.kube-system.svc.cluster.local
          enable: false
          endpoint: /api/v2/spans
          port: 9411
      sidecar:
        configResyncInterval: 0s
        enablePrivilegedInitContainer: false
        envoyImage: mcr.azk8s.cn/oss/envoyproxy/envoy:v1.18.3
        initContainerImage: mcr.azk8s.cn/oss/openservicemesh/init:v0.9.1
        logLevel: error
        maxDataPlaneConnections: 0
        resources: {}
      traffic:
        enableEgress: true
        enablePermissiveTrafficPolicyMode: true
        inboundExternalAuthorization:
          enable: false
          failureModeAllow: false
          statPrefix: inboundExtAuthz
          timeout: 1s
        useHTTPSIngress: false
    

osm-mesh-config 资源值

密钥 类型 默认值 Kubectl Patch 命令示例
spec.traffic.enableEgress bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enableEgress":true}}}' --type=merge
spec.traffic.enablePermissiveTrafficPolicyMode bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge
spec.traffic.useHTTPSIngress bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"useHTTPSIngress":true}}}' --type=merge
spec.traffic.outboundPortExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundPortExclusionList":[6379,8080]}}}' --type=merge
spec.traffic.outboundIPRangeExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundIPRangeExclusionList":["10.0.0.0/32","1.1.1.1/24"]}}}' --type=merge
spec.traffic.inboundPortExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"inboundPortExclusionList":[6379,8080]}}}' --type=merge
spec.certificate.serviceCertValidityDuration string "24h" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"certificate":{"serviceCertValidityDuration":"24h"}}}' --type=merge
spec.observability.enableDebugServer bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"enableDebugServer":true}}}' --type=merge
spec.observability.tracing.enable bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"enable":true}}}}' --type=merge
spec.observability.tracing.address string "jaeger.kube-system.svc.cluster.local" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"address": "jaeger.kube-system.svc.cluster.local"}}}}' --type=merge
spec.observability.tracing.endpoint string "/api/v2/spans" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"endpoint":"/api/v2/spans"}}}}' --type=merge' --type=merge
spec.observability.tracing.port int 9411 kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"port":9411}}}}' --type=merge
spec.observability.tracing.osmLogLevel 字符串 "info" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"osmLogLevel": "info"}}}}' --type=merge
spec.sidecar.enablePrivilegedInitContainer bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"enablePrivilegedInitContainer":true}}}' --type=merge
spec.sidecar.logLevel string "error" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"logLevel":"error"}}}' --type=merge
spec.sidecar.maxDataPlaneConnections int 0 kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"maxDataPlaneConnections":"error"}}}' --type=merge
spec.sidecar.envoyImage string "mcr.azk8s.cn/oss/envoyproxy/envoy:v1.19.1" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"envoyImage":"mcr.azk8s.cn/oss/envoyproxy/envoy:v1.19.1"}}}' --type=merge
spec.sidecar.initContainerImage string "mcr.azk8s.cn/oss/openservicemesh/init:v0.11.1" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"initContainerImage":"mcr.azk8s.cn/oss/openservicemesh/init:v0.11.1"}}}' --type=merge
spec.sidecar.configResyncInterval string "0s" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"configResyncInterval":"30s"}}}' --type=merge
spec.featureFlags.enableWASMStats bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableWASMStats":"true"}}}' --type=merge
spec.featureFlags.enableEgressPolicy bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEgressPolicy":"true"}}}' --type=merge
spec.featureFlags.enableMulticlusterMode bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableMulticlusterMode":"false"}}}' --type=merge
spec.featureFlags.enableSnapshotCacheMode bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableSnapshotCacheMode":"false"}}}' --type=merge
spec.featureFlags.enableAsyncProxyServiceMapping bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableAsyncProxyServiceMapping":"false"}}}' --type=merge
spec.featureFlags.enableIngressBackendPolicy bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableIngressBackendPolicy":"true"}}}' --type=merge
spec.featureFlags.enableEnvoyActiveHealthChecks bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEnvoyActiveHealthChecks":"false"}}}' --type=merge

检查命名空间

注意

kube-system 命名空间从不参与服务网格,也不会用以下键/值进行标记和/或注释。

可以使用 osm namespace add 命令将命名空间加入到指定的服务网格。 K8s 命名空间要成为网格的一部分,必须具有以下注释和标签。

  1. kubectl get namespace 命令 jq '.metadata.annotations' 配合使用查看注释。

    kubectl get namespace bookbuyer -o json | jq '.metadata.annotations'
    

    必须在输出中看到下列注释:

    {
      "openservicemesh.io/sidecar-injection": "enabled"
    }
    
  2. kubectl get namespaces 命令和 jq '.metadata.labels' 配合使用查看标签。

    kubectl get namespace bookbuyer -o json | jq '.metadata.labels'
    

    必须在输出中看到以下标签:

    {
      "openservicemesh.io/monitored-by": "osm"
    }
    

如果命名空间没有 "openservicemesh.io/sidecar-injection": "enabled" 注释或 "openservicemesh.io/monitored-by": "osm" 标签,则 OSM 注入程序不添加 Envoy 挎斗。

注意

在调用 osm namespace add 之后,只有新的 Pod 才会与 Envoy 挎斗一起注入。 必须使用 kubectl rollout restart deployment ... 重启现有 Pod

验证 OSM CRD

  1. 使用 kubectl get crds 命令检查群集是否具有所需的 CRD。

    kubectl get crds
    

    必须在群集上安装以下 CRD:

    • egresses.policy.openservicemesh.io
    • httproutegroups.specs.smi-spec.io
    • ingressbackends.policy.openservicemesh.io
    • meshconfigs.config.openservicemesh.io
    • multiclusterservices.config.openservicemesh.io
    • tcproutes.specs.smi-spec.io
    • trafficsplits.split.smi-spec.io
    • traffictargets.access.smi-spec.io
  2. 使用 osm mesh list 命令获取已安装 SMI CRD 的版本。

    osm mesh list
    

    输出应类似于以下示例输出:

    MESH NAME   MESH NAMESPACE   VERSION   ADDED NAMESPACES
    osm         kube-system      v0.11.1
    
    MESH NAME   MESH NAMESPACE   SMI SUPPORTED
    osm         kube-system      HTTPRouteGroup:v1alpha4,TCPRoute:v1alpha4,TrafficSplit:v1alpha2,TrafficTarget:v1alpha3
    
    To list the OSM controller pods for a mesh, please run the following command passing in the mesh's namespace
            kubectl get pods -n <osm-mesh-namespace> -l app=osm-controller
    

    OSM 控制器 v0.11.1 需要以下版本:

    • traffictargets.access.smi-spec.io - v1alpha3
    • httproutegroups.specs.smi-spec.io - v1alpha4
    • tcproutes.specs.smi-spec.io - v1alpha4
    • udproutes.specs.smi-spec.io-不支持
    • trafficsplits.split.smi-spec.io - v1alpha2
    • *.metrics.smi-spec.io - v1alpha1

证书管理

有关 OSM 如何向在应用程序 pod 上运行的 Envoy 代理签发和管理证书的更多信息,请参阅 OSM 证书指南

升级 Envoy

在加载项监视的命名空间中创建新的 Pod 时,OSM 会在该 Pod 中注入一个 Envoy 代理挎斗。 有关如何更新 Envoy 版本的详细信息,请参阅 OSM 升级指南