对 Azure Kubernetes 服务 (AKS) 的 Open Service Mesh (OSM) 加载项进行故障排除
为 Azure Kubernetes 服务 (AKS) 部署 Open Service Mesh (OSM) 加载项时,可能会遇到与服务网格配置相关的问题。 本文探讨常见的故障排除错误以及如何解决这些错误。
验证 OSM 组件并对其进行故障排除
检查 OSM 控制器部署、Pod 和服务
使用
kubectl get deployment,pod,service
命令检查 OSM 控制器部署、Pod 和服务运行状况。kubectl get deployment,pod,service -n kube-system --selector app=osm-controller
正常 OSM 控制器提供类似于以下示例输出的输出:
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/osm-controller 2/2 2 2 3m4s NAME READY STATUS RESTARTS AGE pod/osm-controller-65bd8c445c-zszp4 1/1 Running 0 2m pod/osm-controller-65bd8c445c-xqhmk 1/1 Running 0 16s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/osm-controller ClusterIP 10.96.185.178 <none> 15128/TCP,9092/TCP,9091/TCP 3m4s service/osm-validator ClusterIP 10.96.11.78 <none> 9093/TCP 3m4s
注意
对于
osm-controller
服务,CLUSTER-IP 不同。 服务的 NAME 和 PORT(S) 必须与示例输出相同。
检查 OSM 注入程序部署、Pod 和服务
使用
kubectl get deployment,pod,service
命令检查 OSM 注入程序部署、Pod 和服务运行状况。kubectl get deployment,pod,service -n kube-system --selector app=osm-injector
正常运行的 OSM 注入程序会提供类似于以下示例输出的输出:
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/osm-injector 2/2 2 2 4m37s NAME READY STATUS RESTARTS AGE pod/osm-injector-5c49bd8d7c-b6cx6 1/1 Running 0 4m21s pod/osm-injector-5c49bd8d7c-dx587 1/1 Running 0 4m37s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/osm-injector ClusterIP 10.96.236.108 <none> 9090/TCP 4m37s
检查 OSM 启动部署、Pod 和服务
使用
kubectl get deployment,pod,service
命令检查 OSM 启动部署、Pod 和服务运行状况。kubectl get deployment,pod,service -n kube-system --selector app=osm-bootstrap
正常的 OSM 启动会提供类似于以下示例输出的输出:
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/osm-bootstrap 1/1 1 1 5m25s NAME READY STATUS RESTARTS AGE pod/osm-bootstrap-594ffc6cb7-jc7bs 1/1 Running 0 5m25s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/osm-bootstrap ClusterIP 10.96.250.208 <none> 9443/TCP,9095/TCP 5m25s
检查“验证”Webhook 和“变异”Webhook
使用
kubectl get ValidatingWebhookConfiguration
命令检查 OSM 验证 Webhook。kubectl get ValidatingWebhookConfiguration --selector app=osm-controller
正常的 OSM 验证 Webhook 会提供类似于以下示例输出的输出:
NAME WEBHOOKS AGE aks-osm-validator-mesh-osm 1 81m
使用
kubectl get MutatingWebhookConfiguration
命令检查 OSM 变异 Webhook。kubectl get MutatingWebhookConfiguration --selector app=osm-injector
正常的 OSM 变异 Webhook 会提供类似于以下示例输出的输出:
NAME WEBHOOKS AGE aks-osm-webhook-osm 1 102m
检查“验证”Webhook 的服务和 CA 捆绑包
将
kubectl get ValidatingWebhookConfiguration
命令aks-osm-validator-mesh-osm
和jq '.webhooks[0].clientConfig.service'
配合使用检查 OSM 验证 Webhook 的服务和 CA 捆绑包。kubectl get ValidatingWebhookConfiguration aks-osm-validator-mesh-osm -o json | jq '.webhooks[0].clientConfig.service'
配置好的验证 Webhook 配置类似于以下示例 JSON 输出:
{ "name": "osm-config-validator", "namespace": "kube-system", "path": "/validate-webhook", "port": 9093 }
检查“变异”Webhook 的服务和 CA 捆绑包
将
kubectl get ValidatingWebhookConfiguration
命令与aks-osm-validator-mesh-osm
和jq '.webhooks[0].clientConfig.service'
配合使用检查 OSM 变异 Webhook 的服务和 CA 捆绑包。kubectl get MutatingWebhookConfiguration aks-osm-webhook-osm -o json | jq '.webhooks[0].clientConfig.service'
配置好的变异 Webhook 配置类似于以下示例 JSON 输出:
{ "name": "osm-injector", "namespace": "kube-system", "path": "/mutate-pod-creation", "port": 9090 }
检查 osm-mesh-config
资源
使用
kubectl get meshconfig
命令检查 OSM MeshConfig 资源是否存在。kubectl get meshconfig osm-mesh-config -n kube-system
将
kubectl get meshconfig
命令和-o yaml
配合使用检查 OSM MeshConfig 资源的内容。kubectl get meshconfig osm-mesh-config -n kube-system -o yaml
apiVersion: config.openservicemesh.io/v1alpha1 kind: MeshConfig metadata: creationTimestamp: "0000-00-00A00:00:00A" generation: 1 name: osm-mesh-config namespace: kube-system resourceVersion: "2494" uid: 6c4d67f3-c241-4aeb-bf4f-b029b08faa31 spec: certificate: serviceCertValidityDuration: 24h featureFlags: enableEgressPolicy: true enableMulticlusterMode: false enableWASMStats: true observability: enableDebugServer: true osmLogLevel: info tracing: address: jaeger.kube-system.svc.cluster.local enable: false endpoint: /api/v2/spans port: 9411 sidecar: configResyncInterval: 0s enablePrivilegedInitContainer: false envoyImage: mcr.azk8s.cn/oss/envoyproxy/envoy:v1.18.3 initContainerImage: mcr.azk8s.cn/oss/openservicemesh/init:v0.9.1 logLevel: error maxDataPlaneConnections: 0 resources: {} traffic: enableEgress: true enablePermissiveTrafficPolicyMode: true inboundExternalAuthorization: enable: false failureModeAllow: false statPrefix: inboundExtAuthz timeout: 1s useHTTPSIngress: false
osm-mesh-config
资源值
密钥 | 类型 | 默认值 | Kubectl Patch 命令示例 |
---|---|---|---|
spec.traffic.enableEgress | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enableEgress":true}}}' --type=merge |
spec.traffic.enablePermissiveTrafficPolicyMode | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge |
spec.traffic.useHTTPSIngress | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"useHTTPSIngress":true}}}' --type=merge |
spec.traffic.outboundPortExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundPortExclusionList":[6379,8080]}}}' --type=merge |
spec.traffic.outboundIPRangeExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundIPRangeExclusionList":["10.0.0.0/32","1.1.1.1/24"]}}}' --type=merge |
spec.traffic.inboundPortExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"inboundPortExclusionList":[6379,8080]}}}' --type=merge |
spec.certificate.serviceCertValidityDuration | string | "24h" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"certificate":{"serviceCertValidityDuration":"24h"}}}' --type=merge |
spec.observability.enableDebugServer | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"enableDebugServer":true}}}' --type=merge |
spec.observability.tracing.enable | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"enable":true}}}}' --type=merge |
spec.observability.tracing.address | string | "jaeger.kube-system.svc.cluster.local" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"address": "jaeger.kube-system.svc.cluster.local"}}}}' --type=merge |
spec.observability.tracing.endpoint | string | "/api/v2/spans" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"endpoint":"/api/v2/spans"}}}}' --type=merge' --type=merge |
spec.observability.tracing.port | int | 9411 |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"port":9411}}}}' --type=merge |
spec.observability.tracing.osmLogLevel | 字符串 | "info" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"osmLogLevel": "info"}}}}' --type=merge |
spec.sidecar.enablePrivilegedInitContainer | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"enablePrivilegedInitContainer":true}}}' --type=merge |
spec.sidecar.logLevel | string | "error" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"logLevel":"error"}}}' --type=merge |
spec.sidecar.maxDataPlaneConnections | int | 0 |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"maxDataPlaneConnections":"error"}}}' --type=merge |
spec.sidecar.envoyImage | string | "mcr.azk8s.cn/oss/envoyproxy/envoy:v1.19.1" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"envoyImage":"mcr.azk8s.cn/oss/envoyproxy/envoy:v1.19.1"}}}' --type=merge |
spec.sidecar.initContainerImage | string | "mcr.azk8s.cn/oss/openservicemesh/init:v0.11.1" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"initContainerImage":"mcr.azk8s.cn/oss/openservicemesh/init:v0.11.1"}}}' --type=merge |
spec.sidecar.configResyncInterval | string | "0s" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"configResyncInterval":"30s"}}}' --type=merge |
spec.featureFlags.enableWASMStats | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableWASMStats":"true"}}}' --type=merge |
spec.featureFlags.enableEgressPolicy | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEgressPolicy":"true"}}}' --type=merge |
spec.featureFlags.enableMulticlusterMode | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableMulticlusterMode":"false"}}}' --type=merge |
spec.featureFlags.enableSnapshotCacheMode | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableSnapshotCacheMode":"false"}}}' --type=merge |
spec.featureFlags.enableAsyncProxyServiceMapping | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableAsyncProxyServiceMapping":"false"}}}' --type=merge |
spec.featureFlags.enableIngressBackendPolicy | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableIngressBackendPolicy":"true"}}}' --type=merge |
spec.featureFlags.enableEnvoyActiveHealthChecks | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEnvoyActiveHealthChecks":"false"}}}' --type=merge |
检查命名空间
注意
kube-system
命名空间从不参与服务网格,也不会用以下键/值进行标记和/或注释。
可以使用 osm namespace add
命令将命名空间加入到指定的服务网格。 K8s 命名空间要成为网格的一部分,必须具有以下注释和标签。
将
kubectl get namespace
命令jq '.metadata.annotations'
配合使用查看注释。kubectl get namespace bookbuyer -o json | jq '.metadata.annotations'
必须在输出中看到下列注释:
{ "openservicemesh.io/sidecar-injection": "enabled" }
将
kubectl get namespaces
命令和jq '.metadata.labels'
配合使用查看标签。kubectl get namespace bookbuyer -o json | jq '.metadata.labels'
必须在输出中看到以下标签:
{ "openservicemesh.io/monitored-by": "osm" }
如果命名空间没有 "openservicemesh.io/sidecar-injection": "enabled"
注释或 "openservicemesh.io/monitored-by": "osm"
标签,则 OSM 注入程序不添加 Envoy 挎斗。
注意
在调用 osm namespace add
之后,只有新的 Pod 才会与 Envoy 挎斗一起注入。 必须使用 kubectl rollout restart deployment ...
重启现有 Pod
验证 OSM CRD
使用
kubectl get crds
命令检查群集是否具有所需的 CRD。kubectl get crds
必须在群集上安装以下 CRD:
- egresses.policy.openservicemesh.io
- httproutegroups.specs.smi-spec.io
- ingressbackends.policy.openservicemesh.io
- meshconfigs.config.openservicemesh.io
- multiclusterservices.config.openservicemesh.io
- tcproutes.specs.smi-spec.io
- trafficsplits.split.smi-spec.io
- traffictargets.access.smi-spec.io
使用
osm mesh list
命令获取已安装 SMI CRD 的版本。osm mesh list
输出应类似于以下示例输出:
MESH NAME MESH NAMESPACE VERSION ADDED NAMESPACES osm kube-system v0.11.1 MESH NAME MESH NAMESPACE SMI SUPPORTED osm kube-system HTTPRouteGroup:v1alpha4,TCPRoute:v1alpha4,TrafficSplit:v1alpha2,TrafficTarget:v1alpha3 To list the OSM controller pods for a mesh, please run the following command passing in the mesh's namespace kubectl get pods -n <osm-mesh-namespace> -l app=osm-controller
OSM 控制器 v0.11.1 需要以下版本:
证书管理
有关 OSM 如何向在应用程序 pod 上运行的 Envoy 代理签发和管理证书的更多信息,请参阅 OSM 证书指南。
升级 Envoy
在加载项监视的命名空间中创建新的 Pod 时,OSM 会在该 Pod 中注入一个 Envoy 代理挎斗。 有关如何更新 Envoy 版本的详细信息,请参阅 OSM 升级指南。