如何了解 ClusterResourcePlacement 和 ResourcePlacement 的状态

使用 Azure Kubernetes Fleet Manager 时,了解 (CRP) 资源的状态 ClusterResourcePlacement 对于监视部署进度和排查问题至关重要。 本文提供了一个全面的指南,用于解释 Fleet Manager 针对群集范围和命名空间范围的放置报告的状态字段和条件。

先决条件

放置状态结构的概述

对象 ClusterResourcePlacement 不仅包含有关放置的描述性规范,还包含放置作的状态。 状态部分提供有关以下内容的详细信息:

  • 通过条件表示的总体放置状态
  • 位置选择的资源
  • 按群集放置状态通过条件表示
  • Faileddrifteddiffed 每个群集中的位置

若要查看放置状态,请使用以下命令:

kubectl describe clusterresourceplacement <placement-name>

或者获取原始 YAML 输出:

kubectl get clusterresourceplacement <placement-name> -o yaml

顶级状态字段

状态部分包含以下顶级字段:

  • selectedResources:位置选择的资源列表
  • 条件:总体放置条件数组
  • observedResourceIndex:当前资源快照的索引
  • placementStatuses:每群集放置状态信息

以下部分详细介绍了每个字段。

了解所选资源

selectedResources 字段列出放置选择的所有资源。 此字段允许检查放置中是否包含预期的资源。 下面是一个示例:

selectedResources:
- kind: Namespace
  name: test
  version: v1
- kind: ConfigMap
  name: test-config
  namespace: test
  version: v1
  envelope:
    name: example-envelope
    namespace: test
    type: ResourceEnvelope
- kind: Deployment
  name: web-app
  namespace: test
  group: apps
  version: v1
- kind: Service
  name: web-service
  namespace: test
  version: v1
- kind: Secret
  name: app-secrets
  namespace: test
  version: v1

每个资源条目包括:

  • :API 组(核心资源的空)
  • 版本:API 版本(例如,v1v1beta1
  • kind:资源类型
  • 名称:资源名称
  • 命名空间:命名空间(用于命名空间资源)
  • 信封:如果资源包装在信封中,则还会提供信封元数据,其中包括:
    • 名称:信封的名称
    • 命名空间:信封的命名空间
    • 类型:信封的类型(例如 ResourceEnvelope

了解放置条件

conditions 数组提供有关整个放置的高级状态信息。 每个条件都遵循 Kubernetes 通用定义,该定义具有以下标准字段:

  • 类型:条件类型(以下部分所述)
  • 状态TrueFalseUnknown
  • 原因:条件的简短原因代码
  • 消息:人工可读说明
  • lastTransitionTime:条件上次更改时
  • observedGeneration:设置条件时放置的生成

ClusterResourcePlacement 条件类型

以下条件类型可用于 ClusterResourcePlacement:

ClusterResourcePlacementScheduled

指示放置是否成功计划到目标群集。

  • True:根据放置策略选择所有必需的群集
  • False:计划失败(例如,群集可用不足)
  • 未知:计划决策处于挂起状态
conditions:
- type: ClusterResourcePlacementScheduled
  status: "True"
  reason: SchedulingPolicyFulfilled
  message: "found all the clusters needed as specified by the scheduling policy"
  lastTransitionTime: "2023-11-10T08:14:52Z"
  observedGeneration: 5

ClusterResourcePlacementRolloutStarted

指示是否在所选群集中启动推出。

  • True:资源已开始向计划群集推出
  • False:尚未启动推出
  • 未知:推出决策处于挂起状态
conditions:
- type: ClusterResourcePlacementRolloutStarted
  status: "True"
  reason: RolloutStarted
  message: "All 3 cluster(s) start rolling out the latest resource"
  lastTransitionTime: "2023-11-10T08:15:30Z"
  observedGeneration: 5

ClusterResourcePlacementOverridden

指示是否成功应用资源替代。

  • True:处理所有适用的替代
  • False:某些替代无法应用
  • 未知:替代处理处于挂起状态
conditions:
- type: ClusterResourcePlacementOverridden
  status: "True"
  reason: NoOverrideSpecified
  message: "No override rules are configured for the selected resources"
  lastTransitionTime: "2023-11-10T08:15:45Z"
  observedGeneration: 5

ClusterResourcePlacementWorkSynchronized

指示是否在中心群集的每个群集命名空间中创建工作对象。

  • True:所有工作对象都同步
  • False:工作同步失败或不完整
  • 未知:工作同步挂起
conditions:
- type: ClusterResourcePlacementWorkSynchronized  
  status: "True"
  reason: SynchronizeSucceeded
  message: "All 2 cluster(s) are synchronized to the latest resources on the hub cluster"
  lastTransitionTime: "2023-11-10T08:23:43Z"
  observedGeneration: 5

ClusterResourcePlacementApplied

指示是否将所有资源成功应用于成员群集。

  • True:已成功应用于所有目标群集的所有资源
  • False:某些资源无法应用(检查 failedPlacements
  • 未知:应用作挂起
conditions:
- type: ClusterResourcePlacementApplied
  status: "True"
  reason: ApplySucceeded
  message: "The selected resources are successfully applied to 3 clusters"
  lastTransitionTime: "2023-11-10T08:16:15Z"
  observedGeneration: 5

ClusterResourcePlacementAvailable

指示放置的资源是否全部可用,并且已准备好在成员群集上。

  • True:所有目标群集上都提供所有资源
  • False:某些资源尚不可用
  • 未知:可用性检查处于挂起状态
conditions:
- type: ClusterResourcePlacementAvailable
  status: "True"
  reason: ResourceAvailable
  message: "The selected resources in 3 clusters are available now"
  lastTransitionTime: "2023-11-10T08:16:30Z"
  observedGeneration: 5

ClusterResourcePlacementDiffReported

指示是否报告配置差异(使用 ReportDiff 策略时)。

  • True:完整的差异报告可用
  • False:差异报告失败或不完整
  • 未知:差异报告挂起
conditions:
- type: ClusterResourcePlacementDiffReported
  status: "True"
  reason: DiffReportComplete
  message: "Configuration differences are reported for all target clusters"
  lastTransitionTime: "2023-11-10T08:16:45Z"
  observedGeneration: 5

了解资源快照

observedResourceIndex 字段指示当前正在部署的资源快照:

observedResourceIndex: "1"

当以下情况时,Fleet Manager 会创建资源快照:

  • 资源选择器更改
  • 已修改所选资源

每个快照都有唯一的索引。 可以使用以下方法查看快照:

kubectl get clusterresourcesnapshot --selector=kubernetes-fleet.io/resource-index=1

了解每个群集的放置状态

placementStatuses 数组包含放置或尝试放置资源的每个群集的详细状态:

placementStatuses:
- clusterName: aks-member-1
  observedResourceIndex: "1"
  conditions:
    - type: ResourceScheduled
      status: "True"
      reason: ScheduleSucceeded
      message: "Successfully scheduled resources for placement in aks-member-1"
      lastTransitionTime: "2023-11-10T08:14:52Z"
      observedGeneration: 5
    - type: RolloutStarted
      status: "True"
      reason: RolloutStarted
      message: "Detected the new changes on the resources and started the rollout process"
      lastTransitionTime: "2023-11-10T08:15:30Z"
      observedGeneration: 5
    - type: Overridden
      status: "True"
      reason: NoOverrideSpecified
      message: "No override rules are configured for the selected resources"
      lastTransitionTime: "2023-11-10T08:15:45Z"
      observedGeneration: 5
    - type: WorkSynchronized
      status: "True"
      reason: AllWorkSynced
      message: "All of the works are synchronized to the latest"
      lastTransitionTime: "2023-11-10T08:16:00Z"
      observedGeneration: 5
    - type: Applied
      status: "True"
      reason: AllWorkHaveBeenApplied
      message: "All corresponding work objects are applied"
      lastTransitionTime: "2023-11-10T08:16:15Z"
      observedGeneration: 5
    - type: Available
      status: "True"
      reason: ResourceAvailable
      message: "All resources are available on the target cluster"
      lastTransitionTime: "2023-11-10T08:16:30Z"
      observedGeneration: 5
  failedPlacements: []
  driftedPlacements: []
  diffedPlacements: []
- clusterName: aks-member-2
  observedResourceIndex: "1"
  conditions:
    - type: ResourceScheduled
      status: "True"
      reason: ScheduleSucceeded
      message: "Successfully scheduled resources for placement in aks-member-2"
      lastTransitionTime: "2023-11-10T08:14:52Z"
      observedGeneration: 5
    - type: Applied
      status: "False"
      reason: AppliedManifestFailedReason
      message: "Failed to apply some manifests"
      lastTransitionTime: "2023-11-10T08:16:15Z"
      observedGeneration: 5
  failedPlacements:
    - kind: Deployment
      name: web-app
      namespace: test
      version: apps/v1
      condition:
        type: Applied
        status: "False"
        reason: AppliedManifestFailedReason
        message: "Failed to apply manifest: insufficient resources"
        lastTransitionTime: "2023-11-10T08:16:15Z"

按群集条件类型

每个群集的状态包括跟踪部署生命周期的条件:

ResourceScheduled

指示是否已成功选择群集进行放置。

- type: ResourceScheduled
  status: "True"
  reason: ScheduleSucceeded
  message: "Successfully scheduled resources for placement in aks-member-1 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
  lastTransitionTime: "2023-11-10T08:14:52Z"
  observedGeneration: 5

RolloutStarted

指示是否已在此特定群集上启动推出。

- type: RolloutStarted
  status: "True"
  reason: RolloutStarted
  message: "Detected new changes on the resources and started the rollout process"
  lastTransitionTime: "2023-11-10T08:15:30Z"
  observedGeneration: 5

重写

指示是否为此群集应用资源替代。

- type: Overridden
  status: "True"
  reason: NoOverrideSpecified
  message: "No override rules are configured for the selected resources"
  lastTransitionTime: "2023-11-10T08:15:45Z"
  observedGeneration: 5

WorkSynchronized

指示是否为此群集创建工作对象。

- type: WorkSynchronized
  status: "True"
  reason: AllWorkSynced
  message: "All of the works are synchronized to the latest"
  lastTransitionTime: "2023-11-10T08:16:00Z"
  observedGeneration: 5

已应用

指示是否成功将所有资源应用于此群集。

- type: Applied
  status: "True"
  reason: AllWorkHaveBeenApplied
  message: "All corresponding work objects are applied"
  lastTransitionTime: "2023-11-10T08:16:15Z"
  observedGeneration: 5

可用

指示所有资源是否都可用,并已在此群集上准备就绪。

- type: Available
  status: "True"
  reason: ResourceAvailable
  message: "All resources are available on the target cluster"
  lastTransitionTime: "2023-11-10T08:16:30Z"
  observedGeneration: 5

DiffReported

指示是否报告此群集的配置差异。

- type: DiffReported
  status: "True"
  reason: DiffReportComplete
  message: "Configuration differences are reported for this cluster"
  lastTransitionTime: "2023-11-10T08:16:45Z"
  observedGeneration: 5

了解失败的位置

如果资源无法应用于群集,则会在 failedPlacements 数组中记录详细信息:

failedPlacements:
- kind: Deployment
  name: my-app
  namespace: default
  version: apps/v1
  condition:
    type: Applied
    status: "False"
    reason: AppliedManifestFailedReason
    message: "Failed to apply manifest: namespaces 'app' not found"
    lastTransitionTime: "2023-12-06T00:09:53Z"
  envelope:
    name: example
    namespace: app
    type: ResourceEnvelope
- kind: Service
  name: my-service
  namespace: default
  version: v1
  condition:
    type: Applied
    status: "False"
    reason: AppliedManifestFailedReason
    message: "Failed to apply manifest: Service 'my-service' is forbidden: User 'system:serviceaccount:fleet-system:fleet-agent' cannot create resource 'services' in API group '' in the namespace 'default'"
    lastTransitionTime: "2023-12-06T00:10:15Z"
- kind: ConfigMap
  name: app-config
  namespace: production
  version: v1
  condition:
    type: Applied
    status: "False"
    reason: AppliedManifestFailedReason
    message: "Failed to apply manifest: configmaps 'app-config' already exists"
    lastTransitionTime: "2023-12-06T00:10:30Z"

每个失败的位置包括:

  • 资源标识:组、版本、类型、名称、命名空间
  • 条件:特定故障条件
  • 信封:信封信息(如果适用)

了解偏移位置

机群管理器始终报告偏离其所需状态的资源:

driftedPlacements:
- kind: Namespace
  name: web
  version: v1
  observationTime: "2025-03-19T06:50:25Z"
  firstDriftedObservedTime: "2025-03-19T06:49:54Z"
  targetClusterObservedGeneration: 12
  observedDrifts:
  - path: "/metadata/labels/owner"
    valueInHub: "simon"
    valueInMember: "chen"
  - path: "/metadata/annotations/purpose"
    valueInHub: "production"
    valueInMember: "testing"
- kind: Deployment
  name: web-app
  namespace: web
  group: apps
  version: v1
  observationTime: "2025-03-19T06:50:25Z"
  firstDriftedObservedTime: "2025-03-19T06:49:54Z"
  targetClusterObservedGeneration: 8
  observedDrifts:
  - path: "/spec/replicas"
    valueInHub: "3"
    valueInMember: "5"
  - path: "/spec/template/spec/containers/0/image"
    valueInHub: "nginx:1.20"
    valueInMember: "nginx:1.21"
- kind: ConfigMap
  name: app-config
  namespace: web
  version: v1
  observationTime: "2025-03-19T06:50:25Z"
  firstDriftedObservedTime: "2025-03-19T06:49:54Z"
  targetClusterObservedGeneration: 5
  observedDrifts:
  - path: "/data/environment"
    valueInHub: "production"
    valueInMember: "staging"

每个偏移位置包括:

  • 资源标识:组、版本、类型、名称、命名空间
  • 观察时间:上次观测偏移时间
  • firstDriftedObservedTime:首次检测到偏移时
  • targetClusterObservedGeneration:在成员群集上生成资源
  • observedDrifts:配置差异的详细列表

了解差异放置

使用 ReportDiff 应用策略时,Fleet Manager 报告配置差异:

diffedPlacements:
- kind: Service
  name: my-service
  namespace: default
  version: v1
  observationTime: "2025-03-19T06:50:25Z"
  firstDiffedObservedTime: "2025-03-19T06:49:54Z"
  targetClusterObservedGeneration: 8
  observedDiffs:
  - path: "/spec/ports/0/nodePort"
    valueInHub: ""
    valueInMember: "30080"
  - path: "/spec/clusterIP"
    valueInHub: ""
    valueInMember: "10.96.100.200"
- kind: Deployment
  name: web-app
  namespace: default
  group: apps
  version: v1
  observationTime: "2025-03-19T06:50:25Z"
  firstDiffedObservedTime: "2025-03-19T06:49:54Z"
  targetClusterObservedGeneration: 12
  observedDiffs:
  - path: "/status/replicas"
    valueInHub: ""
    valueInMember: "3"
  - path: "/status/readyReplicas"
    valueInHub: ""
    valueInMember: "3"
  - path: "/metadata/generation"
    valueInHub: "1"
    valueInMember: "2"

差异放置的结构与偏移放置类似,但用于不同的方案:

  • 偏移位置:在应用资源但随后更改时使用
  • 差异放置:与 ReportDiff 策略一起使用或未满足接管条件时

监视放置进度

若要有效监视放置进度,请检查以下关键指标:

  1. 放置的资源:验证 ClusterResourcePlacementWorkSynchronized 是否为 True
  2. 总体运行状况:查看 ClusterResourcePlacementApplied 条件
  3. 按群集状态:查看每个目标群集的条件
  4. 失败的位置:检查数组中的任何 failedPlacements 条目

完成状态示例

下面是显示 ClusterResourcePlacement 的完整状态的综合示例:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: web-app-placement
  generation: 5
spec:
  resourceSelectors:
  - group: ""
    kind: Namespace
    name: web-app
    version: v1
  - group: apps
    kind: Deployment
    name: web-server
    namespace: web-app
    version: v1
  - group: ""
    kind: Service
    name: web-service
    namespace: web-app
    version: v1
  policy:
    placementType: PickN
    numberOfClusters: 2
    affinity:
      clusterAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          clusterSelectorTerms:
          - matchLabels:
              region: chinanorth3
status:
  conditions:
  - type: ClusterResourcePlacementScheduled
    status: "True"
    reason: SchedulingPolicyFulfilled
    message: "found all the clusters needed as specified by the scheduling policy"
    lastTransitionTime: "2023-11-10T08:14:52Z"
    observedGeneration: 5
  - type: ClusterResourcePlacementRolloutStarted
    status: "True"
    reason: RolloutStarted
    message: "All 2 cluster(s) start rolling out the latest resource"
    lastTransitionTime: "2023-11-10T08:15:30Z"
    observedGeneration: 5
  - type: ClusterResourcePlacementOverridden
    status: "True"
    reason: NoOverrideSpecified
    message: "No override rules are configured for the selected resources"
    lastTransitionTime: "2023-11-10T08:15:45Z"
    observedGeneration: 5
  - type: ClusterResourcePlacementWorkSynchronized
    status: "True"
    reason: SynchronizeSucceeded
    message: "All 2 cluster(s) are synchronized to the latest resources on the hub cluster"
    lastTransitionTime: "2023-11-10T08:16:00Z"
    observedGeneration: 5
  - type: ClusterResourcePlacementApplied
    status: "True"
    reason: ApplySucceeded
    message: "The selected resources are successfully applied to 2 clusters"
    lastTransitionTime: "2023-11-10T08:16:15Z"
    observedGeneration: 5
  - type: ClusterResourcePlacementAvailable
    status: "True"
    reason: ResourceAvailable
    message: "The selected resources in 2 cluster are available now"
    lastTransitionTime: "2023-11-10T08:16:30Z"
    observedGeneration: 5
  observedResourceIndex: "1"
  selectedResources:
  - group: ""
    kind: Namespace
    name: web-app
    version: v1
  - group: apps
    kind: Deployment
    name: web-server
    namespace: web-app
    version: v1
  - group: ""
    kind: Service
    name: web-service
    namespace: web-app
    version: v1
  placementStatuses:
  - clusterName: aks-chinanorth3-1
    observedResourceIndex: "1"
    conditions:
    - type: ResourceScheduled
      status: "True"
      reason: ScheduleSucceeded
      message: "Successfully scheduled resources for placement in aks-chinanorth3-1 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
      lastTransitionTime: "2023-11-10T08:14:52Z"
      observedGeneration: 5
    - type: RolloutStarted
      status: "True"
      reason: RolloutStarted
      message: "Detected the new changes on the resources and started the rollout process"
      lastTransitionTime: "2023-11-10T08:15:30Z"
      observedGeneration: 5
    - type: Overridden
      status: "True"
      reason: NoOverrideSpecified
      message: "No override rules are configured for the selected resources"
      lastTransitionTime: "2023-11-10T08:15:45Z"
      observedGeneration: 5
    - type: WorkSynchronized
      status: "True"
      reason: AllWorkSynced
      message: "All of the works are synchronized to the latest"
      lastTransitionTime: "2023-11-10T08:16:00Z"
      observedGeneration: 5
    - type: Applied
      status: "True"
      reason: AllWorkHaveBeenApplied
      message: "All corresponding work objects are applied"
      lastTransitionTime: "2023-11-10T08:16:15Z"
      observedGeneration: 5
    - type: Available
      status: "True"
      reason: ResourceAvailable
      message: "All resources are available on the target cluster"
      lastTransitionTime: "2023-11-10T08:16:30Z"
      observedGeneration: 5
    failedPlacements: []
    driftedPlacements: []
    diffedPlacements: []
  - clusterName: aks-chinanorth3-2
    observedResourceIndex: "1"
    conditions:
    - type: ResourceScheduled
      status: "True"
      reason: ScheduleSucceeded
      message: "Successfully scheduled resources for placement in aks-chinanorth3-2 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
      lastTransitionTime: "2023-11-10T08:14:52Z"
      observedGeneration: 5
    - type: RolloutStarted
      status: "True"
      reason: RolloutStarted
      message: "Detected new changes on the resources and started the rollout process"
      lastTransitionTime: "2023-11-10T08:15:30Z"
      observedGeneration: 5
    - type: Overridden
      status: "True"
      reason: NoOverrideSpecified
      message: "No override rules are configured for the selected resources"
      lastTransitionTime: "2023-11-10T08:15:45Z"
      observedGeneration: 5
    - type: WorkSynchronized
      status: "True"
      reason: AllWorkSynced
      message: "All of the works are synchronized to the latest"
      lastTransitionTime: "2023-11-10T08:16:00Z"
      observedGeneration: 5
    - type: Applied
      status: "True"
      reason: AllWorkHaveBeenApplied
      message: "All corresponding work objects are applied"
      lastTransitionTime: "2023-11-10T08:16:15Z"
      observedGeneration: 5
    - type: Available
      status: "True"
      reason: ResourceAvailable
      message: "All resources are available on the target cluster"
      lastTransitionTime: "2023-11-10T08:16:30Z"
      observedGeneration: 5
    failedPlacements: []
    driftedPlacements: []
    diffedPlacements: []

此示例显示:

  • 成功的计划:放置能够找到两个与放置策略匹配的群集
  • 成功推出:所有资源都部署到两个目标群集
  • 无替代:未配置或需要任何资源替代
  • 同步的工作:创建和同步工作对象
  • 已应用的资源:已成功应用所有资源
  • 可用资源:所有资源都正在运行且可用
  • 清理状态:无失败、偏移或差异放置

若要了解有关资源传播的详细信息,请参阅以下资源: