使用 Azure Kubernetes Fleet Manager 时,了解 (CRP) 资源的状态 ClusterResourcePlacement
对于监视部署进度和排查问题至关重要。 本文提供了一个全面的指南,用于解释 Fleet Manager 针对群集范围和命名空间范围的放置报告的状态字段和条件。
先决条件
- 你有一个具有中心群集和一个或多个成员群集的机群管理器。 如果没有,请参阅 创建 Azure Kubernetes Fleet Manager 资源并加入成员群集。
- 有权访问 Fleet Manager 中心群集。 有关详细信息,请参阅 访问 Azure Kubernetes Fleet Manager 中心群集的 Kubernetes API。
- 你已部署了至少一个 ClusterResourcePlacement API 对象,用于将资源放入群中。 如果没有,请参阅 “使用群集资源放置”跨多个群集部署工作负荷。
放置状态结构的概述
对象 ClusterResourcePlacement
不仅包含有关放置的描述性规范,还包含放置作的状态。
状态部分提供有关以下内容的详细信息:
- 通过条件表示的总体放置状态
- 位置选择的资源
- 按群集放置状态通过条件表示
-
Failed
、drifted
和diffed
每个群集中的位置
若要查看放置状态,请使用以下命令:
kubectl describe clusterresourceplacement <placement-name>
或者获取原始 YAML 输出:
kubectl get clusterresourceplacement <placement-name> -o yaml
顶级状态字段
状态部分包含以下顶级字段:
- selectedResources:位置选择的资源列表
- 条件:总体放置条件数组
- observedResourceIndex:当前资源快照的索引
- placementStatuses:每群集放置状态信息
以下部分详细介绍了每个字段。
了解所选资源
该 selectedResources
字段列出放置选择的所有资源。 此字段允许检查放置中是否包含预期的资源。 下面是一个示例:
selectedResources:
- kind: Namespace
name: test
version: v1
- kind: ConfigMap
name: test-config
namespace: test
version: v1
envelope:
name: example-envelope
namespace: test
type: ResourceEnvelope
- kind: Deployment
name: web-app
namespace: test
group: apps
version: v1
- kind: Service
name: web-service
namespace: test
version: v1
- kind: Secret
name: app-secrets
namespace: test
version: v1
每个资源条目包括:
- 组:API 组(核心资源的空)
-
版本:API 版本(例如,
v1
)v1beta1
- kind:资源类型
- 名称:资源名称
- 命名空间:命名空间(用于命名空间资源)
-
信封:如果资源包装在信封中,则还会提供信封元数据,其中包括:
- 名称:信封的名称
- 命名空间:信封的命名空间
-
类型:信封的类型(例如
ResourceEnvelope
)
了解放置条件
该 conditions
数组提供有关整个放置的高级状态信息。 每个条件都遵循 Kubernetes 通用定义,该定义具有以下标准字段:
- 类型:条件类型(以下部分所述)
-
状态:
True
、False
或Unknown
- 原因:条件的简短原因代码
- 消息:人工可读说明
- lastTransitionTime:条件上次更改时
- observedGeneration:设置条件时放置的生成
ClusterResourcePlacement 条件类型
以下条件类型可用于 ClusterResourcePlacement:
ClusterResourcePlacementScheduled
指示放置是否成功计划到目标群集。
- True:根据放置策略选择所有必需的群集
- False:计划失败(例如,群集可用不足)
- 未知:计划决策处于挂起状态
conditions:
- type: ClusterResourcePlacementScheduled
status: "True"
reason: SchedulingPolicyFulfilled
message: "found all the clusters needed as specified by the scheduling policy"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
ClusterResourcePlacementRolloutStarted
指示是否在所选群集中启动推出。
- True:资源已开始向计划群集推出
- False:尚未启动推出
- 未知:推出决策处于挂起状态
conditions:
- type: ClusterResourcePlacementRolloutStarted
status: "True"
reason: RolloutStarted
message: "All 3 cluster(s) start rolling out the latest resource"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
ClusterResourcePlacementOverridden
指示是否成功应用资源替代。
- True:处理所有适用的替代
- False:某些替代无法应用
- 未知:替代处理处于挂起状态
conditions:
- type: ClusterResourcePlacementOverridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
ClusterResourcePlacementWorkSynchronized
指示是否在中心群集的每个群集命名空间中创建工作对象。
- True:所有工作对象都同步
- False:工作同步失败或不完整
- 未知:工作同步挂起
conditions:
- type: ClusterResourcePlacementWorkSynchronized
status: "True"
reason: SynchronizeSucceeded
message: "All 2 cluster(s) are synchronized to the latest resources on the hub cluster"
lastTransitionTime: "2023-11-10T08:23:43Z"
observedGeneration: 5
ClusterResourcePlacementApplied
指示是否将所有资源成功应用于成员群集。
- True:已成功应用于所有目标群集的所有资源
-
False:某些资源无法应用(检查
failedPlacements
) - 未知:应用作挂起
conditions:
- type: ClusterResourcePlacementApplied
status: "True"
reason: ApplySucceeded
message: "The selected resources are successfully applied to 3 clusters"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
ClusterResourcePlacementAvailable
指示放置的资源是否全部可用,并且已准备好在成员群集上。
- True:所有目标群集上都提供所有资源
- False:某些资源尚不可用
- 未知:可用性检查处于挂起状态
conditions:
- type: ClusterResourcePlacementAvailable
status: "True"
reason: ResourceAvailable
message: "The selected resources in 3 clusters are available now"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
ClusterResourcePlacementDiffReported
指示是否报告配置差异(使用 ReportDiff 策略时)。
- True:完整的差异报告可用
- False:差异报告失败或不完整
- 未知:差异报告挂起
conditions:
- type: ClusterResourcePlacementDiffReported
status: "True"
reason: DiffReportComplete
message: "Configuration differences are reported for all target clusters"
lastTransitionTime: "2023-11-10T08:16:45Z"
observedGeneration: 5
了解资源快照
该 observedResourceIndex
字段指示当前正在部署的资源快照:
observedResourceIndex: "1"
当以下情况时,Fleet Manager 会创建资源快照:
- 资源选择器更改
- 已修改所选资源
每个快照都有唯一的索引。 可以使用以下方法查看快照:
kubectl get clusterresourcesnapshot --selector=kubernetes-fleet.io/resource-index=1
了解每个群集的放置状态
该 placementStatuses
数组包含放置或尝试放置资源的每个群集的详细状态:
placementStatuses:
- clusterName: aks-member-1
observedResourceIndex: "1"
conditions:
- type: ResourceScheduled
status: "True"
reason: ScheduleSucceeded
message: "Successfully scheduled resources for placement in aks-member-1"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
- type: RolloutStarted
status: "True"
reason: RolloutStarted
message: "Detected the new changes on the resources and started the rollout process"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
- type: Overridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
- type: WorkSynchronized
status: "True"
reason: AllWorkSynced
message: "All of the works are synchronized to the latest"
lastTransitionTime: "2023-11-10T08:16:00Z"
observedGeneration: 5
- type: Applied
status: "True"
reason: AllWorkHaveBeenApplied
message: "All corresponding work objects are applied"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
- type: Available
status: "True"
reason: ResourceAvailable
message: "All resources are available on the target cluster"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
failedPlacements: []
driftedPlacements: []
diffedPlacements: []
- clusterName: aks-member-2
observedResourceIndex: "1"
conditions:
- type: ResourceScheduled
status: "True"
reason: ScheduleSucceeded
message: "Successfully scheduled resources for placement in aks-member-2"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
- type: Applied
status: "False"
reason: AppliedManifestFailedReason
message: "Failed to apply some manifests"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
failedPlacements:
- kind: Deployment
name: web-app
namespace: test
version: apps/v1
condition:
type: Applied
status: "False"
reason: AppliedManifestFailedReason
message: "Failed to apply manifest: insufficient resources"
lastTransitionTime: "2023-11-10T08:16:15Z"
按群集条件类型
每个群集的状态包括跟踪部署生命周期的条件:
ResourceScheduled
指示是否已成功选择群集进行放置。
- type: ResourceScheduled
status: "True"
reason: ScheduleSucceeded
message: "Successfully scheduled resources for placement in aks-member-1 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
RolloutStarted
指示是否已在此特定群集上启动推出。
- type: RolloutStarted
status: "True"
reason: RolloutStarted
message: "Detected new changes on the resources and started the rollout process"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
重写
指示是否为此群集应用资源替代。
- type: Overridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
WorkSynchronized
指示是否为此群集创建工作对象。
- type: WorkSynchronized
status: "True"
reason: AllWorkSynced
message: "All of the works are synchronized to the latest"
lastTransitionTime: "2023-11-10T08:16:00Z"
observedGeneration: 5
已应用
指示是否成功将所有资源应用于此群集。
- type: Applied
status: "True"
reason: AllWorkHaveBeenApplied
message: "All corresponding work objects are applied"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
可用
指示所有资源是否都可用,并已在此群集上准备就绪。
- type: Available
status: "True"
reason: ResourceAvailable
message: "All resources are available on the target cluster"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
DiffReported
指示是否报告此群集的配置差异。
- type: DiffReported
status: "True"
reason: DiffReportComplete
message: "Configuration differences are reported for this cluster"
lastTransitionTime: "2023-11-10T08:16:45Z"
observedGeneration: 5
了解失败的位置
如果资源无法应用于群集,则会在 failedPlacements
数组中记录详细信息:
failedPlacements:
- kind: Deployment
name: my-app
namespace: default
version: apps/v1
condition:
type: Applied
status: "False"
reason: AppliedManifestFailedReason
message: "Failed to apply manifest: namespaces 'app' not found"
lastTransitionTime: "2023-12-06T00:09:53Z"
envelope:
name: example
namespace: app
type: ResourceEnvelope
- kind: Service
name: my-service
namespace: default
version: v1
condition:
type: Applied
status: "False"
reason: AppliedManifestFailedReason
message: "Failed to apply manifest: Service 'my-service' is forbidden: User 'system:serviceaccount:fleet-system:fleet-agent' cannot create resource 'services' in API group '' in the namespace 'default'"
lastTransitionTime: "2023-12-06T00:10:15Z"
- kind: ConfigMap
name: app-config
namespace: production
version: v1
condition:
type: Applied
status: "False"
reason: AppliedManifestFailedReason
message: "Failed to apply manifest: configmaps 'app-config' already exists"
lastTransitionTime: "2023-12-06T00:10:30Z"
每个失败的位置包括:
- 资源标识:组、版本、类型、名称、命名空间
- 条件:特定故障条件
- 信封:信封信息(如果适用)
了解偏移位置
机群管理器始终报告偏离其所需状态的资源:
driftedPlacements:
- kind: Namespace
name: web
version: v1
observationTime: "2025-03-19T06:50:25Z"
firstDriftedObservedTime: "2025-03-19T06:49:54Z"
targetClusterObservedGeneration: 12
observedDrifts:
- path: "/metadata/labels/owner"
valueInHub: "simon"
valueInMember: "chen"
- path: "/metadata/annotations/purpose"
valueInHub: "production"
valueInMember: "testing"
- kind: Deployment
name: web-app
namespace: web
group: apps
version: v1
observationTime: "2025-03-19T06:50:25Z"
firstDriftedObservedTime: "2025-03-19T06:49:54Z"
targetClusterObservedGeneration: 8
observedDrifts:
- path: "/spec/replicas"
valueInHub: "3"
valueInMember: "5"
- path: "/spec/template/spec/containers/0/image"
valueInHub: "nginx:1.20"
valueInMember: "nginx:1.21"
- kind: ConfigMap
name: app-config
namespace: web
version: v1
observationTime: "2025-03-19T06:50:25Z"
firstDriftedObservedTime: "2025-03-19T06:49:54Z"
targetClusterObservedGeneration: 5
observedDrifts:
- path: "/data/environment"
valueInHub: "production"
valueInMember: "staging"
每个偏移位置包括:
- 资源标识:组、版本、类型、名称、命名空间
- 观察时间:上次观测偏移时间
- firstDriftedObservedTime:首次检测到偏移时
- targetClusterObservedGeneration:在成员群集上生成资源
- observedDrifts:配置差异的详细列表
了解差异放置
使用 ReportDiff 应用策略时,Fleet Manager 报告配置差异:
diffedPlacements:
- kind: Service
name: my-service
namespace: default
version: v1
observationTime: "2025-03-19T06:50:25Z"
firstDiffedObservedTime: "2025-03-19T06:49:54Z"
targetClusterObservedGeneration: 8
observedDiffs:
- path: "/spec/ports/0/nodePort"
valueInHub: ""
valueInMember: "30080"
- path: "/spec/clusterIP"
valueInHub: ""
valueInMember: "10.96.100.200"
- kind: Deployment
name: web-app
namespace: default
group: apps
version: v1
observationTime: "2025-03-19T06:50:25Z"
firstDiffedObservedTime: "2025-03-19T06:49:54Z"
targetClusterObservedGeneration: 12
observedDiffs:
- path: "/status/replicas"
valueInHub: ""
valueInMember: "3"
- path: "/status/readyReplicas"
valueInHub: ""
valueInMember: "3"
- path: "/metadata/generation"
valueInHub: "1"
valueInMember: "2"
差异放置的结构与偏移放置类似,但用于不同的方案:
- 偏移位置:在应用资源但随后更改时使用
- 差异放置:与 ReportDiff 策略一起使用或未满足接管条件时
监视放置进度
若要有效监视放置进度,请检查以下关键指标:
-
放置的资源:验证
ClusterResourcePlacementWorkSynchronized
是否为 True -
总体运行状况:查看
ClusterResourcePlacementApplied
条件 - 按群集状态:查看每个目标群集的条件
-
失败的位置:检查数组中的任何
failedPlacements
条目
完成状态示例
下面是显示 ClusterResourcePlacement 的完整状态的综合示例:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: web-app-placement
generation: 5
spec:
resourceSelectors:
- group: ""
kind: Namespace
name: web-app
version: v1
- group: apps
kind: Deployment
name: web-server
namespace: web-app
version: v1
- group: ""
kind: Service
name: web-service
namespace: web-app
version: v1
policy:
placementType: PickN
numberOfClusters: 2
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- matchLabels:
region: chinanorth3
status:
conditions:
- type: ClusterResourcePlacementScheduled
status: "True"
reason: SchedulingPolicyFulfilled
message: "found all the clusters needed as specified by the scheduling policy"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
- type: ClusterResourcePlacementRolloutStarted
status: "True"
reason: RolloutStarted
message: "All 2 cluster(s) start rolling out the latest resource"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
- type: ClusterResourcePlacementOverridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
- type: ClusterResourcePlacementWorkSynchronized
status: "True"
reason: SynchronizeSucceeded
message: "All 2 cluster(s) are synchronized to the latest resources on the hub cluster"
lastTransitionTime: "2023-11-10T08:16:00Z"
observedGeneration: 5
- type: ClusterResourcePlacementApplied
status: "True"
reason: ApplySucceeded
message: "The selected resources are successfully applied to 2 clusters"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
- type: ClusterResourcePlacementAvailable
status: "True"
reason: ResourceAvailable
message: "The selected resources in 2 cluster are available now"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
observedResourceIndex: "1"
selectedResources:
- group: ""
kind: Namespace
name: web-app
version: v1
- group: apps
kind: Deployment
name: web-server
namespace: web-app
version: v1
- group: ""
kind: Service
name: web-service
namespace: web-app
version: v1
placementStatuses:
- clusterName: aks-chinanorth3-1
observedResourceIndex: "1"
conditions:
- type: ResourceScheduled
status: "True"
reason: ScheduleSucceeded
message: "Successfully scheduled resources for placement in aks-chinanorth3-1 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
- type: RolloutStarted
status: "True"
reason: RolloutStarted
message: "Detected the new changes on the resources and started the rollout process"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
- type: Overridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
- type: WorkSynchronized
status: "True"
reason: AllWorkSynced
message: "All of the works are synchronized to the latest"
lastTransitionTime: "2023-11-10T08:16:00Z"
observedGeneration: 5
- type: Applied
status: "True"
reason: AllWorkHaveBeenApplied
message: "All corresponding work objects are applied"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
- type: Available
status: "True"
reason: ResourceAvailable
message: "All resources are available on the target cluster"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
failedPlacements: []
driftedPlacements: []
diffedPlacements: []
- clusterName: aks-chinanorth3-2
observedResourceIndex: "1"
conditions:
- type: ResourceScheduled
status: "True"
reason: ScheduleSucceeded
message: "Successfully scheduled resources for placement in aks-chinanorth3-2 (affinity score: 0, topology spread score: 0): picked by scheduling policy"
lastTransitionTime: "2023-11-10T08:14:52Z"
observedGeneration: 5
- type: RolloutStarted
status: "True"
reason: RolloutStarted
message: "Detected new changes on the resources and started the rollout process"
lastTransitionTime: "2023-11-10T08:15:30Z"
observedGeneration: 5
- type: Overridden
status: "True"
reason: NoOverrideSpecified
message: "No override rules are configured for the selected resources"
lastTransitionTime: "2023-11-10T08:15:45Z"
observedGeneration: 5
- type: WorkSynchronized
status: "True"
reason: AllWorkSynced
message: "All of the works are synchronized to the latest"
lastTransitionTime: "2023-11-10T08:16:00Z"
observedGeneration: 5
- type: Applied
status: "True"
reason: AllWorkHaveBeenApplied
message: "All corresponding work objects are applied"
lastTransitionTime: "2023-11-10T08:16:15Z"
observedGeneration: 5
- type: Available
status: "True"
reason: ResourceAvailable
message: "All resources are available on the target cluster"
lastTransitionTime: "2023-11-10T08:16:30Z"
observedGeneration: 5
failedPlacements: []
driftedPlacements: []
diffedPlacements: []
此示例显示:
- 成功的计划:放置能够找到两个与放置策略匹配的群集
- 成功推出:所有资源都部署到两个目标群集
- 无替代:未配置或需要任何资源替代
- 同步的工作:创建和同步工作对象
- 已应用的资源:已成功应用所有资源
- 可用资源:所有资源都正在运行且可用
- 清理状态:无失败、偏移或差异放置
相关内容
若要了解有关资源传播的详细信息,请参阅以下资源: