使用 ClusterStagedUpdateRun 跨成员群集协调分阶段推出

Azure Kubernetes Fleet Manager 分阶段更新运行提供了使用分阶段过程跨多个成员群集部署工作负荷的受控方法。 通过此方法,可以按顺序部署到目标群集,并在阶段之间部署可选的等待时间和审批入口,从而最大程度地降低风险。

本文介绍如何创建和执行暂存更新运行,以逐步部署工作负载,并在需要时回滚到以前的版本。

先决条件

  • 需要具有活动订阅的 Azure 帐户。 创建帐户

  • 若要了解本文中使用的概念和术语,请阅读 分阶段推出策略的概念概述

  • 需要安装 Azure CLI 2.58.0 或更高版本才能完成本文。 若要安装或升级,请参阅 安装 Azure CLI

  • 如果还没有 Kubernetes CLI (kubectl),可以使用以下命令安装它:

    az aks install-cli
    
  • 需要 fleet Azure CLI 扩展。 可以通过运行以下命令来安装它:

    az extension add --name fleet
    

    运行命令 az extension update 以更新到最新版本的扩展:

    az extension update --name fleet
    

配置演示环境

此演示在具有中心群集和三个成员群集的 Fleet Manager 上运行。 如果没有,请按照 快速入门 创建包含中心群集的机群管理器。 然后,将 Azure Kubernetes 服务 (AKS) 群集加入为成员。

本教程演示了使用具有以下标签的三个成员群集的演示机群环境的分阶段更新运行:

群集名称 标签
member1 environment=canary, order=2
member2 environment=staging
member3 environment=canary, order=1

这些标签允许我们创建按环境对群集进行分组的阶段,并控制每个阶段内的部署顺序。

准备用于放置的工作负荷

接下来,将工作负荷发布到中心群集,以便将其放置在成员群集上。

为中心群集上的工作负荷创建命名空间和配置映射:

kubectl create ns test-namespace
kubectl create cm test-cm --from-literal=key=value1 -n test-namespace

若要部署资源,请创建 ClusterResourcePlacement:

注释

设置为spec.strategy.typeExternal允许通过 a ClusterStagedUpdateRun. 触发的推出。

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: example-placement
spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      name: test-namespace
      version: v1
  policy:
    placementType: PickAll
  strategy:
    type: External

应计划这三个群集,因为我们使用 PickAll 策略,但尚未在成员群集上部署任何资源,因为我们没有创建 ClusterStagedUpdateRun资源。

验证放置是否计划:

kubectl get crp example-placement

输出应类似于以下示例:

NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1                                           51s

使用资源快照

当资源发生更改时,Fleet Manager 会创建资源快照。 每个快照都有一个唯一索引,可用于引用特定版本的资源。

小窍门

有关资源快照及其工作原理的详细信息,请参阅 了解资源快照

检查当前资源快照

检查当前资源快照:

kubectl get clusterresourcesnapshots --show-labels

输出应类似于以下示例:

NAME                           GEN   AGE   LABELS
example-placement-0-snapshot   1     60s   kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0

我们只有一个版本的快照。 这是最新的 (kubernetes-fleet.io/is-latest-snapshot=true) 并具有资源索引 0 (kubernetes-fleet.io/resource-index=0)。

创建新的资源快照

现在,使用新值修改 configmap:

kubectl edit cm test-cm -n test-namespace

将值更新 value1value2

kubectl get configmap test-cm -n test-namespace -o yaml

输出应类似于以下示例:

apiVersion: v1
data:
  key: value2 # value updated here, old value: value1
kind: ConfigMap
metadata:
  creationTimestamp: ...
  name: test-cm
  namespace: test-namespace
  resourceVersion: ...
  uid: ...

现在,应分别看到两个版本包含索引 0 和 1 的资源快照:

kubectl get clusterresourcesnapshots --show-labels

输出应类似于以下示例:

NAME                           GEN   AGE    LABELS
example-placement-0-snapshot   1     2m6s   kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
example-placement-1-snapshot   1     10s    kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=1

最新标签设置为 example-placement-1-snapshot,其中包含最新的 configmap 数据:

kubectl get clusterresourcesnapshots example-placement-1-snapshot -o yaml

输出应类似于以下示例:

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
  annotations:
    kubernetes-fleet.io/number-of-enveloped-object: "0"
    kubernetes-fleet.io/number-of-resource-snapshots: "1"
    kubernetes-fleet.io/resource-hash: 10dd7a3d1e5f9849afe956cfbac080a60671ad771e9bda7dd34415f867c75648
  creationTimestamp: "2025-07-22T21:26:54Z"
  generation: 1
  labels:
    kubernetes-fleet.io/is-latest-snapshot: "true"
    kubernetes-fleet.io/parent-CRP: example-placement
    kubernetes-fleet.io/resource-index: "1"
  name: example-placement-1-snapshot
  ownerReferences:
  - apiVersion: placement.kubernetes-fleet.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterResourcePlacement
    name: example-placement
    uid: e7d59513-b3b6-4904-864a-c70678fd6f65
  resourceVersion: "19994"
  uid: 79ca0bdc-0b0a-4c40-b136-7f701e85cdb6
spec:
  selectedResources:
  - apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: test-namespace
      name: test-namespace
    spec:
      finalizers:
      - kubernetes
  - apiVersion: v1
    data:
      key: value2 # latest value: value2, old value: value1
    kind: ConfigMap
    metadata:
      name: test-cm
      namespace: test-namespace

部署 ClusterStagedUpdateStrategy

定义 ClusterStagedUpdateStrategy 将群集分组到阶段并指定推出序列的业务流程模式。 它按标签选择成员群集。 对于我们的演示,我们将创建一个包含两个阶段的暂存和 Canary:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateStrategy
metadata:
  name: example-strategy
spec:
  stages:
    - name: staging
      labelSelector:
        matchLabels:
          environment: staging
      afterStageTasks:
        - type: TimedWait
          waitTime: 1m
    - name: canary
      labelSelector:
        matchLabels:
          environment: canary
      sortingLabelKey: order
      afterStageTasks:
        - type: Approval

部署 ClusterStagedUpdateRun 以推出最新更改

执行ClusterStagedUpdateRun以下项ClusterResourcePlacementClusterStagedUpdateStrategy的推出。 若要触发 ClusterResourcePlacement(CRP)的暂存更新运行,请创建一个 ClusterStagedUpdateRun 指定 CRP 名称、updateRun 策略名称和最新的资源快照索引(“1”):

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy

分阶段更新运行已初始化并运行:

kubectl get csur example-run

输出应类似于以下示例:

NAME          PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                         0                       True                      7s

更详细地查看一分钟 TimedWait 后的状态:

kubectl get csur example-run -o yaml

输出应类似于以下示例:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  ...
  name: example-run
  ...
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy
status:
  conditions:
  - lastTransitionTime: "2025-07-22T21:28:08Z"
    message: ClusterStagedUpdateRun initialized successfully
    observedGeneration: 1
    reason: UpdateRunInitializedSuccessfully
    status: "True" # the updateRun is initialized successfully
    type: Initialized
  - lastTransitionTime: "2025-07-22T21:29:53Z"
    message: The updateRun is waiting for after-stage tasks in stage canary to complete
    observedGeneration: 1
    reason: UpdateRunWaiting
    status: "False" # the updateRun is still progressing and waiting for approval
    type: Progressing
  deletionStageStatus:
    clusters: [] # no clusters need to be cleaned up
    stageName: kubernetes-fleet.io/deleteStage
  policyObservedClusterCount: 3 # number of clusters to be updated
  policySnapshotIndexUsed: "0"
  stagedUpdateStrategySnapshot: # snapshot of the strategy used for this update run
    stages:
    - afterStageTasks:
      - type: TimedWait
        waitTime: 1m0s
      labelSelector:
        matchLabels:
          environment: staging
      name: staging
    - afterStageTasks:
      - type: Approval
      labelSelector:
        matchLabels:
          environment: canary
      name: canary
      sortingLabelKey: order
  stagesStatus: # detailed status for each stage
  - afterStageTaskStatus:
    - conditions:
      - lastTransitionTime: "2025-07-22T21:29:23Z"
        message: Wait time elapsed
        observedGeneration: 1
        reason: AfterStageTaskWaitTimeElapsed
        status: "True" # the wait after-stage task has completed
        type: WaitTimeElapsed
      type: TimedWait
    clusters:
    - clusterName: member2 # stage staging contains member2 cluster only
      conditions:
      - lastTransitionTime: "2025-07-22T21:28:08Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:28:23Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member2 is updated successfully
        type: Succeeded
    conditions:
    - lastTransitionTime: "2025-07-22T21:28:23Z"
      message: All clusters in the stage are updated and after-stage tasks are completed
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "False"
      type: Progressing
    - lastTransitionTime: "2025-07-22T21:29:23Z"
      message: Stage update completed successfully
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True" # stage staging has completed successfully
      type: Succeeded
    endTime: "2025-07-22T21:29:23Z"
    stageName: staging
    startTime: "2025-07-22T21:28:08Z"
  - afterStageTaskStatus:
    - approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:53Z"
        message: ClusterApprovalRequest is created
        observedGeneration: 1
        reason: AfterStageTaskApprovalRequestCreated
        status: "True"
        type: ApprovalRequestCreated
      type: Approval
    clusters:
    - clusterName: member3 # according to the labelSelector and sortingLabelKey, member3 is selected first in this stage
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:23Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:29:38Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member3 update is completed
        type: Succeeded
    - clusterName: member1 # member1 is selected after member3 because of order=2 label
      conditions:
      - lastTransitionTime: "2025-07-22T21:29:38Z"
        message: Cluster update started
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: "2025-07-22T21:29:53Z"
        message: Cluster update completed successfully
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member1 update is completed
        type: Succeeded
    conditions:
    - lastTransitionTime: "2025-07-22T21:29:53Z"
      message: All clusters in the stage are updated, waiting for after-stage tasks
        to complete
      observedGeneration: 1
      reason: StageUpdatingWaiting
      status: "False" # stage canary is waiting for approval task completion
      type: Progressing
    stageName: canary
    startTime: "2025-07-22T21:29:23Z"

我们可以看到暂存阶段的 TimedWait 时间段已过,我们还会看到 ClusterApprovalRequest Canary 阶段中审批任务的对象已创建。 我们可以检查生成的 ClusterApprovalRequest,并查看它尚未获得批准

kubectl get clusterapprovalrequest

输出应类似于以下示例:

NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary                                 2m39s

可以通过创建 json 修补程序文件并应用它来批准 ClusterApprovalRequest 该文件:

cat << EOF > approval.json
"status": {
    "conditions": [
        {
            "lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
            "message": "lgtm",
            "observedGeneration": 1,
            "reason": "testPassed",
            "status": "True",
            "type": "Approved"
        }
    ]
}
EOF

提交修补程序请求,以使用创建的 JSON 文件进行批准。

kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json

然后验证它是否已获得批准:

kubectl get clusterapprovalrequest

输出应类似于以下示例:

NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary   True       True               3m35s

现在 ClusterStagedUpdateRun 能够继续并完成:

kubectl get csur example-run

输出应类似于以下示例:

NAME          PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                         0                       True          True        5m28s

此外 ClusterResourcePlacement ,还显示已完成的推出和资源在所有成员群集上可用:

kubectl get crp example-placement

输出应类似于以下示例:

NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1               True        1               8m55s

应在所有三个成员群集上部署 configmap test-cm,其中包含最新数据:

apiVersion: v1
data:
  key: value2
kind: ConfigMap
metadata:
  ...
  name: test-cm
  namespace: test-namespace
  ...

部署第二个 ClusterStagedUpdateRun 以回滚到以前的版本

现在,假设工作负荷管理员想要回滚配置映射更改,并将该值 value2 还原回 value1去。 他们可以使用以前的资源快照索引“0”在我们的上下文中创建新的 ClusterStagedUpdateRun,而不是从中心手动更新配置映射,并且可以重复使用相同的策略:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run-2
spec:
  placementName: example-placement
  resourceSnapshotIndex: "0"
  stagedRolloutStrategyName: example-strategy

让我们检查一下新的 ClusterStagedUpdateRun

kubectl get csur

输出应类似于以下示例:

NAME            PLACEMENT           RESOURCE-SNAPSHOT-INDEX   POLICY-SNAPSHOT-INDEX   INITIALIZED   SUCCEEDED   AGE
example-run     example-placement   1                         0                       True          True        13m
example-run-2   example-placement   0                         0                       True                      9s

经过一分钟 TimedWait 后,应会看到 ClusterApprovalRequest 为新 ClusterStagedUpdateRun对象创建的对象:

kubectl get clusterapprovalrequest

输出应类似于以下示例:

NAME                   UPDATE-RUN      STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-2-canary   example-run-2   canary                                 75s
example-run-canary     example-run     canary   True       True               14m

若要批准新 ClusterApprovalRequest 对象,让我们重复使用相同的 approval.json 文件来修补它:

kubectl patch clusterapprovalrequests example-run-2-canary --type='merge' --subresource=status --patch-file approval.json

验证新对象是否已获得批准:

kubectl get clusterapprovalrequest                                                                            

输出应类似于以下示例:

NAME                   UPDATE-RUN      STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-2-canary   example-run-2   canary   True       True               2m7s
example-run-canary     example-run     canary   True       True               15m

现在,应在所有三个成员群集上部署 configmap test-cm ,并将数据还原为 value1

apiVersion: v1
data:
  key: value1
kind: ConfigMap
metadata:
  ...
  name: test-cm
  namespace: test-namespace
  ...

清理资源

完成本教程后,可以清理所创建的资源:

# Delete the staged update runs
kubectl delete csur example-run example-run-2

# Delete the staged update strategy
kubectl delete csus example-strategy

# Delete the cluster resource placement
kubectl delete crp example-placement

# Delete the test namespace (this will also delete the configmap)
kubectl delete namespace test-namespace

后续步骤

本文介绍了如何使用 ClusterStagedUpdateRun 跨成员群集协调分阶段推出。 你已创建暂存更新策略、执行渐进式推出,并执行回退到以前的版本。

若要了解有关分阶段更新运行和相关概念的详细信息,请参阅以下资源: