Azure Kubernetes Fleet Manager 分阶段更新运行提供了使用分阶段过程跨多个成员群集部署工作负荷的受控方法。 通过此方法,可以按顺序部署到目标群集,并在阶段之间部署可选的等待时间和审批入口,从而最大程度地降低风险。
本文介绍如何创建和执行暂存更新运行,以逐步部署工作负载,并在需要时回滚到以前的版本。
先决条件
需要具有活动订阅的 Azure 帐户。 创建帐户。
若要了解本文中使用的概念和术语,请阅读 分阶段推出策略的概念概述。
需要安装 Azure CLI 2.58.0 或更高版本才能完成本文。 若要安装或升级,请参阅 安装 Azure CLI。
如果还没有 Kubernetes CLI (kubectl),可以使用以下命令安装它:
az aks install-cli
需要
fleet
Azure CLI 扩展。 可以通过运行以下命令来安装它:az extension add --name fleet
运行命令
az extension update
以更新到最新版本的扩展:az extension update --name fleet
配置演示环境
此演示在具有中心群集和三个成员群集的 Fleet Manager 上运行。 如果没有,请按照 快速入门 创建包含中心群集的机群管理器。 然后,将 Azure Kubernetes 服务 (AKS) 群集加入为成员。
本教程演示了使用具有以下标签的三个成员群集的演示机群环境的分阶段更新运行:
群集名称 | 标签 |
---|---|
member1 | environment=canary, order=2 |
member2 | environment=staging |
member3 | environment=canary, order=1 |
这些标签允许我们创建按环境对群集进行分组的阶段,并控制每个阶段内的部署顺序。
准备用于放置的工作负荷
接下来,将工作负荷发布到中心群集,以便将其放置在成员群集上。
为中心群集上的工作负荷创建命名空间和配置映射:
kubectl create ns test-namespace
kubectl create cm test-cm --from-literal=key=value1 -n test-namespace
若要部署资源,请创建 ClusterResourcePlacement:
注释
设置为spec.strategy.type
External
允许通过 a ClusterStagedUpdateRun
. 触发的推出。
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: example-placement
spec:
resourceSelectors:
- group: ""
kind: Namespace
name: test-namespace
version: v1
policy:
placementType: PickAll
strategy:
type: External
应计划这三个群集,因为我们使用 PickAll
策略,但尚未在成员群集上部署任何资源,因为我们没有创建 ClusterStagedUpdateRun
资源。
验证放置是否计划:
kubectl get crp example-placement
输出应类似于以下示例:
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
example-placement 1 True 1 51s
使用资源快照
当资源发生更改时,Fleet Manager 会创建资源快照。 每个快照都有一个唯一索引,可用于引用特定版本的资源。
小窍门
有关资源快照及其工作原理的详细信息,请参阅 了解资源快照。
检查当前资源快照
检查当前资源快照:
kubectl get clusterresourcesnapshots --show-labels
输出应类似于以下示例:
NAME GEN AGE LABELS
example-placement-0-snapshot 1 60s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
我们只有一个版本的快照。 这是最新的 (kubernetes-fleet.io/is-latest-snapshot=true
) 并具有资源索引 0 (kubernetes-fleet.io/resource-index=0
)。
创建新的资源快照
现在,使用新值修改 configmap:
kubectl edit cm test-cm -n test-namespace
将值更新 value1
为 value2
:
kubectl get configmap test-cm -n test-namespace -o yaml
输出应类似于以下示例:
apiVersion: v1
data:
key: value2 # value updated here, old value: value1
kind: ConfigMap
metadata:
creationTimestamp: ...
name: test-cm
namespace: test-namespace
resourceVersion: ...
uid: ...
现在,应分别看到两个版本包含索引 0 和 1 的资源快照:
kubectl get clusterresourcesnapshots --show-labels
输出应类似于以下示例:
NAME GEN AGE LABELS
example-placement-0-snapshot 1 2m6s kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
example-placement-1-snapshot 1 10s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=1
最新标签设置为 example-placement-1-snapshot,其中包含最新的 configmap 数据:
kubectl get clusterresourcesnapshots example-placement-1-snapshot -o yaml
输出应类似于以下示例:
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
annotations:
kubernetes-fleet.io/number-of-enveloped-object: "0"
kubernetes-fleet.io/number-of-resource-snapshots: "1"
kubernetes-fleet.io/resource-hash: 10dd7a3d1e5f9849afe956cfbac080a60671ad771e9bda7dd34415f867c75648
creationTimestamp: "2025-07-22T21:26:54Z"
generation: 1
labels:
kubernetes-fleet.io/is-latest-snapshot: "true"
kubernetes-fleet.io/parent-CRP: example-placement
kubernetes-fleet.io/resource-index: "1"
name: example-placement-1-snapshot
ownerReferences:
- apiVersion: placement.kubernetes-fleet.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: ClusterResourcePlacement
name: example-placement
uid: e7d59513-b3b6-4904-864a-c70678fd6f65
resourceVersion: "19994"
uid: 79ca0bdc-0b0a-4c40-b136-7f701e85cdb6
spec:
selectedResources:
- apiVersion: v1
kind: Namespace
metadata:
labels:
kubernetes.io/metadata.name: test-namespace
name: test-namespace
spec:
finalizers:
- kubernetes
- apiVersion: v1
data:
key: value2 # latest value: value2, old value: value1
kind: ConfigMap
metadata:
name: test-cm
namespace: test-namespace
部署 ClusterStagedUpdateStrategy
定义 ClusterStagedUpdateStrategy
将群集分组到阶段并指定推出序列的业务流程模式。 它按标签选择成员群集。 对于我们的演示,我们将创建一个包含两个阶段的暂存和 Canary:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateStrategy
metadata:
name: example-strategy
spec:
stages:
- name: staging
labelSelector:
matchLabels:
environment: staging
afterStageTasks:
- type: TimedWait
waitTime: 1m
- name: canary
labelSelector:
matchLabels:
environment: canary
sortingLabelKey: order
afterStageTasks:
- type: Approval
部署 ClusterStagedUpdateRun 以推出最新更改
执行ClusterStagedUpdateRun
以下项ClusterResourcePlacement
ClusterStagedUpdateStrategy
的推出。 若要触发 ClusterResourcePlacement(CRP)的暂存更新运行,请创建一个 ClusterStagedUpdateRun
指定 CRP 名称、updateRun 策略名称和最新的资源快照索引(“1”):
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
name: example-run
spec:
placementName: example-placement
resourceSnapshotIndex: "1"
stagedRolloutStrategyName: example-strategy
分阶段更新运行已初始化并运行:
kubectl get csur example-run
输出应类似于以下示例:
NAME PLACEMENT RESOURCE-SNAPSHOT-INDEX POLICY-SNAPSHOT-INDEX INITIALIZED SUCCEEDED AGE
example-run example-placement 1 0 True 7s
更详细地查看一分钟 TimedWait
后的状态:
kubectl get csur example-run -o yaml
输出应类似于以下示例:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
...
name: example-run
...
spec:
placementName: example-placement
resourceSnapshotIndex: "1"
stagedRolloutStrategyName: example-strategy
status:
conditions:
- lastTransitionTime: "2025-07-22T21:28:08Z"
message: ClusterStagedUpdateRun initialized successfully
observedGeneration: 1
reason: UpdateRunInitializedSuccessfully
status: "True" # the updateRun is initialized successfully
type: Initialized
- lastTransitionTime: "2025-07-22T21:29:53Z"
message: The updateRun is waiting for after-stage tasks in stage canary to complete
observedGeneration: 1
reason: UpdateRunWaiting
status: "False" # the updateRun is still progressing and waiting for approval
type: Progressing
deletionStageStatus:
clusters: [] # no clusters need to be cleaned up
stageName: kubernetes-fleet.io/deleteStage
policyObservedClusterCount: 3 # number of clusters to be updated
policySnapshotIndexUsed: "0"
stagedUpdateStrategySnapshot: # snapshot of the strategy used for this update run
stages:
- afterStageTasks:
- type: TimedWait
waitTime: 1m0s
labelSelector:
matchLabels:
environment: staging
name: staging
- afterStageTasks:
- type: Approval
labelSelector:
matchLabels:
environment: canary
name: canary
sortingLabelKey: order
stagesStatus: # detailed status for each stage
- afterStageTaskStatus:
- conditions:
- lastTransitionTime: "2025-07-22T21:29:23Z"
message: Wait time elapsed
observedGeneration: 1
reason: AfterStageTaskWaitTimeElapsed
status: "True" # the wait after-stage task has completed
type: WaitTimeElapsed
type: TimedWait
clusters:
- clusterName: member2 # stage staging contains member2 cluster only
conditions:
- lastTransitionTime: "2025-07-22T21:28:08Z"
message: Cluster update started
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: "2025-07-22T21:28:23Z"
message: Cluster update completed successfully
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True" # member2 is updated successfully
type: Succeeded
conditions:
- lastTransitionTime: "2025-07-22T21:28:23Z"
message: All clusters in the stage are updated and after-stage tasks are completed
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "False"
type: Progressing
- lastTransitionTime: "2025-07-22T21:29:23Z"
message: Stage update completed successfully
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "True" # stage staging has completed successfully
type: Succeeded
endTime: "2025-07-22T21:29:23Z"
stageName: staging
startTime: "2025-07-22T21:28:08Z"
- afterStageTaskStatus:
- approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage
conditions:
- lastTransitionTime: "2025-07-22T21:29:53Z"
message: ClusterApprovalRequest is created
observedGeneration: 1
reason: AfterStageTaskApprovalRequestCreated
status: "True"
type: ApprovalRequestCreated
type: Approval
clusters:
- clusterName: member3 # according to the labelSelector and sortingLabelKey, member3 is selected first in this stage
conditions:
- lastTransitionTime: "2025-07-22T21:29:23Z"
message: Cluster update started
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: "2025-07-22T21:29:38Z"
message: Cluster update completed successfully
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True" # member3 update is completed
type: Succeeded
- clusterName: member1 # member1 is selected after member3 because of order=2 label
conditions:
- lastTransitionTime: "2025-07-22T21:29:38Z"
message: Cluster update started
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: "2025-07-22T21:29:53Z"
message: Cluster update completed successfully
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True" # member1 update is completed
type: Succeeded
conditions:
- lastTransitionTime: "2025-07-22T21:29:53Z"
message: All clusters in the stage are updated, waiting for after-stage tasks
to complete
observedGeneration: 1
reason: StageUpdatingWaiting
status: "False" # stage canary is waiting for approval task completion
type: Progressing
stageName: canary
startTime: "2025-07-22T21:29:23Z"
我们可以看到暂存阶段的 TimedWait 时间段已过,我们还会看到 ClusterApprovalRequest
Canary 阶段中审批任务的对象已创建。 我们可以检查生成的 ClusterApprovalRequest,并查看它尚未获得批准
kubectl get clusterapprovalrequest
输出应类似于以下示例:
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-canary example-run canary 2m39s
可以通过创建 json 修补程序文件并应用它来批准 ClusterApprovalRequest
该文件:
cat << EOF > approval.json
"status": {
"conditions": [
{
"lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"message": "lgtm",
"observedGeneration": 1,
"reason": "testPassed",
"status": "True",
"type": "Approved"
}
]
}
EOF
提交修补程序请求,以使用创建的 JSON 文件进行批准。
kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json
然后验证它是否已获得批准:
kubectl get clusterapprovalrequest
输出应类似于以下示例:
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-canary example-run canary True True 3m35s
现在 ClusterStagedUpdateRun
能够继续并完成:
kubectl get csur example-run
输出应类似于以下示例:
NAME PLACEMENT RESOURCE-SNAPSHOT-INDEX POLICY-SNAPSHOT-INDEX INITIALIZED SUCCEEDED AGE
example-run example-placement 1 0 True True 5m28s
此外 ClusterResourcePlacement
,还显示已完成的推出和资源在所有成员群集上可用:
kubectl get crp example-placement
输出应类似于以下示例:
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
example-placement 1 True 1 True 1 8m55s
应在所有三个成员群集上部署 configmap test-cm,其中包含最新数据:
apiVersion: v1
data:
key: value2
kind: ConfigMap
metadata:
...
name: test-cm
namespace: test-namespace
...
部署第二个 ClusterStagedUpdateRun 以回滚到以前的版本
现在,假设工作负荷管理员想要回滚配置映射更改,并将该值 value2
还原回 value1
去。 他们可以使用以前的资源快照索引“0”在我们的上下文中创建新的 ClusterStagedUpdateRun,而不是从中心手动更新配置映射,并且可以重复使用相同的策略:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
name: example-run-2
spec:
placementName: example-placement
resourceSnapshotIndex: "0"
stagedRolloutStrategyName: example-strategy
让我们检查一下新的 ClusterStagedUpdateRun
:
kubectl get csur
输出应类似于以下示例:
NAME PLACEMENT RESOURCE-SNAPSHOT-INDEX POLICY-SNAPSHOT-INDEX INITIALIZED SUCCEEDED AGE
example-run example-placement 1 0 True True 13m
example-run-2 example-placement 0 0 True 9s
经过一分钟 TimedWait
后,应会看到 ClusterApprovalRequest
为新 ClusterStagedUpdateRun
对象创建的对象:
kubectl get clusterapprovalrequest
输出应类似于以下示例:
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-2-canary example-run-2 canary 75s
example-run-canary example-run canary True True 14m
若要批准新 ClusterApprovalRequest
对象,让我们重复使用相同的 approval.json
文件来修补它:
kubectl patch clusterapprovalrequests example-run-2-canary --type='merge' --subresource=status --patch-file approval.json
验证新对象是否已获得批准:
kubectl get clusterapprovalrequest
输出应类似于以下示例:
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-2-canary example-run-2 canary True True 2m7s
example-run-canary example-run canary True True 15m
现在,应在所有三个成员群集上部署 configmap test-cm
,并将数据还原为 value1
:
apiVersion: v1
data:
key: value1
kind: ConfigMap
metadata:
...
name: test-cm
namespace: test-namespace
...
清理资源
完成本教程后,可以清理所创建的资源:
# Delete the staged update runs
kubectl delete csur example-run example-run-2
# Delete the staged update strategy
kubectl delete csus example-strategy
# Delete the cluster resource placement
kubectl delete crp example-placement
# Delete the test namespace (this will also delete the configmap)
kubectl delete namespace test-namespace
后续步骤
本文介绍了如何使用 ClusterStagedUpdateRun 跨成员群集协调分阶段推出。 你已创建暂存更新策略、执行渐进式推出,并执行回退到以前的版本。
若要了解有关分阶段更新运行和相关概念的详细信息,请参阅以下资源: