如何更新云服务How to update a cloud service

三步操作进行云服务更新(包括其角色和来宾 OS)。Updating a cloud service, including both its roles and guest OS, is a three step process. 首先,必须上传新云服务或 OS 版本的二进制文件和配置文件。First, the binaries and configuration files for the new cloud service or OS version must be uploaded. 其次,Azure 会根据新云服务版本的要求,保留云服务的计算资源和网络资源。Next, Azure reserves compute and network resources for the cloud service based on the requirements of the new cloud service version. 最后,Azure 执行滚动升级,以增量方式将租户更新到新版本或来宾 OS,同时保留可用性。Finally, Azure performs a rolling upgrade to incrementally update the tenant to the new version or guest OS, while preserving your availability. 本文介绍最后一个步骤 - 滚动升级的详细信息。This article discusses the details of this last step – the rolling upgrade.

更新 Azure 服务Update an Azure Service

Azure 将角色实例划分为称为升级域 (UD) 的逻辑组。Azure organizes your role instances into logical groupings called upgrade domains (UD). 升级域 (UD) 是角色实例的逻辑集,将以组的方式进行更新。Upgrade domains (UD) are logical sets of role instances that are updated as a group. Azure 每次更新一个 UD 的一个云服务,使其他 UD 中的实例能够继续处理流量。Azure updates a cloud service one UD at a time, which allows instances in other UDs to continue serving traffic.

升级域的默认数量为 5 个。The default number of upgrade domains is 5. 可以通过在服务定义文件 (.csdef) 中包含 upgradeDomainCount 属性,指定不同数量的升级域。You can specify a different number of upgrade domains by including the upgradeDomainCount attribute in the service's definition file (.csdef). 有关 upgradeDomainCount 属性的详细信息,请参阅 Azure 云服务定义架构(.csdef 文件)For more information about the upgradeDomainCount attribute, see Azure Cloud Services Definition Schema (.csdef File).

在为你的服务中的一个或多个角色执行就地更新时,Azure 会根据所属的升级域更新角色实例集。When you perform an in-place update of one or more roles in your service, Azure updates sets of role instances according to the upgrade domain to which they belong. Azure 更新给定升级域中的所有实例(停止这些实例,更新这些实例并将它们重新联机),并移到下一个域上。Azure updates all of the instances in a given upgrade domain – stopping them, updating them, bringing them back on-line – then moves onto the next domain. 通过仅停止在当前升级域中运行的实例,Azure 确保在执行更新时将对运行的服务造成的影响降到最低。By stopping only the instances running in the current upgrade domain, Azure makes sure that an update occurs with the least possible impact to the running service. 有关详细信息,请参阅本文后面的 如何进行更新For more information, see How the update proceeds later in this article.


虽然"更新"和"升级"术语在 Azure 上下文中的含义略有不同,但在本文中的功能过程和描述中可以互换 。While the terms update and upgrade have slightly different meaning in the context Azure, they can be used interchangeably for the processes and descriptions of the features in this document.

服务必须至少定义角色的两个实例,以便就地更新该角色而无需停机。Your service must define at least two instances of a role for that role to be updated in-place without downtime. 如果服务仅包含某个角色的一个实例,在完成就地更新之前,将无法使用服务。If the service consists of only one instance of one role, your service will be unavailable until the in-place update has finished.

本主题包含有关 Azure 更新的以下信息:This topic covers the following information about Azure updates:

在更新期间允许进行的服务更改Allowed service changes during an update

下表显示了在更新期间允许进行的服务更改:The following table shows the allowed changes to a service during an update:

允许对托管、服务和角色进行的更改Changes permitted to hosting, services, and roles 就地更新In-place update 过渡(VIP 交换)Staged (VIP swap) 删除并重新部署Delete and re-deploy
操作系统版本Operating system version Yes Yes Yes
.NET 信任级别.NET trust level Yes Yes Yes
虚拟机大小1Virtual machine size1 2Yes2 Yes Yes
本地存储设置Local storage settings 仅限增大2Increase only2 Yes Yes
在服务中添加或删除角色Add or remove roles in a service Yes Yes Yes
特定角色的实例数Number of instances of a particular role Yes Yes Yes
服务的终结点数量或类型Number or type of endpoints for a service 2Yes2 No Yes
配置设置的名称和值Names and values of configuration settings Yes Yes Yes
配置设置的值(但不包括名称)Values (but not names) of configuration settings Yes Yes Yes
添加新证书Add new certificates Yes Yes Yes
更改现有证书Change existing certificates Yes Yes Yes
部署新代码Deploy new code Yes Yes Yes

1限制为仅云服务可用的大小的子集可进行大小更改。1 Size change limited to the subset of sizes available for the cloud service.

2需要 Azure SDK 1.5 或更高版本。2Requires Azure SDK 1.5 or later versions.


更改虚拟机大小会破坏本地数据。Changing the virtual machine size will destroy local data.

在更新期间,不支持以下操作:The following items are not supported during an update:

  • 更改角色的名称。Changing the name of a role. 删除角色,并使用新名称添加角色。Remove and then add the role with the new name.
  • 更改升级域计数。Changing of the Upgrade Domain count.
  • 减小本地资源的大小。Decreasing the size of the local resources.

如果对服务定义进行其他更新(如减小本地资源的大小),则必须执行 VIP 交换更新。If you are making other updates to your service's definition, such as decreasing the size of local resource, you must perform a VIP swap update instead. 有关详细信息,请参阅 交换部署For more information, see Swap Deployment.

如何进行升级How an upgrade proceeds

可以决定是要更新服务中的所有角色,还是更新服务中的一个角色。You can decide whether you want to update all of the roles in your service or a single role in the service. 在这两种情况下,将停止要进行升级并属于第一个升级域的每个角色的所有实例,进行升级,然后恢复联机。In either case, all instances of each role that is being upgraded and belong to the first upgrade domain are stopped, upgraded, and brought back online. 在这些实例恢复联机后,将停止第二个升级域中的实例,进行升级,并恢复联机。Once they are back online, the instances in the second upgrade domain are stopped, upgraded, and brought back online. 一个云服务一次最多可有一个活动的升级。A cloud service can have at most one upgrade active at a time. 升级操作始终针对最新版本的云服务运行。The upgrade is always performed against the latest version of the cloud service.

下图演示了在升级服务中的所有角色时如何进行升级:The following diagram illustrates how the upgrade proceeds if you are upgrading all of the roles in the service:

升级服务Upgrade service

下图演示了在仅升级一个角色时如何进行更新:This next diagram illustrates how the update proceeds if you are upgrading only a single role:

升级角色Upgrade role

在自动更新期间,Azure 结构控制器会定期评估云服务的运行状况,以判断何时可以安全进行下一次 UD。During an automatic update, the Azure Fabric Controller periodically evaluates the health of the cloud service to determine when it's safe to walk the next UD. 此运行状况评估会针对每个角色执行,而且只考虑最新版本中的实例(即 UD 中已步进的实例)。This health evaluation is performed on a per-role basis and considers only instances in the latest version (i.e. instances from UDs that have already been walked). 它会验证每个角色的最少角色实例是否已达到令人满意的终止状态。It verifies that a minimum number of role instances, for each role, have achieved a satisfactory terminal state.

角色实例启动超时Role Instance Start Timeout

结构控制器等待 30 分钟,让每个角色实例达到启动状态。The Fabric Controller will wait 30 minutes for each role instance to reach a Started state. 超时期限过后,结构控制器将继续步进到下一个角色实例。If the timeout duration elapses, the Fabric Controller will continue walking to the next role instance.

云服务升级期间对驱动器数据的影响Impact to drive data during Cloud Service upgrades

在将服务从单个实例升级到多个实例时,由于 Azure 升级服务的方式,因此会在执行升级时停止服务。When upgrading a service from a single instance to multiple instances your service will be brought down while the upgrade is performed due to the way Azure upgrades services. 保证服务可用性的服务级别协议仅适用于部署的具有多个实例的服务。The service level agreement guaranteeing service availability only applies to services that are deployed with more than one instance. 以下列表描述了每种 Azure 服务升级方案如何影响每个驱动器上的数据:The following list describes how the data on each drive is affected by each Azure service upgrade scenario:

方案Scenario C 驱动器C Drive D 驱动器D Drive E 驱动器E Drive
VM 重启VM reboot 已保留Preserved 已保留Preserved 已保留Preserved
门户重启Portal reboot 已保留Preserved 已保留Preserved 已破坏Destroyed
门户重置映像Portal reimage 已保留Preserved 已破坏Destroyed 已破坏Destroyed
就地升级In-Place Upgrade 已保留Preserved 已保留Preserved 已破坏Destroyed
节点迁移Node migration 已破坏Destroyed 已破坏Destroyed 已破坏Destroyed

请注意,在上面的列表中,E: 驱动器表示角色的根驱动器,而不应进行硬编码。Note that, in the above list, the E: drive represents the role's root drive, and should not be hard-coded. 应改用 %RoleRoot% 环境变量表示该驱动器。Instead, use the %RoleRoot% environment variable to represent the drive.

要在升级单实例服务时最大限度减少停机时间,请将新的多实例服务部署到过渡服务器中并执行 VIP 交换。To minimize the downtime when upgrading a single-instance service, deploy a new multi-instance service to the staging server and perform a VIP swap.

更新回滚Rollback of an update

在 Azure 结构控制器接受初始更新请求后,Azure 允许用户对服务启动额外的操作,从而提高了在更新期间管理服务方面的灵活性。Azure provides flexibility in managing services during an update by letting you initiate additional operations on a service, after the initial update request is accepted by the Azure Fabric Controller. 只有当更新(配置更改)或升级在部署上处于"进行中"状态时,才能执行回滚 。A rollback can only be performed when an update (configuration change) or upgrade is in the in progress state on the deployment. 只要至少有一个服务实例尚未更新为新版本,就认为更新或升级处于进行中状态。An update or upgrade is considered to be in-progress as long as there is at least one instance of the service which has not yet been updated to the new version. 要测试是否允许回滚,请检查获取部署获取云服务属性操作返回的 RollbackAllowed 标志值是否设置为 true。To test whether a rollback is allowed, check the value of the RollbackAllowed flag, returned by Get Deployment and Get Cloud Service Properties operations, is set to true.


这仅对在就地更新或升级上调用 Rollback 有意义,因为 VIP 交换升级涉及将服务的一个完整运行实例替换为另一个实例。It only makes sense to call Rollback on an in-place update or upgrade because VIP swap upgrades involve replacing one entire running instance of your service with another.

回滚进行中的更新对部署有以下影响:Rollback of an in-progress update has the following effects on the deployment:

  • 不会更新或升级任何尚未更新或升级为新版本的角色实例,因为这些实例已运行服务的目标版本。Any role instances which had not yet been updated or upgraded to the new version are not updated or upgraded, because those instances are already running the target version of the service.
  • 已更新或升级为新版本的服务包 (*.cspkg) 文件和/或服务配置 (*.cscfg) 文件的任何角色实例将恢复为这些文件的升级前版本。Any role instances which had already been updated or upgraded to the new version of the service package (*.cspkg) file or the service configuration (*.cscfg) file (or both files) are reverted to the pre-upgrade version of these files.

此功能是由以下功能提供的:This functionally is provided by the following features:

  • 回滚更新或升级操作;只要服务中至少有一个实例尚未更新为新版本,就可以在配置更新上调用该操作(通过调用更改部署配置触发),或者在升级上调用该操作(通过调用升级部署触发)。The Rollback Update Or Upgrade operation, which can be called on a configuration update (triggered by calling Change Deployment Configuration) or an upgrade (triggered by calling Upgrade Deployment) as long as there is at least one instance in the service which has not yet been updated to the new version.

  • Locked 和 RollbackAllowed 元素;这是作为获取部署获取云服务属性操作响应正文的一部分返回的:The Locked element and the RollbackAllowed element, which are returned as part of the response body of the Get Deployment and Get Cloud Service Properties operations:

    1. Locked 元素用于检测何时可以在给定部署上调用变动操作。The Locked element allows you to detect when a mutating operation can be invoked on a given deployment.
    2. RollbackAllowed 元素用于检测何时可以在给定部署上调用回滚更新或升级操作。The RollbackAllowed element allows you to detect when the Rollback Update Or Upgrade operation can be called on a given deployment.

    要执行回滚,不需要检查 Locked 和 RollbackAllowed 元素。In order to perform a rollback, you do not have to check both the Locked and the RollbackAllowed elements. 确认 RollbackAllowed 设置为 true 就足够了。It suffices to confirm that RollbackAllowed is set to true. 只有在使用设置为"x-ms-version:2011-10-01"或更高版本的请求标头调用这些方法时,才会返回这些元素。These elements are only returned if these methods are invoked by using the request header set to "x-ms-version: 2011-10-01" or a later version.

在某些情况下,不支持回滚更新或升级,这些情况包括:There are some situations where a rollback of an update or upgrade is not supported, these are as follows:

  • 本地资源减少 - 如果更新增加了角色的本地资源,则 Azure 平台不允许进行回滚。Reduction in local resources - If the update increases the local resources for a role the Azure platform does not allow rolling back.
  • 配额限制 - 如果更新减少操作,可能没有足够的计算配额来完成回滚操作。Quota limitations - If the update was a scale down operation you may no longer have sufficient compute quota to complete the rollback operation. 每个 Azure 订阅具有关联的配额,指定属于该订阅的所有托管服务可以使用的最大核心数。Each Azure subscription has a quota associated with it that specifies the maximum number of cores which can be consumed by all hosted services that belong to that subscription. 如果执行给定更新的回退操作而导致订阅超过配额,则不会启用回退。If performing a rollback of a given update would put your subscription over quota then that a rollback will not be enabled.
  • 争用情况 - 如果初始更新已完成,则无法进行回滚。Race condition - If the initial update has completed, a rollback is not possible.

回滚更新可能是非常有用的,其中的一个例子是,在手动模式下使用升级部署操作来控制为你的 Azure 托管服务部署主要就地升级的速度。An example of when the rollback of an update might be useful is if you are using the Upgrade Deployment operation in manual mode to control the rate at which a major in-place upgrade to your Azure hosted service is rolled out.

在升级部署期间,你可以在手动模式下调用升级部署并开始依次更新升级域。During the rollout of the upgrade you call Upgrade Deployment in manual mode and begin to walk upgrade domains. 在监视升级时,如果你在某些时候注意到检查的第一批升级域中的某些角色实例停止响应,则可以在部署上调用回滚更新或升级操作,这会将尚未升级的实例保持不变,并将已升级的实例回滚到以前的服务包和配置。If at some point, as you monitor the upgrade, you note some role instances in the first upgrade domains that you examine have become unresponsive, you can call the Rollback Update Or Upgrade operation on the deployment, which will leave untouched the instances which had not yet been upgraded and rollback instances which had been upgraded to the previous service package and configuration.

在进行的部署上启动多个变动操作Initiating multiple mutating operations on an ongoing deployment

在某些情况下,可能需要在进行的部署上启动多个同时的变动操作。In some cases you may want to initiate multiple simultaneous mutating operations on an ongoing deployment. 例如,你可能执行一个服务更新,并在服务中部署该更新的同时希望进行一些更改,例如,回滚更新,应用不同的更新,甚至删除部署。For example, you may perform a service update and, while that update is being rolled out across your service, you want to make some change, e.g. to roll the update back, apply a different update, or even delete the deployment. 一种可能需要执行此操作的情况是,服务升级包含错误的代码,而导致升级的角色实例反复崩溃。A case in which this might be necessary is if a service upgrade contains buggy code which causes an upgraded role instance to repeatedly crash. 在这种情况下,Azure 结构控制器无法继续应用该升级,因为升级域中的正常实例数不足。In this case, the Azure Fabric Controller will not be able to make progress in applying that upgrade because an insufficient number of instances in the upgraded domain are healthy. 这种状态称为 卡住的部署This state is referred to as a stuck deployment. 可以回滚更新或应用全新的更新以覆盖失败的更新,从而纠正卡住的部署状态。You can unstick the deployment by rolling back the update or applying a fresh update over top of the failing one.

在 Azure 结构控制器收到更新或升级服务的初始请求后,可以启动后续的变动操作。Once the initial request to update or upgrade the service has been received by the Azure Fabric Controller, you can start subsequent mutating operations. 也就是说,不必等待初始操作完成,即可启动其他变动操作。That is, you do not have to wait for the initial operation to complete before you can start another mutating operation.

在进行第一个更新的同时启动第二个更新操作会以类似回滚操作的方式执行。Initiating a second update operation while the first update is ongoing will perform similar to the rollback operation. 如果第二个更新是在自动模式下执行的,将立即升级第一个升级域,这可能会导致多个升级域中的实例在同一时刻处于脱机状态。If the second update is in automatic mode, the first upgrade domain will be upgraded immediately, possibly leading to instances from multiple upgrade domains being offline at the same point in time.

变动操作如下:更改部署配置升级部署更新部署状态删除部署回滚更新或升级The mutating operations are as follows: Change Deployment Configuration, Upgrade Deployment, Update Deployment Status, Delete Deployment, and Rollback Update Or Upgrade.

获取部署获取云服务属性这两个操作返回 Locked 标志,可以通过检查该标志确定是否可以在给定部署上调用变动操作。Two operations, Get Deployment and Get Cloud Service Properties, return the Locked flag which can be examined to determine whether a mutating operation can be invoked on a given deployment.

要调用返回 Locked 标志的这些方法版本,必须将请求标头设置为"x-ms-version:2011-10-01"或更高版本。In order to call the version of these methods which returns the Locked flag, you must set request header to "x-ms-version: 2011-10-01" or a later. 有关版本控制标头的详细信息,请参阅 服务管理版本控制For more information about versioning headers, see Service Management Versioning.

在升级域之间分配角色Distribution of roles across upgrade domains

Azure 在设置的升级域数之间平均分配角色的实例,可以将升级域数配置为服务定义 (.csdef) 文件的一部分。Azure distributes instances of a role evenly across a set number of upgrade domains, which can be configured as part of the service definition (.csdef) file. 升级域的最大数目为 20 个,默认值为 5 个。The max number of upgrade domains is 20 and the default is 5. 有关如何修改服务定义文件的详细信息,请参阅 Azure 服务定义架构(.csdef 文件)For more information about how to modify the service definition file, see Azure Service Definition Schema (.csdef File).

例如,如果角色具有 10 个实例,则每个升级域默认包含两个实例。For example, if your role has ten instances, by default each upgrade domain contains two instances. 如果角色具有 14 个实例,则四个升级域分别包含 3 个实例,第五个域包含 2 个实例。If your role has 14 instances, then four of the upgrade domains contain three instances, and a fifth domain contains two.

以从零开始的索引标识升级域:第一个升级域的 ID 为 0,第二个升级域的 ID 为 1,依此类推。Upgrade domains are identified with a zero-based index: the first upgrade domain has an ID of 0, and the second upgrade domain has an ID of 1, and so on.

下图演示了在服务定义了两个升级域时,如何分配包含两个角色的服务。The following diagram illustrates how a service than contains two roles are distributed when the service defines two upgrade domains. 该服务运行 8 个 Web 角色实例和 9 个辅助角色实例。The service is running eight instances of the web role and nine instances of the worker role.

分发升级域Distribution of Upgrade Domains


请注意 Azure 控制如何在升级域之间分配实例。Note that Azure controls how instances are allocated across upgrade domains. 无法指定将哪个实例分配给哪个域。It's not possible to specify which instances are allocated to which domain.

后续步骤Next steps