为 Azure 虚拟机规模集实例终止通知Terminate notification for Azure virtual machine scale set instances

规模集实例可以选择接收实例终止通知,并为终止操作设置预定义的延迟超时。Scale set instances can opt in to receive instance termination notifications and set a pre-defined delay timeout to the terminate operation. 终止通知通过 Azure 元数据服务 - Scheduled Events 发送,该服务为影响性操作(如重新启动和重新部署)提供通知和延迟。The termination notification is sent through Azure Metadata Service - Scheduled Events, which provides notifications for and delaying of impactful operations such as reboots and redeploy. 解决方案将另一个事件“终止”添加到 Scheduled Events 列表中,终止事件的关联延迟将取决于用户在其规模集模型配置中指定的延迟限制。The solution adds another event - Terminate - to the list of Scheduled Events, and the associated delay of the terminate event will depend on the delay limit as specified by users in their scale set model configurations.

注册该功能后,规模集实例无需等待指定的超时过期就会被删除。Once enrolled into the feature, scale set instances don't need to wait for specified timeout to expire before the instance is deleted. 收到终止通知后,实例可以选择在终止超时到期之前随时删除。After receiving a Terminate notification, the instance can choose to be deleted at any time before the terminate timeout expires.

启用终止通知Enable terminate notifications

可以通过多种方式在规模集实例上启用终止通知,详见以下示例。There are multiple ways of enabling termination notifications on your scale set instances, as detailed in the examples below.

Azure 门户Azure portal

创建新的规模集时,可以通过以下步骤启用终止通知。The following steps enable terminate notification when creating a new scale set.

  1. 转到“虚拟机规模集”。Go to Virtual machine scale sets.
  2. 选择“+ 添加”,创建新的规模集。Select + Add to create a new scale set.
  3. 转到“管理”选项卡。Go to the Management tab.
  4. 找到“实例终止”部分。Locate the Instance termination section.
  5. 对于“实例终止通知”,选择“启用” 。For Instance termination notification, select On.
  6. 对于“终止延迟(分钟)”,设置所需的默认超时值。For Termination delay (minutes), set the desired default timeout.
  7. 创建完新的规模集后,选择“查看 + 创建”按钮。When you are done creating the new scale set, select Review + create button.

备注

无法对 Azure 门户中的现有规模集设置终止通知You are unable to set terminate notifications on existing scale sets in Azure portal

REST APIREST API

以下示例对规模集模型启用终止通知。The following example enables terminate notification on the scale set model.

PUT on `/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmScaleSetName}?api-version=2019-03-01`
{
  "properties": {
    "virtualMachineProfile": {
            "scheduledEventsProfile": {
                "terminateNotificationProfile": {
                    "notBeforeTimeout":"PT5M",
                    "enable":true
                }
            }
        }
    }        
}

上面的代码块为规模集中所有实例的任何终止操作指定 5 分钟的超时延迟(如 PT5M 所示)。The above block specifies a timeout delay of 5 minutes (as indicated by PT5M) for any terminate operation on all instances in your scale set. 字段 notBeforeTimeout 可以接受采用 ISO 8601 格式,在 5 到 15 分钟之间的任何值。The field notBeforeTimeout can take any value between 5 and 15 minutes in ISO 8601 format. 可以通过修改上文所述的 terminateNotificationProfile 下的 notBeforeTimeout 属性来更改终止操作的默认超时 。You can change the default timeout for the terminate operation by modifying the notBeforeTimeout property under terminateNotificationProfile described above.

对规模集模型启用 scheduledEventsProfile 并设置 notBeforeTimeout 后,将各个实例更新为最新模型以反映更改 。After enabling scheduledEventsProfile on the scale set model and setting the notBeforeTimeout, update the individual instances to the latest model to reflect the changes.

备注

只能使用 API 版本 2019-03-01 及更高版本对规模集实例启用终止通知Terminate notifications on scale set instances can only be enabled with API version 2019-03-01 and above

Azure PowerShellAzure PowerShell

创建新的规模集时,可以使用 New-AzVmssConfig cmdlet 对规模集启用终止通知。When creating a new scale set, you can enable termination notifications on the scale set by using the New-AzVmssConfig cmdlet.

此示例脚本使用配置文件演示如何创建模集和相关资源:创建完整的虚拟机规模集This sample script walks through the creation of a scale set and associated resources using the configuration file: Create a complete virtual machine scale set. 通过将参数 TerminateScheduledEvents 和 TerminateScheduledEventNotBeforeTimeoutInMinutes 添加到用于创建规模集的配置对象来配置终止通知 。You can provide configure terminate notification by adding the parameters TerminateScheduledEvents and TerminateScheduledEventNotBeforeTimeoutInMinutes to the configuration object for creating scale set. 以下示例以 10 分钟的延迟超时启用这项功能。The following example enables the feature with a delay timeout of 10 minutes.

New-AzVmssConfig `
  -Location "VMSSLocation" `
  -SkuCapacity 2 `
  -SkuName "Standard_DS2" `
  -UpgradePolicyMode "Automatic" `
  -TerminateScheduledEvents $true `
  -TerminateScheduledEventNotBeforeTimeoutInMinutes 10

使用 Update-AzVmss cmdlet 对现有规模集启用终止通知。Use the Update-AzVmss cmdlet to enable termination notifications on an existing scale set.

Update-AzVmss `
  -ResourceGroupName "myResourceGroup" `
  -VMScaleSetName "myScaleSet" `
  -TerminateScheduledEvents $true
  -TerminateScheduledEventNotBeforeTimeoutInMinutes 15

上述示例对现有规模集启用终止通知,并为终止事件设置 15 分钟的超时。The above example enables terminate notifications on an existing scale set and sets a 15-minute timeout for the terminate event.

对规模集模型启用计划事件并设置超时后,将各个实例更新为最新模型以反映更改。After enabling scheduled events on the scale set model and setting the timeout, update the individual instances to the latest model to reflect the changes.

Azure CLI 2.0Azure CLI 2.0

以下示例用于在创建新规模集时启用终止通知。The following example is for enabling termination notification while creating a new scale set.

az group create --name <myResourceGroup> --location <VMSSLocation>
az vmss create \
  --resource-group <myResourceGroup> \
  --name <myVMScaleSet> \
  --image UbuntuLTS \
  --admin-username <azureuser> \
  --generate-ssh-keys \
  --terminate-notification-time 10

上面的示例首先创建一个资源组,然后创建一个新的规模集,并启用了 10 分钟的默认超时终止通知。The example above first creates a resource group, then creates a new scale set with terminate notifications enabled for a 10-minute default timeout.

以下示例用于在现有规模集中启用终止通知。The following example is for enabling termination notification in an existing scale set.

az vmss update \  
  --resource-group <myResourceGroup> \
  --name <myVMScaleSet> \
  --enable-terminate-notification true \
  --terminate-notification-time 10

获取终止通知Get Terminate notifications

终止通知通过 Scheduled Events 传递,它是一项 Azure 元数据服务。Terminate notifications are delivered through Scheduled Events, which is an Azure Metadata Service. Azure 元数据服务公开在 VM 中使用可访问的 REST 终结点运行虚拟机的相关信息。Azure Metadata service exposes information about running Virtual Machines using a REST Endpoint accessible from within the VM. 该信息通过不可路由的 IP 提供,因此不会在 VM 外部公开。The information is available via a non-routable IP so that it isn't exposed outside the VM.

首次为事件发出请求时,会为规模集启用 Scheduled Events。Scheduled Events is enabled for your scale set the first time you make a request for events. 首次调用时可能会延迟响应最多两分钟。You can expect a delayed response in your first call of up to two minutes. 定期查询终结点,以便检测即将发生的维护事件以及正在进行的维护活动的状态。Query the endpoint periodically to detect upcoming maintenance events and the status of ongoing maintenance activities.

如果规模集实例在 24 小时内未发出请求,则为规模集禁用 Scheduled Events。Scheduled Events is disabled for your scale set if the scale set instances don't make a request for 24 hours.

终结点发现Endpoint discovery

对于启用了 VNET 的 VM,元数据服务可通过不可路由的静态 IP (169.254.169.254) 使用。For VNET enabled VMs, the Metadata Service is available from a static non-routable IP, 169.254.169.254.

最新版本的计划事件的完整终结点是:The full endpoint for the latest version of Scheduled Events is:

'http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01 ''http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01'

查询响应Query response

响应包含计划事件的数组。A response contains an array of scheduled events. 数组为空意味着目前没有计划事件。An empty array means that there are currently no events scheduled.

如果有计划事件,响应会包含事件的数组。In the case where there are scheduled events, the response contains an array of events. 对于“终止”事件,响应将如下所示:For a “Terminate” event, the response will look as follows:

{
    "DocumentIncarnation": {IncarnationID},
    "Events": [
        {
            "EventId": {eventID},
            "EventType": "Terminate",
            "ResourceType": "VirtualMachine",
            "Resources": [{resourceName}],
            "EventStatus": "Scheduled",
            "NotBefore": {timeInUTC},
        }
    ]
}

DocumentIncarnation 是一个 ETag,它提供了一种简单的方法来检查自上次查询以来事件有效负载是否已更改。The DocumentIncarnation is an ETag and provides an easy way to inspect if the Events payload has changed since the last query.

若要详细了解上述每个字段,请参阅适用于 WindowsLinux 的 Scheduled Events 文档。For more information on each of the fields above, see the Scheduled Events documentation for Windows and Linux.

响应事件Respond to events

了解即将发生的事件并完成正常关闭逻辑后,可以通过使用 EventId 对元数据服务进行 POST 调用来批准未完成的事件。Once you've learnt of an upcoming event and completed your logic for graceful shutdown, you may approve the outstanding event by making a POST call to the metadata service with the EventId. POST 调用向 Azure 指示可以继续删除 VM。The POST call indicates to Azure that it can continue with the VM delete.

下面是 POST 请求正文中所需的 json。Below is the json expected in the POST request body. 请求应包含 StartRequests 列表。The request should contain a list of StartRequests. 每个 StartRequest 都包含要加速的事件的 EventId:Each StartRequest contains the EventId for the event you want to expedite:

{
    "StartRequests" : [
        {
            "EventId": {EventId}
        }
    ]
}

确保规模集中的每个 VM 仅批准与该 VM 相关的 EventID。Ensure that every VM in the scale set is only approving the EventID relevant for that VM only. VM 可以通过实例元数据获取自身的 VM 名称。A VM can get its own VM name through instance metadata. 此名称采用“{scale-set-name}{instance-id}”的格式,将显示在上述查询响应的“资源”部分中。This name takes the form "{scale-set-name}{instance-id}", and will be displayed in the 'Resources' section of the query response described above.

还可以参阅示例脚本,来查询和响应事件 Python You can also refer to samples scripts for querying and responding to events Python.

提示和最佳实践Tips and best practices

  • 仅对“删除”操作提供终止通知 - 如果规模集中已启用 scheduledEventsProfile,则所有删除操作(手动删除或自动缩放启动的缩放)都将生成终止事件。Terminate notifications only on ‘delete’ operations - All delete operations (manual delete or Autoscale-initiated scale-in) will generate Terminate events if your scale set has scheduledEventsProfile enabled. 重新启动、重置映像、重新部署和停止/解除分配等其他操作不会生成终止事件。Other operations such as reboot, reimage, redeploy, and stop/deallocate do not generate Terminate events. 无法为低优先级 VM 启用终止通知。Terminate notifications can't be enabled for low-priority VMs.
  • 无需强制等待超时 - 你可以在收到事件后和事件的 NotBefore 时间到期之前,随时启动终止操作。No mandatory wait for timeout - You can start the terminate operation at any time after the event has been received and before the event's NotBefore time expires.
  • 超时时强制删除 - 生成事件后,没有任何延长超时值的功能。Mandatory delete at timeout - There isn't any capability of extending the timeout value after an event has been generated. 超时过期后,将处理挂起的终止事件,并删除 VM。Once the timeout expires, the pending terminate event will be processed and the VM will be deleted.
  • 可修改超时值 - 你可以在删除实例之前随时修改超时值,方法是修改规模集模型的 notBeforeTimeout 属性,并将 VM 实例更新为最新模型。Modifiable timeout value - You can modify the timeout value at any time before an instance is deleted, by modifying the notBeforeTimeout property on the scale set model and updating the VM instances to the latest model.
  • 批准所有挂起的删除 - 如果未批准的 VM_1 上有挂起的删除,并且你已批准 VM_2 上的另一个终止事件,则在批准 VM_1 的终止事件或其超时之前,不会删除 VM_2。Approve all pending deletes - If there’s a pending delete on VM_1 that isn't approved, and you've approved another terminate event on VM_2, then VM_2 isn't deleted until the terminate event for VM_1 is approved, or its timeout has elapsed. 批准 VM_1 终止事件后,将删除 VM_1 和 VM_2。Once you approve the terminate event for VM_1, then both VM_1 and VM_2 are deleted.
  • 批准所有同时删除 - 扩展上述示例,如果 VM_1 和 VM_2 具有相同的 NotBefore 时间,则必须批准这两个终止事件,否则在超时到期之前,两个 VM 都不会被删除。Approve all simultaneous deletes - Extending the above example, if VM_1 and VM_2 have the same NotBefore time, then both terminate events must be approved or neither VM is deleted before the timeout expires.

故障排除Troubleshoot

无法启用 scheduledEventsProfileFailure to enable scheduledEventsProfile

如果收到“错误的请求”错误并显示错误消息“在‘VirtualMachineProfile’类型的对象中找不到成员‘scheduledEventsProfile’”,请检查用于规模集操作的 API 版本。If you get a ‘BadRequest’ error with an error message stating "Could not find member 'scheduledEventsProfile' on object of type 'VirtualMachineProfile'”, check the API version used for the scale set operations. 需要计算 API 版本 2019-03-01 或更高版本。Compute API version 2019-03-01 or higher is required.

未能获取终止事件Failure to get Terminate events

如果无法通过 Scheduled Events 获取任何终止事件,请检查用于获取事件的 API 版本。If you are not getting any Terminate events through Scheduled Events, then check the API version used for getting the events. 终止事件需要元数据服务 API 版本 2019-01-01 或更高版本。Metadata Service API version 2019-01-01 or higher is required for Terminate events.

'http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01 ''http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01'

通过不正确的 NotBefore 时间获取终止事件Getting Terminate event with incorrect NotBefore time

对规模集模型启用 scheduledEventsProfile 并设置 notBeforeTimeout 后,将各个实例更新为最新模型以反映更改 。After enabling scheduledEventsProfile on the scale set model and setting the notBeforeTimeout, update the individual instances to the latest model to reflect the changes.

后续步骤Next steps

了解如何在虚拟机规模集上部署应用程序Learn how to deploy your application on virtual machine scale sets.