Azure 元数据服务:适用于 Linux VM 的计划事件Azure Metadata Service: Scheduled Events for Linux VMs

计划事件是一个 Azure 元数据服务,可提供应用程序时间用于准备虚拟机 (VM) 维护。Scheduled Events is an Azure Metadata Service that gives your application time to prepare for virtual machine (VM) maintenance. 它提供有关即将发生的维护事件的信息(例如重新启动),使应用程序可以为其准备并限制中断。It provides information about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. 它可用于 Windows 和 Linux 上的所有 Azure 虚拟机类型(包括 PaaS 和 IaaS)。It's available for all Azure Virtual Machines types, including PaaS and IaaS on both Windows and Linux.

有关 Windows 上的计划事件的信息,请参阅适用于 Windows VM 的计划事件For information about Scheduled Events on Windows, see Scheduled Events for Windows VMs.

备注

计划事件在所有 Azure 区域中正式发布。Scheduled Events is generally available in all Azure Regions. 有关最新版本信息,请参阅版本和区域可用性See Version and Region Availability for latest release information.

为何使用计划事件?Why use Scheduled Events?

许多应用程序都可以受益于时间来准备 VM 维护。Many applications can benefit from time to prepare for VM maintenance. 时间可以用于执行应用程序的特定任务的提高可用性、可靠性和可维护性,包括:The time can be used to perform application-specific tasks that improve availability, reliability, and serviceability, including:

  • 检查点和还原。Checkpoint and restore.
  • 连接清空。Connection draining.
  • 主要副本故障转移。Primary replica failover.
  • 从负载均衡器池删除。Removal from a load balancer pool.
  • 事件日志记录。Event logging.
  • 正常关闭。Graceful shutdown.

使用计划事件,应用程序可以发现维护的发生,并触发任务以限制其影响。With Scheduled Events, your application can discover when maintenance will occur and trigger tasks to limit its impact.

预定事件提供以下用例中的事件:Scheduled Events provides events in the following use cases:

  • 平台启动的维护(例如,VM 重新启动、实时迁移或主机的内存保留更新)Platform initiated maintenance (for example, VM reboot, live migration or memory preserving updates for host)
  • 虚拟机正在根据预测很快会出现故障的降级主机硬件上运行Virtual machine is running on degraded host hardware that is predicted to fail soon
  • 用户启动的维护(例如,用户重启或重新部署 VM)User-initiated maintenance (for example, a user restarts or redeploys a VM)

基础知识The Basics

元数据服务公开在 VM 中使用可访问的 REST 终结点运行 VM 的相关信息。Metadata Service exposes information about running VMs by using a REST endpoint that's accessible from within the VM. 该信息通过不可路由的 IP 提供,因此不会在 VM 外部公开。The information is available via a nonroutable IP so that it's not exposed outside the VM.

作用域Scope

计划的事件传送到:Scheduled events are delivered to:

  • 独立虚拟机。Standalone Virtual Machines.
  • 云服务中的所有 VM。All the VMs in a cloud service.
  • 可用性集中的所有 VM。All the VMs in an availability set.
  • 规模集位置组中的所有 VM。All the VMs in a scale set placement group.

因此,检查事件中的 Resources 字段可确定哪些 VM 受到了影响。As a result, check the Resources field in the event to identify which VMs are affected.

终结点发现Endpoint Discovery

对于启用了 VNET 的 VM,元数据服务可通过不可路由的静态 IP (169.254.169.254) 使用。For VNET enabled VMs, Metadata Service is available from a static nonroutable IP, 169.254.169.254. 最新版本的计划事件的完整终结点是:The full endpoint for the latest version of Scheduled Events is:

http://169.254.169.254/metadata/scheduledevents?api-version=2019-08-01

如果不是在虚拟网络中创建 VM(云服务和经典 VM 的默认情况),则需使用额外的逻辑以发现要使用的 IP 地址。If the VM is not created within a Virtual Network, the default cases for cloud services and classic VMs, additional logic is required to discover the IP address to use. 若要了解如何发现主机终结点,请参阅此示例。To learn how to discover the host endpoint, see this sample.

版本和区域可用性Version and Region Availability

计划事件服务受版本控制。The Scheduled Events service is versioned. 版本是必需的,当前版本为 2019-01-01Versions are mandatory; the current version is 2019-01-01.

版本Version 发布类型Release Type 区域Regions 发行说明Release Notes
2019-08-012019-08-01 正式版General Availability 全部All
  • 添加了对 EventSource 的支持Added support for EventSource
  • 2019-04-012019-04-01 正式版General Availability 全部All
  • 添加了对事件说明的支持Added support for Event Description
  • 2019-01-012019-01-01 正式版General Availability AllAll
  • 添加了对虚拟机规模集 EventType“Terminate”的支持Added support for virtual machine scale sets EventType 'Terminate'
  • 2017-08-012017-08-01 正式版General Availability AllAll
  • 已从 IaaS VM 的资源名称中删除前置下划线Removed prepended underscore from resource names for IaaS VMs
  • 针对所有请求强制执行元数据标头要求Metadata header requirement enforced for all requests
  • 2017-03-012017-03-01 预览Preview AllAll
  • 初始版本Initial release
  • 备注

    支持的计划事件的前一预览版 {latest} 发布为 api-version。Previous preview releases of Scheduled Events supported {latest} as the api-version. 此格式不再受支持,并且将在未来弃用。This format is no longer supported and will be deprecated in the future.

    启用和禁用计划事件Enabling and Disabling Scheduled Events

    首次为事件发出请求时,为服务启用了计划事件。Scheduled Events is enabled for your service the first time you make a request for events. 首次调用时应该会延迟响应最多两分钟。You should expect a delayed response in your first call of up to two minutes.

    如果 24 小时未发出请求,将为服务禁用计划事件。Scheduled Events is disabled for your service if it does not make a request for 24 hours.

    用户启动的维护User-initiated Maintenance

    用户通过 Azure 门户、API、CLI 或 PowerShell 启动的 VM 维护会生成计划事件。User-initiated VM maintenance via the Azure portal, API, CLI, or PowerShell results in a scheduled event. 然后,可以在应用程序中测试维护准备逻辑,并可以通过应用程序准备用户启动的维护。You then can test the maintenance preparation logic in your application, and your application can prepare for user-initiated maintenance.

    如果重启 VM,将计划 Reboot 类型的事件。If you restart a VM, an event with the type Reboot is scheduled. 如果重新部署 VM,将计划 Redeploy 类型的事件。If you redeploy a VM, an event with the type Redeploy is scheduled.

    使用 APIUse the API

    头文件Headers

    查询元数据服务时,必须提供标头 Metadata:true 以确保不会在无意中重定向该请求。When you query Metadata Service, you must provide the header Metadata:true to ensure the request wasn't unintentionally redirected. Metadata:true 标头对于所有预定事件请求是必需的。The Metadata:true header is required for all scheduled events requests. 不在请求中包含标头会导致元数据服务发出的“错误的请求”响应。Failure to include the header in the request results in a "Bad Request" response from Metadata Service.

    查询事件Query for events

    只需进行以下调用即可查询计划事件:You can query for scheduled events by making the following call:

    BashBash

    curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2019-08-01
    

    响应包含计划事件的数组。A response contains an array of scheduled events. 数组为空意味着目前没有计划事件。An empty array means that currently no events are scheduled. 如果有计划事件,响应会包含事件的数组。In the case where there are scheduled events, the response contains an array of events.

    {
        "DocumentIncarnation": {IncarnationID},
        "Events": [
            {
                "EventId": {eventID},
                "EventType": "Reboot" | "Redeploy" | "Freeze" | "Terminate",
                "ResourceType": "VirtualMachine",
                "Resources": [{resourceName}],
                "EventStatus": "Scheduled" | "Started",
                "NotBefore": {timeInUTC},       
                "Description": {eventDescription},
                "EventSource" : "Platform" | "User",
            }
        ]
    }
    

    事件属性Event Properties

    propertiesProperty 说明Description
    EventIdEventId 此事件的全局唯一标识符。Globally unique identifier for this event.

    示例:Example:
    • 602d9444-d2cd-49c7-8624-8643e7171297602d9444-d2cd-49c7-8624-8643e7171297
    EventTypeEventType 此事件造成的影响。Impact this event causes.

    值:Values:
    • Freeze:虚拟机计划暂停数秒。Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU 和网络连接可能会暂停,但对内存或打开的文件没有影响。CPU and network connectivity may be suspended, but there is no impact on memory or open files.
    • Reboot:计划重启虚拟机(非永久性内存丢失)。Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost).
    • Redeploy:计划将虚拟机移到另一节点(临时磁盘将丢失)。Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost).
    • Terminate:计划将删除虚拟机。Terminate: The virtual machine is scheduled to be deleted.
    ResourceTypeResourceType 此事件影响的资源类型。Type of resource this event affects.

    值:Values:
    • VirtualMachine
    资源Resources 此事件影响的资源列表。List of resources this event affects. 它保证最多只能包含一个更新域的计算机,但可能不包含该更新域中的所有计算机。The list is guaranteed to contain machines from at most one update domain, but it might not contain all machines in the UD.

    示例:Example:
    • ["FrontEnd_IN_0", "BackEnd_IN_0"]["FrontEnd_IN_0", "BackEnd_IN_0"]
    EventStatusEventStatus 此事件的状态。Status of this event.

    值:Values:
    • Scheduled:此事件计划在 NotBefore 属性指定的时间之后启动。Scheduled: This event is scheduled to start after the time specified in the NotBefore property.
    • Started:此事件已启动。Started: This event has started.
    不提供 Completed 或类似状态。No Completed or similar status is ever provided. 事件完成后,将不再返回该事件。The event is no longer returned when the event is finished.
    NotBeforeNotBefore 在可以启动此事件之前所要经过的时间。Time after which this event can start.

    示例:Example:
    • 2016 年 9 月 19 日星期一 18:29:47 GMTMon, 19 Sep 2016 18:29:47 GMT
    说明Description 此事件的说明。Description of this event.

    示例:Example:
    • 主机服务器正在维护中。Host server is undergoing maintenance.
    EventSourceEventSource 事件的发起者。Initiator of the event.

    示例:Example:
    • Platform:此事件是由平台发起的。Platform: This event is initiated by platform.
    • User:此事件是由用户发起的。User: This event is initiated by user.

    事件计划Event Scheduling

    将根据事件类型为每个事件计划将来的最小量时间。Each event is scheduled a minimum amount of time in the future based on the event type. 此时间反映在某个事件的 NotBefore 属性上。This time is reflected in an event's NotBefore property.

    EventTypeEventType 最小通知Minimum notice
    冻结Freeze 15 分钟15 minutes
    重新启动Reboot 15 分钟15 minutes
    重新部署Redeploy 10 分钟10 minutes
    终止Terminate 用户可配置:5 - 15 分钟User Configurable: 5 to 15 minutes

    备注

    在某些情况下,由于硬件降级,Azure 能够预测主机故障,并会尝试通过对迁移进行计划来缓解服务中断。In some cases, Azure is able to predict host failure due to degraded hardware and will attempt to mitigate disruption to your service by scheduling a migration. 受影响的虚拟机会收到计划事件,该事件的 NotBefore 通常是将来几天的时间。Affected virtual machines will receive a scheduled event with a NotBefore that is typically a few days in the future. 实际时间因预测的故障风险评估而异。The actual time varies depending on the predicted failure risk assessment. Azure 会尽可能提前 7 天发出通知,但实际时间可能会有变化,如果预测硬件即将发生故障的可能性很大,则实际时间可能更早。Azure tries to give 7 days' advance notice when possible, but the actual time varies and might be smaller if the prediction is that there is a high chance of the hardware failing imminently. 为了在系统启动迁移之前硬件出现故障时将服务风险降至最低,我们建议你尽快自行重新部署虚拟机。To minimize risk to your service in case the hardware fails before the system-initiated migration, we recommend that you self-redeploy your virtual machine as soon as possible.

    启动事件Start an event

    了解即将发生的事件并完成正常关闭逻辑后,可以通过使用 EventId 对元数据服务进行 POST 调用来批准未完成的事件。After you learn of an upcoming event and finish your logic for graceful shutdown, you can approve the outstanding event by making a POST call to Metadata Service with EventId. 此调用指示 Azure 可以缩短最小通知时间(如可能)。This call indicates to Azure that it can shorten the minimum notification time (when possible).

    下面是 POST 请求正文中所需的 JSON 示例。The following JSON sample is expected in the POST request body. 请求应包含 StartRequests 列表。The request should contain a list of StartRequests. 每个 StartRequest 包含想要加速的事件的 EventIdEach StartRequest contains EventId for the event you want to expedite:

    {
        "StartRequests" : [
            {
                "EventId": {EventId}
            }
        ]
    }
    

    Bash 示例Bash sample

    curl -H Metadata:true -X POST -d '{"StartRequests": [{"EventId": "f020ba2e-3bc0-4c40-a10b-86575a9eabd5"}]}' http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01
    

    备注

    确认事件后,即可允许事件针对事件中所有的 Resources 继续进行,而不仅仅是确认该事件的 VM。Acknowledging an event allows the event to proceed for all Resources in the event, not just the VM that acknowledges the event. 因此,可以选择一个指挥计算机来协调该确认,为简单起见,可选择 Resources 字段中的第一个计算机。Therefore, you can choose to elect a leader to coordinate the acknowledgement, which might be as simple as the first machine in the Resources field.

    Python 示例Python sample

    下例将查询计划事件的元数据服务器并审核所有未完成的事件。The following sample queries Metadata Service for scheduled events and approves each outstanding event:

    #!/usr/bin/python
    
    import json
    import socket
    import urllib2
    
    metadata_url = "http://169.254.169.254/metadata/scheduledevents?api-version=2019-08-01"
    this_host = socket.gethostname()
    
    
    def get_scheduled_events():
        req = urllib2.Request(metadata_url)
        req.add_header('Metadata', 'true')
        resp = urllib2.urlopen(req)
        data = json.loads(resp.read())
        return data
    
    
    def handle_scheduled_events(data):
        for evt in data['Events']:
            eventid = evt['EventId']
            status = evt['EventStatus']
            resources = evt['Resources']
            eventtype = evt['EventType']
            resourcetype = evt['ResourceType']
            notbefore = evt['NotBefore'].replace(" ", "_")
        description = evt['Description']
        eventSource = evt['EventSource']
            if this_host in resources:
                print("+ Scheduled Event. This host " + this_host +
                    " is scheduled for " + eventtype + 
            " by " + eventSource + 
            " with description " + description +
            " not before " + notbefore)
                # Add logic for handling events here
    
    
    def main():
        data = get_scheduled_events()
        handle_scheduled_events(data)
    
    
    if __name__ == '__main__':
        main()
    

    后续步骤Next steps