Azure VM 上的 Windows 重启循环Windows reboot loop on an Azure VM

本文介绍在 Azure 中的 Windows 虚拟机 (VM) 上可能遇到的重启循环。This article describes the reboot loop you may experience on a Windows Virtual Machine (VM) in Azure.

症状Symptom

使用启动诊断获取 VM 的屏幕截图时,你发现虚拟机正在启动,但启动进程被中断并且即将重新开始。When you use Boot diagnostics to get the screenshots of a VM, you find the virtual machine is booting but the boot process is getting interrupted and the process is starting over.

开始屏幕 1

原因Cause

重启循环由以下原因导致:The reboot loop occurs because of the following causes:

原因 1Cause 1

存在标记为“重要”的第三方服务,并且该服务无法启动。There is a third-party service that is flagged as critical and it cannot be started. 这将导致操作系统重启。This causes the operating system to reboot.

原因 2Cause 2

对操作系统进行了一些更改。Some changes were made to the operating system. 通常情况下,这些更改与更新安装、应用程序安装或新策略相关。Usually, these are related to an update installation, application installation, or a new policy. 可能需要查看以下日志来了解更多详细信息:You may have to check the following logs for additional details:

  • 事件日志Event Logs
  • CBS.logWindowsCBS.logWindows
  • Update.logUpdate.log

原因 3Cause 3

文件系统损坏可能导致此问题。File system corruption could cause this. 但是,很难诊断并确定导致操作系统损坏的更改。However, it is difficult to diagnose and identify the change that causes the corruption of the operating system.

解决方案Solution

要解决此问题,请备份 OS 磁盘,并将 OS 磁盘附加到安全的 VM,然后按照相应地解决方案选项执行操作或逐个尝试解决方案。To resolve this problem, back up the OS disk, and attach the OS disk to a rescue VM, and then follow the solution options accordingly, or try the solutions one by one.

原因 1 的解决方案Solution for cause 1

  1. 将 OS 磁盘附加到有效的 VM 后,确保磁盘在磁盘管理控制台中标记为“联机”,并记下保存 \Windows 文件夹的分区的驱动器号 。Once the OS disk is attached to a working VM, make sure that the disk is flagged as Online in the Disk Management console and note the drive letter of the partition that holds the \Windows folder.

  2. 如果磁盘设置为“脱机”,则将其设置为“联机” 。If the disk is set to Offline, then set it to Online.

  3. 创建 \Windows\System32\config 文件夹的副本,以防需要回滚更改 。Create a copy of the \Windows\System32\config folder in case a rollback on the changes is needed.

  4. 在安全的 VM 中,打开 Windows 注册表编辑器 (regedit)。On the rescue VM, open the Windows Registry Editor (regedit).

  5. 选择 HKEY_LOCAL_MACHINE 键,然后选择菜单中的“文件” > “加载配置单元”。Select the HKEY_LOCAL_MACHINE key and then select File > Load Hive from the menu.

  6. 浏览到 \Windows\System32\config 文件夹中的系统文件 。Browse to the SYSTEM file in the \Windows\System32\config folder.

  7. 选择“打开”,键入 BROKENSYSTEM 作为名称,展开 HKEY_LOCAL_MACHINE 键,然后将看到名为 BROKENSYSTEM 的附加键 。Select Open, type BROKENSYSTEM for the name, expand the HKEY_LOCAL_MACHINE key, and then you will see an additional key called BROKENSYSTEM.

  8. 检查计算机从哪个 ControlSet 重启。Check which ControlSet the computer is booting from. 你将在以下注册表项中看到键编号。You will see its key number in the following registry key.

    HKEY_LOCAL_MACHINE\BROKENSYSTEM\Select\Current

  9. 通过以下注册表项查看 VM 代理服务的重要程度。Check which is the criticality of the VM agent service through the following registry key.

    HKEY_LOCAL_MACHINE\BROKENSYSTEM\ControlSet00x\Services\RDAgent\ErrorControl

  10. 如果注册表项的值未设置为 2,则执行下一步缓解措施 。If the value of the registry key is not set to 2, then go to the next mitigation.

  11. 如果注册表项的值设置为 2,则将值从 2 改为 1 。If the value of the registry key is set to 2, then change the value from 2 to 1.

  12. 如果存在以下键,并且其值为 2 或 3,则将这些值相应地改为 1: If any of the following keys exist and they have value 2 or 3, and then change these values to 1 accordingly:

    • HKEY_LOCAL_MACHINE\BROKENSYSTEM\ControlSet00x\Services\AzureWLBackupCoordinatorSvc\ErrorControl
    • HKEY_LOCAL_MACHINE\BROKENSYSTEM\ControlSet00x\Services\AzureWLBackupInquirySvc\ErrorControl
    • HKEY_LOCAL_MACHINE\BROKENSYSTEM\ControlSet00x\Services\AzureWLBackupPluginSvc\ErrorControl
  13. 选择 BROKENSYSTEM 项,然后从菜单中选择“文件” > “卸载配置单元”。Select the BROKENSYSTEM key and then select File > Unload Hive from the menu.

  14. 从故障排除 VM 中分离 OS 磁盘。Detach the OS disk from the troubleshooting VM.

  15. 从故障排除 VM 中删除该磁盘并等待大约 2 分钟,以便 Azure 释放此磁盘。Remove the disk from the troubleshooting VM and wait about 2 minutes for Azure to release this disk.

  16. 从 OS 磁盘创建新的 VMCreate a new VM from the OS disk.

  17. 如果问题得到解决,则可能需要重新安装 RDAgent (WaAppAgent.exe)。If the issue is fixed, then you may have to reinstall the RDAgent (WaAppAgent.exe).

原因 2 的解决方案Solution for cause 2

将 VM 还原到上一个已知的正确配置:按照如何使用上一个已知的正确配置启动 Azure Windows VM 中的步骤执行操作。Restore the VM to the last known good configuration, follow the steps in How to start Azure Windows VM with Last Known Good Configuration.

原因 3 的解决方案Solution for cause 3

Note

以下过程应该仅作为最后的手段。The following procedure should only be used as last resource. 尽管从 regback 还原可以还原对计算机的访问权限,但 OS 并不稳定,因为注册表中将出现数据丢失(在配置单元和当天的时间戳之间)。While restoring from regback may restore access to the machine, the OS is not considered stable since there is data lost in the registry between the timestamp of the hive and the current day. 需要构建新的 VM 并制订数据迁移计划。You need to build a new VM and make plans to migrate data.

  1. 将磁盘附加到故障排除 VM 后,请确保磁盘在磁盘管理控制台中标记为“联机” 。Once the disk is attached to a troubleshooting VM, make sure that the disk is flagged as Online in the Disk Management console.

  2. 创建 \Windows\System32\config 文件夹的副本,以防需要回滚更改 。Create a copy of the \Windows\System32\config folder in case a rollback on the changes is needed.

  3. 复制 \Windows\System32\config\regback 文件夹中的文件,并替换 \Windows\System32\config 文件夹中的文件 。Copy the files in the \Windows\System32\config\regback folder and replace the files in the \Windows\System32\config folder.

  4. 从故障排除 VM 中删除该磁盘并等待大约 2 分钟,以便 Azure 释放此磁盘。Remove the disk from the troubleshooting VM and wait about 2 minutes for Azure to release this disk.

  5. 从 OS 磁盘创建新的 VMCreate a new VM from the OS disk.