通过使用 Azure 门户将 OS 磁盘附加到恢复 VM 来对 Linux VM 进行故障排除Troubleshoot a Linux VM by attaching the OS disk to a recovery VM using the Azure portal

如果 Linux 虚拟机 (VM) 遇到启动或磁盘错误,则可能需要对虚拟硬盘本身执行故障排除步骤。If your Linux virtual machine (VM) encounters a boot or disk error, you may need to perform troubleshooting steps on the virtual hard disk itself. 一个常见示例是 /etc/fstab 中存在无效条目,使 VM 无法成功启动。A common example would be an invalid entry in /etc/fstab that prevents the VM from being able to boot successfully. 本文详细介绍如何使用 Azure 门户将虚拟硬盘连接到另一个 Linux VM 来修复所有错误,然后重新创建原始 VM。This article details how to use the Azure portal to connect your virtual hard disk to another Linux VM to fix any errors, then re-create your original VM.

恢复过程概述Recovery process overview

故障排除过程如下:The troubleshooting process is as follows:

  1. 停止受影响的 VM。Stop the affected VM.
  2. 为 VM 的 OS 磁盘拍摄快照。Take a snapshot for the OS disk of the VM.
  3. 从快照创建虚拟硬盘。Create a virtual hard disk from the snapshot.
  4. 将虚拟硬盘附加并装入到另一个 Windows VM,以便进行故障排除。Attach and mount the virtual hard disk to another Windows VM for troubleshooting purposes.
  5. 连接到故障排除 VM。Connect to the troubleshooting VM. 编辑文件或运行任何工具以修复原始虚拟硬盘上的问题。Edit files or run any tools to fix issues on the original virtual hard disk.
  6. 从故障排除 VM 卸载并分离虚拟硬盘。Unmount and detach the virtual hard disk from the troubleshooting VM.
  7. 交换 VM 的 OS 磁盘。Swap the OS disk for the VM.

备注

本文不适用于包含非托管磁盘的 VM。This article does not apply to the VM with unmanaged disk.

确定启动问题Determine boot issues

检查启动诊断信息和 VM 屏幕截图,确定 VM 不能正常启动的原因。Examine the boot diagnostics and VM screenshot to determine why your VM is not able to boot correctly. 一个常见的例子是 /etc/fstab 中存在无效条目,或底层虚拟硬盘已删除或移动。A common example would be an invalid entry in /etc/fstab, or an underlying virtual hard disk being deleted or moved.

在门户中选择 VM,并向下滚动到“支持 + 故障排除”部分。 Select your VM in the portal and then scroll down to the Support + Troubleshooting section. 单击“启动诊断”,查看从 VM 流式传输的控制台消息。 Click Boot diagnostics to view the console messages streamed from your VM. 查看控制台日志,以便了解是否能够确定 VM 遇到问题的原因。Review the console logs to see if you can determine why the VM is encountering an issue. 以下示例显示某个 VM 停滞在维护模式,需要人工干预:The following example shows a VM stuck in maintenance mode that requires manual interaction:

查看 VM 启动诊断控制台日志

也可以单击启动诊断日志顶部的“屏幕截图”,下载 VM 的屏幕截图。 You can also click Screenshot across the top of the boot diagnostics log to download a capture of the VM screenshot.

拍摄 OS 磁盘的快照Take a snapshot of the OS Disk

快照是虚拟硬盘 (VHD) 的完整只读副本。A snapshot is a full, read-only copy of a virtual hard drive (VHD). 建议在创建快照之前完全关闭 VM,以清除正在运行的所有进程。We recommend that you cleanly shut down the VM before taking a snapshot, to clear out any processes that are in progress. 若要创建 OS 磁盘的快照,请执行以下步骤:To take a snapshot of an OS disk, follow these steps:

  1. 转到 Azure 门户Go to Azure portal. 在边栏中选择“虚拟机”,然后选择有问题的 VM。 Select Virtual machines from the sidebar, and then select the VM that has problem.
  2. 在左窗格中选择“磁盘”,然后选择 OS 磁盘的名称。 On the left pane, select Disks, and then select the name of the OS disk. 有关 OS 磁盘名称的插图
  3. 然后,在 OS 磁盘的“概述”页上,选择“创建快照”。 On the Overview page of the OS disk, and then select Create snapshot.
  4. 在 OS 磁盘所在位置创建快照。Create a snapshot in the same location as the OS disk.

从快照创建磁盘Create a disk from the snapshot

若要从快照创建磁盘,请执行以下步骤:To create a disk from the snapshot, follow these steps:

  1. 运行以下 PowerShell 命令从快照创建托管磁盘。Run the following PowerShell commands to create a managed disk from the snapshot. 应将这些示例名称替换为相应的名称。You should replace these sample names with the appropriate names.

    
    #Sign in the Azure China Cloud
    Connect-AzAccount -Environment AzureChinaCloud
    
    #Provide the name of your resource group
    $resourceGroupName ='myResourceGroup'
    
    #Provide the name of the snapshot that will be used to create Managed Disks
    $snapshotName = 'mySnapshot' 
    
    #Provide the name of theManaged Disk
    $diskName = 'newOSDisk'
    
    #Provide the size of the disks in GB. It should be greater than the VHD file size. In this sample, the size of the snapshot is 127 GB. So we set the disk size to 128 GB.
    $diskSize = '128'
    
    #Provide the storage type for Managed Disk. Premium_LRS or Standard_LRS.
    $storageType = 'Standard_LRS'
    
    #Provide the Azure region (e.g. chinanorth) where Managed Disks will be located.
    #This location should be same as the snapshot location
    #Get all the Azure location using command below:
    #Get-AzLocation
    $location = 'chinanorth'
    
    $snapshot = Get-AzSnapshot -ResourceGroupName $resourceGroupName -SnapshotName $snapshotName 
    
    $diskConfig = New-AzDiskConfig -AccountType $storageType -Location $location -CreateOption Copy -SourceResourceId $snapshot.Id
    
    New-AzDisk -Disk $diskConfig -ResourceGroupName $resourceGroupName -DiskName $diskName
    
  2. 如果命令运行成功,你将在提供的资源组中看到新磁盘。If the commands run successfully, you will see the new disk in the resource group that you provided.

将磁盘附加到另一个 VMAttach disk to another VM

在后续几个步骤中,将使用另一个 VM 进行故障排除。For the next few steps, you use another VM for troubleshooting purposes. 将磁盘附加到故障排除 VM 后,可以浏览和编辑磁盘的内容。After you attach the disk to the troubleshooting VM, you can browse and edit the disk's content. 此过程允许用户更正任何配置错误或者查看其他应用程序或系统日志文件。This process allows you to correct any configuration errors or review additional application or system log files. 若要将磁盘附加到另一个 VM,请执行以下步骤:To attach the disk to another VM, follow these steps:

  1. 在门户中选择资源组,并选择故障排除 VM。Select your resource group from the portal, then select your troubleshooting VM. 依次选择“磁盘” 、“编辑” ,然后单击“添加数据磁盘” :Select Disks, select Edit, and then click Add data disk:

    在门户中附加现有磁盘

  2. 在“数据磁盘” 列表中,选择所标识的 VM 的 OS 磁盘。In the Data disks list, select the OS disk of the VM that you identified. 如果看不到 OS 磁盘,请确保故障排除 VM 和 OS 磁盘位于同一区域(位置)。If you do not see the OS disk, make sure that troubleshooting VM and the OS disk is in the same region (location).

  3. 选择“保存”应用所做的更改。 Select Save to apply the changes.

装载附加的数据磁盘Mount the attached data disk

备注

以下示例详细说明了在 Ubuntu VM 上需要执行的步骤。The following examples detail the steps required on an Ubuntu VM. 如果使用不同的 Linux 发行版(如 CentOS 或 SUSE),日志文件位置和 mount 命令可能会稍有不同。If you are using a different Linux distro, such as CentOS or SUSE, the log file locations and mount commands may be a little different. 请参阅具体分发版的文档,了解命令中有哪些相应的变化。Refer to the documentation for your specific distro for the appropriate changes in commands.

  1. 使用适当的凭据通过 SSH 登录到故障排除 VM。SSH to your troubleshooting VM using the appropriate credentials. 如果此磁盘是附加到故障排除 VM 的第一个数据磁盘,则它可能已连接到 /dev/sdcIf this disk is the first data disk attached to your troubleshooting VM, it is likely connected to /dev/sdc. 使用 dmseg 列出附加的磁盘:Use dmseg to list attached disks:

    dmesg | grep SCSI
    

    输出类似于以下示例:The output is similar to the following example:

    [    0.294784] SCSI subsystem initialized
    [    0.573458] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
    [    7.110271] sd 2:0:0:0: [sda] Attached SCSI disk
    [    8.079653] sd 3:0:1:0: [sdb] Attached SCSI disk
    [ 1828.162306] sd 5:0:0:0: [sdc] Attached SCSI disk
    

    在前面的示例中,OS 磁盘位于 /dev/sda,为每个 VM 提供的临时磁盘位于 /dev/sdbIn the preceding example, the OS disk is at /dev/sda and the temporary disk provided for each VM is at /dev/sdb. 如果有多个数据磁盘,它们应位于 /dev/sdd/dev/sde,依次类推。If you had multiple data disks, they should be at /dev/sdd, /dev/sde, and so on.

  2. 创建一个目录来装载现有的虚拟硬盘。Create a directory to mount your existing virtual hard disk. 以下示例创建一个名为 troubleshootingdisk 的目录:The following example creates a directory named troubleshootingdisk:

    sudo mkdir /mnt/troubleshootingdisk
    
  3. 如果现有的虚拟硬盘上有多个分区,则装载所需的分区。If you have multiple partitions on your existing virtual hard disk, mount the required partition. 以下示例在 /dev/sdc1 中装载第一个主分区:The following example mounts the first primary partition at /dev/sdc1:

    sudo mount /dev/sdc1 /mnt/troubleshootingdisk
    

    备注

    最佳做法是使用虚拟硬盘的全局唯一标识符 (UUID) 装载 Azure 中 VM 上的数据磁盘。Best practice is to mount data disks on VMs in Azure using the universally unique identifier (UUID) of the virtual hard disk. 对于此简短的故障排除方案,不必要使用 UUID 装载虚拟硬盘。For this short troubleshooting scenario, mounting the virtual hard disk using the UUID is not necessary. 但是,在正常使用时,编辑 /etc/fstab 以使用设备名称(而不是 UUID)装载虚拟硬盘可能会导致 VM 无法启动。However, under normal use, editing /etc/fstab to mount virtual hard disks using device name rather than UUID may cause the VM to fail to boot.

修复原始虚拟硬盘上的问题Fix issues on original virtual hard disk

装载现有虚拟硬盘后,可以根据需要执行任何维护和故障排除步骤。With the existing virtual hard disk mounted, you can now perform any maintenance and troubleshooting steps as needed. 解决问题后,请继续执行以下步骤。Once you have addressed the issues, continue with the following steps.

卸载并分离原始虚拟硬盘Unmount and detach original virtual hard disk

解决错误后,可从故障排除 VM 中分离现有虚拟硬盘。Once your errors are resolved, detach the existing virtual hard disk from your troubleshooting VM. 在将虚拟硬盘附加到故障排除 VM 的租约释放前,不能将该虚拟硬盘用于任何其他 VM。You cannot use your virtual hard disk with any other VM until the lease attaching the virtual hard disk to the troubleshooting VM is released.

  1. 通过 SSH 会话登录到故障排除 VM 中,卸载现有的虚拟硬盘。From the SSH session to your troubleshooting VM, unmount the existing virtual hard disk. 首先更改出装入点的父目录:Change out of the parent directory for your mount point first:

    cd /
    

    现在卸载现有的虚拟硬盘。Now unmount the existing virtual hard disk. 以下示例卸载 /dev/sdc1 中的设备:The following example unmounts the device at /dev/sdc1:

    sudo umount /dev/sdc1
    
  2. 现在从 VM 中分离虚拟硬盘。Now detach the virtual hard disk from the VM. 在门户中选择 VM,然后单击“磁盘”。 Select your VM in the portal and click Disks. 选择现有的虚拟硬盘,并单击“分离”: Select your existing virtual hard disk and then click Detach:

    分离现有虚拟硬盘

    等到 VM 成功分离数据磁盘,并继续操作。Wait until the VM has successfully detached the data disk before continuing.

交换 VM 的 OS 磁盘Swap the OS disk for the VM

Azure 门户现在支持更改 VM 的 OS 磁盘。Azure portal now supports change the OS disk of the VM. 为此,请按照下列步骤进行操作:To do this, follow these steps:

  1. 转到 Azure 门户Go to Azure portal. 在边栏中选择“虚拟机”,然后选择有问题的 VM。 Select Virtual machines from the sidebar, and then select the VM that has problem.

  2. 在左窗格中选择“磁盘”,然后选择“交换 OS 磁盘”。 On the left pane, select Disks, and then select Swap OS disk. 有关在 Azure 门户中交换 OS 磁盘的插图

  3. 选择已修复的新磁盘,然后键入 VM 的名称以确认更改。Choose the new disk that you repaired, and then type the name of the VM to confirm the change. 如果在列表中看不到该磁盘,请在从故障排除 VM 中分离磁盘后等待 10 到 15 分钟。If you do not see the disk in the list, wait 10 ~ 15 minutes after you detach the disk from the troubleshooting VM. 另外,请确保该磁盘与 VM 位于同一位置。Also make sure that the disk is in the same location as the VM.

  4. 选择“确定”。Select OK.

后续步骤Next steps

如果在连接到 VM 时遇到问题,请参阅排查 Azure VM 的 SSH 连接问题If you are having issues connecting to your VM, see Troubleshoot SSH connections to an Azure VM. 如果在访问 VM 上运行的应用时遇到问题,请参阅排查 Linux VM 上的应用程序连接问题For issues with accessing applications running on your VM, see Troubleshoot application connectivity issues on a Linux VM.

有关资源组的详细信息,请参阅 Azure 资源管理器概述For more information about using Resource Manager, see Azure Resource Manager overview.