排查文件系统错误导致的 Linux VM 启动问题Troubleshoot Linux VM starting issues due to file system errors

无法使用安全外壳 (SSH) 与 Azure Linux 虚拟机 (VM) 建立连接。You cannot connect to an Azure Linux virtual machine (VM) by using Secure Shell (SSH). Azure 门户上运行启动诊断功能时,看到类似于以下示例的日志条目。When you run the Boot Diagnostics feature on Azure portal, you see log entries that resemble the following examples.

示例Examples

下面是可能的错误示例。The following are examples of possible errors.

示例 1Example 1

Checking all file systems.
[/sbin/fsck.ext4 (1) — /] fsck.ext4 -a /dev/sda1
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY

示例 2Example 2

EXT4-fs (sda1): INFO: recovery required on readonly filesystem
EXT4-fs (sda1): write access will be enabled during recovery
EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure
EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Making fs in need of filesystem check.

示例 3Example 3

[  14.252404] EXT4-fs (sda1): Couldn't remount RDWR because of unprocessed orphan inode list.  Please unmount/remount instead
An error occurred while mounting /.

示例 4Example 4

此示例是使用干净 fsck 导致的。This example is caused by a clean fsck. 在这种情况下,还会有其他数据磁盘(/dev/sdc1 和 /dev/sde1)附加到 VM。In this case, there are also additional data disks attached to the VM (/dev/sdc1 and /dev/sde1).

Checking all file systems. 
[/sbin/fsck.ext4 (1) — /] fsck.ext4 -a /dev/sda1
/dev/sda1: clean, 65405/1905008 files, 732749/7608064 blocks
[/sbin/fsck.ext4 (1) — /tmp] fsck.ext4 -a /dev/sdc1
[/sbin/fsck.ext4 (2) — /backup] fsck.ext4 -a /dev/sde1
/dev/sdc1: clean, 12/1048576 files, 109842/4192957 blocks
/dev/sde1 : clean, 51/67043328 files, 4259482/268173037 blocks

如果文件系统未彻底关闭或者有存储相关问题,则可能会出现此问题。This problem may occur if the file system was not shut down cleanly or storage related issues. 这些问题包括硬件或软件错误、驱动程序或程序问题、写入错误等。备份关键数据始终很重要。The issues include hardware or software errors, issues with drivers or programs, write errors, etc. It is always important to have a backup of critical data. 本文中介绍的工具可用于恢复文件系统,但仍可能出现数据丢失的情况。The tools that describe in this article may help recover file systems, but it is data loss can still occur.

Linux 提供了多个文件系统检查程序。Linux has several file system checkers available. Azure 中最常见的发行版为:FSCKE2FSCKXfs_repairThe most common for the distributions in Azure are: FSCK, E2FSCK, and Xfs_repair.

解决方法Resolution

若要解决此问题,请参阅本文的脱机修复 VM 部分。To resolve this problem, see the Repair the VM offline section of this article.

修复 VM 脱机Repair the VM offline

  1. 将 VM 的系统磁盘作为数据磁盘附加到恢复 VM(任何正常工作的 Linux VM)。Attach the system disk of the VM as a data disk to a recovery VM (any working Linux VM). 为此,可以使用 CLI 命令,或者使用 VM 修复命令自动设置恢复 VM。To do this, you can use CLI commands or you can automate setting up the recovery VM using the VM repair commands.

  2. 找到附加的系统磁盘的驱动器标签。Locate the drive label of the system disk that you attached. 在此示例中,我们假定附加的系统磁盘的驱动器标签为 /dev/sdc1。In this case, we assume that the label of the system disk that you attached is /dev/sdc1. 请将它替换为 VM 的相应值。Replace it with the appropriate value for your VM.

  3. 将 xfs_repair 与 -n 选项配合使用,以便检测文件系统中的错误。Use xfs_repair with the -n option to detect the errors in the file system.

    xfs_repair -n /dev/sdc1
    
  4. 运行以下命令来修复文件系统:Run the following command to repair the file system:

    xfs_repair /dev/sdc1
    
  5. 如果收到错误消息:“错误:文件系统在需要重播的日志中有重要的元数据更改”,请创建一个临时目录并装载文件系统:If you receive the error message "ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed", create a temporary directory and mount the filesystem:

    mkdir /temp
    
    mount /dev/sdc1 /temp
    

    如果磁盘无法装载,请使用 -L 选项运行 xfs_repair 命令(强制日志归零):If the disk fails to mount, run the xfs_repair command with the -L option (force log zeroing):

    xfs_repair /dev/sdc1 -L
    
  6. 接下来,尝试装载文件系统。Next, try to mount the file system. 如果磁盘装载成功,则会收到以下输出:If the disk is mounted successfully, you will receive the following output:

    XFS (sdc1): Mounting V1 Filesystem
    
    XFS (sdc1): Ending clean mount
    
  7. 请卸载并分离原始虚拟硬盘,然后从原始系统磁盘创建 VM。Unmount and detach the original virtual hard disk, and then create a VM from the original system disk. 为此,可以使用 CLI 命令VM 修复命令(如果使用这些命令创建了恢复 VM)。To do this, you can use CLI commands or the VM repair commands if you used them to create the recovery VM.

  8. 查看问题是否得到解决。Check if the problem is resolved.

后续步骤Next steps