方案:本地 HDFS 在 Azure HDInsight 群集上的安全模式下停止响应Scenario: Local HDFS stuck in safe mode on Azure HDInsight cluster

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方案。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题Issue

本地 Apache Hadoop 分布式文件系统 (HDFS) 在 HDInsight 群集上的安全模式下停止响应。The local Apache Hadoop Distributed File System (HDFS) is stuck in safe mode on the HDInsight cluster. 收到如下错误消息:You receive an error message similar as follows:

hdiuser@spark2:~$ hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
17/04/05 16:20:52 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.mkdirs over spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.chinacloudapp.cn/10.0.0.22:8020. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /temp. Name node is in safe mode.
It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359)
...
mkdir: Cannot create directory /temp. Name node is in safe mode.

原因Cause

HDInsight 群集已纵向缩减为以下非常少的节点,或者节点数接近 HDFS 复制因子。The HDInsight cluster has been scaled down to very few nodes below, or number of nodes is close to the HDFS replication factor.

解决方法Resolution

  1. 使用以下命令报告 HDInsight 群集上的 HDFS 状态:Report on the status of HDFS on the HDInsight cluster with the following command:

    hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
    
  2. 使用以下命令检查 HDInsight 群集上的 HDFS 完整性:Check on the integrity of HDFS on the HDInsight cluster with the following command:

    hdiuser@spark2:~$ hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
    
  3. 如果确定没有块处于缺失、损坏或复制状态,或者确定可以忽略这些块,请运行以下命令,使指定节点脱离安全模式:If determined there are no missing, corrupt or under replicated blocks or those blocks can be ignored run the following command to take the name node out of safe mode:

    hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -safemode leave
    

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.