使用 Azure HDInsight 对 Apache Hadoop HDFS 进行故障排除Troubleshoot Apache Hadoop HDFS by using Azure HDInsight

了解在 Apache Ambari 中使用 Hadoop 分布式文件系统 (HDFS) 有效负载时遇到的常见问题及其解决方法。Learn about the top issues and their resolutions when working with Hadoop Distributed File System (HDFS) payloads in Apache Ambari.

如何从群集内访问本地 HDFS?How do I access the local HDFS from inside a cluster?

问题Issue

从 HDInsight 群集内通过命令行和应用程序代码(而不是 Azure Blob 存储)访问本地 HDFS。Access the local HDFS from the command line and application code instead of by using Azure Blob storage from inside the HDInsight cluster.

解决步骤Resolution steps

  1. 在命令提示符下,按原义使用 hdfs dfs -D "fs.default.name=hdfs://mycluster/" ...,如以下命令中所示:At the command prompt, use hdfs dfs -D "fs.default.name=hdfs://mycluster/" ... literally, as in the following command:

    hdfs dfs -D "fs.default.name=hdfs://mycluster/" -ls /
    Found 3 items
    drwxr-xr-x   - hdiuser hdfs          0 2017-03-24 14:12 /EventCheckpoint-30-8-24-11102016-01
    drwx-wx-wx   - hive    hdfs          0 2016-11-10 18:42 /tmp
    drwx------   - hdiuser hdfs          0 2016-11-10 22:22 /user
    
  2. 从源代码按原义使用 URI hdfs://mycluster/,如以下示例应用程序中所示:From source code, use the URI hdfs://mycluster/ literally, as in the following sample application:

    import java.io.IOException;
    import java.net.URI;
    import org.apache.commons.io.IOUtils;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    
    public class JavaUnitTests {
    
        public static void main(String[] args) throws Exception {
    
            Configuration conf = new Configuration();
            String hdfsUri = "hdfs://mycluster/";
            conf.set("fs.defaultFS", hdfsUri);
            FileSystem fileSystem = FileSystem.get(URI.create(hdfsUri), conf);
            RemoteIterator<LocatedFileStatus> fileStatusIterator = fileSystem.listFiles(new Path("/tmp"), true);
            while(fileStatusIterator.hasNext()) {
                System.out.println(fileStatusIterator.next().getPath().toString());
            }
        }
    }
    
  3. 使用以下命令在 HDInsight 群集上运行已编译的 .jar 文件(例如,名为 java-unit-tests-1.0.jar 的文件):Run the compiled .jar file (for example, a file named java-unit-tests-1.0.jar) on the HDInsight cluster with the following command:

    hadoop jar java-unit-tests-1.0.jar JavaUnitTests
    hdfs://mycluster/tmp/hive/hive/5d9cf301-2503-48c7-9963-923fb5ef79a7/inuse.info
    hdfs://mycluster/tmp/hive/hive/5d9cf301-2503-48c7-9963-923fb5ef79a7/inuse.lck
    hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.info
    hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.lck
    

dudu

-du 命令显示给定目录中包含的文件和目录的大小或文件的长度(如果它只是一个文件)。The -du command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

-s 选项生成所显示文件长度的聚合汇总。The -s option produces an aggregate summary of file lengths being displayed.
-h 选项设置文件大小的格式。The -h option formats the file sizes.

示例:Example:

hdfs dfs -du -s -h hdfs://mycluster/
hdfs dfs -du -s -h hdfs://mycluster/tmp

rmrm

-rm 命令删除指定为参数的文件。The -rm command deletes files specified as arguments.

示例:Example:

hdfs dfs -rm hdfs://mycluster/tmp/testfile

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.