在 Apache Hadoop on HDInsight 中使用 MapReduceUse MapReduce in Apache Hadoop on HDInsight
了解如何在 HDInsight 群集上运行 MapReduce 作业。Learn how to run MapReduce jobs on HDInsight clusters.
示例数据Example data
HDInsight 提供存储在 /example/data
和 /HdiSamples
目录中的各种示例数据集。HDInsight provides various example data sets, which are stored in the /example/data
and /HdiSamples
directory. 这些目录位于群集的默认存储中。These directories are in the default storage for your cluster. 在本文档中,我们使用 /example/data/gutenberg/davinci.txt
文件。In this document, we use the /example/data/gutenberg/davinci.txt
file. 此文件包含 Leonardo Da Vinci 的笔记本。This file contains the notebooks of Leonardo Da Vinci.
MapReduce 示例Example MapReduce
MapReduce 单词计数应用程序示例包含在 HDInsight 群集中。An example MapReduce word count application is included with your HDInsight cluster. 此示例位于群集默认存储的 /example/jars/hadoop-mapreduce-examples.jar
中。This example is located at /example/jars/hadoop-mapreduce-examples.jar
on the default storage for your cluster.
以下 Java 代码是包含在 hadoop-mapreduce-examples.jar
文件中的 MapReduce 应用程序的源代码:The following Java code is the source of the MapReduce application contained in the hadoop-mapreduce-examples.jar
file:
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
有关编写自己的 MapReduce 应用程序的说明,请参阅以下文档:For instructions to write your own MapReduce applications, see the following document:
运行 MapReduceRun the MapReduce
HDInsight 可以使用各种方法运行 HiveQL 作业。HDInsight can run HiveQL jobs by using various methods. 使用下表来确定哪种方法最适合用户,并访问此链接进行演练。Use the following table to decide which method is right for you, then follow the link for a walkthrough.
使用此方法...Use this... | ...实现此目的...to do this | ...从此 客户端操作系统...from this client operating system |
---|---|---|
SSHSSH | 通过 SSHUse the Hadoop command through SSH | Linux、Unix、Mac OS X 或 WindowsLinux, Unix, Mac OS X, or Windows |
CurlCurl | 使用 RESTSubmit the job remotely by using REST | Linux、Unix、Mac OS X 或 WindowsLinux, Unix, Mac OS X, or Windows |
Windows PowerShellWindows PowerShell | 使用 Windows PowerShellSubmit the job remotely by using Windows PowerShell | WindowsWindows |
后续步骤Next steps
若要了解如何使用 HDInsight 中的数据的详细信息,请参阅以下文档:To learn more about working with data in HDInsight, see the following documents: