This article has participated in the weekend learning program, click the link to see more details: juejin.cn/post/696572…
Hadoop’s MapReduce framework solves the case of counting word occurrences: The project structure is as follows:
-wordcount
-WordcountMapper.java
-WordcountReducer.java
-JobSubmitter.java
Copy the code
Mapper. Java: responsible for handling the each line of a text file, the line split into several independent word Reducer, Java: responsible for the statistics of the number of times each word (receives a set of data with the word as the key) JobSubmitter. Java: responsible for submitting
The detailed code is as follows: wordcountmapper.java
package wordcount; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Mapper.Context; public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for (String word : words) { context.write(new Text(word), new IntWritable(1)); }}}Copy the code
WordcountReducer.java
package wordcount; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Reducer.Context; public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int count = 0; Iterator<IntWritable> iterator= values.iterator(); while(iterator.hasNext()) { IntWritable value = iterator.next(); count += value.get(); } context.write(new Text(key), new IntWritable(count)); }}Copy the code
JobSubmitter.java
package wordcount; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class JobSubmitter { public static void main(String[] args) throws Exception { System.setProperty("HADOOP_USER_NAME", "student"); Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(JobSubmitter.class); job.setMapperClass(WordcountMapper.class); job.setReducerClass(WordcountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Path output = new Path("/user/joe/wordcount/output2"); FileSystem fs = FileSystem.get(new URI("hdfs://hdp-01:9000"), conf, "student"); if (fs.exists(output)) { fs.delete(output, true); } FileInputFormat.setInputPaths(job, new Path("/user/joe/wordcount/input")); FileOutputFormat.setOutputPath(job, output); job.setNumReduceTasks(2); boolean res = job.waitForCompletion(true); System.exit(res ? 0:1); }}Copy the code
Running steps
- Submit a text file whose statistics are to be collected to the HDFS
Hadoop fs -put Local file path HDFS Absolute pathCopy the code
Package the file as wordcount.jar. 3
hadoop jar wordcount.jar wordcount.JobSubmitter
Copy the code