Through the article the hadoop trip 5 – the idea through the maven build HDFS environment, believe that everyone can do visit HDFS hadoop file system on the idea of development. A cloud disk can be based on such a system. If you’re interested, you can try it yourself.
Today we take you in the local implementation of Mapreduce, the number of words for statistics, generally used for debugging. Online mode is also very simple, just need to open a JAR package, online service throughHadoop jar XXXX. jar Package name + class
The command can be executed, can explore
MapReduce
MapReduce is a programming model for parallel computation of large data sets (larger than 1TB). It is also a core component of Hadoop, HDFS, a distributed file system, and MapReduce, a distributed computing framework. Designed for computing
The simplest flow chart
- Read the file
- The map operation
- Reduce the operating
- Output result file
Detailed process diagram
Begin to implement
1. The first
After preparing an input file input. TXT in any directory of the project, I created the SpringBoot project in map_input under Resources
c 2
c++ 1
java 2
python 1
Copy the code
2. Import dependencies
<properties>
<hadoop.version>2.7.3</hadoop.version>
</properties>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
Copy the code
3. Prepare your own map terminal
Implementing the Map side is actually quite simple. Just create a class that inherits the Mapper class and implements its Map methods.
Public class MyMapper extends Mapper<Object, Text, Text, LongWritable> {/** ** @param key * @param text specifies the value of the current line * @param context specifies the content of the file: * java,c++ * c,java * python,c */ @Override protected void map(Object key, Text text, Context Context) throws IOException, InterruptedException {// Get the current input. String line = text.toString(); [] lines = line.split(",");
for*/ context.write(new Text(word),new LongWritable(1)); }}}Copy the code
4. Implement the Reduce end
Similarly, inherit the Reducer class and implement the Reduce method
/** * The first two parameters: reducer input parameters, i.e. keys output from map, set of values (same keys will be merged together) * The last two parameters: reducer input parameters Reducer output keys, Value (that is, the final result) * in the reduce accumulation can be counted the number of each word * / public class MyReduce extends Reducer < LongWritable Text, Text, LongWritable > { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {/** * key,values * Uses the key value output in the map. * C ++:[1] * Java :[1,1] Java :[2] * python:[1] */ long sum = 0; // The total number of occurrences of the wordfor(LongWritable value : values) { sum+=value.get(); } context.write(key,new LongWritable(sum)); // Output results such as c:2, Java :1... }}Copy the code
5. Implement the client code
The calling code is pretty generic, basically the same template
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); //conf. Set ()"fs.defaultFS"."hdfs://master:9000/"); // master access path //conf.set("mapreduce.framework.name"."yarn"); // run yarn //conf.set("yarn.resourcemanager.hostname"."master"); // Set host Job Job = job.getinstance (conf); // Set the main class to run job.setJarbyClass (mapreduceclient.class); job.setJobName("wordCount"); / / set application name / / set the position of the input file FileInputFormat addInputPaths (job,"J:\\IDEA\\springboot-hadoop\\src\\main\\resources\\map_input"); / / set the position of the output file FileOutputFormat. SetOutputPath (job, new Path ("J:\\IDEA\\springboot-hadoop\\src\\main\\resources\\map_output")); Mapper and reducer job. SetMapperClass (mymapper.class); job.setReducerClass(MyReduce.class); Map.setoutputkeyclass (text.class); // Set input and output type (map and reduce are the same). job.setOutputValueClass(LongWritable.class); / / perform the job. WaitForCompletion (true);
}
Copy the code
To run it locally, you need to have the Windows executable file in the bin directory of local Hadoop. You can simply unzip Hadoop and copy the Windows execution package to the bin directory
The final result is as follows