This article has been included in my Github selection, welcome to Star: github.com/yehongzhi/l…

Graphs is introduced

MapReduce is divided into Two parts, Map and Reduce. Mapper divides a large task into several smaller tasks for processing, while Reduce summarizes results in the Map phase.

For example, if we want to count the frequency of each word in a large text, WordCount. How does it work? See the picture below:

In the map stage, the input text is divided into words one by one. Key is the word and value is the number of occurrences. Then in the Reduce phase, the number of times for the same key is increased by 1. Finally get the result, output to the file save.

WordCount example

Now, how do I implement WordCount?

Create a project

First we need to create a Maven project that relies on the following:


      
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>io.github.yehongzhi</groupId>
    <artifactId>hadooptest</artifactId>
    <version>1.0 the SNAPSHOT</version>
    <packaging>jar</packaging>

    <repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.0</version>
        </dependency>
    </dependencies>
</project>
Copy the code

The first step is the Mapper stage, which creates the WordcountMapper class:

/** * Mapper has four generic arguments to fill in * the first argument KEYIN: by default, is the starting offset of a line of text read by the Mr Framework, of type LongWritable * the second argument VALUEIN: By default, this is the content of a line of Text read by the Mr Framework, of type Text *. The third argument KEYOUT: is the key to output data after the logical processing has finished, in this case each word, of type Text *. The fourth argument VALUEOUT: Is the value of the output data after the logical processing is complete, in this case, the number of times, and the type is Intwriterable * */
public class WordcountMapper extends Mapper<LongWritable.Text.Text.IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // Convert the input text to String
        String string = value.toString();
        // Separate each word with a space
        String[] words = string.split("");
        // Output < word, 1>
        for (String word : words) {
            // use the word as key and count 1 as value
            context.write(new Text(word), new IntWritable(1)); }}}Copy the code

Next to the Reduce phase, create the WordcountReduce class:

/** * KEYIN, VALUEIN, corresponding to mapper phase KEYOUT,VALUEOUT type ** KEYOUT,VALUEOUT, is the output data type of reduce logical processing results ** KEYOUT is a word, The value is Text * VALUEOUT, which indicates the total number of times. The value is IntWritable */
public class WordcountReduce extends Reducer<Text.IntWritable.Text.IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        // Add The Times
        for (IntWritable value : values) {
            count += value.get();
        }
        // Output < word, total number >
        context.write(key, newIntWritable(count)); }}Copy the code

Finally, create the WordCount class to provide an entry:

public class WordCount {
    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);
        // Specify the local path where the jar package of the program is stored to submit the JAR package to YARN
        job.setJarByClass(WordCount.class);
        /* * Tells the framework which class to call and specifies the mapper/Reducer business class that this job should use
        job.setMapperClass(WordcountMapper.class);
        job.setReducerClass(WordcountReduce.class);
        /* * Mapper output data KV type */
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // Specify the kv type of the final output data
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        // Specify the directory where the job input file resides
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        // Specify the directory where the output of the job is located
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        boolean completion = job.waitForCompletion(true);
        System.exit(completion ? 0 : 1); }}Copy the code

So I’m done here. Next, use Maven to package jars and upload them to the server where Hadoop is deployed.

Upload files to Hadoop

Then upload the text file that needs to be counted to Hadoop. Here I take a random Redis configuration file and upload it.

TXT and upload to /usr/local/hadoop-3.2.2/input using FTP. Then create /user/root folder in Hadoop.

HDFS DFS -mkdir /user HDFS DFS -mkdir /user/root hadoop fs -mkdir input // Upload files to HDFS hadoop fs -put TXT input/ / After the upload is successful, you can run the following command to view hadoop fs -ls /user/root/inputCopy the code

Execute a program

The first step is to start Hadoop. Run the./start-all.sh command in the sbin directory.

Run the following command to execute the jar package:

Hadoop jar /usr/local/hadoop-3.2.2/jar/ hadooptest-1.0-snapshot. jar WordCount input output# /usr/local/ hadooptest-1.0-snapshot. jar indicates the location of the JAR package
#WordCount is the name of the class
#Input is the folder where the input file resides
#Output Indicates the output folder
Copy the code

This indicates success. We open the Web administration interface and find the Output folder.

The output is this file, downloaded.

Then open the file and you can see the statistical results, some of which are shown in the following screenshot:

Problems encountered

If the Running Job does not respond, change the mapred-site. XML file:

Before the change:

<configuration>
    <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
    </property>
</configuration>
Copy the code

After the changes:

<configuration>
    <property>
          <name>mapreduce.job.tracker</name>
          <value>HDFS: / / 192.168.1.4:8001</value>
          <final>true</final>
     </property>
</configuration>
Copy the code

Then restart Hadoop and run the command to run the JAR package task.

conclusion

WordCount is the big data HelloWord program, and it is very helpful for beginners to learn the basic operation of MapReduce through this example. Next, I will continue to learn knowledge related to big data. I hope this article will be helpful to you.

Please give me a thumbs-up if you think it is useful. Your thumbs-up is the biggest motivation for my creation

I’m a programmer who tries to be remembered. See you next time!!

Ability is limited, if there is any mistake or improper place, please criticize and correct, study together!