Remember a pit you encountered using MapReduce for WordCount

First of all, the development environment is introduced. Hadoop is deployed on Ali Cloud and locally developed with IDEA. The purpose is to use MR to make a simple WC.

1. Mapper, Reducer and Driver related codes

1.1 Mapper Service code

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/** * The key type of data read by the keyin Map task. Offset is the offset of the starting position of each row of data. Long * Valuein Is the value type of data read by the Map task. String * Hello World welcome * Hello Welcome * Keyout Map custom implementation of the output key type, String * ValueOut map custom implementation of the value type, The Integer types * /
public class WordCountMapper extends Mapper<LongWritable.Text.Text.IntWritable>{

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // Split the row corresponding to value with the specified delimiter
        String[] words = value.toString().split("\t");
        for(String word : words){
            context.write(new Text(word),new IntWritable(1)); }}}Copy the code

1.2 Reducer service code

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;


public class WordCountReducer extends Reducer<Text.IntWritable.Text.IntWritable> {
    /** * (hello,1) (world,1) * (hello,1) (world,1) * (welcome,1) * <p> * Map output to reduce, * <p> * reduce1: (Hello,1) (Hello,1) (Hello,1) --> (Hello,<1,1,1>) * reduce2: (world, 1) (world, 1) (world, 1) - > (world,,1,1 < 1 >) * reduce3: (welcome, 1) - > (welcome, < 1 >) * *@param key
     * @param values
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        Iterator<IntWritable> iterator = values.iterator();
        while (iterator.hasNext()) {
            IntWritable value = iterator.next();
            count += value.get();
        }
        context.write(key,newIntWritable(count)); }}Copy the code

1.3 Driver Service code

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;

/** * Use MR to count the word frequency corresponding to files in HDFS */
public class WordCountApp {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        // System.setProperty("HADOOP_USER_NAME","root");
        Configuration configuration = new Configuration();
        configuration.set("fs.defaultFS"."hdfs://121.**.***.81:8020/");
        // configuration.set("dfs.client.use.datanode.hostname", "true");

        // Create a job
        Job job = Job.getInstance(configuration);
        // Set the main class of the job
        job.setJarByClass(WordCountApp.class);
        // Set job parameters: customize Mapper and Reducer classes
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        // Set Job parameters: Mapper Output key and value types
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        // Set parameters corresponding to the Job: Reducer output key and value types
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        // Set Job parameters: Mapper output key and value types, Job input and output paths
        FileInputFormat.setInputPaths(job, new Path("/wordcount/input"));
        FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));
        / / submit the job
        boolean res = job.waitForCompletion(true);
        System.exit(res ? 0 : -1); }}Copy the code

2. Summary of relevant errors and solutions

2.1 The first error

Solution Download Solution:

(1) Download WinUtils and make sure it matches the Hadoop version.

Hadoop2.2 can be downloaded hereGithub.com/srccodes/ha…

Hadoop version 2.6 can be downloaded hereGithub.com/amihalik/ha…

Since the test cluster configured is hadoop2.6.0, I downloaded version 2.6.0 here. Once downloaded, unzip it.

HADOOP_HOME = %HADOOP_HOME%\bin = %HADOOP_HOME%\bin = %HADOOP_HOME%\bin = %HADOOP_HOME%\bin This problem can be solved by setting environment variables to take effect.

2.2 Second Error

The solution is to delete all hadoop.dll files inside the folder on the previous basis

2.3 Mistake number three

The workaround, the custom NativeIO class, returns true between access methods

2.4 Error Number four

A message is displayed indicating that the Windows user has insufficient permissions. Add a line of code to operate with Linux internal users of Hadoop

System.setProperty("HADOOP_USER_NAME","root");
Copy the code

2.5 Mistake Number Five

Connection timeoutSolutions:

 configuration.set("dfs.client.use.datanode.hostname", "true");
Copy the code