
I’ve been busy for a few weeks, I haven’t been blogging for two weeks, but this week I finally got my Hadoop instance running and ran the official WordCount example (used to count the number of words appearing in a file). What follows is a record of my successful running of the instance. The premise is to install and configure Hadoop (see my previous blog: Hadoop pseudo-distributed Installation record)

Operation steps:

1. Prepare a file containing words and upload the file to the Linux server. Contents of the document:

hello world hello hadoop
abc hadoop aabb hello word
count test hdfs mapreduce
Copy the code

2. Run HDFS commands to create a directory for input files (HFDS commands are similar to Linux commands). Hadoop fs-mkdir /input/wordcount then create an output directory /output to store the subsequent hadoop running results

3. Then place the file into the hadoop file system. Hadoop fs-put /home/file1/inpu/wordcount After creating the file, run ls to check whether the file exists in Hadoop fs-ls -r /

4. Enter hadoop share/ Hadoop/mapReduce. Hadoop jar hadoop jar hadoop mapreduce-examples-3.1.2.jar Hadoop jar hadoop mapreduce-examples-3.1.2.jar

Hadop jar hadop-mapreduce-examples -3.1.2.jar /input/wordcount /output/wordcountCopy the code

The last two parameters are the input path of the file, which is the path we created the HDFS before, and the second parameter is the output path of the file, if not, Hadoop will create its own. 5. Then, map is carried out first. In the process of reduce, it can be understood as a divide-and-conquer step. Map is the intermediate result of processing files on multiple machines, and then the result is summarized by Reduce (reduction and aggregation). Map is executed first and then Reduce is executed.


Although it seems that the steps are not many, the content is relatively simple, but the pit is still quite many. Points to note: 1. Set the hostname of pseudo-distributed hadoop to be consistent with the configuration file. If the configuration file fails, set the hostname to If the memory size is too small, it will always be stuck in the link of Runing Job or 0% of map. In this case, go to yarn-site to set the memory size (according to the actual server memory Settings, I set the memory size to 2048 MB) 3. If you find a stuck link, check logs in the Hadoop installation directory. There are many log types in the logs, including nodeManageer and resourceManager. The logs contain relevant logs and prompts to help you find problems.