Code warehouse address: https://gitee.com/jikeh/BigData

Development environment: Maven, IntelliJ IDEA

Environment construction: Hadoop single-node cluster environment construction

Entry project: Hadoop Actual combat: MapReduce WordCount instance

Advanced project: Hadoop combat: data preprocessing data sorting instance Hadoop combat: data preprocessing data deduplication instance (IP deduplication, URL deduplication)

Note: This example is similar to wordcount

1. Start Hadoop

CD/usr/local/Env/Hadoop/Hadoop – 2.6.5 / sbin /

./start-all.sh

2. HDFS creates a directory

hadoop fs -mkdir -p /jikeh/datascore/input

3. Upload the text file to HDFS

Test data:

Chinese:

Zhang SAN’s 78

Li si 89

Fifty and 96

Zhao six 67

Math:

Zhang SAN’s 88

Li si 99

Fifty and 66

Zhao six 77

English:

Zhang SAN’s 80

Li si 82

Fifty and 84

Zhao six 86

hadoop fs -copyFromLocal /usr/local/src/* /jikeh/datascore/input

4. Run datascore

Syntax: Hadoop jar ***.jar [input file]

Hadoop jar datascore – 0.0.1 – the SNAPSHOT. Jar com/jikeh/hadoop/datascore/datascore

5. View the running result

hadoop fs -ls /jikeh/datascore/output

hadoop fs -cat /jikeh/datascore/output/part-r-00000

Added: HDFS commands are commonly used

Hadoop fs-mkdir Create an HDFS directory

Hadoop fs -ls Lists HDFS directories

Hadoop fs -copyfromLocal Copies local files to HDFS

Hadoop fs -put Copies local files to HDFS

Hadoop fs-cat View the contents of files in the HDFS directory

Hadoop fs -copytolocal Copies files from the HDFS to a local directory

Hadoop fs-get Copies files from the HDFS to a local directory

Hadoop fs -cp Copies HDFS files

Hadoop fs -rm Deletes HDFS files

hadoop fs -rm -R /jikeh/datascore/output