Note: before blog.csdn.net/Allenzyg/ar wrote a document… “CDH6.3.1 Offline Deployment and Installation (3 nodes)”, why is CDH6.2.1 here? Because the CDH6.3.1 installed before has too little memory allocation (the computer memory itself is very small, so the allocation is less, otherwise the computer memory will be eaten up and the computer will freeze), it will be very slow after startup, and the execution is too time-consuming. The CDH6.2.1 cluster is our company’s environment, and the cluster environment is quite powerful. I use VPN remote connection, and the CDH cluster of our company will be used in future tests and projects.
Use Eclipse to connect to Hadoop on CDH6.2.1
1. Configure JDK and Hadoop environment variables on your PC (Windows)
2. After configuring environment variables, verify:
Win+R, enter CMD
The command window java-version is displayed
Go to the Hadoop bin directory: Hadoop Version
Download Eclipse and install Eclipse. 4. Copy the hadoop-eclipse-plugin-2.6.0.jar file to the plugins directory of Eclipse
5. Copy hadoop. DLL from E:\ hadoop-2.7.3\bin to C:\Windows\System32. The reason: If you don’t add, Error Exceptionin thread “main” java.lang.UnsatisfiedLinkError:org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/ String;JZ)V
6. Modify the C:\Windows\System32\drivers\etc\hosts file and add the REMOTE cluster IP address and hostname
7. Open Eclipse, go to Window >Preferences >Hadoop Map/Reduce, and fill in the Hadoop directory where you configure environment variables
8. Click the little blue elephant icon in the lower right corner
Before the change
After the modification
Click “Finish” in the lower right corner.
Note: Apache Hadoop uses ports 9000 and 9001, while Cloudera Manager uses port 8020
9. When you’re done, DFS Locations will appear on the left, which is exactly the same directory as your HDFS
If DFS Locations does not appear on the left side, go to Window –>show View –>other… Under MapReduce Tools, click Open.
Start writing wordCount code and create a new project
Data preparation:
Create a local wc. TXT file with the following contents:
hello nihao
allen I love you
zyg I love you
hello hadoop
bye hadoop
bye allen
bye zyg
Upload to HDFS:
The code is as follows:
package Hadoop; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { |
/ * |
- Mapper was followed by reducer.
- Inner classes: Mapper
,> - The source text is read first
/ public static class WcMap extends Mapper < Object, Text, Text, IntWritable > {/ / placeholder body, 1, Private final static IntWritable one=new IntWritable(1); // Text private Text word=new Text(); // Each call to the map method passes a split line of data. // Key: the location of the line in the file subscript, value: the contents of the line (data), context: the context object, which survives the entire wordCount operation cycle. [K,V] // Rewrite the map method to achieve the desired effect. There is only one instance of WcMap, but the map method of the instance is always executing, @override protected void map(Object key, Text value, Mapper
,>
- After the mapper is complete, perform the current reducer.
- Reducer
,>
/ public static class WcReduce extends Reducer < Text, IntWritable, Text, IntWritable > {/ / counter. Private IntWritable times=new IntWritable(); / *
- Rewrite the Reduce method to achieve the desired effect
- There is also only one instance of WcReduce, but the reduce method of the instance is executed until the count is complete
- Key: Words. Values: set of Values, i.e. [1,1,1… . * here here K,V like this [K,V[1,1,1…]]. Each time it is executed, key is a new word and values is its entire placeholder
*/ @Override protected void reduce(Text key, Iterable values, Reducer
.Context context) throws IOException, InterruptedException { int sum=0; For (IntWritable I :values){sum +=i.get(); for(IntWritable I :values){sum +=i.get(); // Count the number of key words. Int} times. Set (sum); I = IntWritable; Context. write(key, times); context. // Output to HDFS: /output To the result file}} public static void main(String[] args) throws Exception {//HDFS Configuration conf=new Configuration(); // Job (environment) Job Job =Job. GetInstance (conf); job.setJarByClass(WordCount.class); Job.setmapperclass (wcmap.class); // Class/Combiner that reads metadata and performs map operations
- Generally, each map may generate a large number of outputs. Combiner merges the outputs on the Map side to reduce the data transferred to the reducer.
- The input and output types of combiner must be the same as the output of mapper and the input type of reducer */
/ / job. SetCombinerClass (WcReduce. Class); Job. SetReducerClass (WcReduce. Class); Job. SetOutputKeyClass (text.class); // Set the output key to the same type as the context context object write. //job.setNumReduceTasks(1); Job.setoutputvalueclass (IntWritable. Class); // Set the number of Reduce jobs. . / / set the output value type FileInputFormat addInputPath (job, new Path (” HDFS: / / manager: 8020 / test/input/wc. TXT “)); / / meta data Path, a file or directory (input) must be existing FileOutputFormat. SetOutputPath (job, new Path (” HDFS: / / manager: 8020 / test/output/wc “)); Exit (job.waitForcompletion (true)? 1-0); // Wait until the job is submitted to the cluster and completed. Waiting for the job is completed, if the system is successful, it returns 0, otherwise it returns 1}} |
The result is as follows: