Hadoop-2.6.0 Environment Setup Streamlined ultimate guide

 

HopToad

Welcome to follow the wechat public account: HopToad

Please mark the place for reprinting

1. Software download

From the website apache.fayea.com/hadoop/comm… Download hadoop

Download the JDK from the Oracle website

www.oracle.com/technetwork… (1.8.25)

The Hadoop-Example JAR is used for simple testing

www.java2s.com/Code/Jar/h/…

2. Hardware preparation

Prepare 3 or 4 machines

I prepared three virtual machines this time.

One master and two slaves

 

3. Operation steps

A) Installing a 64-bit operating system (such as REHL 6.5)

B) Set the host name (for overall planning)

192.168.1.200 master

192.168.1.201 slave1

192.168.1.202 slave2

Copy IP resolution to /etc/hosts on each machine.

C) Set SSH password-free access (implement primary node to all secondary nodes)

A) Run ssh-keygen -t rsa on each node and add the contents of ~/. SSH /id_rsa.pub to the ~/. SSH /authorized_keys file on the master node.

 

D) to install the JDK

Decompress the JDK package as follows:

tar zxvfjdk-8u25-linux-x64.gz

Editing a Configuration File

vi/etc/profile

Add as follows:

JAVA_HOME=/opt/jdk

CLASSPATH=.:$JAVA_HOME/lib.tools.jar

PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$PATH

exportJAVA_HOME CLASSPATH PATH

Run the following to confirm the JDK installation

java-version

E) the Hadoop installation

A) decompression

Tar – zxvfhadoop – server. Tar. Gz

B) Edit the configuration file

vi/etc/profile

HADOOP_HOME=/opt/hadoop

HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

PATH=$HADOOP_HOME/bin:$PATH

exportHADOOP_HOME HADOOP_CONF_DIR PATH

C) The configuration file takes effect

source/etc/profile

D) to modify the core – site. XML

Under/opt/hadoop/etc/hadoop core – site. XML

As follows:

<configuration>

 <property>

   <name>fs.defaultFS</name>

   <value>hdfs://master:9000</value>

   <description>NameNode URI.</description>

 </property>

 <property>

   <name>io.file.buffer.size</name>

   <value>131072</value>

   <description>Size of read/write buffer used inSequenceFiles.</description>

 </property>

 <property>

   <name>hadoop.tmp.dir</name>

   <value>/data/hadoop/tmp</value>

   <description>A base for other temporary directories.</description>

 </property>

</configuration>

 

E) editing HDFS – site. XML

As follows:

<configuration>

 <property>

   <name>dfs.namenode.secondary.http-address</name>

   <value>master:50090</value>

   <description>The secondary namenode http server address andport.</description>

 </property>

 <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:///data/dfs/name</value>

   <description>Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.</description>

 </property>

 <property>

   <name>dfs.datanode.data.dir</name>

   <value>file:///data/dfs/data</value>

   <description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>

 </property>

 <property>

   <name>dfs.namenode.checkpoint.dir</name>

   <value>file:///data/dfs/namesecondary</value>

   <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>

 </property>

</configuration>

 

F) Edit slaves file

As follows:

master

slave1

slave2

 

G) start the HDFS

Format the Namenode first

HDFS namenode – format

Start the DFS

Set the/opt/hadoop/etc/hadoop/hadoop – env. JAVA variable of sh file.

start-dfs.sh

Check the process

jps

Primary node: [root@master sbin]# JPS

2291 DataNode

2452SecondaryNameNode

2170 NameNode

2573 Jps

From the node:

[[email protected]]# jps

1841 DataNode

1917 Jps

 

H) editing yarn – site. XML

As follows:

<configuration>

 <property>

   <name>yarn.resourcemanager.hostname</name>

<value>master</value>

<description>The hostname of theRM.</description>

 </property>

 <property>

   <name>yarn.nodemanager.aux-services</name>

   <value>mapreduce_shuffle</value>

   <description>Shuffle service that needs to be set for Map Reduceapplications.</description>

 </property>

</configuration>

 

I) editing mapred – site. XML

As follows:

<configuration>

  <property>

   <name>mapreduce.framework.name</name>

<value>yarn</value>

<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>

  </property>

  <property>

   <name>mapreduce.jobhistory.address</name>

    <value>master:10020</value>

    <description>MapReduce JobHistoryServer IPC host:port</description>

  </property>

  <property>

   <name>mapreduce.jobhistory.webapp.address</name>

    <value>master:19888</value>

    <description>MapReduce JobHistoryServer Web UI host:port</description>

  </property>

</configuration>

 

 

 

 

J) Start YARN resource management

Execute as follows:

start-yarn.sh

Execute JPS to view.

To complete.

The Jps view

The master node:

[root@master sbin]# jps

2720 NodeManager

2291 DataNode

2452 SecondaryNameNode

2953 Jps

2170 NameNode

2621 ResourceManager

From the node:

[root@slave1 .ssh]# jps

1841 DataNode

2082 Jps

1958 NodeManager

4. Simple tests

For example, run the following command on master:

#hadoop jar hadoop-examples-1.2.1.jar pi 1 10

This command tests distributed computing performance and calculates PI values. The first 10 refers to the number of 10 map tasks to be run. The second number refers to the number of jobs to be split by each map task.

Can be achieved by

IE for graphical observation

Figure 1:

 

 

\