The most complete! Hadoop Installation tutorial for babysitters

Learn big data, no matter how always can not get around the little yellow elephant HadoopThe installation of Hadoop can be said to be the first step to enter the field of big data. As a student who is still studying big data in school, I have accumulated some experience after years of learning, so I want to introduce a wave of nanny Hadoop installation teaching.

By default, you have some basic Linux, vmware Workstation and other similar virtual machine installation software installed on your computer (of course, you have the money to buy a cloud server, forget that I said).

Linux VM installation

Here we download the mirror portal of centos7.6
Install the centos VM
- Click Create vm
- Select the installation cd-rom column and import the image we have created. It will detect the system we want to install
- Click Next and click Finish. This will open the virtual machine automatically
- Press Enter until the visual interface is displayed. Set the installation language to Chinese
- We’d better choose minimum installation for speed
- The next step is to set the password of the root account and create the user we normally use
- Then there was a slow wait (about five or six minutes)
- The hou click on the restart is done!

Preparations before Hadoop installation

Static IP address and host name configuration

Open the ifcfg-ens33 file to modify the configuration

vi /etc/sysconfig/network-scripts/ifcfg-ens33 ............ BOOTPROTO=static # Change DHCP to static ONBOOT=yes # Change no to yes IPADDR=192.168.10.200 # Add IPADDR attribute and IP address PREFIX=24 GATEWAY DNS1=114.114.114.114 # Add DNS1 and backup DNS DNS2=8.8.8.8Copy the code

Restart the Network service

systemctl restart network 
# or
service network restart
Copy the code

Changing the host name
```
hostnamectl set-hostname master
Copy the code
```
Note: After configuring the IP address and host name, reboot

Configure the /etc/hosts file

vi /etc/hosts
#Add it later
192.168.216.114 master
Copy the code

Disabling the firewall

systemctl stop firewalld
systemctl disable firewalld
#It is also a good idea to turn off selinux, which isa security mechanism on Linux systems. Go to the file and set selinux to Disabled
vi /etc/selinux/config
SELINUX=disabled
Copy the code

Time synchronization
- Enter tzselect and select 5, 9, 1, and 1
- Download the NTP
```
yum install -y ntp
Copy the code
```
- Configure the/etc/NTP. Conf
```
Vim /etc/ntp.conf server 127.127.1.0 fudge 127.127.1.0 stratum 10Copy the code
```
- Start :/bin/systemctl restart ntpd.service
Set no-password login
- Press enter after ssh-keygen
- Ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@master Enter the password

Hadoop single-node installation and configuration

The JDK installation

Check to see if the JDK has been installed or built into the system, and if so, uninstall it

RPM - qa | grep JDK # if you have, please uninstall the RPM -e XXXXXXXX -- nodeps # will query to the built-in JDK forced unloadingCopy the code

Upload the JDK to /opt/software/

Decompress the JDK to /opt/apps/

cd /opt/software
tar -zxvf jdk-8u152-linux-x64.tar.gz -C /opt/apps/
Copy the code

Renamed the JDK

cd /opt/apps
mv jdk-8u152/ jdk
Copy the code

Configure the Jdk environment variable /etc/profile

vim /etc/profile
#The back to add
#jdk environment
export JAVA_HOME=/opt/apps/jdk
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
Copy the code

Make the current window effective
```
source /etc/profile
Copy the code
```
Verifying the Java environment
```
java -version
javac
Copy the code
```

Hadoop Standalone Installation

Upload hadoop to /opt/software/

/opt/apps/

CD /opt/software/ tar -zxvf hadoop-2.1.6.tar. gz -c /opt/apps/Copy the code

Renamed the hadoop

CD /opt/apps mv hadoop-2.7.6/ hadoopCopy the code

Configure environment variables for Hadoop

vi /etc/profile
#hadoop environment
export HADOOP_HOME=/opt/apps/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Copy the code

Make the current window effective
```
source /etc/profile
Copy the code
```
Verify the hadoop
```
hadoop version
Copy the code
```

Configure the hadoop-env.sh file

vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#Change the followingExport JAVA_HOME = / simple/jdk1.8.0 _152Copy the code

Hadoop pseudo-distributed installation and configuration

Pseudo distributed mode is introduced

First of all, we need to understand the characteristics of pseudo distributed 1. Characteristics

Install on one machine, using the distributed idea, that is, distributed file system, non-local file system.
Hdfs involved a daemon (namenode, datanode, secondarynamenode) are running on a machine, is independent of the Java process.

More code debugging than Standalone mode, allowing you to check memory usage, HDFS input/output, and other daemon interactions.

Since we have already configured the non-secret login static IP host mapping and installed JDK and Hadoop, we are going to go straight to the file configuration

Configuration file

The core – site. XML configuration

[root@master ~]# cd $HADOOP_HOME/etc/haoop [root@master hadoop]# vi core-site.xml <configuration> <! </name> <value> HDFS ://localhost/</value> </property> </configuration> Extensions: the default hadoop1.x port is 9000, and the default hadoop2.x port is 8020. Use either portCopy the code

HDFS – site. The XML configuration

[root@master hadoop]# vi hdfs-site.xml <configuration> <! -- Configure the number of copies. Note that the pseudo-distribution mode can only be 1. --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>Copy the code

Configuration of hadoop-env.sh: specify the JDK environment (as in single-machine mode, not detailed here)

Format the NameNode

hdfs namenode -format
Copy the code

Start the cluster.

start-dfs.sh
Copy the code

JPS View the process

WebUI_50070

You can enter: 192.168.10.200:50070 in the browser to view the pseudo distributed cluster information –1. Browse the page for ClusterID,BlockPoolID –2. Check the number of Live Nodes. It should be 1

Simple explanation:

Compiled Hadoop is Compiled by kshvachk tool. Cluster ID: Cluster ID. Block Pool ID: ID of the Block Pool of datanodes

Fully distributed cluster installation and configuration

Single machine and pseudo distributed can not be used in production environment, can only be used in the daily debugging and learning, we really use or fully distributed cluster

Vm Description

Use vm cloning to clone two VMS. The configurations of the three VMS are as follows

Host name IP Master 192.168.10.200 slave1 192.168.10.201 slave2 192.168.10.202

Note: if the clone operation is slave1 slave2, you do not need to turn off the firewall. Just add the mapping between the two clones in /etc/hosts and change the IP address

Note, note, note: 1. If you are from pseudo-distributed, it is best to shut down the related daemon of pseudo-distributed first: stop-all.sh 2. Delete the namenode and Datanode directories. Delete the namenode and Datanode directories. Delete the namenode and Datanode directories. Delete the namenode and Datanode directories

Daemon layout

Let’s set up the full distribution of HDFS and set up YARN. The layout of HDFS and YARN daemons is as follows:

master: namenode,datanode,ResourceManager,nodemanager slave1: datanode,nodemanager,secondarynamenode slave2: datanode,nodemanager

Hadoop configuration file configuration

Note before configuration: 1. We first configure hadoop related properties on the Master machine node. In 2.<value></value>3. After the master is configured, clone two VMS and change their IP addresses
Configure the core-site. XML file

[root@master ~]# cd $HADOOP_HOME/etc/hadoop/
[root@master hadoop]# vi core-site.xml<configuration> <! DefaultFS </name> <value> HDFS ://master:8020</value> </property> <! -- HDFS base path, </name> <value>/opt/apps/ TMP </value> </property> </configuration>Copy the code

Configure the HDFS-site. XML file

[root@master hadoop]# vi core-site.xml<configuration> <! -- location of metadata file fsimage managed by the namenode daemon --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/dfs/name</value> </property> <! -- Determine where the DFS datanode should store its blocks in the local file system --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/dfs/data</value> </property> <! -- Number of copies of the block --> <property> <name>dfs.replication</name> <value>3</value> </property> <! -- block size (128--> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <! -- SecondaryNamenode HTTP address: host name and port number of the daemon. Reference daemon layout - > < property > < name > DFS. The namenode. Secondary. HTTP-address</name>
<value>slave1:50090</value> </property> <! -- file detection directory --> <property> <name>fs.checkpoint.dir</name> <value>file:///${hadoop.tmp.dir}/checkpoint/dfs/cname</value> </property> <! </name> fs.checkpoint.edits.dir</name> <value>file:///${hadoop.tmp.dir}/checkpoint/dfs/cname</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
</configuration>
Copy the code

Configure the mapred-site. XML file

XML and HDFS-site. XML files are required. However, to learn MapReduce, YARN resource manager is required. Therefore, configure related files in advance

[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master hadoop]# vi mapred-site.xml<configuration> <! -- Specify mapReduce to use YARN Resource Manager --> <property> <name>mapreduce.framework. Name </name> <value>yarn</value> </property> <! - configuration operation history server address - > < property > < name > graphs. The jobhistory. Address < / name > < value > master:10020</value> </property> <! - configuration operation history server HTTP address - > < property > < name > graphs. The jobhistory. Webapp. Address < / name > < value > master:19888</value>
</property>
</configuration>
Copy the code

Configure the yarn-site. XML file

[root@master hadoop]# vi yarn-site.xml<configuration> <! -- Specify yarn shuffle technology --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <! - specifies the resourcemanager host name -- > < property > < name > yarn. The resourcemanager. The hostname < / name > < value > master < value > / < / property > <! -- Optional below --> <! -- Specify the class corresponding to shuffle --> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <! - configure the resourcemanager internal address - > < property > < name > yarn. The resourcemanager. Address < / name > < value > master:8032</value> </property> <! - internal correspondence address allocation of the resourcemanager scheduler - > < property > < name > yarn. The resourcemanager. Scheduler. The address < / name > < value > master:8030</value> </property> <! - configuration of resource scheduling resoucemanager internal address - > < property > < name > yarn. The resourcemanager. The resource-tracker.address</name>
<value>master:8031</value> </property> <! - configure the resourcemanager internal communications at the address - > < property > < name > yarn. The resourcemanager. Admin. Address < / name > < value > master:8033</value> </property> <! - configure the resourcemanager web UI monitoring page - > < property > < name > yarn. The resourcemanager. Webapp. Address < / name > < value > master:8088</value>
</property>
</configuration>
Copy the code

Configure the hadoop-env.sh script (the same as the single-machine deployment mode).
Configure the slaves file, which is used to specify the host name of the machine node where the Datanode daemon is located

[root@master hadoop]# vi slaves
master
slave1
slave2
Copy the code

Configure the yarn-env.sh file. You do not need to configure this file. However, it is better to modify the JDK environment of Yarn

Configuration instructions for the other two machines

After configuring hadoop related files on the Master machine, we have the following two ways to configure Hadoop on other machines.

SCP synchronization (This method applies to scenarios where multiple VMS have been created in advance)
Vm cloning

It’s still a hassle to install two more virtual machines from scratch so let’s go with cloning

Open a newly cloned VM and change the host name
Changing an IP Address
Restart the Network service
Repeat steps 1 to 3 for other newly cloned VMS
The authentication of the no-secret login is from the master machine, connected to every other node, to verify that the no-secret is working, while removing the first interrogation step
Suggestion: Reboot the network service on each machine after restarting the network service

The specific operation steps above are not repeated here

Format the NameNode

#Perform operations at master
hdfs namenode -format
Copy the code

If you are successful, you are ready to start your Hadoop cluster!

This section describes starting and closing scripts

1. Start script --start-dfsSh: script used to start the HDFS cluster --start-yarnSh: used to start the YARN daemon process --start-allSh: used to start the HDFS and YARN2. Close the script --stop-dfsSh: script used to shut down the HDFS clusterstop-yarnSh: used to stop the yarn daemon process --stop-allSh: used to shut down HDFS and YARN3Single daemon script -- Hadoop-daemonsSh: script used to start or stop a certain HDFS daemon process -- Hadoop-daemonSh: script used to start or stop an HDFS daemon process. Reg: hadoop-daemon.sh [start|stop] [namenode|datanode|secondarynamenode]
-- yarn-daemonsSh: script used to start or stop a daemon process of the HDFS -- yarn-daemonSh: script used to start or stop an HDFS daemon process reg: yarn-daemon.sh [start|stop] [resourcemanager|nodemanager]
Copy the code

Finally, each host performs the JPS process check operation. If the started process is based on our process layout, then congratulations on your successful Hadoop cluster setup!

Of course, if some processes do not start successfully, we can modify the corresponding process configuration file

Welcome to exchange and study

Personal blog

CSDN home page

The most complete! Hadoop Installation tutorial for babysitters

Linux VM installation

Preparations before Hadoop installation

Hadoop single-node installation and configuration

Hadoop pseudo-distributed installation and configuration

Pseudo distributed mode is introduced

Configuration file

Format the NameNode

Start the cluster.

WebUI_50070

Fully distributed cluster installation and configuration

Vm Description

Daemon layout

Hadoop configuration file configuration

Format the NameNode

This section describes starting and closing scripts

Welcome to exchange and study

Related Posts

Fruit recognition based on MATLAB GUI morphology fruit recognition

Understanding AI training set, verification set, test set (attached: segmentation method + cross verification)

week7