Premise: White Bear has three laptops and has all of them installed as Centos7.6. Here is the detailed installation process of the Hadoop cluster.
Install the ifconfig service
- All three machines need to execute the following command
yum install -y net-tools.x86_64
Change the IP addresses of the three servers to static IP addresses
-
Change the configuration file to add the following
-
MyNode01
# Laptop brush Linux, /etc/sysconfig/network-scripts/ifcfg-ens33 # BOOTPROTO="static" # ONBOOT = yes # BROADCAST=192.168.2.255 IPADDR=192.168.2.114 NETMASK=255.255.255.0 GATEWAY=192.168.2.1 Add DNS DNS1=192.168.2.1Copy the code
-
MyNode02
# Laptop brush Linux, /etc/sysconfig/network-scripts/ifcfg-ens33 # BOOTPROTO="static" # ONBOOT = yes # BROADCAST=192.168.2.255 IPADDR=192.168.2.115 NETMASK=255.255.255.0 GATEWAY=192.168.2.1 Add DNS DNS1=192.168.2.1Copy the code
-
MyNode03
# Laptop brush Linux, /etc/sysconfig/network-scripts/ifcfg-ens33 # BOOTPROTO="static" # ONBOOT = yes # BROADCAST=192.168.2.255 IPADDR=192.168.2.116 NETMASK=255.255.255.0 GATEWAY=192.168.2.1 Add DNS DNS1=192.168.2.1Copy the code
-
Restarting the Network Service
service network restart
-
The IP addresses of the three servers are:
- 192.168.2.114
- 192.168.2.115
- 192.168.2.116
Close the firewall
- All three machines need to be executed
systemctl disable firewalld.service
Close Selinux
- All three machines need to be executed
vim /etc/selinux/config
SELINUX=disabled
5. Change the host name
- 192.168.2.114
vim /etc/hostname
MyNode01
- 192.168.2.115
vim /etc/hostname
MyNode02
- 192.168.2.116
vim /etc/hostname
MyNode03
6. Add the mapping between host names and IP addresses
- All three machines need to be executed
vim /etc/hosts
192.168.2.114 MyNode01 192.168.2.115 MyNode02 192.168.2.116 MyNode03 Copy the code
7. Add common users
- All three machines need to be executed
- Password 123456 for both root and regular users (test only)
- Add the common user iceBear
useradd icebear
passwd 123456
- For common users, add sudo permission at the end of the file
visudo
miller ALL=(ALL) ALL
8. Set up secret free login
- It is best to set this parameter for both root and icebear users
- All three machines need to be executed
Su icebear go back to the root directory CD Ssh-keygen -t rsa # copy the public key to authorized_keys on MyNode01 CD. SSH ssh-copy-id MyNode01Copy the code
- MyNode01 executed on machine
- Copy the authorized_keys file to another machine
cd /home/icebear/.ssh scp authorized_keys MyNode02:$PWD scp authorized_keys MyNode03:$PWD Copy the code
- Copy the authorized_keys file to another machine
- restart
- All three machines need to be executed
reboot
- Secret free login test
- Connect all three machines separately, because the first connection needs to be confirmed
su MyNode01 ssh MyNode02 ssh MyNode03 Copy the code
- Connect all three machines separately, because the first connection needs to be confirmed
9. Create an installation directory
- Execute on all three machines (root user)
# mkdir -p /home/bgd/soft # mkdir -p /home/bgd/install # /home/bgdCopy the code
Install the JDK
- MyNode01 on machine (IceBear user)
- Upload the JDK package to the soft folder on the MyNode01 machine
- Unpack the
CD /home/bg/soft # Decompress the package to the installation directory tar -zxf jdK-8u181-linux-x64.tar. gz -c /home/bg/install /Copy the code
- Modifying environment Variables
Sudo vim /etc/profile Configuring JDK export environment variable JAVA_HOME = / home/BGD/install/jdk1.8.0 _141 export PATH = : $JAVA_HOME/bin: $PATHCopy the code
- Enable environment variables
source /etc/profile
- Viewing the Java Version
java -version
Install and configure Hadoop
- MyNode01 on machine (IceBear user)
- Upload the Hadoop installation package and decompress it
Tar -xzvf hadoop-2.6.0-cdh5.14.2.tar.gz -c /home/bg/install
- Configure hadoop environment variables
Sudo vim JAVA_HOME/etc/profile # add the following content = / home/BGD/install/jdk1.8.0 _141 HADOOP_HOME = / home/BGD/install/hadoop - server - cdh5.14.2 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export JAVA_HOME export HADOOP_HOME export PATHCopy the code
- Enable environment variables
source /etc/profile
- View the Hadoop version information
hadoop version
- Configure hadoop – env. Sh
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/hadoop - env. Sh
Export JAVA_HOME = / home/BGD/install/jdk1.8.0 _141
- Configure the core – site. The XML
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/core - site. XML
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://MyNode01:8020</value> </property> <property> < name > hadoop. TMP. Dir < / name > < value > / home/BGD/install/hadoop - server - cdh5.14.2 hadoopDatas/tempDatas < value > / < / property > <! -- Buffer size, Size </name> <value>4096</value> </property> <property> <name>fs.trash.interval</name> <value>10080</value> <description> Number of minutes after the checkpoint was deleted. If it is zero, the trash can function will be disabled. This option can be configured on both the server and client. If the dumpster is disabled on the server side, check the client configuration. If the bin is enabled on the server side, the values configured on the server are used and the client configuration values are ignored. </description> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>0</value> <description> Number of minutes between garbage checkpoints. It should be less than or equal to fs.trash.interval. If zero, set the value to the value of fs.trash.interval. Each time the check pointer runs, it creates a new checkpoint from the current one and removes checkpoints created earlier than fs.trash.interval. </description> </property> </configuration>Copy the code
- Configuration HDFS – site. XML
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/HDFS - site. XML
<configuration> <! -- NameNode specifies the path where the metadata information is stored. In actual operation, the mount directory of the disk is determined first. -- Cluster dynamic offline <property> <name>dfs.hosts</name> < value > / home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/accept_host < value > / < / property > < property > < name > DFS. Hosts. Exclude < / name > < value > / home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/deny_host < value > / < / property > --> <property> <name>dfs.namenode.secondary.http-address</name> <value>MyNode02:50090</value> </property> <property> <name>dfs.namenode.http-address</name> <value>MyNode01:50070</value> </property> <property> <name>dfs.namenode.name.dir</name> < value > file:///home/bgd/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas < value > / < / property > <! -- Define the node location of dataNode data storage. In actual work, generally determine the disk mount directory first, and then use multiple directories. <property> <name>dfs.datanode.data.dir</name> < value > file:///home/bgd/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas < value > / < / property > < property > <name>dfs.namenode.edits.dir</name> < value > file:///home/bgd/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits < value > / < / property > < property > <name>dfs.namenode.checkpoint.dir</name> < value > file:///home/bgd/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name < value > / < / property > < property > <name>dfs.namenode.checkpoint.edits.dir</name> < value > file:///home/bgd/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits < value > / < / property > < property > <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> </configuration>Copy the code
- Configuration mapred – site. XML
CD/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop
cp mapred-site.xml.template mapred-site.xml
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/mapred - site. XML
<! Mapreduce --> <configuration> <property> <name> mapReduce.framework. name</name> <value> YARN </value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>MyNode01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>MyNode01:19888</value> </property> </configuration>Copy the code
- Configuration of yarn – site. XML
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/yarn - site. XML
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>MyNode01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://MyNode01:19888/jobhistory/logs</value> </property> <! --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>2592000</value><! --30 day--> </property> <! -- Time keeps user logs for a few seconds. Aggregation applies only if the log is disabled - > < property > < name > yarn. The nodemanager. Log. Retain - seconds < / name > < value > 604800 < / value > <! --7 day--> </property> <! <property> <name>yarn.nodemanager.log-aggregation.com press-type </name> <value>gz</value> </property> <! -- nodeManager Local file storage directory --> <property> <name>yarn.nodemanager.local-dirs</name> < value > / home/BGD/install/hadoop - server - cdh5.14.2 hadoopDatas/yarn/local < value > / < / property > <! -- Maximum number of completed tasks saved by resourceManager --> <property> <name>yarn.resourcemanager.max-completed-applications</name> <value>1000</value> </property> </configuration>Copy the code
- Edit the slaves
Sudo vim/home/BGD/install/hadoop - server - cdh5.14.2 / etc/hadoop/slaves
MyNode01 MyNode02 MyNode03 Copy the code
- Create a directory for storing files
The mkdir -p/home/BGD/install/hadoop - server - cdh5.14.2 hadoopDatas/tempDatas mkdir -p / home/BGD/install/hadoop - server - cdh5.14.2 hadoopDatas/namenodeDatas mkdir -p / home/BGD/install/hadoop - server - cdh5.14.2 hadoopDatas/datanodeDatas mkdir -p / home/BGD/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/edits the mkdir -p / home/BGD/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/SNN/name mkdir -p / home/BGD/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/SNN/editsCopy the code
- Upload the Hadoop installation package and decompress it
Copy the configured Hadoop to another machine
- MyNode01 machine Execution (IceBear user)
- Delete the doc user document
CD/home/BGD/install/hadoop - server - cdh5.14.2 / share /
rm -rf doc/
- Copy Hadoop to another machine
cd /home/bgd/
Sudo SCP -r hadoop-2.6.0-cdh5.14.2 MyNode02:$PWD
Sudo SCP -r hadoop-2.6.0-cdh5.14.2 MyNode03:$PWD
- Copy global variables to other machines
sudo scp /etc/profile MyNode02:/etc/
sudo scp /etc/profile MyNode03:/etc/
- Delete the doc user document
- Execute on all three machines (IceBear user)
- Enable environment variables
source /etc/profile
- Viewing the Hadoop Version
hadoop version
- Enable environment variables
Add directory permissions for common users
- All three machines execute
chown -R icebear:icebear /home/bgd
chmod -R 755 /home/bgd
Format Hadoop
- MyNode01 on machine (IceBear user)
su icebear
hdfs namenode -format
Start/Stop the Hadoop cluster
- MyNode01 on machine (IceBear user)
- Start the
start-all.sh
- stop
stop-all.sh
- Start the
16. View the Web interface
- Configure the hosts file on the user host (using the Mac OS as an example).
vim /etc/hosts
192.168.2.114 MyNode01 192.168.2.115 MyNode02 192.168.2.116 MyNode03Copy the code
- Type in the browser
- http://mynode01:50070/dfshealth.html#tab-overview
- If the hosts file is not configured on the user host
- http://192.168.2.114:50070/dfshealth.html#tab-overview