Note 1 — Server-based Hadoop Construction 2021
Internship project, learn to build hadoop integration environment
[Build hadoop list]
- [hadoop – 3.2.1]
- [the JDK – 1.8.0 comes with]
Configuration Flowchart
Hadoop Setup – Initial preparation
First, provide an environment
10.0.20.181, 10.0.20.182, 10.0.20.183 // User: root, password: hadoop
Two, check the environment
1) Whether the server can be logged in
Three devices work properly. Log in to SSH user@ip and enter passwd
SSH [email protected] passwd Enter: hadoop (verify three devices 181,182,183)
1.2) Whether you have permission to perform write operations
- Have permission
1.3) Whether the hard disk has space for operation
df -lh
3. Establish mappings between hosts and IP addresses
Modify the following: IP address Host name: enter vim /etc/hosts
Cat: see
Attention! After this change, the host name has not changed
vi /etc/sysconfig/network
(About Linux I: Enter save and exit: wq!)
This step is especially important because it may be missing if a bug occurs.
After restart, host name change server name change complete! Do the same for the rest of the server configuration. Change the corresponding name.
After the configuration is complete, ping each other. Ping name -c + times
Install the JDK — Java environment 1.8.0
You are advised to use yum to install the JDK
Yum -y install Java -- 1.8.0 comes with its *
The default installation path is /usr/lib/jvm,
Configure environment variables and modify the vim /etc/profile configuration file
export JAVA_HOME=/usr/lib/jvm/jre-1.8. 0-openjdk-1.8. 0151.-1.b12.el7_4.x86_64
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Copy the code
Attention! Java version name
Use the souce command for immediate effect, source /etc/profile
Verify the Java version
Java installation successful!
Five, no secret login
1, first turn off the firewall and SELINUX for the three servers, copy and paste step by step
View the firewall status service iptables status
Disabling firewall service iptables stop, chkconfig iptables off
After shutting down SELINUX, you need to restart the server
Enter the vim/etc/selinux/config
#SELINUX=enforcing changed to disabled #SELINUXTYPE=targetedCopy the code
2. Password-free login
2.1) Configure the master to log in without a password. Copy and paste step by step
Example Generate the key ssh-keygen -t rsa
Append the public key to the authorized_keys file
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Grant permission chmod 600. SSH /authorized_keys
Verify that SSH Master is accessed without a password
2.2) No secret access between several machines
Log in to Slave1 and copy the master server’s public key ID_rsa. pub to the root directory of the Slave1 server.
scp root@master:/root/.ssh/id_rsa.pub /root/
Append master’s public key (id_rsa.pub) to Slave1’s authorized_keys
Pub >>. SSH /authorized_keys, rm -rf id_rsa.pub Delete the previous file
SSH Slave1 to log in to Slave1 without a password. In the same way, the last three computers can log in to each other without a password.
Note that SCP is a very useful command for remote copy across machines. In addition to the method I wrote, there are direct three machines each other a cross-machine copy can be used, interested in online search, this method used a lot, especially in the case of multiple machines working together.
At this point, our basic preparation work is over, and the next step is to build the Hadoop environment.
Hadoop Setup – Version 3.2.1
1. Decompress the installation package on master and create a base directory
Method 1. Download directly from the website, but if you use VPN in China, and then download from the server, it will be very slow, very slow, I started with the company’s network in 6 hours.
Wget HTTP: / / http://apache.claz.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
Method 2. Use domestic mirror image, no research, a search must be many. Do your own research.
Method 3. I use this method, using Xshell and Xftp, upload the Hadoop installation package downloaded in Win system to the server, very fast, simple and easy to use! This super simple, search for xshell, XFTP use method, IP, and then select the file, OK.
Gz -c /usr/local to the /usr/local directory: tar -xzvf hadoop-3.2.1
Change name (mv original name new name) mv hadoop-3.2.1 Hadoop
2. Configure hadoop environment variables for hadoop-master
2.1) Configure environment variables and modify configuration filesvi /etc/profile
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
#export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/usr/local/hadoop/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Copy the code
Important! Source /etc/profile command that enables hadoop commands to take effect immediately on the current terminal
Check whether Hadoop Version is installed successfully
2.2) Configure environment variables slave1 and slave2 for other hosts
Copy environment variables to Slave1, slave2
SCP -r /etc/profile root@slave1: /etc/profile,
scp -r /etc/profile root@slave2:/etc/
Enable environment variables to take effect in Slave1, slave2 source /etc/profile
Hadoop configuration file
Hadoop-env.sh (configure the home path of our JDK),
Core-site.xml (the core configuration file that basically defines whether our cluster is distributed or run natively),
Hdfs-site.xml (the core configuration of the distributed file system determines where our data is stored, the copy of the data, the block size of the data, etc.),
Mapred-site.xml (which defines some of our parameters for mapReduce running),
Yarn-site.xml (the core configuration file that defines our YARN cluster, the resource management framework),
Good! Start configuration!
Hadoop configuration file –master
We this several files are configured in the/usr/local/hadoop/etc/hadoop directory. All configuration files are for reference only.
CD into the directory/usr/local/hadoop/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jre-1.8. 0-openjdk-1.8. 0151.-1.b12.el7_4.x86_64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
Copy the code
vim core-site.xml
Where fs.defaultFS configuration address is Java code access path, need to configure the code in Java code with IP:9000 cannot use localhost,
<configuration> <! -- specify HDFS access address --><property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property><! -- specify zero hour file address --><property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
Copy the code
vim hdfs-site.xml
DFS. The namenode. Secondary. The HTTP address – the address is the browser to access the address file system
The primary Namenode has an HDFS access address: http://10.0.20.181:50070
<configuration>
<property>
<! Value: 2-->
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<! -- Select namenode storage path -->
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property><! -- Specify a port for web access to HDFS --><property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property><! -- Disable permission check user or user group --><property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
Copy the code
vim mapred-site.xml
Mapred.job. tracker determines how the MapReduce program is executed
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
</configuration>
Copy the code
vim yarn-site.xml
Configure resources. You can configure many resources on YARN.
<configuration>
<property><! -- RM host name -->
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<! -- Comma-separated list of services, where the service name should contain only A-zA-Z0-9_ and cannot start with a number -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Copy the code
2, configure masters slaves file (master unique!)
Into the hadoop directory CD/usr/local/hadoop/etc/hadoop file specifies the namenode node server machine. Delete localhost and add namenode host name master. You are not advised to use IP addresses because IP addresses may change, but host names generally do not.
Vim masters to add
master
Copy the code
Vim slaves to add
slave1
slave2
Copy the code
Vim workers add
master
slave1
slave2
Copy the code
3, Hadoopo configuration — Slaves
Copy Hadoop from Master to Slave1 node
scp -r /usr/local/hadoop slave1:/usr/local/
Log in to slave1 server and delete the content of Slaves
rm -rf /usr/local/hadoop/etc/hadoop/slaves
Start the hadoop
1. Format the HDFS file system hadoop namenode -formate Formats namenode. This operation is performed before starting the service for the first time and does not need to be performed later.
Attention! Use formatting carefully! For the first time, do not use it unless otherwise unexpected!
2. Start hadoop start-all.sh
3. Run the JPS command to check the running status
Check slave1 and Slave2 in operation JPS
Web and process view
Web view: http://master:50070 View datanode
10.0.20.181:8088 Stores namenode nodes
! Recommendation site
Three cloud servers set up a fully distributed Hadoop environment
Blog.csdn.net/weixin_4393…
The hadoop distributed cluster structures, www.ityouknow.com/hadoop/2017…
Problem solved: Formatting or JAVA_HOME is faulty
Solution connection bbs.huaweicloud.com/blogs/24226…
Test it in a month
There are basically no problems with this configuration version. The problem I tested was formatting. Delete TMP file from Hadoop, delete rm -rf folder, create logs mkdir logs, restart start-all.sh problem resolved 50070 and namenode appear.