@TOC
preface
Connect a big data server environment preparation.
1. Prepare the environment before installing the big data cluster
- Disable the firewall on three VMS
Three machines execute the following command (root)
systemctl stop firewalld
systemctl disable firewalld
Copy the code
- Three machines shut down Selinux
All three machines execute the following command to shut down Selinux
vi /etc/sysconfig/selinux
Copy the code
SELINUX=disabled
Copy the code
- Three machines changed host names
All three machines execute the following command to change the host name
vi /etc/hostname
Copy the code
The first machine changes the content
node01
Copy the code
The second machine changes the content
node02
Copy the code
The third machine changes the content
node03
Copy the code
- Mapping host names to IP addresses for three machines
Run the following command on the three machines to change the mapping between host names and IP addresses
vi /etc/hosts
Copy the code
192.168.52.100 node01
192.168.52.110 node02
192.168.52.120 node03
Copy the code
- The three machines synchronize their clocks
If the VM is connected to the Internet to synchronize time, ensure that nTPDate is installed on the three VMS that are connected to the Internet
yum -y install ntpdate
Copy the code
Ali Cloud clock synchronization server
ntpdate ntp4.aliyun.com
Copy the code
Three machines have timed tasks
crontab -e
Copy the code
Add the following
* /1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;
Copy the code
- Add normal users for three machines
Add common user Hadoop to the three Linux servers and grant sudo permission for the installation of all big data software in the future, and set the password of common user ==123456==
useradd hadoop
passwd hadoop
Copy the code
The password of a common user is 123456
Three machines add sudo permissions for ordinary users
visudo
Copy the code
hadoop ALL=(ALL) ALL
Copy the code
- Three sets define a unified catalog
Define the directories for storing the compressed software package and the decompressed installation directory on three Linux servers. Run the following command on the three servers to create two folders: one for storing the compressed software package and the other for storing the decompressed software package
mkdir -p /kkb/soft # Directory for storing compressed software packages
mkdir -p /kkb/install Software decompression after the storage directory
chown -R hadoop:hadoop /kkb # change folder permissions to hadoop user
Copy the code
After the Hadoop user was created, all three machines were operated by hadoop user, and there was no need to use root user to operate
The three machines were switched to hadoop users using the su hadoop command
su hadoop
Copy the code
- Hadoop users on three machines can log in without password
Step 1: Run the following commands on the three machines under hadoop user to generate public and private keys
ssh-keygen -t rsa
Copy the code
After executing the above command, press Enter three times to generate
Step 2: Run commands to copy the public key to node01 server on the three machines under hadoop user
ssh-copy-id node01
Copy the code
Step 3: Node01 The server copies the public key to Node02 and Node03
Node01 Run the following command as user Hadoop to copy authorized_keys to node02 and node03 servers
cd /home/hadoop/.ssh/
scp authorized_keys node02:$PWD
scp authorized_keys node03:$PWD
Copy the code
Step 4: Verify; Whether from any node can log in to other nodes without the secret key; For example, no-encryption login on Node01 node02
ssh node02
Copy the code
- Three machines shut down and restarted
Run the following command as user Hadoop to shut down and restart the three machines
sudo reboot -h now
Copy the code
- JDK installed on three machines
-
Use hadoop user to reconnect the three machines, and then use Hadoop user to install JDK software
-
Download the JDK8 package, upload the package to/KKB /soft on the first server, decompress the package, configure environment variables, and install the package on all three servers
cd /kkb/soft/
tar -xzvf jdk-8u141-linux-x64.tar.gz -C /kkb/install/
sudo vim /etc/profile
Copy the code
Add the following configuration to configure the JDK environment variables
exportJAVA_HOME = / KKB/install/jdk1.8.0 _141export PATH=$PATH:$JAVA_HOME/bin
Copy the code
Let the changes take effect immediately
source /etc/profile
Copy the code
Suggestion: After the three machines are ready, make a snapshot for easy recovery after errors
Now that the three machines are fully connected and the JDK has been installed, you can start installing the Hadoop and ZooKeeper clusters.
3. Install the Hadoop cluster
Here is another example of Hadoop installation packages that need to be compiled by themselves:
Because of CDH all installation package version gives the corresponding software versions, is normally does not need to compile, but because of CDH hadoop installation package is not given to provide an interface with C program to access, so we in the use of local repository (local libraries can be used in the making of compression, and support the C program, etc.) when it comes to a problem.
Note: You are not advised to compile the Hadoop source code by yourself. You can directly install and deploy the cluster using the provided hadoop package Hadoop-2.6.0-cDH5.14.2_after_compile.tar. gz. Or find your own resources to download.
Compilation is not a skill, mainly with good or bad network.
Link: pan.baidu.com/s/1V_Y6uoNX… Password: 3 t7m
2. Install the Hadoop cluster
Plan the installation environment service deployment
Server IP | 192.168.52.100 | 192.168.52.110 | 192.168.52.120 |
---|---|---|---|
HDFS | NameNode | ||
HDFS | SecondaryNameNode | ||
HDFS | DataNode | DataNode | DataNode |
YARN | ResourceManager | ||
YARN | NodeManager | NodeManager | NodeManager |
History log server | JobHistoryServer |
Step 1: Upload and decompress the package
- Upload our recompiled Hadoop package supporting Snappy compression to the first server and decompress it; The first machine executes the following command
cd/ KKB /soft/ tar -xzvf Hadoop-2.6.0-cDH5.14.2_after_compile.tar. gz -c/KKB /install/Copy the code
Step 2: Check out the compression methods and local libraries supported by Hadoop
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 bin/hadoop checknativeCopy the code
If openSSL is false, install OpenSSL online on all VMS. Run the following command to install openSSL online on VMS
sudo yum -y install openssl-devel
Copy the code
Step 3: Modify the configuration file
Modify the hadoop – env. Sh
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim hadoop - env. ShCopy the code
Add the following
exportJAVA_HOME = / KKB/install/jdk1.8.0 _141Copy the code
Modify the core – site. XML
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim core - site. XMLCopy the code
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/tempDatas</value>
</property>
<! -- Buffer size, dynamically adjusted according to server performance
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<! -- Enable the dustbin mechanism of HDFS, and the deleted data can be recycled from the dustbin in minutes -->
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
Copy the code
Modify the HDFS – site. XML
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim HDFS - site. XMLCopy the code
<configuration>
<! -- NameNode specifies the path where the metadata information is stored. In actual operation, the mount directory of the disk is determined first.
<! Dynamic top and bottom line - cluster < property > < name > DFS. Hosts < / name > < value > / KKB/install/hadoop - server - cdh5.14.2 / etc/hadoop/accept_host value > < / </property> <property> <name>dfs.hosts.exclude</name> < value > / KKB/install/hadoop - server - cdh5.14.2 / etc/hadoop/deny_host < value > / < / property > -- >
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node01:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value>
</property>
<! -- Define the node location of dataNode data storage. In actual work, generally determine the mount directory of disks first, and then use multiple directories to divide. -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
Copy the code
Modify the mapred – site. XML
The first machine executes the following command
cd/ KKB /install/hadoop-2.6.0-cdh5.14.2/etc/hadoop mv mapred-site.xml. Template mapred-site. XML vim mapred-site.xmlCopy the code
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node01:19888</value>
</property>
</configuration>
Copy the code
Modify the yarn – site. XML
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim yarn - site. XMLCopy the code
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Copy the code
Modify slaves file
The first machine executes the following command
cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim slavesCopy the code
Replace the original content with
node01
node02
node03
Copy the code
Step 4: Create a directory for storing files
The first machine executes the following command
Node01 Create the following directories on the machine
The mkdir -p/KKB/install/hadoop - server - cdh5.14.2 hadoopDatas/tempDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/namenodeDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/datanodeDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/edits the mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/SNN/name mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/SNN/editsCopy the code
Step 5: Install package distribution SCP
SCP secure copy (SCP) Secure copy
You can use SCP to copy files or folders between different servers
Use the syntax
scp -r sourceFile username@host:destpath
Copy the code
Node01 Run the following command to copy the file
cd/ KKB /install/ SCP -r hadoop-2.6.0-cdh5.14.2/ node02:$PWDSCP -r hadoop - server - cdh5.14.2 / node03:$PWD
Copy the code
Step 6: Configure the Hadoop environment variables
Hadoop environment variables need to be configured on all three machines
Three machines execute the following command
sudo vim /etc/profile
Copy the code
exportHADOOP_HOME = / KKB/install/hadoop - server - cdh5.14.2export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code
The configuration takes effect
source /etc/profile
Copy the code
Step 7: Start the cluster
1. Format the cluster
To start the Hadoop cluster, you need to start HDFS and YARN clusters.
Note: When starting HDFS for the first time, you must format it. This is essentially a bit of cleanup and preparation, since HDFS does not physically exist at this point. Formatting is only required for the first startup and is never required again
Node01 Run the command once
hdfs namenode -format
Copy the code
The following figure highlights that the formatting is successful.You can start the cluster in either of the following ways: 1. ② Start each process one by one
2. Start the script with one click
If etc/ Hadoop/Slaves and SSH secret free login are configured, the program scripts can be used to start the related processes of all hadoop clusters and execute them on the machines set by the master node.
Start the cluster
Run the following command on node01
The first machine executes the following command
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Copy the code
Stopping a cluster:
stop-dfs.sh
stop-yarn.sh
Copy the code
- Start a single process one by one
Run the following command on the primary node to start HDFS NameNode: hadoop-daemon.sh start NameNode Run the following command on each secondary node to start HDFS DataNode: Sh start datanode Run the following command on the active node to start YARN ResourceManager: Yarn-daemon. sh start resourcemanager Run the following command on each secondary node to start yarn NodeManager: yarn-daemon.sh start nodeManager The preceding script is located$HADOOP_HOMEIn the /sbin/ directory. To stop a role on a node, change start to stop.Copy the code
Step 8: View the startup page in your browser
HDFS cluster access address
http://192.168.52.100:50070/
Yarn Cluster access address
http://192.168.52.100:8088
Jobhistory Address:
http://192.168.52.100:19888
We can also use JPS to view the process name on each machine. To make it easier for us to view the process in the future, we can view the process on all machines in one click through the script
1. View the process script on all machines
Create file xcall in the /home/hadoop/bin directory on the node01 server
[hadoop@node01 bin]$ cd ~/bin/
[hadoop@node01 bin]$ vim xcall
Copy the code
Add the following
#! /bin/bash
params=$@
for (( i=1 ; i <= 3 ; i = $i+ 1));do
echo ============= node0$i $params =============
ssh node0$i "source /etc/profile;$params"
done
Copy the code
Then one click to view the process and distribute the script
chmod 777 /home/hadoop/bin/xcall
cd /home/hadoop/bin
scp -r xcall node02:$PWD
scp -r xcall node03:$PWD
Copy the code
View the Hadoop process on each node
xcall jps
Copy the code
Warning: If you want to shut down the computer, clear must follow the following order, otherwise the cluster may have problems ==
-
Shutting down the Hadoop Cluster
-
Stop a VM.
-
Shut down the computer
conclusion
This is the complete hadoop cluster deployment method, welcome to follow my official account.