@TOC


preface

Connect a big data server environment preparation.


1. Prepare the environment before installing the big data cluster

  1. Disable the firewall on three VMS

Three machines execute the following command (root)

systemctl stop firewalld
systemctl disable firewalld
Copy the code
  1. Three machines shut down Selinux

All three machines execute the following command to shut down Selinux

vi /etc/sysconfig/selinux
Copy the code
SELINUX=disabled
Copy the code
  1. Three machines changed host names

All three machines execute the following command to change the host name

vi /etc/hostname
Copy the code

The first machine changes the content

node01
Copy the code

The second machine changes the content

node02
Copy the code

The third machine changes the content

node03
Copy the code
  1. Mapping host names to IP addresses for three machines

Run the following command on the three machines to change the mapping between host names and IP addresses

vi /etc/hosts
Copy the code
192.168.52.100 node01
192.168.52.110 node02
192.168.52.120 node03
Copy the code
  1. The three machines synchronize their clocks

If the VM is connected to the Internet to synchronize time, ensure that nTPDate is installed on the three VMS that are connected to the Internet

yum -y install ntpdate
Copy the code

Ali Cloud clock synchronization server

ntpdate ntp4.aliyun.com
Copy the code

Three machines have timed tasks

crontab -e
Copy the code

Add the following

* /1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;
Copy the code
  1. Add normal users for three machines

Add common user Hadoop to the three Linux servers and grant sudo permission for the installation of all big data software in the future, and set the password of common user ==123456==

useradd hadoop
passwd hadoop
Copy the code

The password of a common user is 123456

Three machines add sudo permissions for ordinary users

visudo
Copy the code
hadoop ALL=(ALL)    ALL
Copy the code
  1. Three sets define a unified catalog

Define the directories for storing the compressed software package and the decompressed installation directory on three Linux servers. Run the following command on the three servers to create two folders: one for storing the compressed software package and the other for storing the decompressed software package

mkdir -p /kkb/soft   # Directory for storing compressed software packages
mkdir -p /kkb/install Software decompression after the storage directory
chown -R hadoop:hadoop /kkb  # change folder permissions to hadoop user
Copy the code

After the Hadoop user was created, all three machines were operated by hadoop user, and there was no need to use root user to operate

The three machines were switched to hadoop users using the su hadoop command

su hadoop
Copy the code
  1. Hadoop users on three machines can log in without password

Step 1: Run the following commands on the three machines under hadoop user to generate public and private keys

ssh-keygen -t rsa
Copy the code

After executing the above command, press Enter three times to generate

Step 2: Run commands to copy the public key to node01 server on the three machines under hadoop user

ssh-copy-id node01
Copy the code

Step 3: Node01 The server copies the public key to Node02 and Node03

Node01 Run the following command as user Hadoop to copy authorized_keys to node02 and node03 servers

cd /home/hadoop/.ssh/
scp authorized_keys node02:$PWD
scp authorized_keys node03:$PWD
Copy the code

Step 4: Verify; Whether from any node can log in to other nodes without the secret key; For example, no-encryption login on Node01 node02

ssh node02
Copy the code
  1. Three machines shut down and restarted

Run the following command as user Hadoop to shut down and restart the three machines

sudo reboot -h now
Copy the code
  1. JDK installed on three machines
  • Use hadoop user to reconnect the three machines, and then use Hadoop user to install JDK software

  • Download the JDK8 package, upload the package to/KKB /soft on the first server, decompress the package, configure environment variables, and install the package on all three servers

cd /kkb/soft/
tar -xzvf jdk-8u141-linux-x64.tar.gz -C /kkb/install/
sudo vim /etc/profile
Copy the code
Add the following configuration to configure the JDK environment variables
exportJAVA_HOME = / KKB/install/jdk1.8.0 _141export PATH=$PATH:$JAVA_HOME/bin
Copy the code

Let the changes take effect immediately

source /etc/profile
Copy the code

Suggestion: After the three machines are ready, make a snapshot for easy recovery after errors

Now that the three machines are fully connected and the JDK has been installed, you can start installing the Hadoop and ZooKeeper clusters.

3. Install the Hadoop cluster

Here is another example of Hadoop installation packages that need to be compiled by themselves:

Because of CDH all installation package version gives the corresponding software versions, is normally does not need to compile, but because of CDH hadoop installation package is not given to provide an interface with C program to access, so we in the use of local repository (local libraries can be used in the making of compression, and support the C program, etc.) when it comes to a problem.

Note: You are not advised to compile the Hadoop source code by yourself. You can directly install and deploy the cluster using the provided hadoop package Hadoop-2.6.0-cDH5.14.2_after_compile.tar. gz. Or find your own resources to download.

Compilation is not a skill, mainly with good or bad network.

Link: pan.baidu.com/s/1V_Y6uoNX… Password: 3 t7m

2. Install the Hadoop cluster

Plan the installation environment service deployment

Server IP 192.168.52.100 192.168.52.110 192.168.52.120
HDFS NameNode
HDFS SecondaryNameNode
HDFS DataNode DataNode DataNode
YARN ResourceManager
YARN NodeManager NodeManager NodeManager
History log server JobHistoryServer
Step 1: Upload and decompress the package
  • Upload our recompiled Hadoop package supporting Snappy compression to the first server and decompress it; The first machine executes the following command
cd/ KKB /soft/ tar -xzvf Hadoop-2.6.0-cDH5.14.2_after_compile.tar. gz -c/KKB /install/Copy the code
Step 2: Check out the compression methods and local libraries supported by Hadoop

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 bin/hadoop checknativeCopy the code

If openSSL is false, install OpenSSL online on all VMS. Run the following command to install openSSL online on VMS

sudo yum -y install openssl-devel
Copy the code
Step 3: Modify the configuration file
Modify the hadoop – env. Sh

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim hadoop - env. ShCopy the code

Add the following

exportJAVA_HOME = / KKB/install/jdk1.8.0 _141Copy the code
Modify the core – site. XML

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim core - site. XMLCopy the code
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/tempDatas</value>
    </property>
    <! -- Buffer size, dynamically adjusted according to server performance
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
    <! -- Enable the dustbin mechanism of HDFS, and the deleted data can be recycled from the dustbin in minutes -->
    <property>
        <name>fs.trash.interval</name>
        <value>10080</value>
    </property>
</configuration>
Copy the code
Modify the HDFS – site. XML

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim HDFS - site. XMLCopy the code
<configuration>
	<! -- NameNode specifies the path where the metadata information is stored. In actual operation, the mount directory of the disk is determined first. 
	<! Dynamic top and bottom line - cluster < property > < name > DFS. Hosts < / name > < value > / KKB/install/hadoop - server - cdh5.14.2 / etc/hadoop/accept_host value > < / </property> <property> <name>dfs.hosts.exclude</name> < value > / KKB/install/hadoop - server - cdh5.14.2 / etc/hadoop/deny_host < value > / < / property > -- >
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>node01:50090</value>
	</property>
	<property>
		<name>dfs.namenode.http-address</name>
		<value>node01:50070</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value>
	</property>
	<! -- Define the node location of dataNode data storage. In actual work, generally determine the mount directory of disks first, and then use multiple directories to divide. -->
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value>
	</property>
	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value>
	</property>
	<property>
		<name>dfs.namenode.checkpoint.dir</name>
		<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value>
	</property>
	<property>
		<name>dfs.namenode.checkpoint.edits.dir</name>
		<value>file:///kkb/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<property>
		<name>dfs.blocksize</name>
		<value>134217728</value>
	</property>
</configuration>
Copy the code
Modify the mapred – site. XML

The first machine executes the following command

cd/ KKB /install/hadoop-2.6.0-cdh5.14.2/etc/hadoop mv mapred-site.xml. Template mapred-site. XML vim mapred-site.xmlCopy the code
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>
</configuration>
Copy the code
Modify the yarn – site. XML

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim yarn - site. XMLCopy the code
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
Copy the code
Modify slaves file

The first machine executes the following command

cdKKB/install/hadoop - server - cdh5.14.2 / etc/hadoop vim slavesCopy the code

Replace the original content with

node01
node02
node03
Copy the code
Step 4: Create a directory for storing files

The first machine executes the following command

Node01 Create the following directories on the machine

The mkdir -p/KKB/install/hadoop - server - cdh5.14.2 hadoopDatas/tempDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/namenodeDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/datanodeDatas mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/edits the mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/SNN/name mkdir -p KKB/install/hadoop - server - cdh5.14.2 / hadoopDatas/DFS/nn/SNN/editsCopy the code
Step 5: Install package distribution SCP

SCP secure copy (SCP) Secure copy

You can use SCP to copy files or folders between different servers

Use the syntax

scp -r sourceFile  username@host:destpath
Copy the code

Node01 Run the following command to copy the file

cd/ KKB /install/ SCP -r hadoop-2.6.0-cdh5.14.2/ node02:$PWDSCP -r hadoop - server - cdh5.14.2 / node03:$PWD
Copy the code
Step 6: Configure the Hadoop environment variables

Hadoop environment variables need to be configured on all three machines

Three machines execute the following command

sudo vim /etc/profile
Copy the code
exportHADOOP_HOME = / KKB/install/hadoop - server - cdh5.14.2export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code

The configuration takes effect

source /etc/profile
Copy the code
Step 7: Start the cluster
1. Format the cluster

To start the Hadoop cluster, you need to start HDFS and YARN clusters.

Note: When starting HDFS for the first time, you must format it. This is essentially a bit of cleanup and preparation, since HDFS does not physically exist at this point. Formatting is only required for the first startup and is never required again

Node01 Run the command once

hdfs namenode -format
Copy the code

The following figure highlights that the formatting is successful.You can start the cluster in either of the following ways: 1. ② Start each process one by one

2. Start the script with one click

If etc/ Hadoop/Slaves and SSH secret free login are configured, the program scripts can be used to start the related processes of all hadoop clusters and execute them on the machines set by the master node.

Start the cluster

Run the following command on node01

The first machine executes the following command

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Copy the code

Stopping a cluster:

stop-dfs.sh
stop-yarn.sh 
Copy the code
  1. Start a single process one by one
Run the following command on the primary node to start HDFS NameNode: hadoop-daemon.sh start NameNode Run the following command on each secondary node to start HDFS DataNode: Sh start datanode Run the following command on the active node to start YARN ResourceManager: Yarn-daemon. sh start resourcemanager Run the following command on each secondary node to start yarn NodeManager: yarn-daemon.sh start nodeManager The preceding script is located$HADOOP_HOMEIn the /sbin/ directory. To stop a role on a node, change start to stop.Copy the code
Step 8: View the startup page in your browser

HDFS cluster access address

http://192.168.52.100:50070/

Yarn Cluster access address

http://192.168.52.100:8088

Jobhistory Address:

http://192.168.52.100:19888

We can also use JPS to view the process name on each machine. To make it easier for us to view the process in the future, we can view the process on all machines in one click through the script

1. View the process script on all machines

Create file xcall in the /home/hadoop/bin directory on the node01 server

[hadoop@node01 bin]$ cd ~/bin/
[hadoop@node01 bin]$ vim xcall
Copy the code

Add the following

#! /bin/bash

params=$@
for (( i=1 ; i <= 3 ; i = $i+ 1));do
    echo ============= node0$i $params =============
    ssh node0$i "source /etc/profile;$params"
done
Copy the code

Then one click to view the process and distribute the script

chmod 777  /home/hadoop/bin/xcall
cd /home/hadoop/bin
scp -r xcall node02:$PWD
scp -r xcall node03:$PWD
Copy the code

View the Hadoop process on each node

xcall jps
Copy the code

Warning: If you want to shut down the computer, clear must follow the following order, otherwise the cluster may have problems ==

  • Shutting down the Hadoop Cluster

  • Stop a VM.

  • Shut down the computer


conclusion

This is the complete hadoop cluster deployment method, welcome to follow my official account.