Vmware builds a Hadoop cluster

Prerequisites: Prepare three VMS. Configure the NAT network. Configure static IP addresses.

1. Disable the firewalls on the three VMS

Three machines execute the following command (root)

systemctl stop firewalld
systemctl disable firewalld
Copy the code

2. Shut down Selinux on three machines

All three machines execute the following command to shut down Selinux

vi /etc/sysconfig/selinux
Copy the code

SELINUX=disabled
Copy the code

All three machines execute the following command to change the host name

vi /etc/hostname
Copy the code

The three machines are changed to

node01
node02
node03
Copy the code

4. Map the host name to the IP address of the three hosts

Run the following command on the three machines to change the mapping between host names and IP addresses

vi /etc/hosts
Copy the code

192.168.51.100 node01 node01
192.168.51.110 node02 node02
192.168.51.120 node03 node03
Copy the code

Note: Configure IP addresses based on site requirements

5. Three machines synchronize their clocks

The first synchronization method is clock synchronization over the network

To synchronize time from the Internet, ensure that VMS are connected to the Internet

Ntpdate is installed on all three machines

yum -y install ntpdate
Copy the code

Ali Cloud clock synchronization server

ntpdate ntp4.aliyun.com
Copy the code

Three machines have timed tasks

crontab -e
Copy the code

Add the following

*/1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;
Copy the code

The second synchronization mode: a machine on the Intranet serves as the clock synchronization server

Run the su root command to switch to user root

Clock synchronization is performed based on the server time 192.168.51.100

Step 1: Three machines to determine whether the NTPD service is installed

Check whether the NTPDate clock synchronization tool is installed on the three machines

rpm -qa | grep ntpdate
Copy the code

If it is not installed, run the following command on all three machines to install it online

yum -y install ntpdate
Copy the code

The following figure shows the installation

Node01 installation (NTP)

yum -y install ntp
Copy the code

Run the following command to set the time zone to Shanghai, China

timedatectl set-timezone Asia/Shanghai
Copy the code

Step 2: Node01 Starts the NTPD service

We need to start the NTPD service on Node01 to provide the time synchronization service

Start the NTPD service

#Start the NTPD service
systemctl start ntpd

#Enable the NTPD service upon startup
systemctl enable ntpd
Copy the code

Step 3: Modify the node01 server configuration

Example Modify the clock synchronization configuration of node01 to allow external services

vim /etc/ntp.conf
Copy the code

Add the following two lines

#All machines that agree to network segment 192.168.51.0 (modified to their own network segment) synchronize time with Node01Restrict 192.168.51.0 Mask 255.255.255.0 nomodify NoTrap Server 127.127.1.0Copy the code

Comment out the following four lines

#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
Copy the code

After the modification is complete, restart the NTPD service on Node01

systemctl restart ntpd
Copy the code

At this point, the NTPD server has been installed and configured. Next, configure the client to synchronize with the server

Step 4: Configure node02 and Node03 to synchronize the time on Node01

Clients Node02 and Node03 Set the time zone to the same as that on Node01 Asia/Shanghai

Modify the configuration file on node02 and node03 to ensure that each time is written to the hardware clock

vim /etc/sysconfig/ntpdate
Copy the code

SYNC_HWCLOCK=yes
Copy the code

Modify scheduled tasks on node02 and node03 to synchronize time with node01

[root@node03 hadoop]# crontab -e
Copy the code

Add the following

*/1 * * * * /usr/sbin/ntpdate node01
Copy the code

6. Add common users for three machines

The common user Hadoop is uniformly added to the three Linux servers and sudo permission is granted for the installation of all big data software in the future

Set the password of a common user to ==123456==

useradd hadoop
passwd hadoop
Copy the code

The password of a common user is 123456

Three machines add sudo permissions for ordinary users

visudo
Copy the code

Add the following

hadoop ALL=(ALL)    ALL
Copy the code

7. The three sets define a unified catalog

Define the directories for storing the compressed software package and the decompressed installation directory on three Linux servers. Run the following command on the three servers to create two folders: one for storing the compressed software package and the other for storing the decompressed software package

Mkdir -p/KKB /soft # Directory for storing the compressed software package mkdir -p/KKB /install # Directory for storing the compressed software package chown -r hadoop:hadoop/KKB # Change the folder permission to the hadoop userCopy the code

After the Hadoop user was created, all three machines were operated by hadoop user, and there was no need to use root user to operate

The three machines were switched to hadoop users using the su hadoop command

su hadoop
Copy the code

8. Password free login for hadoop users on three machines

Restart the next three Linux VMS for the host names to take effect

Step 1: Run the following commands on the three machines under hadoop user to generate public and private keys

ssh-keygen -t rsa
Copy the code

After executing the above command, press Enter three times to generate

Step 2: Run commands to copy the public key to node01 server on the three machines under hadoop user

ssh-copy-id node01
Copy the code

Step 3: Node01 The server copies the public key to Node02 and Node03

Node01 Run the following command as user Hadoop to copy authorized_keys to node02 and node03 servers

cd /home/hadoop/.ssh/
scp authorized_keys node02:$PWD
scp authorized_keys node03:$PWD
Copy the code

Step 4: Verify; Whether from any node can log in to other nodes without the secret key; For example, no-encryption login on Node01 node02

ssh node02
Copy the code

9. Three machines shut down and restarted

Run the following command as user Hadoop to shut down and restart the three machines

sudo reboot -h now
Copy the code

3. Install the Hadoop cluster

Plan the installation environment service deployment

Server IP	node01	node02	node03
HDFS	NameNode
HDFS	SecondaryNameNode
HDFS	DataNode	DataNode	DataNode
YARN	ResourceManager
YARN	NodeManager	NodeManager	NodeManager
History log server	JobHistoryServer

Step 1: Upload and decompress the package

Upload our recompiled Hadoop package supporting Snappy compression to the first server and decompress it; The first machine executes the following command

CD/KKB /soft/ tar -xzvf Hadoop-3.1.4.tar. gz -c/KKB /installCopy the code

Step 2: Check out the compression methods and local libraries supported by Hadoop

The first machine executes the following command

CD/KKB/install/hadoop - 3.1.4 / bin/hadoop checknativeCopy the code

If openSSL is false, install openSSL online on all machines. Run the following command to install openSSL online on all machines

sudo yum -y install openssl-devel
Copy the code

Step 3: Modify the configuration file

Modify the hadoop – env. Sh

The first machine executes the following command

CD/KKB/install/hadoop - 3.1.4 / etc/hadoop/vim hadoop - env. ShCopy the code

Modify the core – site. XML

The first machine executes the following command

vim core-site.xml
Copy the code

Add the following<configuration>
<property>
     <name>fs.defaultFS</name>
     <value>hdfs://node01:8020</value>
 </property>
 <property>
     <name>hadoop.tmp.dir</name>
     <value>KKB/install/hadoop - 3.1.4 / hadoopDatas/tempDatas</value>
 </property>
 <! -- Buffer size, dynamically adjusted according to server performance in actual work; Default value 4096 -->
 <property>
     <name>io.file.buffer.size</name>
     <value>4096</value>
 </property>
 <! - Enable the dustbin mechanism of the HDFS. The deleted data can be reclaimed from the dustbin in minutes. Default value 0 -->
 <property>
     <name>fs.trash.interval</name>
     <value>10080</value>
 </property>

<property>
     <name>hadoop.proxyuser.hadoop.hosts</name>
     <value>*</value>
 </property>
 <property>
     <name>hadoop.proxyuser.hadoop.groups</name>
     <value>*</value>
 </property>
 <property>
     <name>hadoop.http.staticuser.user</name>
     <value>hadoop</value>
 </property>
</configuration>
Copy the code

Modify the HDFS – site. XML

The first machine executes the following command

vim hdfs-site.xml <configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:9868</value>  </property> <property> <name>dfs.namenode.http-address</name> <value>node01:9870</value> </property> <! <property> <name>dfs.namenode.name.dir</name> < value > file:///kkb/install/hadoop-3.1.4/hadoopDatas/namenodeDatas < value > / < / property > <! -- Define the node location of dataNode data storage. In actual work, generally determine the disk mount directory first, and then use multiple directories. <property> <name>dfs.datanode.data.dir</name> < value > file:///kkb/install/hadoop-3.1.4/hadoopDatas/datanodeDatas < value > / < / property > <! <property> <name>dfs.namenode.edits.dir</name> < value > file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/edits < value > / < / property > <! - secondarynamenode save to merge fsimage - > < property > < name > DFS. The namenode. Checkpoint. Dir < / name > < value > file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/snn/name < value > / < / property > <! - secondarynamenode save to merge editslog - > < property > < name > DFS. The namenode. Checkpoint. Edits. Dir < / name > < value > file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/snn/edits < value > / < / property > < property > <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> </configuration>Copy the code

Modify the mapred – site. XML

The first machine executes the following command

vim mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>
        <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>
Copy the code

Modify the yarn – site. XML

The first machine executes the following command

 vim yarn-site.xml

<configuration>
 
<property>
     <name>yarn.resourcemanager.hostname</name>
      <value>node01</value>
  </property>
  <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
  </property>

   <property>
      <name>yarn.nodemanager.env-whitelist</name>
      <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED _HOME</value>
  </property>
  <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>512</value>
  </property>
  <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>4096</value>
  </property>
  <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>4096</value>
  </property>
  <property>
      <name>yarn.nodemanager.pmem-check-enabled</name>
      <value>false</value>
  </property>
  <property>
      <name>yarn.nodemanager.vmem-check-enabled</name>
      <value>false</value>
  </property>
<property>
      <name>yarn.log-aggregation-enable</name>
      <value>true</value>
</property>
  <property>
      	<name>yarn.log.server.url</name>
      	<value>http://node01:19888/jobhistory/logs</value>
  </property>
  <property>
      	<name>yarn.log-aggregation.retain-seconds</name>
      	<value>25920000</value>
  </property>

</configuration>
Copy the code

If an error occurs after yarn-site. XML is started, add classpath to yarn-site. XML and run the following command:

hadoop classpath
Copy the code

Add the result

Modify workers file

The first machine executes the following command

vim workers
Copy the code

Replace the original content with

node01
node02
node03
Copy the code

Step 4: Create a directory for storing files

The first machine executes the following command

Node01 Create the following directories on the machine

The mkdir -p/KKB/install/hadoop - 3.1.4 / hadoopDatas/tempDatas mkdir -p/KKB/install/hadoop - 3.1.4 / hadoopDatas/namenodeDatas Mkdir -p/KKB/install/hadoop - 3.1.4 / hadoopDatas/datanodeDatas mkdir -p/KKB/install/hadoop - 3.1.4 / hadoopDatas/DFS/nn/edits Mkdir -p/KKB/install/hadoop - 3.1.4 / hadoopDatas/DFS/SNN/name mkdir -p KKB/install/hadoop - 3.1.4 / hadoopDatas/DFS/nn/SNN/editsCopy the code

Step 5: Install package distribution SCP and rsync

In Linux, you can use SCP or rsync to copy files or folders to the remote server. Both commands function similarly to copy files or folders to the remote server, but SCP is a full copy and rsync can perform incremental copy. Rsync is more efficient than SCP

1. Copy files using SCP

SCP secure copy (SCP) Secure copy

You can use SCP to copy files or folders between different servers

Use the syntax

scp -r sourceFile  username@host:destpath
Copy the code

Usage examples

SCP -r hadoop - lzo - 0.4.20. Jar hadoop @ node01: / KKB /Copy the code

Node01 Run the following command to copy the file

CD/KKB /install/ SCP -r hadoop-3.1.4/ node02:$PWD SCP -r hadoop-3.1.4/ node03:$PWDCopy the code

2. Use rsync to implement incremental copy

Rsync Remote synchronization tool

Rsync is mainly used for backup and mirroring. It has the advantages of high speed, avoiding copying the same content and supporting symbolic links.

The difference between rsync and SCP: Using rsync to copy files is faster than using SCP. Rsync only updates different files. SCP is to copy all the files.

Run the following command on all three machines to install the rsync tool

sudo yum -y install rsync
Copy the code

(1) Basic grammar

Node01 Run the following command to synchronize the ZK installation package

Rsync - av/KKB/soft/apache - they are - 3.6.2 - bin. Tar. Gz node02: / KKB/soft /Copy the code

Command Option Parameter Path/name of the file to be copied Destination User@host: Destination path/name

Options Parameter Description

options	function
-a	File copy
-v	Show the replication process

(2) Case practice

(3) Synchronize the/KKB /soft directory on node01 to the/KKB/directory under user hadooop on node02

rsync -av /kkb/soft node02:/kkb/soft
Copy the code

3. Use rsync to encapsulate the distribution script

We can use the rsync command tool to implement script distribution, which can be incrementally distributed to all of our other machines

(1) Requirement: copy files to the same directory on all nodes

(2) Demand analysis:

(a) Original copy of rsync:

rsync -av /kkb/soft hadoop@node02:/kkb/soft
Copy the code

(b) Expected script usage:

Xsync Name of the file to be synchronized

(3) Script implementation

(a) Create a bin directory in the /home/hadoop directory and create a file in the bin directory by xsync. The file content is as follows:

[hadoop@node01 ~]$ cd ~
[hadoop@node01 ~]$ mkdir bin
[hadoop@node01 bin]$ cd /home/hadoop/bin
[hadoop@node01 ~]$ touch xsync
[hadoop@node01 ~]$ vim xsync
Copy the code

Write the following code in this file

#! Pcount =$# if ((pcount==0)); then echo no args; exit; Fname = 'basename $p1' echo $fname #3 pdir= 'CD -p $(dirname $p1); User = 'whoami' #5 loop for((host=1; host<4; host++)); do echo ------------------- node0$host -------------- rsync -av $pdir/$fname $user@node0$host:$pdir doneCopy the code

(b) Modify script xsync with execution permission

[hadoop@node01 bin]$ cd ~/bin/
[hadoop@node01 bin]$ chmod 777 xsync
Copy the code

[hadoop@node01 bin]$ xsync /home/hadoop/bin/
Copy the code

Note: If placing xsync in the /home/hadoop/bin directory still does not enable global use, you can move xsync to the /usr/local/bin directory

Step 6: Configure the Hadoop environment variables

Hadoop environment variables need to be configured on all three machines

Three machines execute the following command

sudo vim /etc/profile
Copy the code

Export HADOOP_HOME = / KKB/install/hadoop - 3.1.4 export PATH = $PATH: $HADOOP_HOME/bin: $HADOOP_HOME/sbinCopy the code

The configuration takes effect

source /etc/profile
Copy the code

Step 7: Format the cluster

To start the Hadoop cluster, you need to start HDFS and YARN clusters.
Note: When starting HDFS for the first time, you must format it. This is essentially a bit of cleanup and preparation, since HDFS does not physically exist at this point. Formatting is only required for the first startup and is never required again
Node01 Run the command once

HDFS namenode-format or Hadoop namenode-format

Step 8: Cluster startup

You can start a cluster in either of the following ways:
- (1) Start the script with one key;
- ② Start each process one by one

1. Start HDFS, YARN, and Historyserver

If /etc/hadoop/workers and SSH password-free login are configured, you can use program scripts to start the processes of all hadoop clusters and run them on the machines configured on the primary node.
Start the cluster
Run the following command on active node node01

Sh # obsolete mr-jobHistory -daemon.sh start historyServer mapred --daemon start historyServerCopy the code

Stop the cluster (on the active node node01) :

Stop-dfs. sh stop-yarn.sh # Obsolete mr-jobHistory -daemon.sh stop historyServer mapred --daemon stop historyServerCopy the code

2. Start each process one by one

Run the following command to start HDFS NameNode on primary node: Daemon start namenode HDFS --daemon start namenode HDFS SecondaryNamenode Sh start secondarynamenode HDFS --daemon start secondarynamenode run the following command to start HDFS DataNode on each secondary node: Sh start datanode HDFS --daemon start datanode # Run the following command on the active node to start YARN ResourceManager: Sh start resourcemanager yarn --daemon start resourcemanager # Run the following command on each secondary node to start yarn NodeManager: Yarn-daemon. sh start nodemanager yarn --daemon start nodemanager The above script is in the $HADOOP_HOME/sbin/ directory. To stop a role on a node, change start to stop.Copy the code

3. One-click Script to start the Hadoop cluster

To facilitate one-click startup of the Hadoop cluster, we can write shell scripts
Create a script in the /home/hadoop/bin directory on the node01 server

[hadoop@node01 bin]$ cd /home/hadoop/bin/
[hadoop@node01 bin]$ vim hadoop.sh
Copy the code

The following

#! /bin/bash case $1 in "start" ){ source /etc/profile; KKB/install/hadoop - 3.1.4 / sbin/start - DFS. Sh/KKB/install/hadoop - 3.1.4 / sbin/start - yarn. Sh # / KKB/install/hadoop - 3.1.4 / sbin/Mr - jobhistory - daemon. Sh start historyserver/KKB/install/hadoop - 3.1.4 / bin/mapred --daemon start historyserver };; "Stop") {/ KKB/install/hadoop - 3.1.4 / sbin/stop - DFS. Sh/KKB/install/hadoop - 3.1.4 / sbin/stop - yarn. Sh #/ KKB /install/ hadoor-3.1.4 /sbin/mr-jobhistory-daemon.sh stop historyserver/KKB /install/ hadoor-3.1.4 /bin/mapred --daemon  stop historyserver };; esacCopy the code

Modifying Script Permissions

[hadoop@node01 bin]$chmod 777 hadoop.sh [hadoop@node01 bin]$./hadoop.sh start # Start the Hadoop cluster [hadoop@node01 bin]$ Sh stop # Stop the Hadoop clusterCopy the code

Step 9: Verify that the cluster is set up successfully

1. Access the Web UI

HDFS cluster access address

http://192.168.51.100:9870/

Yarn Cluster access address

http://192.168.51.100:8088

Jobhistory Address:

http://192.168.51.100:19888

If the Linux/etc/hostsAdd the following content of the file to the hosts file on the local host (== Change the IP address based on the actual situation ==)

192.168.51.100 node01.kaikeba.com  node01
192.168.51.110 node02.kaikeba.com  node02
192.168.51.120 node03.kaikeba.com  node03
Copy the code

For Windows, the hosts file path is C:\Windows\System32\drivers\etc\hosts
The MAC hosts file is /etc/hosts
So, the above Web UI interface access address can be written separately
- HDFS cluster access address
  
  http://node01:9870/
- Yarn Cluster access address
  
  http://node01:8088
- Jobhistory Address:
  
  http://node01:19888

2. View the process script on all machines

We can also use JPS to view the process name on each machine. To make it easier for us to view the process in the future, we can view the process on all machines in one click through the script
Create file xcall in the /home/hadoop/bin directory on the node01 server

[hadoop@node01 bin]$ cd ~/bin/
[hadoop@node01 bin]$ vim xcall
Copy the code

Add the following

#! /bin/bash params=$@ for (( i=1 ; i <= 3 ; i = $i + 1 )) ; do echo ============= node0$i $params ============= ssh node0$i "source /etc/profile; $params" doneCopy the code

Then one click to view the process and distribute the script

chmod 777  /home/hadoop/bin/xcall
xsync /home/hadoop/bin/
Copy the code

The Following figure shows the Hadoop process that should be started on each node

xcall jps
Copy the code

3. Run the Mr Example

Run the PI example on any node

[hadoop @ node01 ~] $hadoop jar/KKB/install/hadoop - 3.1.4 / share/hadoop/graphs/hadoop - graphs - examples - 3.1.4. Jar PI 5 5Copy the code

Finally, the approximate value of PI is calculated

Warning: If you want to shut down the computer, clear must follow the following order, otherwise the cluster may have problems ==

Shutting down the Hadoop Cluster
Stop a VM.
Shut down the computer

Step 8: Cluster startup

You can start a cluster in either of the following ways:
- (1) Start the script with one key;
- ② Start each process one by one

1. Start HDFS, YARN, and Historyserver

If /etc/hadoop/workers and SSH password-free login are configured, you can use program scripts to start the processes of all hadoop clusters and run them on the machines configured on the primary node.
Start the cluster
Run the following command on active node node01

Sh # obsolete mr-jobHistory -daemon.sh start historyServer mapred --daemon start historyServerCopy the code

Stop the cluster (on the active node node01) :

Stop-dfs. sh stop-yarn.sh # Obsolete mr-jobHistory -daemon.sh stop historyServer mapred --daemon stop historyServerCopy the code

2. Start each process one by one

Run the following command to start HDFS NameNode on primary node: Daemon start namenode HDFS --daemon start namenode HDFS SecondaryNamenode Sh start secondarynamenode HDFS --daemon start secondarynamenode run the following command to start HDFS DataNode on each secondary node: Sh start datanode HDFS --daemon start datanode # Run the following command on the active node to start YARN ResourceManager: Sh start resourcemanager yarn --daemon start resourcemanager # Run the following command on each secondary node to start yarn NodeManager: Yarn-daemon. sh start nodemanager yarn --daemon start nodemanager The above script is in the $HADOOP_HOME/sbin/ directory. To stop a role on a node, change start to stop.Copy the code

3. One-click Script to start the Hadoop cluster

To facilitate one-click startup of the Hadoop cluster, we can write shell scripts
Create a script in the /home/hadoop/bin directory on the node01 server

[hadoop@node01 bin]$ cd /home/hadoop/bin/
[hadoop@node01 bin]$ vim hadoop.sh
Copy the code

The following

#! /bin/bash case $1 in "start" ){ source /etc/profile; KKB/install/hadoop - 3.1.4 / sbin/start - DFS. Sh/KKB/install/hadoop - 3.1.4 / sbin/start - yarn. Sh # / KKB/install/hadoop - 3.1.4 / sbin/Mr - jobhistory - daemon. Sh start historyserver/KKB/install/hadoop - 3.1.4 / bin/mapred --daemon start historyserver };; "Stop") {/ KKB/install/hadoop - 3.1.4 / sbin/stop - DFS. Sh/KKB/install/hadoop - 3.1.4 / sbin/stop - yarn. Sh #/ KKB /install/ hadoor-3.1.4 /sbin/mr-jobhistory-daemon.sh stop historyserver/KKB /install/ hadoor-3.1.4 /bin/mapred --daemon  stop historyserver };; esacCopy the code

Modifying Script Permissions

[hadoop@node01 bin]$chmod 777 hadoop.sh [hadoop@node01 bin]$./hadoop.sh start # Start the Hadoop cluster [hadoop@node01 bin]$ Sh stop # Stop the Hadoop clusterCopy the code

Step 9: Verify that the cluster is set up successfully

1. Access the Web UI

HDFS cluster access address

http://192.168.51.100:9870/

Yarn Cluster access address

http://192.168.51.100:8088

Jobhistory Address:

http://192.168.51.100:19888

If the Linux/etc/hostsAdd the following content of the file to the hosts file on the local host (== Change the IP address based on the actual situation ==)

192.168.51.100 node01.kaikeba.com  node01
192.168.51.110 node02.kaikeba.com  node02
192.168.51.120 node03.kaikeba.com  node03
Copy the code

For Windows, the hosts file path is C:\Windows\System32\drivers\etc\hosts
The MAC hosts file is /etc/hosts
So, the above Web UI interface access address can be written separately
- HDFS cluster access address
  
  http://node01:9870/
- Yarn Cluster access address
  
  http://node01:8088
- Jobhistory Address:
  
  http://node01:19888

2. View the process script on all machines

We can also use JPS to view the process name on each machine. To make it easier for us to view the process in the future, we can view the process on all machines in one click through the script
Create file xcall in the /home/hadoop/bin directory on the node01 server

[hadoop@node01 bin]$ cd ~/bin/
[hadoop@node01 bin]$ vim xcall
Copy the code

Add the following

#! /bin/bash params=$@ for (( i=1 ; i <= 3 ; i = $i + 1 )) ; do echo ============= node0$i $params ============= ssh node0$i "source /etc/profile; $params" doneCopy the code

Then one click to view the process and distribute the script

chmod 777  /home/hadoop/bin/xcall
xsync /home/hadoop/bin/
Copy the code

The Following figure shows the Hadoop process that should be started on each node

xcall jps
Copy the code

3. Run the Mr Example

Run the PI example on any node

[hadoop @ node01 ~] $hadoop jar/KKB/install/hadoop - 3.1.4 / share/hadoop/graphs/hadoop - graphs - examples - 3.1.4. Jar PI 5 5Copy the code

Finally, the approximate value of PI is calculated

Warning: If you want to shut down the computer, clear must follow the following order, otherwise the cluster may have problems ==

Shutting down the Hadoop Cluster
Stop a VM.
Shut down the computer

1. Disable the firewalls on the three VMS

2. Shut down Selinux on three machines

4. Map the host name to the IP address of the three hosts

5. Three machines synchronize their clocks

The first synchronization method is clock synchronization over the network

The second synchronization mode: a machine on the Intranet serves as the clock synchronization server

Step 1: Three machines to determine whether the NTPD service is installed

Step 2: Node01 Starts the NTPD service

Step 3: Modify the node01 server configuration

Step 4: Configure node02 and Node03 to synchronize the time on Node01

6. Add common users for three machines

7. The three sets define a unified catalog

8. Password free login for hadoop users on three machines

9. Three machines shut down and restarted

3. Install the Hadoop cluster

Step 1: Upload and decompress the package

Step 2: Check out the compression methods and local libraries supported by Hadoop

Step 3: Modify the configuration file

Modify the hadoop – env. Sh

Modify the core – site. XML

Modify the HDFS – site. XML

Modify the mapred – site. XML

Modify the yarn – site. XML

Modify workers file

Step 4: Create a directory for storing files

Step 5: Install package distribution SCP and rsync

1. Copy files using SCP

2. Use rsync to implement incremental copy

3. Use rsync to encapsulate the distribution script

Step 6: Configure the Hadoop environment variables

Step 7: Format the cluster

Step 8: Cluster startup

1. Start HDFS, YARN, and Historyserver

2. Start each process one by one

3. One-click Script to start the Hadoop cluster

Step 9: Verify that the cluster is set up successfully

1. Access the Web UI

2. View the process script on all machines

3. Run the Mr Example

Step 8: Cluster startup

1. Start HDFS, YARN, and Historyserver

2. Start each process one by one

3. One-click Script to start the Hadoop cluster

Step 9: Verify that the cluster is set up successfully

1. Access the Web UI

2. View the process script on all machines

3. Run the Mr Example

Related Posts

2018 fragmentation technology learning shortcuts continue to improve ~

How does Djangos Model detect field changes when saving

Distributed | Paxos consensus algorithm