1. Complete distribution mode
Fully distributed mode is more complex than local mode and pseudo-distributed mode model, the real use more Linux host to deploy Hadoop, planned for cluster, makes Hadoop modules deployed on different multiple machines respectively, this article is through three virtual machine for the cluster configuration, the main steps are:
- Prepare VMS: Prepare the basic vm environment
ip
+Host
Configuration: Manually configure the VMip
As well as the host name, you need to ensure that the three virtual machines function with each otherping
tongssh
Configuration: Generate a key pair and copy the public key to the three VMS for password-free communicationHadoop
Configuration:core-site.xml
+hdfs-site.xml
+workers
YARN
Configuration:yarn-site.xml
2 VM Installation
Three VMS need to be used, one of which is the Master node and two Worker nodes. The first step is to install the VM and configure the environment, and then perform the test.
2.1 Image Download
To install the VM using VirtualBox, download the image of the latest version from the CentOS official website.
There are three different mirror images:
boot
: Network installation versiondvd1
: the full versionminimal
: Minimum installation version
Select the minimal install version for convenience, that is, the one without the GUI.
2.2 installation
After downloading, open the Virtual Box and click New to select Expert mode:
Name it CentOSMaster, act as the Master node, and allocate memory (1 gb), if you think you have a large memory can be 2 gb:
Disk 30G is enough, other can keep default:
Once created, from the store in Settings, select the image to download:
After startup, the system prompts you to select a boot disk.
When you are ready, the following screen will appear. Select the first installation:
After a while, the installation screen is displayed:
To configure the installation location and time zone, select the installation location first:
Since it is a virtual single empty disk, select automatic partition:
Time zone here can choose Shanghai, China:
Select the network and change the host name to master:
Then click Configure:
Add the IP address and DNS server. The IP address can refer to the local computer. For example, the local IP address of the author’s machine is 192.168.1.7.
- The virtual machine
ip
You can fill in192.168.1.8
- The subnet mask is usually
255.255.255.0
- The default gateway is
192.168.1.1
DNS
The server is114.114.114.114
(Of course, you can also change to other publicDNS
Like Ali’s223.5.5.5
, baidu180.76.76.76
Etc.)
Click Save to apply the host name and enable:
No problem if you can install:
Set root user password and create user:
The user uses a user named hadoopuser on which all subsequent operations are directly based:
Wait for a period of time after the installation is complete and restart.
2.3 start
First remove the original image before booting:
Black box screen after startup:
Log in to the hadoopuser user created earlier.
3 ssh
Connecting a VM
By default, it can’t connect to the outside world, you need to select Network from Devices in the menu bar, and set it to Bridged Adapter:
Ping test:
Then you can test whether the local machine can be pinged:
After logging in to the VM, you can use SSH to connect to the VM. You can connect to the VM from the local terminal as you normally do to the server.
Then enter the password to connect:
If you want to be lazy, you can use the key connection mode on the local machine:
Ssh-keygen -t ed25519 -a 100 ssh-copy-id -i ~/. SSH /id_ed25519.pub [email protected]Copy the code
4 Basic Environment construction
The basic environment is to install JDK and Hadoop, and use SCP to upload OpenJDK and Hadoop.
4.1 JDK
First download OpenJDK, then upload using SCP on local machine:
SCP its - 11 + 28 _linux - x64_bin. Tar. Gz [email protected]: / home/hadoopuserCopy the code
Then switch to SSH to connect to the virtual machine locally,
cd ~
tar -zxvf openjdk-11+28_linux-x64_bin.tar.gz
sudo mv jdk-11 /usr/local/java
Copy the code
The next step is to edit /etc/profile and add bin to the environment variable, adding at the end:
sudo vim /etc/profile
# No vim please use vi
Sudo yum install vim
# add
export PATH=$PATH:/usr/local/java/bin
Copy the code
And then:
. /etc/profile
Copy the code
Testing:
4.2 Hadoop
The Hadoop SCP package is uploaded to the VM, decompressed, and moved to /usr/local:
SCP hadoop - 3.3.0. Tar. Gz [email protected]: / home/hadoopuserCopy the code
Vm SSH terminal:
cd~ tar-xvf hadoop-3.3.0.tar.gz sudo mv hadoop-3.3.0 /usr/local/hadoop
Copy the code
At the same time, modify the /etc/hadoop/hadoop-env. sh configuration file and enter the Java path:
sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
# fill in
export JAVA_HOME=/usr/local/java # Change to your Java directory
Copy the code
5 cloning
Because one Master node and two Worker nodes are required, shut down the Master node, select the configured CentOSMaster, and right-click to clone:
And select full clone:
Clone CentOSWorker1 and CentOSWorker2.
6 Host name +ip
Set up the
Worker1 = Worker1; Worker2 = Worker1;
sudo vim /etc/hostname
# enter
# worker1
Copy the code
For IP, since the IP address of the Master node is 192.168.1.8, the two Worker nodes are modified as follows:
192.168.1.9
192.168.1.10
sudo vim /etc/sysconfig/network-scripts/ifcfg-xxxx # This file varies from person to person
# modified IPADDRIPADDR = 192.168.1.9Copy the code
After the modification, restart Worker1 and perform the same operations to change the host name and IP address of Worker2.
7 Host
Set up the
Host Settings need to be set on Master and Worker nodes:
7.1 Master
node
sudo vim /etc/hosts
# add
192.168.1.9 worker1 # correspond to the IP address above
192.168.1.10 worker2
Copy the code
7.2 Worker1
node
sudo vim /etc/hosts
# add
192.168.1.8 master
192.168.1.10 worker2
Copy the code
7.3 Worker2
node
sudo vim /etc/hosts
# add
192.168.1.8 master
192.168.1.9 worker1
Copy the code
7.4 each otherping
test
Ping the IP address or host name of the other two VMS on one of the three VMS. After the test passes, you can proceed to the next step. This section uses the Worker1 node test:
8 configurationssh
8.1 sshd
service
You need to configure SSH password-free (key) connections between three nodes (including itself)
systemctl status sshd
Copy the code
Check whether the SSHD service is enabled
systemctl start sshd
Copy the code
Open it.
8.2 Copying a Public Key
Perform the following operations on all three nodes:
ssh-keygen -t ed25519 -a 100
ssh-copy-id master
ssh-copy-id worker1
ssh-copy-id worker2
Copy the code
8.3 test
SSH directly from one of the nodes to the other nodes to log in without a password, such as in Master:
ssh master # hadoopuser = hadoopuser
ssh worker1
ssh worker2
Copy the code
9 Master
nodeHadoop
configuration
On the Master node, modify the following configuration files:
HADOOP/etc/hadoop/core-site.xml
HADOOP/etc/hadoop/hdfs-site.xml
HADOOP/etc/hadoop/workers
9.1 core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data/tmp</value>
</property>
</configuration>
Copy the code
fs.defaultFS
:NameNode
addresshadoop.tmp.dir
:Hadoop
The temporary directory
9.2 hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
Copy the code
dfs.namenode.name.dir
: saveFSImage
Directory to storeNameNode
themetadata
dfs.datanode.data.dir
: saveHDFS
The directory in which data is storedDataNode
Multiple data blocks ofdfs.replication
:HDFS
The number of temporary backups stored, twoWorker
Node, so the value is2
9.3 workers
Finally modify workers, enter (same as the host name set above) :
worker1
worker2
Copy the code
9.4 Copying a Configuration File
Copy the Master configuration to the Worker:
scp /usr/local/hadoop/etc/hadoop/* worker1:/usr/local/hadoop/etc/hadoop/
scp /usr/local/hadoop/etc/hadoop/* worker2:/usr/local/hadoop/etc/hadoop/
Copy the code
10 HDFS
Format and start
10.1 start
In the Master node:
cd /usr/local/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
Copy the code
You can run the JPS command to view the following information:
In the Worker node:
10.2 test
Browser input:
master:9870
# If you have not changed the local Host, you can enter it
# 192.168.1.8:9870
Copy the code
But…
I thought I’d see results after all this work.
Then I checked the Host + VM Host and Hadoop configuration file, and there was no problem.
In the end,
I was able to locate the problem
The firewall.
10.3 the firewall
CentOS8 enables the firewall by default and can use:
systemctl status firewalld
Copy the code
Check the firewall status.
Because port 9870 is used for access, check whether 9870 is enabled. Enter the following information in the Master node:
sudo firewall-cmd --query-port=9870/tcp
# or
sudo firewall-cmd --list-ports
Copy the code
If the output is no:
Is not open, manually open:
sudo firewall-cmd --add-port=9870/tcp --permanent
sudo firewall-cmd --reload # make it work
Copy the code
Type again in your browser:
master:9870
# if local Host is not modified
# 192.168.1.8:9870
Copy the code
You can now see a friendly page:
However, one problem is that the Worker Nodes are not displayed. The number of Live Nodes in the image above is 0, and the Datanodes has nothing displayed:
But you do see Datanode processes in Worker nodes:
To see the Worker nodes log (/ usr/local/hadoop/logs/hadoop – hadoopuser – datanode – worker1. Log) can see should be a Master node 9000 port is not open questions:
Run the stop-dfs.sh command on the Master node to disable the service. Run the start-dfs.sh command to enable the 9000 port.
/usr/local/hadoop/sbin/stop-dfs.sh
sudo firewall-cmd --add-port=9000/tcp --permanent
sudo firewall-cmd --reload
/usr/local/hadoop/sbin/start-dfs.sh
Copy the code
Visit again in browser:
master:9000
# or
# 192.168.1.8:9000
Copy the code
Now you can see the Worker node:
11 configurationYARN
11.1 YARN
configuration
Changes in two Worker node/usr/local/hadoop/etc/hadoop/yarn – site. XML:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
Copy the code
11.2 openYARN
Enable YARN on the Master node:
cd /usr/local/hadoop
sbin/start-yarn.sh
Copy the code
Also open port 8088 in preparation for the following tests:
sudo firewall-cmd --add-port=8088/tcp --permanent
sudo firewall-cmd --reload
Copy the code
11.3 test
Browser input:
master:8088
# or
# 192.168.1.8:8088
Copy the code
You should be able to access the following page:
Similarly, there is no Worker node. Check the log of Worker node, and it is found that the problem is also the port:
Disable YARN on the Master node, enable port 8031, and restart YARN:
/usr/local/hadoop/sbin/stop-yarn.sh
sudo firewall-cmd --add-port=8031/tcp --permanent
sudo firewall-cmd --reload
/usr/local/hadoop/sbin/start-yarn.sh
Copy the code
Visit again:
master:8088
# or
# 192.168.1.8:8088
Copy the code
You can now see the Worker node:
At this point, a Hadoop cluster consisting of VMS is set up.
12 reference
- CSDN GitChat · with large data | in the history of the most detailed Hadoop environment set up
- How To Set Up a Hadoop 3.2.1 Multi-Node Cluster on Ubuntu 18.04 (2 Nodes)
- How to Install and Set Up a 3-Node Hadoop Cluster
- Csdn-virtualbox Enables a host and a VM to ping each other and configure a static IP address