First, cluster environment
- OS Version VM: Memory 16 GB CPU Dual-core OS: centos-7 64-bit OS Download address: http://124.202.164.6/files/417500000AB646E7/mirrors.163.com/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1708.iso
- Software version Hadoop-2.1.1.tar. gz hbase-1.3.1-bin.tar.gz Zookeeper-3.4.10.tar. gz JDK-8u144-linux-x64.tar. gz
- Host Environment Physical machines: Window Server 2012 R2 Virtual machines: VMware Workstation Pro 12
-
Use information Physical machine remote Desktop: 124.130.192.24:9007 Username: JCNY_ZhuGeLiang Password: ** Ask Xie Yan Three VMS =======Hamster====== I P: 192.168.0.10 Username: Hadoop secret code: jcny@hadoop ======= hslav-1 ====== I P: 192.168.0.11 User name: Hadoop secret code: jcny@hadoop ======= hslav-2 ====== I P: 192.168.0.12 User name: Hadoop Password: jcny@hadoop
-
Three VMS have been created in the cluster. The Ip addresses of the VMS are as follows: Master node 192.168.0.10 hmaster.hcluster Slave node 192.168.0.11 hslave-1.hcluster 192.168.0.12 hslave-2.hcluster Zookeeper must be installed on all three VMS
Note: operation instructions, the command before the # represents the root user login
- Time Description Written time: 2017-10-19 First updated time: 2017-10-22
Install the base cluster environment
- Basic Environment Configuration For easy configuration, firewalls in the cluster are disabled and cannot be started upon startup
- The new version of centos-7 uses firewalld firewall by default, not iptables Run the systemctl stop firewalld systemctl mask firewalld command to switch to an iptables firewall, perform the following operations: Install: Yum install iptables-services configure systemctl enable iptables for details, see basic operations of iptables. Note: Perform the same operations on all nodes in the cluster
- Edit the /etc/host file. Mappings between IP addresses and Hostname 192.168.0.10 hmaster hmaster.hcluster 192.168.0.11 hslav-1 hslav-1. Hcluster 192.168.0.12 Hslave-2 hslave-2.hcluster Note: Perform the same operation on all nodes in the cluster
- Install the Java
1) Decompress jdK-8u144-linux-x64.tar. gz to /opt
tar -xzf jdk-7u79-linux-x64.tar.gz -C /opt
2) Add Java environment variables to the end of the /etc/profile file
# = = = = = = = = = = = = = = = = = JAVA_HOME = = = = = = = = = = = = = = = = = = = export JAVA_HOME = “/ opt/jdk1.8.0 _144” export JAVA_BIN = $JAVA_HOME/bin export JAVA_LIB=$JAVA_HOME/lib export CLASSPATH=.:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar export PATH=$JAVA_BIN:$PATH 3) Importing environment variables
source /etc/profile
4) Verify the installation. Run the Java -version command to verify the installation.
Note: Perform the same operation on all nodes in the cluster
- Install the NTP
In a Hadoop cluster, clocks on all nodes must be synchronized
1) Install the NTP service on all nodes
yum install ntp ntpdate
2) Configure the Master node
On the Master node, edit the /etc/ntp.conf configuration file and add the following configuration to enable the NTP service on the Master node to accept time synchronization requests from the cluster network segment
Restrict 192.168.1.0 Mask 255.255.255.0 nomodify Notrap
3) Start the NTPD service on the Master node and enable the NTPD service on the Master nodechkconfig –level 345 ntpd on
service ntpd start
4) Configure the Slave node
On the Slave node, edit the /etc/ntp.conf configuration file and add the following configuration to synchronize it with the Master node clock
server hamster
Before starting NTPD on the Slave node for the first time, manually synchronize time with that on the Master nodentpdate hamster
5) Start the NTPD service of the Slave node and set it to start after the Slave node is started
chkconfig –level 345 ntpd on
service ntpd start
Note: Ntpd usually takes about 5 minutes to synchronize time when it is started. Therefore, Ntpd cannot provide clock service when it is just started. Error message “No server suitable for synchronization found” is displayed.
- Install rsync
With the Rsync tool, hadoop control scripts can distribute configuration files to nodes in the cluster. This function is disabled by default and can be enabled by setting the HADOOP_MASTER variable in hadoop-env.sh. After rsync is enabled, the HADOOP_MASTER root directory tree will be synchronized with the local HADOOP_INSTALL directory when the daemon of the working node starts
yum install rsync
Note: Perform the same operation on all nodes in the cluster
- Configure password-free login for the Master node to all nodes. Prerequisites: Use the same login username for each node in the cluster. You are not advised to use the root user.
Create user Hadoop on all nodes
useradd hadoop
passwd hadoop
New password: Retype new password: passwd: All authentication tokens updated successfully. 1) Generate host keys on Master node $ssh-keygen -t rsa -p”
SSH generates private key id_rsa and public key id_rsa.pub. 2) Copy the Master public key to the authorized_keys file of the logged-in node. SSH folder, then copy the authorized_keys file, and set the corresponding permissions under the Hadoop user
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
chmod 600 .ssh/authorized_keys
Then configure password-free logon from Master to Slave, copy the contents of Master id_rsa.pub to ${HOME}/.ssh/authorized_keys file of Slave. 3) Upload the public key file to Slave first
scp .ssh/id_rsa.pub hadoop@hslave-1:/home/hadoop/id_rsa.pub.hmaster
4) Create files on the Slave node SSH /authorized_keys
mkdir .ssh
cat id_rsa.pub.hmaster >> .ssh/authorized_keys
chmod 700 .ssh
chmod 600 .ssh/authorized_keys
rm -f id_rsa.pub.hmaster
5) Perform the same operations on all Slave nodes in sequence, and log in to the Slave node from the Master node for authentication
ssh hslave-1
6) If the connection fails, open Terminal as user root and run cat /etc/passwd
Hadoop user: gedit /etc/passwd: gedit/vim: gedit /etc/passwd
- 1) This configuration is specific to HBase. Refer to the HBase official document. HBase opens a large number of file handles and processes at the same time, which exceeds the default limit in Linux.
So edit/etc/security/limits file, add the following two lines, Hadoop-nofile 32768 hadoop-nproc 32000: /etc/pam.d/common-session: /etc/pam.d/common-session Otherwise in the/etc/security/limits on the conf configuration will not take effect.
The configuration can take effect only after you logout (logout or exit). After logging in to the hadoop user, run the following command: ulimit -n -u to check whether the maximum number of files and processes has changed. The result is as follows: success
Note: Perform the same operations on all nodes in the cluster. Now that the cluster and Hadoop dependencies are configured, install Hadoop. Install Hadoop
- Decompress the Hadoop tar package
1) Upload the Hadoop software package to each node in the cluster and decompress it
In this document, upload the hadoop-2.5.1.tar.gz file to the hadoop user’s home directory /home/hadoop.
The tar – XZF hadoop – 2.5.1. Tar. Gz
The mv hadoop – 2.5.1 hadoop
Note: Perform the same operation on all nodes in the cluster
2) Set permissions on Hadoop files for common users
chown -R Hadoop:Hadoop /home/Hadoop/hadoop
3) Configure hadoop configuration files
The hadoop configuration file is located in /home/hadoop/etc/hadoop
A) Change the JDK path
B) Core-site.xml is used to define system-level parameters such as HDFS URLS, Hadoop temporary directories, and configuration for configuration files in the rack-aware cluster
C) hdFs-site. XML HDFS Settings, such as the number of file copies, block size, and whether to use the forced permission
D) mapred-site.xml.template HDFS Settings, such as the default number of Reduce jobs and the default upper and lower limits of memory that can be used by jobs
http://192.168.0.10:9001
http://192.168.0.10:10020
http://192.168.0.10:19888
Slave Host Change hmaster to the corresponding node name, for example, the first host. E) yarn hslave – 1 – site. XML yarn. The nodemanager. Aux – services mapreduce_shuffle yarn. The resourcemanager. Address 192.168.0.10:8032 Yarn. The resourcemanager. Scheduler. Address 192.168.0.10:8030 yarn. The resourcemanager. Resource – tracker. The address 192.168.0.10:8031 yarn. The resourcemanager. Admin. Address 192.168.0.10:8033 yarn. The resourcemanager. Webapp. Address 192.168.0.10:8088
Slave Host Change hmaster to the IP address of the corresponding node. For example, change the IP address of the first host. 192.168.0.11f) Slaves The slave host list of the Hadoop cluster. When the master starts, it will connect to all the hosts in this list by SSH and start DataNode and NodeManager files for them. hadoop/etc/hadoop/slaves hslave-1 hslave-2
G) If the file does not exist, create a new file named masters and write hmaster to the host node
Master Host-specific configuration 4) Configure the environment variable export HADOOP_HOME=/usr/hadoop export PATH=$PATH :$HADOOP_HOME/bin 5) Format the HDFS storage Go to the bin directory under Hadoop (on hosts only)
./hdfs namenode -format
Pay attention to the output during formatting. If it succeeds, it will have output similar to the following:
6) Start Hadoop services Start and stop scripts in the Sbin directory of Hadoop a) Start HDFS
sbin/start-dfs.sh
B) to start the Yarn
sbin/start-yarn.sh
C) Start the MapReduce JobHistory Server
sbin/mr-jobhistory-daemon.sh start historyserver
D) Run sbin/start-all.sh e) Check whether the service is running properly. Run the Terminal command on the Master node and enter JPS to view the result
Enter JPS on the node
7) Stop the Hadoop service. a) Stop HDFS
sbin/stop-dfs.sh
B) stop Yarn
sbin/stop-yarn.sh
C) Stop the MapReduce JobHistory Server
sbin/mr-jobhistory-daemon.sh stop historyserver
D) Stop all sentences sbin/stop-all.sh
- Port 50070 is displayed on the webui
ZooKeeper is installed on all nodes
- Zookeeper is installed on all nodes
A) Decompress ZooKeeper on all nodes
B) ZooKeeper configuration file zoo. CFG is stored in the conf directory of ZooKeeper and does not exist by default. You can generate the zoo_sample. CFG file
cd conf
cp zoo_sample.cfg zoo.cfg
C) zoo. CFG is configured as follows:
The number of milliseconds of each tick
tickTime=2000
The number of ticks that the initial
synchronization phase can take
initLimit=10
The number of ticks that can pass between
sending a request and getting an acknowledgement
syncLimit=5
the directory where the snapshot is stored.
do not use /tmp for storage, /tmp here is just
example sakes.
dataDir=/home/hadoop/zookeeper/data
#dataLogDir=/home/hadoop/zookeeper/logsthe port at which the clients will connect
clientPort=2181
the maximum number of client connections.
increase this if you need to handle more clients
#maxClientCnxns=60
#Be sure to read the maintenance section of the
administrator guide before turning on autopurge.
#
Zookeeper.apache.org/doc/current…
#
The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
Purge task interval in hours
Set to “0” to disable auto purge feature
#autopurge.purgeInterval=1
Server. 1=0.0.0.0:2888:3888 server.2=192.168.0.11:2888:3888 server.3=192.168.0.12:2888:3888 Synchronize zoo. CFG to all ZooKeeper nodes D) Create the ZooKeeper data directory and create the node identification file myid. Create the file myid (1) on the hmaster node
mkdir -p /home/hadoop/zookeeperdata/data
echo “1” > /home/hadoop/zookeeperdata/data/myid
Similarly, in accordance with the zoo. CFG configuration, create identity files 2 and 3 on hslave-1 and hslave-2 respectively. e) hslave-1
mkdir -p /home/hadoop/zookeeperdata/data
echo “2” > /home/hadoop/zookeeperdata/data/myid
f) hslave-2
mkdir -p /home/hadoop/zookeeperdata/data
echo “3” > /home/hadoop/zookeeperdata/data/myid
- Start the ZooKeeper cluster
The ZooKeeper startup script is in the bin directory under ZooKeeper
Start the ZooKeeper
Run the script to start the ZooKeeper service on each node of the ZooKeeper cluster as follows:
Each node machine needs to execute the following command
bin/zkServer.sh start
Zookeeper logs Zookeeper. out is stored in the zooKeeper home directory by default. You can change the ZOO_LOG_DIR variable in ${zkhome}/bin/ zkenv. sh to change the log path.
- A) After the installation is completed, open a Terminal and enter JPS to check whether there is QuorumPeerMain thread. The following picture shows that there is no such thread.
B) Run the cat zookpeer command in the bin directory.
C) then it turns out that the Java environment variable may not be in effect enter source /etc/profile Systemctl disable Firewalld service iptables stop e) If the process is running in the red box in the following figure, it indicates that the firewall is successful
F) Run the command to check whether the startup is successful./ zkserver. sh status
Very big note: if the first boot is encountered, but the type: #./zkServer.sh status appears
It indicates that zooKeeper on other nodes has not been started, so you need to start ZooKeeper on other nodes. After starting zooKeeper, you can view the zooKeeper on other nodes
The HBase cluster architecture consists of one HMaster(HMaster) and two HRegionServers (Hslav-1 and HSlav-2).
- Decompress hbase on all nodes
The tar – XZF hbase – 1.1.5 – bin. Tar. Gz
The mv hbase – 1.1.5 – bin hbase
The hbase home directory is ${HBASE_HOME}.
- Configuring HBase The HBase configuration file is in the ${HBASE_HOME}/conf directory. 1) Hbase-env. sh mainly changes the following two configurations: Export JAVA_HOME = / opt/jdk1.8.0 _79 export HBASE_MANAGES_ZK = false) hbase – site. 2. XML hbase cluster. The distributed true Hbase. Rootdir HDFS: / / 192.168.0.10:9002 / hbase hbase. Master hmaster: 60000 hbase. Zookeeper. Quorum hmaster, hslave – 1, hslave – 2 hbase.zookeeper.property.dataDir /home/hadoop/zookeeper hbase.master.maxclockskew 150000 Hbase. Zookeeper. Property. ClientPort 2181 pay special attention here is that the hbase. Rootdir HDFS inside address is with Hadoop core – site. The inside of the XML fs. DefaultFS The HDFS IP address, domain name, and port must be the same. Followed by hbase. Zookeeper. Property. The dataDir, including a01513 is my operating system user name and change according to your own situation, or in other directories can too. Hbase. Cluster. Distributed is enable distributed mode, this must be true. Hbase.zookeeper. quorum Specifies the cluster IP address set or domain name set. The values are separated by commas (,). Hbase. master Set the hbase primary node port. The web port is 60010. You can check whether the access is successful using the webui. 3) Configure the RegionServers host list hslav-1 hslav-2 4) Set the node host list to hslav-1 hslav-2 Note: Synchronize the preceding configuration to all nodes in the cluster.
- Start stop HBase
The HBase script is stored in ${HBASE_HOME}/bin.
bin/start-hbase.sh
- Stop HBase
bin/stop-hbase.sh
- If this problem occurs during startup, comment below
Wrong place
Annotated code
This error is time not synchronized Note: each node machine is configured
- Check whether the host is successfully started
Node 1 and node 2
Note: If the HRegionServer process on the node has a time synchronization problem, you can increase the time synchronization
hbase.master.maxclockskew
150000
-
You can log in to hbase Shell and run commands to check hbase status
-
The following figure shows that the query table is truly successful
-
Port 16010 is displayed on the Web UI
Vi. Summary So far: The fully distributed Hadoop+Zookeeper+HBase cluster test environment has been basically completed successfully. If there is any problem, continue to study. Experienced a lot of pit, and a lot of firewall and Java path problems there is configuration file configuration information problem, there is a node machine configuration file synchronization problem!
If you need the original PDF file, please leave your email in the comments section