First, cluster environment

  1. OS Version VM: Memory 16 GB CPU Dual-core OS: centos-7 64-bit OS Download address: http://124.202.164.6/files/417500000AB646E7/mirrors.163.com/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1708.iso
  2. Software version Hadoop-2.1.1.tar. gz hbase-1.3.1-bin.tar.gz Zookeeper-3.4.10.tar. gz JDK-8u144-linux-x64.tar. gz
  3. Host Environment Physical machines: Window Server 2012 R2 Virtual machines: VMware Workstation Pro 12
  4. Use information Physical machine remote Desktop: 124.130.192.24:9007 Username: JCNY_ZhuGeLiang Password: ** Ask Xie Yan Three VMS =======Hamster====== I P: 192.168.0.10 Username: Hadoop secret code: jcny@hadoop ======= hslav-1 ====== I P: 192.168.0.11 User name: Hadoop secret code: jcny@hadoop ======= hslav-2 ====== I P: 192.168.0.12 User name: Hadoop Password: jcny@hadoop

  5. Three VMS have been created in the cluster. The Ip addresses of the VMS are as follows: Master node 192.168.0.10 hmaster.hcluster Slave node 192.168.0.11 hslave-1.hcluster 192.168.0.12 hslave-2.hcluster Zookeeper must be installed on all three VMS

Note: operation instructions, the command before the # represents the root user login

  1. Time Description Written time: 2017-10-19 First updated time: 2017-10-22

Install the base cluster environment

  1. Basic Environment Configuration For easy configuration, firewalls in the cluster are disabled and cannot be started upon startup
  2. The new version of centos-7 uses firewalld firewall by default, not iptables Run the systemctl stop firewalld systemctl mask firewalld command to switch to an iptables firewall, perform the following operations: Install: Yum install iptables-services configure systemctl enable iptables for details, see basic operations of iptables. Note: Perform the same operations on all nodes in the cluster
  3. Edit the /etc/host file. Mappings between IP addresses and Hostname 192.168.0.10 hmaster hmaster.hcluster 192.168.0.11 hslav-1 hslav-1. Hcluster 192.168.0.12 Hslave-2 hslave-2.hcluster Note: Perform the same operation on all nodes in the cluster
  4. Install the Java

    1) Decompress jdK-8u144-linux-x64.tar. gz to /opt

    tar -xzf jdk-7u79-linux-x64.tar.gz -C /opt

    2) Add Java environment variables to the end of the /etc/profile file

# = = = = = = = = = = = = = = = = = JAVA_HOME = = = = = = = = = = = = = = = = = = = export JAVA_HOME = “/ opt/jdk1.8.0 _144” export JAVA_BIN = $JAVA_HOME/bin export JAVA_LIB=$JAVA_HOME/lib export CLASSPATH=.:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar export PATH=$JAVA_BIN:$PATH 3) Importing environment variables

source /etc/profile

4) Verify the installation. Run the Java -version command to verify the installation.

Note: Perform the same operation on all nodes in the cluster

  1. Install the NTP

    In a Hadoop cluster, clocks on all nodes must be synchronized

    1) Install the NTP service on all nodes

    yum install ntp ntpdate

    2) Configure the Master node

    On the Master node, edit the /etc/ntp.conf configuration file and add the following configuration to enable the NTP service on the Master node to accept time synchronization requests from the cluster network segment

    Restrict 192.168.1.0 Mask 255.255.255.0 nomodify Notrap

    3) Start the NTPD service on the Master node and enable the NTPD service on the Master node

    chkconfig –level 345 ntpd on

    service ntpd start

    4) Configure the Slave node

    On the Slave node, edit the /etc/ntp.conf configuration file and add the following configuration to synchronize it with the Master node clock

    server hamster

    Before starting NTPD on the Slave node for the first time, manually synchronize time with that on the Master node

    ntpdate hamster

    5) Start the NTPD service of the Slave node and set it to start after the Slave node is started

    chkconfig –level 345 ntpd on

    service ntpd start

    Note: Ntpd usually takes about 5 minutes to synchronize time when it is started. Therefore, Ntpd cannot provide clock service when it is just started. Error message “No server suitable for synchronization found” is displayed.

  2. Install rsync

    With the Rsync tool, hadoop control scripts can distribute configuration files to nodes in the cluster. This function is disabled by default and can be enabled by setting the HADOOP_MASTER variable in hadoop-env.sh. After rsync is enabled, the HADOOP_MASTER root directory tree will be synchronized with the local HADOOP_INSTALL directory when the daemon of the working node starts

    yum install rsync

    Note: Perform the same operation on all nodes in the cluster

  3. Configure password-free login for the Master node to all nodes. Prerequisites: Use the same login username for each node in the cluster. You are not advised to use the root user.

Create user Hadoop on all nodes

useradd hadoop

passwd hadoop

New password: Retype new password: passwd: All authentication tokens updated successfully. 1) Generate host keys on Master node $ssh-keygen -t rsa -p”

SSH generates private key id_rsa and public key id_rsa.pub. 2) Copy the Master public key to the authorized_keys file of the logged-in node. SSH folder, then copy the authorized_keys file, and set the corresponding permissions under the Hadoop user

cat .ssh/id_rsa.pub >> .ssh/authorized_keys

chmod 600 .ssh/authorized_keys

Then configure password-free logon from Master to Slave, copy the contents of Master id_rsa.pub to ${HOME}/.ssh/authorized_keys file of Slave. 3) Upload the public key file to Slave first

scp .ssh/id_rsa.pub hadoop@hslave-1:/home/hadoop/id_rsa.pub.hmaster

4) Create files on the Slave node SSH /authorized_keys

mkdir .ssh

cat id_rsa.pub.hmaster >> .ssh/authorized_keys

chmod 700 .ssh

chmod 600 .ssh/authorized_keys

rm -f id_rsa.pub.hmaster

5) Perform the same operations on all Slave nodes in sequence, and log in to the Slave node from the Master node for authentication

ssh hslave-1

6) If the connection fails, open Terminal as user root and run cat /etc/passwd

Hadoop user: gedit /etc/passwd: gedit/vim: gedit /etc/passwd

  1. 1) This configuration is specific to HBase. Refer to the HBase official document. HBase opens a large number of file handles and processes at the same time, which exceeds the default limit in Linux.

So edit/etc/security/limits file, add the following two lines, Hadoop-nofile 32768 hadoop-nproc 32000: /etc/pam.d/common-session: /etc/pam.d/common-session Otherwise in the/etc/security/limits on the conf configuration will not take effect.

The configuration can take effect only after you logout (logout or exit). After logging in to the hadoop user, run the following command: ulimit -n -u to check whether the maximum number of files and processes has changed. The result is as follows: success

Note: Perform the same operations on all nodes in the cluster. Now that the cluster and Hadoop dependencies are configured, install Hadoop. Install Hadoop

  1. Decompress the Hadoop tar package

    1) Upload the Hadoop software package to each node in the cluster and decompress it

    In this document, upload the hadoop-2.5.1.tar.gz file to the hadoop user’s home directory /home/hadoop.

    The tar – XZF hadoop – 2.5.1. Tar. Gz

    The mv hadoop – 2.5.1 hadoop

    Note: Perform the same operation on all nodes in the cluster

    2) Set permissions on Hadoop files for common users

    chown -R Hadoop:Hadoop /home/Hadoop/hadoop

    3) Configure hadoop configuration files

    The hadoop configuration file is located in /home/hadoop/etc/hadoop

    A) Change the JDK path

B) Core-site.xml is used to define system-level parameters such as HDFS URLS, Hadoop temporary directories, and configuration for configuration files in the rack-aware cluster





















































C) hdFs-site. XML HDFS Settings, such as the number of file copies, block size, and whether to use the forced permission


















































D) mapred-site.xml.template HDFS Settings, such as the default number of Reduce jobs and the default upper and lower limits of memory that can be used by jobs





















http://192.168.0.10:9001












http://192.168.0.10:10020












http://192.168.0.10:19888





Slave Host Change hmaster to the corresponding node name, for example, the first host. E) yarn hslave – 1 – site. XML yarn. The nodemanager. Aux – services mapreduce_shuffle yarn. The resourcemanager. Address 192.168.0.10:8032 Yarn. The resourcemanager. Scheduler. Address 192.168.0.10:8030 yarn. The resourcemanager. Resource – tracker. The address 192.168.0.10:8031 yarn. The resourcemanager. Admin. Address 192.168.0.10:8033 yarn. The resourcemanager. Webapp. Address 192.168.0.10:8088

Slave Host Change hmaster to the IP address of the corresponding node. For example, change the IP address of the first host. 192.168.0.11f) Slaves The slave host list of the Hadoop cluster. When the master starts, it will connect to all the hosts in this list by SSH and start DataNode and NodeManager files for them. hadoop/etc/hadoop/slaves hslave-1 hslave-2

G) If the file does not exist, create a new file named masters and write hmaster to the host node

Master Host-specific configuration 4) Configure the environment variable export HADOOP_HOME=/usr/hadoop export PATH=$PATH :$HADOOP_HOME/bin 5) Format the HDFS storage Go to the bin directory under Hadoop (on hosts only)

./hdfs namenode -format

Pay attention to the output during formatting. If it succeeds, it will have output similar to the following:

6) Start Hadoop services Start and stop scripts in the Sbin directory of Hadoop a) Start HDFS

sbin/start-dfs.sh

B) to start the Yarn

sbin/start-yarn.sh

C) Start the MapReduce JobHistory Server

sbin/mr-jobhistory-daemon.sh start historyserver

D) Run sbin/start-all.sh e) Check whether the service is running properly. Run the Terminal command on the Master node and enter JPS to view the result

Enter JPS on the node

7) Stop the Hadoop service. a) Stop HDFS

sbin/stop-dfs.sh

B) stop Yarn

sbin/stop-yarn.sh

C) Stop the MapReduce JobHistory Server

sbin/mr-jobhistory-daemon.sh stop historyserver

D) Stop all sentences sbin/stop-all.sh

  1. Port 50070 is displayed on the webui

ZooKeeper is installed on all nodes

  1. Zookeeper is installed on all nodes

    A) Decompress ZooKeeper on all nodes

    B) ZooKeeper configuration file zoo. CFG is stored in the conf directory of ZooKeeper and does not exist by default. You can generate the zoo_sample. CFG file

    cd conf

    cp zoo_sample.cfg zoo.cfg

    C) zoo. CFG is configured as follows:

    The number of milliseconds of each tick

    tickTime=2000

    The number of ticks that the initial

    synchronization phase can take

    initLimit=10

    The number of ticks that can pass between

    sending a request and getting an acknowledgement

    syncLimit=5

    the directory where the snapshot is stored.

    do not use /tmp for storage, /tmp here is just

    example sakes.

    dataDir=/home/hadoop/zookeeper/data

    #dataLogDir=/home/hadoop/zookeeper/logs

    the port at which the clients will connect

    clientPort=2181

    the maximum number of client connections.

    increase this if you need to handle more clients

    #maxClientCnxns=60

    #

    Be sure to read the maintenance section of the

    administrator guide before turning on autopurge.

    #

    Zookeeper.apache.org/doc/current…

    #

    The number of snapshots to retain in dataDir

    #autopurge.snapRetainCount=3

    Purge task interval in hours

    Set to “0” to disable auto purge feature

    #autopurge.purgeInterval=1

Server. 1=0.0.0.0:2888:3888 server.2=192.168.0.11:2888:3888 server.3=192.168.0.12:2888:3888 Synchronize zoo. CFG to all ZooKeeper nodes D) Create the ZooKeeper data directory and create the node identification file myid. Create the file myid (1) on the hmaster node

mkdir -p /home/hadoop/zookeeperdata/data

echo “1” > /home/hadoop/zookeeperdata/data/myid

Similarly, in accordance with the zoo. CFG configuration, create identity files 2 and 3 on hslave-1 and hslave-2 respectively. e) hslave-1

mkdir -p /home/hadoop/zookeeperdata/data

echo “2” > /home/hadoop/zookeeperdata/data/myid

f) hslave-2

mkdir -p /home/hadoop/zookeeperdata/data

echo “3” > /home/hadoop/zookeeperdata/data/myid

  1. Start the ZooKeeper cluster

    The ZooKeeper startup script is in the bin directory under ZooKeeper

    Start the ZooKeeper

    Run the script to start the ZooKeeper service on each node of the ZooKeeper cluster as follows:

    Each node machine needs to execute the following command

    bin/zkServer.sh start

    Zookeeper logs Zookeeper. out is stored in the zooKeeper home directory by default. You can change the ZOO_LOG_DIR variable in ${zkhome}/bin/ zkenv. sh to change the log path.

  2. A) After the installation is completed, open a Terminal and enter JPS to check whether there is QuorumPeerMain thread. The following picture shows that there is no such thread.

B) Run the cat zookpeer command in the bin directory.

C) then it turns out that the Java environment variable may not be in effect enter source /etc/profile Systemctl disable Firewalld service iptables stop e) If the process is running in the red box in the following figure, it indicates that the firewall is successful

F) Run the command to check whether the startup is successful./ zkserver. sh status

Very big note: if the first boot is encountered, but the type: #./zkServer.sh status appears

It indicates that zooKeeper on other nodes has not been started, so you need to start ZooKeeper on other nodes. After starting zooKeeper, you can view the zooKeeper on other nodes

The HBase cluster architecture consists of one HMaster(HMaster) and two HRegionServers (Hslav-1 and HSlav-2).

  1. Decompress hbase on all nodes

    The tar – XZF hbase – 1.1.5 – bin. Tar. Gz

    The mv hbase – 1.1.5 – bin hbase

    The hbase home directory is ${HBASE_HOME}.

  2. Configuring HBase The HBase configuration file is in the ${HBASE_HOME}/conf directory. 1) Hbase-env. sh mainly changes the following two configurations: Export JAVA_HOME = / opt/jdk1.8.0 _79 export HBASE_MANAGES_ZK = false) hbase – site. 2. XML hbase cluster. The distributed true Hbase. Rootdir HDFS: / / 192.168.0.10:9002 / hbase hbase. Master hmaster: 60000 hbase. Zookeeper. Quorum hmaster, hslave – 1, hslave – 2 hbase.zookeeper.property.dataDir /home/hadoop/zookeeper hbase.master.maxclockskew 150000 Hbase. Zookeeper. Property. ClientPort 2181 pay special attention here is that the hbase. Rootdir HDFS inside address is with Hadoop core – site. The inside of the XML fs. DefaultFS The HDFS IP address, domain name, and port must be the same. Followed by hbase. Zookeeper. Property. The dataDir, including a01513 is my operating system user name and change according to your own situation, or in other directories can too. Hbase. Cluster. Distributed is enable distributed mode, this must be true. Hbase.zookeeper. quorum Specifies the cluster IP address set or domain name set. The values are separated by commas (,). Hbase. master Set the hbase primary node port. The web port is 60010. You can check whether the access is successful using the webui. 3) Configure the RegionServers host list hslav-1 hslav-2 4) Set the node host list to hslav-1 hslav-2 Note: Synchronize the preceding configuration to all nodes in the cluster.
  3. Start stop HBase

    The HBase script is stored in ${HBASE_HOME}/bin.

    bin/start-hbase.sh

  4. Stop HBase

    bin/stop-hbase.sh

  5. If this problem occurs during startup, comment below

Wrong place

Annotated code

This error is time not synchronized Note: each node machine is configured

  1. Check whether the host is successfully started

Node 1 and node 2

Note: If the HRegionServer process on the node has a time synchronization problem, you can increase the time synchronization


hbase.master.maxclockskew

150000

  1. You can log in to hbase Shell and run commands to check hbase status

  2. The following figure shows that the query table is truly successful

  3. Port 16010 is displayed on the Web UI

Vi. Summary So far: The fully distributed Hadoop+Zookeeper+HBase cluster test environment has been basically completed successfully. If there is any problem, continue to study. Experienced a lot of pit, and a lot of firewall and Java path problems there is configuration file configuration information problem, there is a node machine configuration file synchronization problem!

If you need the original PDF file, please leave your email in the comments section