The original address: www.awebone.com/posts/440e5…

Original author :Awebone

High availability HBase construction process

This section describes how to set up a ZooKeeper distributed cluster to detect when servers are online and offline. Then, set up a Hadoop HA cluster based on ZooKeeper to ensure high availability of the HDFS file system. Then, set up a high availability distributed HBase database based on the HIGH availability distributed HDFS file system.

Complete implementation of this process, to learn the distributed system on data consistency knowledge, in the system, can ensure the final consistency. I have mastered the construction and deployment of High availability clusters of ZooKeeper, Hadoop and HBase, and learned the application scenarios of databases based on the distributed file system HDFS.

System is introduced

The system builds a Hadoop HA cluster based on ZooKeeper and a high availability distributed HBase database cluster based on the distributed HDFS file system.

Hadoop distributed cluster adopts the master/slave architecture. In Hadoop HA cluster, ZooKeeper is used to solve SPOF single point of failure. If active Namenod goes down, a new active Namenode is elected from the remaining standby Namenodes, and the switch is instantaneous, so that the distributed cluster can still provide services as demand increases.

Similarly, distributed HBase databases adopt the master/slave architecture. When the Master server is down, the backup Master is switched to the backup Master instantly, making HBase highly available. Distributed HBase databases use the HDFS file system as the underlying storage.

Environment requirements and version selection

(1) Four Linux servers, which are Hadoop01, Hadoop02, Hadoop03 and Hadoop04, using Centos 6.8;

(2) Java uses JDK 1.8 version;

(3) ZooKeeper uses 3.4.10.

(4) Hadoop version 2.7.6;

(5) HBase uses version 1.2.6.

Cluster planning and architecture design

Both Hadoop and HBase work in master/slave mode and use their own load balancers. The following table describes the system cluster planning.

Hadoop01 Hadoop02 Hadoop03 Hadoop04
NameNode Square root Square root
DataNode Square root Square root Square root Square root
ResourceManager Square root Square root
NodeManager Square root Square root Square root Square root
JobHistoryServer Square root
ZooKeeper Square root Square root Square root
JournalNode Square root Square root Square root
Zkfc Square root Square root
HMaster Square root Square root
HRegionServer Square root Square root Square root Square root

The ZooKeeper cluster is installed

Configuration arrangement

When installing the ZooKeeper cluster, ensure that the number of nodes in the cluster is an odd number to facilitate the election of a leader. Here, Hadoop01, Hadoop02, Hadoop03 and Hadoop04 are all used as nodes of the ZooKeeper cluster, but the role of Hadoop04 node is fixed as observer. Observers are similar to followers, but they are used to expand ZooKeeper and do not change the original master/slave ownership of the cluster. They only accept requests and process them, without the right to vote or be elected as the leader.

The configuration steps

  1. Gz installation package Zookeeper-3.4.10.tar. gz

  2. Unpack the

    Tar -zxvf zookeeper-3.4.10.tar.gz -c ~/apps/

  3. Example Modify the configuration file vim zoo.cfg

    / / znode data storage system all nodes in the data storage directory dataDir = / home/hadoop/data/zkdata/server. 1 = hadoop01:2888:3888 server. 2 = hadoop02:2888-3888 server.3=hadoop03:2888:3888 server.4=hadoop04:2888:3888:observerCopy the code
  4. Create a myID file in /home/hadoop/data/zkdata/of each node, and directly store an ID value in the file

    hadoop01:echo 1 > myid
    
    hadoop02:echo 2 > myid
    
    hadoop03:echo 3 > myid
    Copy the code
  5. Configure the environment variable vim. bashrc

    Export ZOOKEEPER_HOME = / home/hadoop/apps/zookeeper - 3.4.10 export PATH = $PATH: $ZOOKEEPER_HOME/binCopy the code
  6. Start the

    Run the zkserver. sh start command on each node

  7. Client connection and shell operation

    Client connection: zkcli. sh -server hostname:2181

Hadoop HA cluster installation

HA design and configuration arrangements

Hadoop HA is implemented using shared storage and ZooKeeper. Here, Hadoop01 is the active node of NameNode, Hadoop02 is the standby node of NameNode, and Hadoop02 is the hot standby of Hadoop01. The metadata of NameNode is stored in the shared storage of the QJournal log system. Hadoop01, Hadoop02, Hadoop03, and Hadoop04 serve as datanodes and periodically send reports and heartbeats to NameNode. The ZKFC process in ZooKeeper monitors NameNode. When the NameNode active node loses its heartbeat, that is, when the node is down, it automatically switches over and activates the standby node to achieve high availability of the Hadoop cluster.

The configuration steps

  1. The installation package hadoop-2.6.tar. gz has been obtained

  2. Unpack the

    Tar -zxvf hadoop-2.7.tar. gz -c ~/apps/

  3. Modifying a Configuration File

    Hadoop – env. Sh file:

    Export JAVA_HOME = / home/hadoop/apps/jdk1.8.0 _73

    The core – site. XML file:

    <configuration><! HDFS nameservice myha01 --><property><name>fs.defaultFS</name><value>hdfs://myha/</value></property><! Hadoop temporary directory --><property><name>hadoop.tmp.dir</name><value>/home/hadoop/data/hadoopdata/</value></property><! -- Set zooKeeper address --><property><name>ha.zookeeper.quorum</name>  <value>hadoop01:2181,hadoop02:2181,hadoop03:2181,hadoop04:2181</value></property><! -- Hadoop link zookeeper timeout setting --><property><name>ha.zookeeper.session-timeout.ms</name><value>1000</value><description>ms</description></property>
    
    </configuration>
    Copy the code

    HDFS – site. XML file:

    <configuration><! -- Specify the number of copies --><property><name>dfs.replication</name><value>2</value></property><! -- Configure the working directory of namenode and datanode.<property><name>dfs.namenode.name.dir</name><value>/home/hadoop/data/hadoopdata/dfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>/home/hadoop/data/hadoopdata/dfs/data</value></property><! -- Enable webhdfs --><property><name>dfs.webhdfs.enabled</name><value>true</value></property><property><name>dfs.nameservices</name><value>myha</value></property><! Nn1, nn2 --><property><name>dfs.ha.namenodes.myha</name><value>nn1,nn2</value></property><! -- NN1 RPC address --><property><name>dfs.namenode.rpc-address.myha.nn1</name><value>hadoop01:9000</value></property><! -- Nn1 HTTP address --><property><name>dfs.namenode.http-address.myha.nn1</name><value>hadoop01:50070</value></property><! -- NN2 RPC address --><property><name>dfs.namenode.rpc-address.myha.nn2</name><value>hadoop02:9000</value></property><! -- HTTP address for Nn2 --><property><name>dfs.namenode.http-address.myha.nn2</name><value>hadoop02:50070</value></property><property><name>dfs.namenode.shared.edits.dir</name>
             <value>qjournal://hadoop01:8485; hadoop02:8485; hadoop03:8485/myha</value></property><! -- Specify the location where JournalNode stores data on the local disk --><property><name>dfs.journalnode.edits.dir</name><value>/home/hadoop/data/journaldata</value></property><! -- Enable NameNode automatic switchover --><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><! -- Automatic switch implementation mode when the configuration fails --><property><name>dfs.client.failover.proxy.provider.myha</name>
              <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><! -- Configure the isolation mechanism method, multiple mechanisms with a line break, i.e. each mechanism with a temporary line --><property><name>dfs.ha.fencing.methods</name><value>
     ​            sshfence
     ​            shell(/bin/true)
     ​        </value></property><! -- Sshfence isolation requires SSH login free --><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value></property><! Sshfence isolation timeout --><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value></property><property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>60000</value></property>
    
     </configuration>
    Copy the code

Mapred – site. XML file:

<configuration><! -- Set Mr Frame to YARN mode --><property><name>mapreduce.framework.name</name><value>yarn</value></property><! Mapreduce JobHistory address --><property><name>mapreduce.jobhistory.address</name><value>hadoop01:10020</value></property><! -- Web address of task history server --><property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop01:19888</value></property>
</configuration>
Copy the code

Yarn – sitem. XML file:

<configuration><! -- Enable RM high availability --><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><! -- Specify RM cluster ID --><property><name>yarn.resourcemanager.cluster-id</name><value>yrc</value></property><! -- Specify RM name --><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><! RM address --><property><name>yarn.resourcemanager.hostname.rm1</name><value>hadoop03</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>hadoop04</value></property><! Zk cluster address --><property><name>yarn.resourcemanager.zk-address</name><value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.log-aggregation-enable</name><value>true</value></property><property><name>yarn.log-aggregation.retain-seconds</name><value>86400</value></property><! -- Enable automatic recovery --><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value></property><! Store resourcemanager status information in the ZooKeeper cluster.<property><name>yarn.resourcemanager.store.class</name> 
         <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value></property>
</configuration>
Copy the code

Slaves file:

hadoop01

hadoop02

hadoop03

hadoop04
Copy the code
  1. Distribution: The configuration is performed on one node and distributed to other nodes with the same configuration information

    SCP - r hadoop - 2.7.6 hadoop02: ~ / apps /

    SCP - r hadoop - 2.7.6 hadoop03: ~ / apps /

    SCP - r hadoop - 2.7.6 hadoop04: ~ / apps /

  2. Configure the environment variable bashrc

    Export HADOOP_HOME = / home/hadoop/apps/hadoop - 2.7.6

    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  3. First startup

    Before starting the Hadoop HA cluster, start the ZooKeeper cluster. To find all nodes in the Qjournal system, run the hadoop-daemon.sh start journalnode command to find a primary HDFS node and run the hadoop namenode-format command. Copy all the data files in the working directory of the current namenode node to the corresponding directory in the remaining namenode nodes: SCP -r hadoopdata/ hadoop02:~/data/, select one of the namenode nodes and run HDFS ZKFC -formatzk to initialize ZKFC

  4. The official launch of

    Start HDFS: start-dfs.sh

    Start the YARN cluster: start-yarn.sh

    Run the mr-jobhistory-daemon.sh start historyserver command to start the Mr Historyserver

HBase HA cluster installation

Configuration arrangement

The HDFS distributed file system is used as the underlying data store of the HBase distributed database. ZooKeeper is used to coordinate the HMaster and Backup HMaster. When the HMaster loses its heartbeat and breaks down, Backup HMaster automatically switches to active to implement highly available distributed databases.

Hadoop01 and Hadoop04 are used as HMaster nodes, and Hadoop01, Hadoop02, Hadoop03, and Hadoop04 are used as HRegionServer nodes.

The configuration steps

  1. The installation package hbase – 1.2.6. Tar. Gz

  2. Unpack the

    Tar -zxvf hbase-1.2.6.tar.gz -c ~/apps/

  3. Modifying a Configuration File

    Hbase – env. Sh file:

    Export JAVA_HOME = / home/hadoop/jdk1.8.0 _73

    export HBASE_MANAGES_ZK=false

    Hbase – site. XML file:

    <property><! -- Specify the path where hbase is stored in HDFS --><name>hbase.rootdir</name><value>hdfs://myha/myhbase</value>
    </property>
    
    <property><! -- Hbase distributed --><name>hbase.cluster.distributed</name><value>true</value>
    </property>
    
    <property><! Zk address = zk address = zk address<name>hbase.zookeeper.quorum</name><value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    Copy the code

Regionservers file:

hadoop01

hadoop02

hadoop03

hadoop04
Copy the code

Backup – masters file:

hadoop01

  1. Put HDFS -site. XML and core-site. XML of Hadoop in hbase conf

    Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/core - site. XML.

    Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/HDFS - site. The XML.

  2. Send the hbase installation package to other nodes

    SCP - r hbase - 1.2.6 hadoop01: $PWD

    SCP - r hbase - 1.2.6 hadoop02: $PWD

    SCP - r hbase - 1.2.6 hadoop03: $PWD

    SCP - r hbase - 1.2.6 hadoop04: $PWD

  3. Start the

    start-hbase.sh

The system test

Run the JPS command on all four machines to see if the relevant processes are started.

Checking the ZooKeeper status

Run the zkserver. sh status command on the four servers to check the ZooKeeper status.

Hadoop HA show

HDFS startup process:

YARN startup process:

View rack perception status:

Web page view:

HBase HA show

HBase startup process:

Web page view:

HBase underlying storage information:

According to the requirements of the system, the architecture is designed to determine the architecture of one master and more slave. Then, according to the choice of the system environment and version compatibility, the version of the installation package in the final system is determined. Then, cluster planning is carried out on only four servers to make the load balance of each node as far as possible.