The original address: www.awebone.com/posts/440e5…
Original author :Awebone
High availability HBase construction process
This section describes how to set up a ZooKeeper distributed cluster to detect when servers are online and offline. Then, set up a Hadoop HA cluster based on ZooKeeper to ensure high availability of the HDFS file system. Then, set up a high availability distributed HBase database based on the HIGH availability distributed HDFS file system.
Complete implementation of this process, to learn the distributed system on data consistency knowledge, in the system, can ensure the final consistency. I have mastered the construction and deployment of High availability clusters of ZooKeeper, Hadoop and HBase, and learned the application scenarios of databases based on the distributed file system HDFS.
System is introduced
The system builds a Hadoop HA cluster based on ZooKeeper and a high availability distributed HBase database cluster based on the distributed HDFS file system.
Hadoop distributed cluster adopts the master/slave architecture. In Hadoop HA cluster, ZooKeeper is used to solve SPOF single point of failure. If active Namenod goes down, a new active Namenode is elected from the remaining standby Namenodes, and the switch is instantaneous, so that the distributed cluster can still provide services as demand increases.
Similarly, distributed HBase databases adopt the master/slave architecture. When the Master server is down, the backup Master is switched to the backup Master instantly, making HBase highly available. Distributed HBase databases use the HDFS file system as the underlying storage.
Environment requirements and version selection
(1) Four Linux servers, which are Hadoop01, Hadoop02, Hadoop03 and Hadoop04, using Centos 6.8;
(2) Java uses JDK 1.8 version;
(3) ZooKeeper uses 3.4.10.
(4) Hadoop version 2.7.6;
(5) HBase uses version 1.2.6.
Cluster planning and architecture design
Both Hadoop and HBase work in master/slave mode and use their own load balancers. The following table describes the system cluster planning.
Hadoop01 | Hadoop02 | Hadoop03 | Hadoop04 | |
---|---|---|---|---|
NameNode | Square root | Square root | ||
DataNode | Square root | Square root | Square root | Square root |
ResourceManager | Square root | Square root | ||
NodeManager | Square root | Square root | Square root | Square root |
JobHistoryServer | Square root | |||
ZooKeeper | Square root | Square root | Square root | |
JournalNode | Square root | Square root | Square root | |
Zkfc | Square root | Square root | ||
HMaster | Square root | Square root | ||
HRegionServer | Square root | Square root | Square root | Square root |
The ZooKeeper cluster is installed
Configuration arrangement
When installing the ZooKeeper cluster, ensure that the number of nodes in the cluster is an odd number to facilitate the election of a leader. Here, Hadoop01, Hadoop02, Hadoop03 and Hadoop04 are all used as nodes of the ZooKeeper cluster, but the role of Hadoop04 node is fixed as observer. Observers are similar to followers, but they are used to expand ZooKeeper and do not change the original master/slave ownership of the cluster. They only accept requests and process them, without the right to vote or be elected as the leader.
The configuration steps
-
Gz installation package Zookeeper-3.4.10.tar. gz
-
Unpack the
Tar -zxvf zookeeper-3.4.10.tar.gz -c ~/apps/
-
Example Modify the configuration file vim zoo.cfg
/ / znode data storage system all nodes in the data storage directory dataDir = / home/hadoop/data/zkdata/server. 1 = hadoop01:2888:3888 server. 2 = hadoop02:2888-3888 server.3=hadoop03:2888:3888 server.4=hadoop04:2888:3888:observerCopy the code
-
Create a myID file in /home/hadoop/data/zkdata/of each node, and directly store an ID value in the file
hadoop01:echo 1 > myid hadoop02:echo 2 > myid hadoop03:echo 3 > myid Copy the code
-
Configure the environment variable vim. bashrc
Export ZOOKEEPER_HOME = / home/hadoop/apps/zookeeper - 3.4.10 export PATH = $PATH: $ZOOKEEPER_HOME/binCopy the code
-
Start the
Run the zkserver. sh start command on each node
-
Client connection and shell operation
Client connection: zkcli. sh -server hostname:2181
Hadoop HA cluster installation
HA design and configuration arrangements
Hadoop HA is implemented using shared storage and ZooKeeper. Here, Hadoop01 is the active node of NameNode, Hadoop02 is the standby node of NameNode, and Hadoop02 is the hot standby of Hadoop01. The metadata of NameNode is stored in the shared storage of the QJournal log system. Hadoop01, Hadoop02, Hadoop03, and Hadoop04 serve as datanodes and periodically send reports and heartbeats to NameNode. The ZKFC process in ZooKeeper monitors NameNode. When the NameNode active node loses its heartbeat, that is, when the node is down, it automatically switches over and activates the standby node to achieve high availability of the Hadoop cluster.
The configuration steps
-
The installation package hadoop-2.6.tar. gz has been obtained
-
Unpack the
Tar -zxvf hadoop-2.7.tar. gz -c ~/apps/
-
Modifying a Configuration File
Hadoop – env. Sh file:
Export JAVA_HOME = / home/hadoop/apps/jdk1.8.0 _73
The core – site. XML file:
<configuration> <! HDFS nameservice myha01 --> <property> <name>fs.defaultFS</name> <value>hdfs://myha/</value> </property> <! Hadoop temporary directory --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/data/hadoopdata/</value> </property> <! -- Set zooKeeper address --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181,hadoop04:2181</value> </property> <! -- Hadoop link zookeeper timeout setting --> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>1000</value> <description>ms</description> </property> </configuration> Copy the code
HDFS – site. XML file:
<configuration> <! -- Specify the number of copies --> <property> <name>dfs.replication</name> <value>2</value> </property> <! -- Configure the working directory of namenode and datanode. <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/data/hadoopdata/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/data/hadoopdata/dfs/data</value> </property> <! -- Enable webhdfs --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.nameservices</name> <value>myha</value> </property> <! Nn1, nn2 --> <property> <name>dfs.ha.namenodes.myha</name> <value>nn1,nn2</value> </property> <! -- NN1 RPC address --> <property> <name>dfs.namenode.rpc-address.myha.nn1</name> <value>hadoop01:9000</value> </property> <! -- Nn1 HTTP address --> <property> <name>dfs.namenode.http-address.myha.nn1</name> <value>hadoop01:50070</value> </property> <! -- NN2 RPC address --> <property> <name>dfs.namenode.rpc-address.myha.nn2</name> <value>hadoop02:9000</value> </property> <! -- HTTP address for Nn2 --> <property> <name>dfs.namenode.http-address.myha.nn2</name> <value>hadoop02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01:8485; hadoop02:8485; hadoop03:8485/myha</value> </property> <! -- Specify the location where JournalNode stores data on the local disk --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/data/journaldata</value> </property> <! -- Enable NameNode automatic switchover --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <! -- Automatic switch implementation mode when the configuration fails --> <property> <name>dfs.client.failover.proxy.provider.myha</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <! -- Configure the isolation mechanism method, multiple mechanisms with a line break, i.e. each mechanism with a temporary line --> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <! -- Sshfence isolation requires SSH login free --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <! Sshfence isolation timeout --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> </configuration> Copy the code
Mapred – site. XML file:
<configuration>
<! -- Set Mr Frame to YARN mode -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<! Mapreduce JobHistory address -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<! -- Web address of task history server -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
Copy the code
Yarn – sitem. XML file:
<configuration>
<! -- Enable RM high availability -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<! -- Specify RM cluster ID -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<! -- Specify RM name -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<! RM address -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop04</value>
</property>
<! Zk cluster address -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<! -- Enable automatic recovery -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<! Store resourcemanager status information in the ZooKeeper cluster.
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
Copy the code
Slaves file:
hadoop01
hadoop02
hadoop03
hadoop04
Copy the code
-
Distribution: The configuration is performed on one node and distributed to other nodes with the same configuration information
SCP - r hadoop - 2.7.6 hadoop02: ~ / apps /
SCP - r hadoop - 2.7.6 hadoop03: ~ / apps /
SCP - r hadoop - 2.7.6 hadoop04: ~ / apps /
-
Configure the environment variable bashrc
Export HADOOP_HOME = / home/hadoop/apps/hadoop - 2.7.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-
First startup
Before starting the Hadoop HA cluster, start the ZooKeeper cluster. To find all nodes in the Qjournal system, run the hadoop-daemon.sh start journalnode command to find a primary HDFS node and run the hadoop namenode-format command. Copy all the data files in the working directory of the current namenode node to the corresponding directory in the remaining namenode nodes: SCP -r hadoopdata/ hadoop02:~/data/, select one of the namenode nodes and run HDFS ZKFC -formatzk to initialize ZKFC
-
The official launch of
Start HDFS: start-dfs.sh
Start the YARN cluster: start-yarn.sh
Run the mr-jobhistory-daemon.sh start historyserver command to start the Mr Historyserver
HBase HA cluster installation
Configuration arrangement
The HDFS distributed file system is used as the underlying data store of the HBase distributed database. ZooKeeper is used to coordinate the HMaster and Backup HMaster. When the HMaster loses its heartbeat and breaks down, Backup HMaster automatically switches to active to implement highly available distributed databases.
Hadoop01 and Hadoop04 are used as HMaster nodes, and Hadoop01, Hadoop02, Hadoop03, and Hadoop04 are used as HRegionServer nodes.
The configuration steps
-
The installation package hbase – 1.2.6. Tar. Gz
-
Unpack the
Tar -zxvf hbase-1.2.6.tar.gz -c ~/apps/
-
Modifying a Configuration File
Hbase – env. Sh file:
Export JAVA_HOME = / home/hadoop/jdk1.8.0 _73
export HBASE_MANAGES_ZK=false
Hbase – site. XML file:
<property> <! -- Specify the path where hbase is stored in HDFS --> <name>hbase.rootdir</name> <value>hdfs://myha/myhbase</value> </property> <property> <! -- Hbase distributed --> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <! Zk address = zk address = zk address <name>hbase.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> Copy the code
Regionservers file:
hadoop01
hadoop02
hadoop03
hadoop04
Copy the code
Backup – masters file:
hadoop01
-
Put HDFS -site. XML and core-site. XML of Hadoop in hbase conf
Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/core - site. XML.
Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/HDFS - site. The XML.
-
Send the hbase installation package to other nodes
SCP - r hbase - 1.2.6 hadoop01: $PWD
SCP - r hbase - 1.2.6 hadoop02: $PWD
SCP - r hbase - 1.2.6 hadoop03: $PWD
SCP - r hbase - 1.2.6 hadoop04: $PWD
-
Start the
start-hbase.sh
The system test
Run the JPS command on all four machines to see if the relevant processes are started.
Checking the ZooKeeper status
Run the zkserver. sh status command on the four servers to check the ZooKeeper status.
Hadoop HA show
HDFS startup process:
YARN startup process:
View rack perception status:
Web page view:
HBase HA show
HBase startup process:
Web page view:
HBase underlying storage information:
According to the requirements of the system, the architecture is designed to determine the architecture of one master and more slave. Then, according to the choice of the system environment and version compatibility, the version of the installation package in the final system is determined. Then, cluster planning is carried out on only four servers to make the load balance of each node as far as possible.