Hbase: Set up a distributed cluster environment

The original address: www.awebone.com/posts/440e5…

Original author :Awebone

High availability HBase construction process

This section describes how to set up a ZooKeeper distributed cluster to detect when servers are online and offline. Then, set up a Hadoop HA cluster based on ZooKeeper to ensure high availability of the HDFS file system. Then, set up a high availability distributed HBase database based on the HIGH availability distributed HDFS file system.

Complete implementation of this process, to learn the distributed system on data consistency knowledge, in the system, can ensure the final consistency. I have mastered the construction and deployment of High availability clusters of ZooKeeper, Hadoop and HBase, and learned the application scenarios of databases based on the distributed file system HDFS.

System is introduced

The system builds a Hadoop HA cluster based on ZooKeeper and a high availability distributed HBase database cluster based on the distributed HDFS file system.

Hadoop distributed cluster adopts the master/slave architecture. In Hadoop HA cluster, ZooKeeper is used to solve SPOF single point of failure. If active Namenod goes down, a new active Namenode is elected from the remaining standby Namenodes, and the switch is instantaneous, so that the distributed cluster can still provide services as demand increases.

Similarly, distributed HBase databases adopt the master/slave architecture. When the Master server is down, the backup Master is switched to the backup Master instantly, making HBase highly available. Distributed HBase databases use the HDFS file system as the underlying storage.

Environment requirements and version selection

(1) Four Linux servers, which are Hadoop01, Hadoop02, Hadoop03 and Hadoop04, using Centos 6.8;

(2) Java uses JDK 1.8 version;

(3) ZooKeeper uses 3.4.10.

(4) Hadoop version 2.7.6;

(5) HBase uses version 1.2.6.

Cluster planning and architecture design

Both Hadoop and HBase work in master/slave mode and use their own load balancers. The following table describes the system cluster planning.

	Hadoop01	Hadoop02	Hadoop03	Hadoop04
NameNode	Square root	Square root
DataNode	Square root	Square root	Square root	Square root
ResourceManager			Square root	Square root
NodeManager	Square root	Square root	Square root	Square root
JobHistoryServer	Square root
ZooKeeper	Square root	Square root	Square root
JournalNode	Square root	Square root	Square root
Zkfc	Square root	Square root
HMaster	Square root			Square root
HRegionServer	Square root	Square root	Square root	Square root

The ZooKeeper cluster is installed

Configuration arrangement

When installing the ZooKeeper cluster, ensure that the number of nodes in the cluster is an odd number to facilitate the election of a leader. Here, Hadoop01, Hadoop02, Hadoop03 and Hadoop04 are all used as nodes of the ZooKeeper cluster, but the role of Hadoop04 node is fixed as observer. Observers are similar to followers, but they are used to expand ZooKeeper and do not change the original master/slave ownership of the cluster. They only accept requests and process them, without the right to vote or be elected as the leader.

The configuration steps

Gz installation package Zookeeper-3.4.10.tar. gz
Unpack the

Tar -zxvf zookeeper-3.4.10.tar.gz -c ~/apps/

Example Modify the configuration file vim zoo.cfg

/ / znode data storage system all nodes in the data storage directory dataDir = / home/hadoop/data/zkdata/server. 1 = hadoop01:2888:3888 server. 2 = hadoop02:2888-3888 server.3=hadoop03:2888:3888 server.4=hadoop04:2888:3888:observerCopy the code

Create a myID file in /home/hadoop/data/zkdata/of each node, and directly store an ID value in the file
```
hadoop01:echo 1 > myid

hadoop02:echo 2 > myid

hadoop03:echo 3 > myid
Copy the code
```

Configure the environment variable vim. bashrc

Export ZOOKEEPER_HOME = / home/hadoop/apps/zookeeper - 3.4.10 export PATH = $PATH: $ZOOKEEPER_HOME/binCopy the code

Start the

Run the zkserver. sh start command on each node
Client connection and shell operation

Client connection: zkcli. sh -server hostname:2181

Hadoop HA cluster installation

HA design and configuration arrangements

Hadoop HA is implemented using shared storage and ZooKeeper. Here, Hadoop01 is the active node of NameNode, Hadoop02 is the standby node of NameNode, and Hadoop02 is the hot standby of Hadoop01. The metadata of NameNode is stored in the shared storage of the QJournal log system. Hadoop01, Hadoop02, Hadoop03, and Hadoop04 serve as datanodes and periodically send reports and heartbeats to NameNode. The ZKFC process in ZooKeeper monitors NameNode. When the NameNode active node loses its heartbeat, that is, when the node is down, it automatically switches over and activates the standby node to achieve high availability of the Hadoop cluster.

The configuration steps

The installation package hadoop-2.6.tar. gz has been obtained
Unpack the

Tar -zxvf hadoop-2.7.tar. gz -c ~/apps/

Modifying a Configuration File

Hadoop – env. Sh file:

Export JAVA_HOME = / home/hadoop/apps/jdk1.8.0 _73

The core – site. XML file:

<configuration>

    <! HDFS nameservice myha01 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://myha/</value>
    </property>

    <! Hadoop temporary directory -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/data/hadoopdata/</value>
    </property>

    <! -- Set zooKeeper address -->
    <property>
        <name>ha.zookeeper.quorum</name>  <value>hadoop01:2181,hadoop02:2181,hadoop03:2181,hadoop04:2181</value>
    </property>

    <! -- Hadoop link zookeeper timeout setting -->
    <property>
        <name>ha.zookeeper.session-timeout.ms</name>
        <value>1000</value>
        <description>ms</description>
    </property>

</configuration>
Copy the code

HDFS – site. XML file:

<configuration>

     <! -- Specify the number of copies -->
     <property>
         <name>dfs.replication</name>
         <value>2</value>
     </property>

     <! -- Configure the working directory of namenode and datanode.
     <property>
         <name>dfs.namenode.name.dir</name>
         <value>/home/hadoop/data/hadoopdata/dfs/name</value>
     </property>

     <property>
         <name>dfs.datanode.data.dir</name>
         <value>/home/hadoop/data/hadoopdata/dfs/data</value>
     </property>

     <! -- Enable webhdfs -->
     <property>
         <name>dfs.webhdfs.enabled</name>
         <value>true</value>
     </property>

     <property>
         <name>dfs.nameservices</name>
         <value>myha</value>
     </property>

     <! Nn1, nn2 -->

     <property>
         <name>dfs.ha.namenodes.myha</name>
         <value>nn1,nn2</value>
     </property>

     <! -- NN1 RPC address -->

     <property>
         <name>dfs.namenode.rpc-address.myha.nn1</name>
         <value>hadoop01:9000</value>
     </property>

     <! -- Nn1 HTTP address -->
     <property>
         <name>dfs.namenode.http-address.myha.nn1</name>
         <value>hadoop01:50070</value>
     </property>

     <! -- NN2 RPC address -->

     <property>
         <name>dfs.namenode.rpc-address.myha.nn2</name>
         <value>hadoop02:9000</value>
     </property>

     <! -- HTTP address for Nn2 -->

     <property>
         <name>dfs.namenode.http-address.myha.nn2</name>
         <value>hadoop02:50070</value>
     </property>

     <property>
         <name>dfs.namenode.shared.edits.dir</name>
         <value>qjournal://hadoop01:8485; hadoop02:8485; hadoop03:8485/myha</value>
     </property>

     <! -- Specify the location where JournalNode stores data on the local disk -->
     <property>
         <name>dfs.journalnode.edits.dir</name>
         <value>/home/hadoop/data/journaldata</value>
     </property>

     <! -- Enable NameNode automatic switchover -->
     <property>
         <name>dfs.ha.automatic-failover.enabled</name>
         <value>true</value>
     </property>

     <! -- Automatic switch implementation mode when the configuration fails -->
     <property>
         <name>dfs.client.failover.proxy.provider.myha</name>
          <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <! -- Configure the isolation mechanism method, multiple mechanisms with a line break, i.e. each mechanism with a temporary line -->
     <property>
         <name>dfs.ha.fencing.methods</name>
         <value>
             sshfence
             shell(/bin/true)
         </value>
     </property>

     <! -- Sshfence isolation requires SSH login free -->
     <property>
         <name>dfs.ha.fencing.ssh.private-key-files</name>
         <value>/home/hadoop/.ssh/id_rsa</value>
     </property>

     <! Sshfence isolation timeout -->
     <property>
         <name>dfs.ha.fencing.ssh.connect-timeout</name>
         <value>30000</value>
     </property>

     <property>
         <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
         <value>60000</value>
     </property>

 </configuration>
Copy the code

Mapred – site. XML file:

<configuration>
    <! -- Set Mr Frame to YARN mode -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <! Mapreduce JobHistory address -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>

    <! -- Web address of task history server -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>
</configuration>
Copy the code

Yarn – sitem. XML file:

<configuration>
    <! -- Enable RM high availability -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <! -- Specify RM cluster ID -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yrc</value>
    </property>

    <! -- Specify RM name -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <! RM address -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop03</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop04</value>
    </property>

    <! Zk cluster address -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>

    <! -- Enable automatic recovery -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <! Store resourcemanager status information in the ZooKeeper cluster.
    <property>
        <name>yarn.resourcemanager.store.class</name> 
         <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
</configuration>
Copy the code

Slaves file:

hadoop01

hadoop02

hadoop03

hadoop04
Copy the code

Distribution: The configuration is performed on one node and distributed to other nodes with the same configuration information

SCP - r hadoop - 2.7.6 hadoop02: ~ / apps /

SCP - r hadoop - 2.7.6 hadoop03: ~ / apps /

SCP - r hadoop - 2.7.6 hadoop04: ~ / apps /
Configure the environment variable bashrc

Export HADOOP_HOME = / home/hadoop/apps/hadoop - 2.7.6

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
First startup

Before starting the Hadoop HA cluster, start the ZooKeeper cluster. To find all nodes in the Qjournal system, run the hadoop-daemon.sh start journalnode command to find a primary HDFS node and run the hadoop namenode-format command. Copy all the data files in the working directory of the current namenode node to the corresponding directory in the remaining namenode nodes: SCP -r hadoopdata/ hadoop02:~/data/, select one of the namenode nodes and run HDFS ZKFC -formatzk to initialize ZKFC
The official launch of

Start HDFS: start-dfs.sh

Start the YARN cluster: start-yarn.sh

Run the mr-jobhistory-daemon.sh start historyserver command to start the Mr Historyserver

HBase HA cluster installation

Configuration arrangement

The HDFS distributed file system is used as the underlying data store of the HBase distributed database. ZooKeeper is used to coordinate the HMaster and Backup HMaster. When the HMaster loses its heartbeat and breaks down, Backup HMaster automatically switches to active to implement highly available distributed databases.

Hadoop01 and Hadoop04 are used as HMaster nodes, and Hadoop01, Hadoop02, Hadoop03, and Hadoop04 are used as HRegionServer nodes.

The configuration steps

The installation package hbase – 1.2.6. Tar. Gz
Unpack the

Tar -zxvf hbase-1.2.6.tar.gz -c ~/apps/

Modifying a Configuration File

Hbase – env. Sh file:

Export JAVA_HOME = / home/hadoop/jdk1.8.0 _73

export HBASE_MANAGES_ZK=false

Hbase – site. XML file:

<property>
    <! -- Specify the path where hbase is stored in HDFS -->
    <name>hbase.rootdir</name>
    <value>hdfs://myha/myhbase</value>
</property>

<property>
    <! -- Hbase distributed -->
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>

<property>
    <! Zk address = zk address = zk address
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
Copy the code

Regionservers file:

hadoop01

hadoop02

hadoop03

hadoop04
Copy the code

Backup – masters file:

hadoop01

Put HDFS -site. XML and core-site. XML of Hadoop in hbase conf

Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/core - site. XML.

Cp/home/hadoop/hadoop - 2.7.6 / etc/hadoop/HDFS - site. The XML.
Send the hbase installation package to other nodes

SCP - r hbase - 1.2.6 hadoop01: $PWD

SCP - r hbase - 1.2.6 hadoop02: $PWD

SCP - r hbase - 1.2.6 hadoop03: $PWD

SCP - r hbase - 1.2.6 hadoop04: $PWD
Start the

start-hbase.sh

The system test

Run the JPS command on all four machines to see if the relevant processes are started.

Checking the ZooKeeper status

Run the zkserver. sh status command on the four servers to check the ZooKeeper status.

Hadoop HA show

HDFS startup process:

YARN startup process:

View rack perception status:

Web page view:

HBase HA show

HBase startup process:

Web page view:

HBase underlying storage information:

According to the requirements of the system, the architecture is designed to determine the architecture of one master and more slave. Then, according to the choice of the system environment and version compatibility, the version of the installation package in the final system is determined. Then, cluster planning is carried out on only four servers to make the load balance of each node as far as possible.

Hbase: Set up a distributed cluster environment

High availability HBase construction process

System is introduced

Environment requirements and version selection

Cluster planning and architecture design

The ZooKeeper cluster is installed

Configuration arrangement

The configuration steps

Hadoop HA cluster installation

HA design and configuration arrangements

The configuration steps

HBase HA cluster installation

Configuration arrangement

The configuration steps

The system test

Checking the ZooKeeper status

Hadoop HA show

HBase HA show

Related Posts

JVM practices: Memory allocation and GC

Handhold takes you from zero encapsulation Gin framework (III) : log initialization

MySQL > select * from b-tree