CenterOS7: hbase cluster environment
What is the hbase
HBase is a distributed column-oriented database based on the Hadoop file system. It is an open source project that is horizontally scalable. HBase is a data model similar to Google’s large table design that provides fast and random access to massive amounts of structured data. It takes advantage of the fault tolerance provided by Hadoop’s file system (HDFS). It is a Hadoop ecosystem that provides random real-time read/write access to data and is part of the Hadoop file system. People can store HDFS data directly or through HBase. Read consumption/random access data in the HDFS using HBase. HBase is on top of Hadoop file systems and provides read and write access
HBase and cluster.
HDFS | HBase |
---|---|
HDFS is a distributed file system suitable for storing large-capacity files | HBase is a database based on the HDFS. |
HDFS does not support fast individual record lookup | HBase provides quick lookup in large tables |
It provides high latency batch processing; There is no batch concept | It provides billions of records with low latency access to a single row record (random access). |
It provides data that can only be accessed sequentially | HBase uses hash tables to provide random access and stores indexes to quickly search for data in HDFS files. |
Hbase data model
Row Key: the primary Key of a Table. Records in the Table are sorted by Row Key
Timestamp: Timestamp. The Timestamp corresponding to each data operation can be regarded as the version number of the data
The Column Family: Column cluster. The Table is composed of one or more Column families in the horizontal direction. A Column Family can be composed of any number of columns, that is, the Column Family supports dynamic expansion without pre-defining the number and type of columns. All columns are stored in binary format. You need to convert them by yourself.
Ok, so you may be confused by the concept, so I will directly lead you to build an environment and do some simple applications to experience the use of hbase.
Environment set up
You can download the hbase software package by yourself. I provide hBase-1.3.1 version for you. Click download
Next, put the downloaded software into your own directory on centerOS7. Mine is under /home/mmcC. Then go to the directory and decompress the files
cd/ home/MMCC tar - ZXVF hbase - 1.3.1 - bin. Tar. GzCopy the code
Then configure the hbase-env.sh file and go to the hbase root directory
Vi /conf/hbase-env.sh // Edit the environment variable configuration scriptexportJAVA_HOME=/home/ MMCC /jdk1.8 // Configure Java environment variables, replace the hash mark with your own JDK pathexport HBASE_MANAGES_ZK=true// Use the built-in ZooKeeperexportHBASE_PID_DIR=/home/ MMCC /hbase-1.3.1/hbase_tmp // Changes the path for saving pid filesCopy the code
Then configure the hbse-site. XML file
XML <property> <name>hbase.rootdir</name> // Set the hbase root directory on the HDFS <value> HDFS ://master:9000/hbase</value> < / property > < property > < name >. Hbase cluster. Distributed < / name > / / whether the open cluster < value >true</value> </property> <property> <name>hbase.zookeeper.quorum</name> // Set the ZooKeeper cluster node <value>master,slave1,slave2</value> </property> <property> <name>hbase.tmp.dir</name> // Sets the temporary file directory <value>/home/mmcc/hbase-1.3.1/data/ TMP </value> </property> <property> // Set the web management interface port< name>hbase.master.info.port</name> <value>60010</value> </property>Copy the code
When configuring the ZooKeeper cluster node, add master:port if ZooKeeper has a separate port number. The default port number is master:2181, which can be omitted by default
Then configure the hbase cluster node service
vi /conf/regionservers
slave1
slave2
Copy the code
Configure all zooKeeper service nodes
The hbase environment of the master node has been configured, but this is a standalone version. You need to create a cluster. You can copy hbase directly to different nodes in the following ways
SCP -r /home/mmcc/hbase-1.3.1 slave1/slave1 NODE IP address :/home/mmcc SCP -r /home/mmcc/hbase-1.3.1 slave2/slave2 node IP address :/home/mmccCopy the code
Copy all files in /hbase-1.3.1 to /home/mmcc on Slave1. Since the VM is created by cloning, the files and directories are the same
Configure the bin directory of hbase to environment variables on each node
Vi/etc/profile HBASE_HOME = / home/MMCC/hbase - 1.3.1 PATH =$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HBASE_HOME/bin:$PATH:.
source/etc/profiler// enables environment variables to take effectCopy the code
Start the cluster
Start-hbase. sh // The Hadoop cluster service has been startedCopy the code
An internal ZooKeeper cluster is started and the master and node services are started. Once started, take a look at the current service through JPS
You can see that the cluster has been started successfully.
Now we are using our own ZooKeeper as the coordination service, so the startup steps are as follows
- Starting a Hadoop Cluster
start-all.sh
- Starting an hbase Cluster
start-hbase.sh
Stop the step
- Stopping an hbase Cluster
stop-hbase.sh
- Stopping the Hadoop Cluster
stop-all.sh
Custom zookeeper
When configuring the hbase-env. Sh file, set export HBASE_MANAGES_ZK=false to false, indicating the cluster that uses customized ZooKeeper. If the zooKeeper cluster port is configured separately, the default port is not 2181. The zooKeeper cluster must be configured with ports in hbase-site. XML
< property > < name > hbase. Zookeeper. Quorum < / name > / / set the zookeeper cluster node < value > master: the port, slave1: port, slave2: port value > < / </property>Copy the code
Use custom ZooKeeper cluster startup sequence
- Starting a Hadoop Cluster
- Starting the ZK Cluster (ZooKeeper)
- Starting an hbase Cluster
Stop the step
- Stopping an hbase Cluster
- Stopping the ZK Cluster (ZooKeeper)
- Stopping the Hadoop Cluster
The setup and use of a ZK cluster will be covered in a later section