Introduction to the
The data model | Related database | Typical applications | advantage | disadvantage |
---|---|---|---|---|
key-value | Redis | The cache | A quick query | The stored data is unstructured |
Column family | Cassandra,Hbase | Distributed file systems, large-scale data storage | Easy to distribute | The limited |
document | Mongo,CouchDB | Easy to use | Poor scalability | |
figure | Neo4J | The social network | Using graph structure correlation algorithm | extensible |
In terms of NoSQL classification, Hbase and Cassandra are the same database and both are column family data types.
As for the comparison between HBAE and Cassandra, you can see why hbase is popular in China, while Cassandra is often used in foreign countries. , will not be repeated here.
The noun is introduced
Tables, rows, all of that is consistent with a relational database
Column family
As the name implies, a column family is a combination of columns. Wide-column, a data type, is implemented according to the BigTable model, which is a sparse, multidimensional structure map. Real storage is when a column family of data is stored together, rather than in a row, as in a relational database. So column families need to be defined in advance.
NoSQL – Mongo and Cassandra talk about NoSQL
region
A region is a range partition, a group of row keys. Region is automatically split. Generally, the size ranges from 1GB to 2GB. If the size exceeds the configured size, split the disk.
Deployment architecture
The Hbase deployment architecture is complex. For a distributed database, the cluster architecture typically has three roles: routing node, configuration information node, and shard data node. Some databases integrate all of these functions into the same node, which makes capacity expansion easier and fewer single points. If it is divided into different nodes, it will be troublesome to deploy, and also troublesome to expand capacity. Each part may need to be expanded. The advantage is that responsibility isolation will not cause the failure of the whole node due to coupling. The following describes the HBase cluster deployment architecture
Hbase Master
Hbase is an AP distributed database in master-slave mode. The Master is responsible for managing all RegsionServer, which is the configuration node role mentioned above. Record the Region Server to which data blocks HRegions belong. When RegionServer is added or offline, HRegion needs to be reassigned. For availability, the number of Master nodes must be greater than one to avoid single points of failure.
Region Server The Region Server reads and writes data. Data is stored in the memory. The Region Server interacts with the HDFS file system for I/O maintenance. HBase is a column family database. Column data is stored together. Different rows are distributed based on row keys and stored on different Region servers.
Generally, the capacity expansion is performed by adding the Region Server, because the Region Server is responsible for reading and writing data.
Zookeeper manages HMaster information
HDFS DataNode
Data storage and backup. One obvious benefit of storing data in HDFS is that there is no need to replicate data between nodes when the cluster Region Server is changed, increased or decreased. This greatly reduces the offline time and I/O consumption of nodes.
shard
The Hbase sharding policy is simple. The sharding policy is based on rowkeys. Each Region Server is responsible for a group of Rowkeys.
Data storage and maintenance
Data storage is similar to Cassandra. Log and memory are written first. Memory memstore is also an LSM tree, and then flush to disk.
When the HFile exceeds a certain size, the data is separated.
Reading and writing analysis
A read operation
Read operations are called 3 hops in Hbase and involve three roles in the Hbase cluster.
Meta Table HRegion metadata information is stored in. This information changes as regions increase or decrease in the META table.
Root Table Root table records META table information and is stored in the ZK.
Three hops are required to read Hbase
Routing information is cached to the client, reducing interaction between the client and Hbase nodes.
The write operation
Nothing complicated, just like Cassandra, I won’t go over it, okay
conclusion
The Hbase cluster deployment mode is similar to that of Mongo. So the 3 hops to read the data are similar. Single-node writes are similar to Cassandra.
reference
www.iteblog.com/archives/25…