Introduction of Hbase

HBase is a column-oriented database management system that runs on top of Hadoop Distributed File System (HDFS). HBase is a database management system running on the HADOOP Distributed File System (HDFS). Note HBase does not store databases in columns. Each column cluster is an HFiel.

  • Support random CRUD, billions of data second response.
  • A database for distributed storage
  • No database
  • Data can be retrieved only by row key and range of primary key
  • Massive data storage: A table can have billions of rows and millions of columns. In contrast to relational databases, there are generally no more than 30 columns and no more than 5 million tables

Actual scenarios: transportation GPS, finance (withdrawal information/consumption information, etc.), e-commerce (browsing log information)

The basic concept


The row key must be unique across the table. Updating the row key overwrites the table. To insert data, specify a row key.

Extension:

Rowkeys are sorted lexicographically. Rows with similar Rowkeys are always stored in similar locations. If a large number of read and write operations are concentrated in a RowKey range, Reginon hot spots will occur, and some data will be stored and queried on an HRegionServer. The other HRegionServers are idle. Shatter RowKey: Salt + hash, invert fixed format value

\

Each column cluster represents a folder. Multiple columns form a column cluster. Mysql > select * from ‘mysql’;

A high table is horizontally cut into each segment, called region, or slice of the table. An Hbase table has only one region when it is first created.

A store is the data content that is actually stored.

If the Time of Linux and Windows is not the same, the deletion may not be able to be deleted, and invisible “supernatural events” may be inserted. (It is implemented based on the latest version of the timestamp).

The data model


1)Name Space: similar to DatabBase. HBase has two built-in namespaces, HBase and default. HBase stores built-in HBase tables. The default table is the default namespace used by users.

2)Region: similar to a Table. The difference is that HBase only needs to declare column families when defining tables.

3)Row: Each Row of data consists of a RowKey and multiple columns. Data is stored in dictionary order according to rowkeys and can only be retrieved based on rowkeys.

4)Column: Each Column in HBase is qualified by Column Family and Column Qualifier.

5)Time Stamp: used to identify different versions of data.

6)Cell: a unique Cell identified by {Rowkey, Column Family: Column Qualifier, time Stamp}. The type is byte array.

Underlying storage model:

  • Different column families exist in different files (the two tables above represent different Hfiles);
  • The entire data is lexicographically sorted by Rowkey;
  • Each column of data is stored in KV form in the underlying HFile.
  • In the same row of data, if the column family is the same, the data is placed together sequentially.

Principle of CRUD

The modification operation is actually putting a new piece of data based on the timestamp.

The query operation returns the data with the largest timestamp based on the rowkey.

The delete operation actually puts a piece of data whose operation type is DEL. (If the maximum timestamp operation type of the row key is DEL, no data is returned and the user feels that the row key has been deleted.)

Rowkey, RF (column cluster), and RN/Rq (column name/column qualifier) constitute unique keys.

HBASE architecture

1)Region Server

RegionServer Is the Region manager and its implementation class is HRegionServer. It provides the following functions: Data operations: Get, PUT, and delete. The client processes data read/write requests. When requesting data, the client interacts with the Region Server. ;

Operations on regions include splitRegion and compactRegion.

2)Master

Master is the administrator of all Region servers. Its implementation class is HMaster. Master performs the following operations: Creates, deletes, and alters tables, allocates regions, and creates and deletes tables DDL.

RegionServer operations: Allocate regions to each RegionServer, monitor the status of each RegionServer, load balancing, and failover.

3)Zookeeper

HBase uses Zookeeper to implement high availability of the Master, RegionServer monitoring, metadata entry, and cluster configuration maintenance. As part of HDFS, HBase maintains cluster status.

4)HDFS

HDFS provides basic data storage services for HBase and high availability support for HBase.

How to select database?


I am small hao elder brother study programming, 21 years school recruit some technology company SP, pay attention to me, help you Internet less detour ~!