HBase: Basic

Summary of HBase

Hadoop Database (HBase) is a highly reliable, high-performance, column-oriented, scalable KV distributed Database. The HBase technology can be used to build large-scale structured storage clusters on cheap PCS.

HBase Uses HDFS, similar to GFS, as the underlying file storage system. On HDFS, MapReduce can be run to process data in batches. ZooKeeper is used as a collaborative service component.

The development history

Google “Three papers” — GFS, MapReduce, BigTable.

GFS: (The Google File System) : Reveals how to store huge amounts of data on lots of cheap machines.
MapReduce (Simple data Processing on Large Clusters) : This paper discusses how to stably realize super-large parallel data Processing on the basis of a Large number of cheap machines.
BigTable: A Distributed StorageSystem for Structured Data is used to store massive Structured Data and efficiently read and write Data within Google.

Apache HBase was originally developed by Powerset to process massive data generated by natural language search. It was initiated by Chad Walters and Jim Kellerman. After two years of development, Apache HBase was listed as a top project by the Apache Foundation. At the same time become a very active and influential project.

In February 2015, the community released version 1.0.0, which standardized the version number of Hbase. Since then, all version numbers follow the semanticversioning semantics, as shown below:

It can be understood that in the case of the same MAJOR and MINOR, the larger the PATCH version, the more reliable the system.

Hbase pros and cons

advantages

Large capacity: A single HBase table supports data of hundreds of billions of rows or millions of columns, and the data capacity can reach TB or PB levels. For traditional relational databases such as Oracle and MySQL, if the number of rows in a single table exceeds 100 million, the READ/write performance deteriorates rapidly. HBase does not have such a problem.
Good scalability: HBase clusters can easily expand cluster capacity, including data storage nodes and read/write service nodes. The HBase underlying data store depends on the HADOOP Distributed File System (HDFS). The HDFS can be expanded by adding Datanodes. The HBase read/write service node can be expanded by adding RegionServer nodes.
Sparsity: HBase supports a large number of sparse storage. That is, a large number of column values are allowed to be empty and do not occupy any storage space. This is different from the traditional database, which takes up a certain amount of storage space for the processing of null values, which causes a certain amount of storage space waste. As a result, HBase can be used to store up to a million columns of data, and no additional space is required even if a large number of null values exist in the table.
High performance: HBase specializes in OLTP scenarios and provides strong data write performance. It also ensures the performance of random single point reads (SSO) and small-range scan reads (READ). For a wide range of scan reads, you can use the APIS provided by MapReduce to achieve more efficient parallel scanning.
Multiple versions: HBase supports multiple versions. One KV can retain multiple versions. Users can select the latest version or a historical version as required.
Hadoop native support: HBase is a core member of the Hadoop ecosystem, and many ecological components can be directly connected to it.

disadvantages

HBase does not support complex aggregation operations, such as Join and GroupBy. If aggregation computing is required, install Phoenix or Spark on HBase. The former applies to small-scale OLTP and the latter applies to large-scale OLAP.
HBase does not provide the secondary index function. Therefore, HBase does not support secondary index search. Fortunately, HBase provides various third-party secondary index schemes. For example, Phoenix is widely used to provide secondary index.
HBase native supports only the single-row transaction model, but does not support global inter-row transactions. Again, you can use the global transaction model component provided by Phoenix to compensate for this HBase deficiency.

The data model

Logical view

Before understanding logical views, it is necessary to take a look at basic concepts in HBase.

Table: A table contains multiple rows of data.
RowKey: The primary key for each record in a table;
Column Family: Column Family. The table is horizontally cut and referred to as CF.
Column: belongs to a Column family. Columns can be dynamically added.
Version Number: The type is Long. The default value is the system timestamp and can be customized by users.
Value: indicates real data.
Region: a collection of data;

Physical view

Principle of HBase

LSM

TTDO

The overall architecture

HBase client

The HBase Client provides Shell command-line interface (CLI), native Java API, Thrift/REST API, and MapReduce API
The HBase client supports all common DML and DDL operations, such as adding, deleting, modifying, and querying data and maintaining tables. The Thrift/REST API is used to support non-Java upper-layer service requirements, and the MapReduce interface is used to import and read data in batches.

Before accessing a data row, the HBase client uses the metadata table to locate the RegionServer where the target data resides and then sends a request to the RegionServer. At the same time, the metadata is cached locally on the client for future access. If the clustered RegionServer breaks down or performs load balancing, data fragments migrate, and the client requests the latest metadata again and stores it locally.

ZooKeeper

To achieve Master high availability: Usually, only one Master works in the system. Once the ActiveMaster breaks down due to abnormal conditions, ZooKeeper will detect the event and elect a new Master through a certain mechanism to ensure the normal operation of the system.
Manage system core metadata: For example, manage the RegionServer set that works properly in the current system and save the system metadata table hbase: Address of the RegionServer where Meta resides.
RegionServer downtime Recovery: ZooKeeper detects whether RegionServer is down through heartbeat and notifies the Master of RegionServer downtime.
Distributed table lock: In HBase, table locks must be added before various management operations (such as ALTER operations) are performed on a table to prevent table status inconsistency caused by other user management operations on the same table. Unlike other RDBMS tables, HBase tables are usually stored in distributed storage. ZooKeeper can implement distributed table locks through specific mechanisms.

Master

Handle various management requests from users, including creating tables, modifying tables, operating permissions, splitting tables, merging data shards, and Compaction.
Manage all RegionServers in a cluster, including Region load balancing, RegionServer recovery, and Region migration.
To clear expired logs and files, the Master checks whether hlogs and hfiles in the HDFS are expired and deleted at intervals.

RegionServer

RegionServer is used to respond to USER I/O requests. RegionServer is the core module of HBase and consists of WAL(HLog), BlockCache, and multiple regions.

WAL(HLog) : HLog provides two core functions in HBase. First, HLog implements high data reliability. When HBase data is written randomly, it is first written to the cache and then asynchronously refreshed and dropped from the HFile. To prevent cache data loss, data must be written to the cache in sequence before being written to the cache. In this way, even if cache data is lost, it can still be recovered through HLog logs. Second, it implements primary/secondary replication between HBase clusters by playing back HLog logs pushed by the primary cluster.
BlockCache: read cache in the HBase system. After the client reads data from the disk, it usually caches the data to the system memory. The subsequent access to the same row of data can be directly obtained from the memory without accessing the disk.
Region: A fragment of a data table. When the size of a data table exceeds a certain threshold, the data table is split horizontally into two regions. Region is the basic unit of cluster load balancing. Generally, regions of a table are distributed on multiple RegionServers in the cluster. A RegionServer manages multiple regions from different data tables.
Store: a Region consists of one or more stores. The number of stores depends on the number of column families in the table. A Region has as many stores as a columnfamily. In HBase, data in each column cluster is stored together to form a storage unit Store. Therefore, you are advised to set data with the same I/O features in the same column cluster.
MemStore & HFile: MemStore is called write cache. When users write data to MemStore, they write to MemStore first. When MemStore is full (cache data exceeds the threshold, default 128M), the system asynchronously fuses the data into a LUSH HFile. Obviously, as data continues to be written, the number of HFile files will increase. When the number of HFile files exceeds a certain threshold, the system will perform Compact operation and merge these small files into one or more large files according to a certain policy.

HDFS

HBase relies on the HDFS component to store actual data, including user data files and HLog logs. HDFS is one of the most mature components in the Hadoop ecosystem. The default three-copy data storage policy ensures high data reliability. An HDFS client component named DFSClient is encapsulated in HBase to read and write HDFS data.

HBase Read/write Process

HBase Region to find

The client searches for the written table and Rowkey in the metadata cache. If the RegionServer and Region where the Rowkey resides can be found, the client can directly send a write request (carrying Region information) to the target RegionServer.
If the rowkey information is not found in the client cache, search the RegionServer where the hbase metadata table resides on the /hbase-root/ meta-region-Server node on ZooKeeper. Sends a query request to hbase: Meta’s RegionServer to search for RowKey’s RegionServer and Region information in the metadata table. After receiving the returned result, the client caches the result locally for future use.
The client sends the write request to the target RegionServer based on the rowKey metadata. After receiving the request, the RegionServer parses the Region information, finds the Region object, and writes the data to the MemStore of the target Region.

Writing process

1) Client processing phase: The client preprocesses the user’s write request, locates the RegionServer where the written data resides based on the cluster metadata, and sends the request to the corresponding RegionServer.

2) Region write phase: RegionServer parses the data after receiving the write request, writes the data first to WAL and then to MemStore of the corresponding Region cluster.

3) MemStore Flush phase: When the MemStore capacity in the Region exceeds a certain threshold, the system asynchronously Flush the MemStore data in the Region into an HFile.

Reading process

The HBase data reading process is more complex than the write process. There are two main reasons:

An HBase range query may involve multiple regions, cache blocks, or even data store files.
Update and delete operations in HBase are simple. The update operation does not update the original data, but implements multiple versions using the timestamp attribute. The deletion did not actually delete the original data, but only inserted a tag marked “deleted”, which occurred when the system performed Major Compact asynchronously.

Obviously, this approach greatly simplifies the process of updating and deleting data, but when it comes to reading data, there are many constraints: the reading process needs to be filtered by version, as well as the data that has been marked for deletion.

HBase Compaction

HBase compactions are classified into two types based on the size of a consolidation: Minor Compaction and MajorCompaction.

A Minor Compaction occurs when a small, neighboring HFile is taken and consolidated into a larger HFile.
When a Major Compaction occurs, it compiles all hfiles in a Store into one HFile. This process also compiles data that has been deleted, TTL data that has expired, and data whose version number exceeds the set version number.

When a Major Compaction occurs, it takes a long time to conduct a Major Compaction that consumes a large amount of system resources and affects upper-layer operations. It is recommended that a Major Compaction is triggered manually when operations with a large amount of data occur during peak times.

This Compaction does the following:

Merge small files to reduce the number of files and stabilize random read latency.
Improve data localization rate.
Clear invalid data to reduce data storage capacity.

When a Compaction occurs, MemStoreFlush, periodic check of background threads, and manual Compaction occur.

HBase Hot Issues

Hot spots are those that occur when a large number of clients directly access one or a small number of nodes in the cluster (the access may be read, write, or other operations). A large number of accesses may cause the capacity of the machine where the hotspot Region resides to exceed its capacity, causing performance deterioration and even Region unavailability. Other regions on the same RegionServer may also be affected.

The reasons causing

By default, an HBase table has only one partition
The RowKey is improperly designed

The solution

Specify partitions when creating tables in HBase
Design the RowKey properly

pre-split

By default, a region is automatically created when an HBase table is created. When data is imported, all HBase clients write data to this region until the region is large enough. Another method to speed up the batch write speed is to create empty regions in advance. In this way, when data is written to HBase, data load is balanced among clusters based on region partitions.

RowKey design

Design principles

uniqueness

It must be designed to be unique. Rowkeys are stored in lexicographical order. Therefore, when designing a RowKey, take full advantage of this sorting feature by storing frequently read data in a block and storing recently accessed data in a block.
The length of the principle

A RowKey is a binary stream. The RowKey length is recommended by many developers to be between 10 and 100 bytes, but the recommended length is as short as possible, no more than 16 bytes.

Here’s why:
- The persistent data file HFile is stored according to KeyValue. If the RowKey is too long, for example, 100 bytes, 10 million rows of data will occupy 100*10 million =1 billion bytes, or nearly 1 gigabyte of data, which will greatly affect the storage efficiency of HFile.
- The MemStore caches some data to the memory. If the RowKey field is too long, the memory utilization decreases and the system cannot cache more data, which reduces the retrieval efficiency. Therefore, the shorter the RowKey’s byte length, the better.
- Current operating systems are all 64-bit systems with 8 byte aligned memory. Control in 16-byte, 8-byte integer multiples takes advantage of the best features of the operating system.
Hash principle

If the RowKey increases by time stamp, do not put the time before the binary code. You are advised to use the high part of the RowKey as the hash field, which is randomly generated by the program, and place the time field in the low part. In this way, data balancing is more likely to be distributed on each RegionServer for load balancing. If there is no hash field, the first field is the time information, and all data is concentrated on one RegionServer. In this way, the load is concentrated on different RegionServers during data retrieval, resulting in hot spots and decreasing query efficiency.

Hash common design methods

Salting

Hashing (Hashing)

Reversing

Inverting effectively randomizes the RowKey by making the parts of the RowKey that change frequently (the least meaningful parts) first, but sacrificing RowKey orderliness. The example of reversing the RowKey takes the mobile phone number as RowKey. You can use the reversed string of the mobile phone number as RowKey. In this way, hot spots can be avoided when the mobile phone number is used as a fixed starting point.