This is the second day of my participation in the August More text Challenge. For details, see:August is more challenging
The body of the
HBase is built on the HADOOP Distributed File System (HDFS). It uses HDFS to reliably store data files. The internal implementation includes Region location, read and write process management, and file management.
1. Region location
HBase supports basic operations such as PUT, GET, DELETE, and Scan. All these operations are based on region location.
Given a Rowkey or Rowkey range, how can I obtain the RegionServer address of the Rowkey?
The basic procedure for locating a region is as follows:
- The client interacts with ZooKeeper to find
hbase:meta
Regionserver where the system table resides,hbase:meta
The table maintains the mapping between rowkey ranges and Region locations in each user table. The details are as follows:
Rowkey: table name, start key, region ID Value: RegionServer object (saves the RegionServer location information) 2. The client interacts with RegionServer where the hbase: Meta system table resides to obtain RegionServer 3 where Rowley resides. The client interacts with the RegionServer where the Rowkey resides and performs operations related to the Rowkey.
Note that the hbase: Meta table needs to be located only when the client performs read and write operations for the first time
Cache the hbase: Meta table locally. The client reads the location of the hbase: Meta table again and updates the cache only when the cache becomes invalid due to region movement.
2. Key internal components of RegionServer
The key components | The name of the interpretation | The main function |
---|---|---|
BlockCache | Read cache | Responsible for caching frequently read data, using LRU replacement strategy. |
MemStore | Write cache | Temporarily cache the data that has not been written to the disk and sort the data before writing it to the disk. Each columnFamily in each region has a MemStore. |
HFile | A data storage format that supports multiple levels of indexing | It is used to store data in HBase tables. All Hfiles are stored in HDFS. |
WAL | WriteAheadLog: log file saved in the HDFS | It is used to save HBase data that is not persisted to the HDFS for restoration after RegionServer breaks down. |
3. RegionServer read and write operations
The two most important operations in HBase are write operations and read operations.
Writing process
To improve HBase write efficiency and avoid low random write performance, RegionServer writes all the received write requests to the memory temporarily and refreshes them to disks sequentially. Then, RegionServer changes the random write requests to sequential write requests to improve HBase performance. The process is as follows:
- After receiving the write request, RegionServer appends the written data to the HDFS log file, which is called WriteAheadLog (WAL). WAL is used to recover the lost data after the RegionServer breaks down.
- RegionServer writes data to MemStore, and then informs the client that the data is written successfully.
- When the memory usage of MemStore reaches a certain threshold, RegionServer refreshes data to the HDFS in sequence and saves the data in HFile (a file format with multi-level indexes) format.
Reading process
Since the data may be stored in memory or on disk, the data needs to be retrieved from multiple storage locations, including the read cache BlockCache, the write cache MemStore, and the HFile file on disk (there may be multiple storage locations), and returned to the user. The specific process is as follows:
- The scanner looks for the read cache, the BlockCache, which internally caches the most recently read data
- The scanner looks for a write cache, MemCache, which internally caches the most recently written data.
- If the target data is not found in the BlockCache or MemCache, HBase reads the data in the HFile to obtain the required data.
MemStore and HFile
MemStore
MemStore is responsible for caching recently written data into memory. It is an ordered Key/Value memory store format. Each Colum family has a MemStore.
When RegionServer receives a write request, RegionServer forwards the request to the corresponding Region.
Each Region stores a set of rows.
The Column data is stored in the corresponding Column Family, depending on its Column Family.
The data in different Column families is stored in their respective HStores, which consist of a MemStore and a series of Hfiles.
MemStore is stored in the main memory of RegionServer, while HFiles are written to HDFS.
When RegionServer processes a write request, data is first written to MemStore, and when a certain threshold is reached, data in MemStore is flushed to HFile.
The main reasons to use MemStore are:
Data stored in the HDFS must be sorted by row key.
The HDFS itself is designed to be sequential reads/writes and does not allow modifications.
In this case, HBase cannot write data efficiently because data written to HBase is not sorted, which means it is not optimized for future retrieval.
To solve this problem, HBase caches the recently received data in memory (in MemStore), sorts the data before persisting to the HDFS, and then writes the data to the HDFS in a fast sequence.
It is important to note that in an actual HFile, there is more to it than simply sorting a list of column data
Besides solving the “disorder” problem, Memstore has some other benefits, such as:
- As an in-memory cache, the cache has recently added data. One obvious case is that newly inserted data is used more frequently than old data.
- There are some in-memory optimizations for Rows/Cells prior to persistent writes. For example, when the version of the data is set to 1, Memstore caches several Cell updates for certain Column families. When writing to HFile, only the latest version needs to be saved, and the rest can be discarded.
It is important to note that each flush of the Memstore creates a new HFile for each Column Family. The read side is relatively simple: HBase first checks if the requested data is in the Memstore, if not in the HFile, and then returns merged results to the user.
Focus on HBase MemStore:
HBase users or administrators need to pay attention to Memstore and be familiar with how it is used for several reasons:
- There are many configurations of Memstore that can be adjusted to achieve good performance and avoid some problems. HBase does not adjust these configurations based on user usage patterns. You need to adjust them yourself.
- Frequent Memstore flushes severely affect HBase cluster read performance and may cause additional load.
- The way Memstore flush may affect your HBase Schema design
HFile
When the amount of data in MemStore reaches a certain threshold, the data is flushed to the HDFS file and saved in the HFile format.
HFile is an open source implementation of GoogleSSTable (Sorted String Table, the storage format used in Google BigTable), which is an ordered Key/Value disk storage format with multiple levels of indexes to facilitate locating data. Multilevel indexes in HFile are similar to B + trees.
The following figure shows the HFile format.
Block section | instructions | Whether or not an optional |
---|---|---|
DataBlock section | Used to hold the data in the table, this part can be compressed. | no |
MetaBlock section | Used to store user-defined kV segments, which can be compressed. | is |
The FileInfo section | The meta information used to hold the HFile is compressed, and users can add their own meta information in this section. | no |
DataBlockIndex section | The index used to hold the Meta Blcok. Data Block Index is eliminated by LRU mechanism. | is |
Trailer segment | When reading an HFile, the Trailer is read first. The Trailer stores the starting position of each segment (the Magic Number of the segment is used as a security check). Then, the DataBlock Index is read into memory. When retrieving a key, you do not need to scan the entire HFile, but only need to find the block where the key resides from the memory, read the whole block into the memory through disk I/O, and then find the required key. | no |
Data blocks and Meta blocks of HFile are usually compressed, which can greatly reduce network I/O and disk I/O. Of course, the overhead is that CPU needs to be used for compression and decompression. The target HFile can be compressed in gzip and LZO modes.
reference
Big Data Technology System: Principle, Structure and Practice, by Dong Xicheng
This section describes the hbase storage structure and hbase concepts
In-depth understanding of HBase Memstore