Hbase Working Principles (1) Hbase basic architecture

The Flink series will be on hold for a while to discuss the parallelism mechanism, the data exchange mechanism for TaskManager, the Flink standalone and YARN remote startup process.

For Hbase, the Hbase series describes the official documents, basic architecture, and principles of frequently asked questions (FAQs) because the storage component cannot be separated

Refer to the article: developer.ibm.com/zh/technolo…

Hbase Basic Structure

Personal architecture diagram: www.processon.com/view/link/6…

1. Each RegionServer has one or more HLogs. (One by default. Version 1.1 supports MultiWAL and allows multiple HLogs.) 2. Each HLog is shared by multiple regions. 3. Log unit WALEntry represents the minimum append unit of a row-level update, which consists of HLogKey and WALEdit. 4. WALEdit represents a collection of updates in a transaction. Before version 0.94, if a transaction on a line of three columns rowR c1, c2 and c3 have modified respectively, so there will be three HLog corresponding log, there is no guarantee that row level transaction atomicity, if update RegionServer c2 columns occurred after downtime, so only part of one row data to success: <logseq1-for-edit1>:<keyvalue-for-edit-c1> <logseq2-fot-edit2>:<keyvalue-for-edit-c2> <logseq3-for-edit3>:<keyvalue-fot-edit-c3> later than 0.94 represents a row-level transaction write as a record, in which WALEdit is serialized into the format <-1, # of edits,,,> : <logseq#-for-entire- TXN >:<WALEdit-for-entire- TXN > <-1, 3,, >, -1 is the identifier, indicating the new log structureCopy the code

HLog life cycle

Personal chart www.processon.com/view/link/6…

MemStore

MemStore Flush Generates the memory strip

Data from different regions is mixed in the JVM Heap.
Region1 Performs flush. The memory is released and becomes Free Space. It allocates Space for data written to the MemStore and splits the memory into smaller strips.
As data is written and flushed in the MemStore, the entire JVM generates a large number of smaller and smaller strips of memory, known as memory fragments.
The JVM is triggered to perform a Full GC to merge the memory fragments as they get smaller and smaller, until they can’t even allocate enough memory for writing objects.

Personal structure: www.processon.com/view/link/6…

MSLAB (MemStore locally allocated Cache)

RegionServer JVM start parameter -xx:PrintFLASStatistics=1 RegionServer JVM start parameter -xx:PrintFLASStatistics=1
Free Space – The total amount of memory currently Free in the old age
Max Chunk Size (emphasis) – The Size of memory occupied by the largest Chunk in the old age
Num Chunks – The total number of memory fragments in the old age

Personal structure: www.processon.com/view/link/6…

HFile

HFile logic structure diagram: www.processon.com/view/link/6…

HFile V2 physical structure diagram

HFile Block

Trailer Block

Data Block

Bloom Index Block

| a GET request, according to Bloom filter to filter search, steps: 1) according to the written Key in Bloom Index Block all Index entries according to BlockKey binary search, locate the corresponding Bloom Index Entry. 2) Load the corresponding bit array according to BlockOffset and BlockOndiskSize in Bloom Index Entry; 3) Hash the Key to check whether the corresponding position in the bit array is 1. If the position is not 1, the Key does not exist. Otherwise, the Key may exist. |

A GET request is filtered according to the Bloom filter. The steps are as follows:

Binary search based on BlockKey is performed among all the indexes in Bloom Index Block to locate corresponding Bloom Index Entry.
Load the corresponding bit array according to BlockOffset and BlockOndiskSize in Bloom Index Entry.
Hash the Key to check whether the corresponding position in the bit array is 1. If the position is not 1, the Key does not exist. Otherwise, the Key may exist.

HFile Indicates the file index

Block & NoneRoot Index Block

View the HFile metadata

Hbase HFile -m -f /hbase/data/default/mytable/c606b6c02a3f6b51192988432afe38bc/colfam1/e40c31de97ff47aeb27d84a4ca59f7c8 2020-05-20 14:20:59:865 INFO [main] hfile.CacheConfig: Created CacheConfig: CacheConfig:disabled Block index size as per Heapsize: 392 reader=/hbase/data/default/mytable/c606b6c02a3f6b51192988432afe38bc/colfam1/e40c31de97ff47aeb27d84a4ca59f7c8, compression=none, cacheConf=CacheConfig:disabled, firstKey=r/colfam1:q/1565781611071/Put, # when scan performance by startkey and stopkey lastKey = r2 / colfam1: q2/1565781667630 / Put, avgKeyLen = 22, # file all the KeyValue average Key length; If the number of keyvalues in a Block is relatively small, adjust the size of the Block (for example, 32KB) to prevent the number of keyvalues in the Block from increasing the intra-block delay time when keyvalues are located in sequence scanning. avgValueLen=1, entries=3, length=4975 Trailer: fileinfoOffset=289, loadOnOpenDataOffset=181, dataIndexCount=1, metaIndexCount=0, totalUncomressedBytes=4884, entryCount=3, compressionCodec=NONE, uncompressedDataIndexSize=34, numDataIndexLevels=1, firstDataBlockOffset=0, lastDataBlockOffset=0, comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, encryptionKey=NONE, majorVersion=3, minorVersion=0 Fileinfo: BLOOM_FILTER_TYPE = ROW DELETE_FAMILY_COUNT = 0 EARLIEST_PUT_TS = 1565781611071 KEY_VALUE_VERSION = 1 LAST_BLOOM_KEY = r2 MAJOR_COMPACTION_KEY = false MAX_MEMSTORE_TS_KEY = 8 MAX_SEQ_ID_KEY = 10 TIMERANGE = 1565781603186.... 1565781667630 hfile.AVG_KEY_LEN = 22 hfile.AVG_VALUE_LEN = 1 hfile.CREATE_TIME_TS = 1565785277850 hfile.LASTKEY = R2 / colfam1: q2/1565781667630 / Put/vlen = 0 / MVCC = 0 # Hfile V3 storage cell tag data MAX_TAGS_LEN = # 22 single cell to the maximum number of bytes stored in the tag TAGS_COMPRESSED Mid key: \x00\x01r\x07colfam1q\x00\x00\ x01L \x8F\xDBR? \x04 Bloom filter: BloomSize: 8 No of Keys in bloom: 3 Max Keys for bloom: 6 Percentage filled: 50% Number of chunks: 1 Comparator: RawBytesComparator Delete Family Bloom filter: Not presentCopy the code

BucketCache

BucketCache memory structure

BucketCache label

BucketSize is always 1KB larger than the Block itself because blocks are not fixed in size and are always larger than 64K. Default label (4 + 1), K (8 + 1), K (16 + 1), K (+ 1) 48 K + 1) (56 K, K (64 + 1), K (96 + 1),... (512+1)K; When HBase is started, the system first traverses all size labels from small to large and allocates a Bucket for each type of label. Finally, all remaining buckets are allocated with the largest size label (512+1)K by default. The size label of a Bucket can be dynamically adjusted. Most 64KB blocks are used up. After a 65 KB Bucket is used up, other empty buckets can be converted to 65 KB buckets, but at least one Bucket of this size is reserved.Copy the code

Block cache write and read processes in BucketCache

HDFS file retrieves blocks

The HDFS Block size is 128 MB: HDFS stores large files. If the data volume is large enough, Block metadata (DN where blocks reside and mapping between files and blocks) is large if the Block size is too small. HDFS metadata is stored on NN, and NN may become the bottleneck of the whole cluster due to a large amount of metadata. As a result, the HDFS Block was increased from the original 64 MB to 128 MB. If the HBase Block size is 64 KB, the HBase cache policy is to cache the entire Block. If the Block size is too large, the cache will be exhausted. Especially for random read services, if the Block size is too large, the cache efficiency will be low.Copy the code