-
Zookeeper, as distributed coordination. RegionServer also writes its own information to ZooKeeper.
-
HDFS is an underlying file system run by Hbase
-
RegionServer: A data node that stores data.
-
The Master RegionServer reports information to the Master in real time. The Master knows the global RegionServer running status and can control RegionServer failover and Region splitting.
Architecture refinement
-
HMaster is a Master Server implementation that monitors RegionServer instances in a cluster and is the interface for all metadata changes. In a cluster, HMaster usually runs on top of NameNode. Here is a more detailed description of HMaster
-
HMasterInterface Exposed interface, Table(createTable, modifyTable, removeTable, enable, disable),ColumnFamily (addColumn, modifyColumn, removeColumn),Region (move, assign, unassign)
-
The Master runs the background thread: the LoadBalancer thread, which controls the region to balance the cluster load. The CatalogJanitor thread periodically checks the hbase: Meta table.
-
HRegionServer is the implementation of RegionServer, which serves and manages Regions. RegionServer runs on Datanodes in clusters
-
HRegionRegionInterface Exposes interfaces: Data (get, PUT, Delete, Next, etc.), Region (splitRegion, compactRegion, etc.)
-
RegionServer Background threads: CompactSplitThread, MajorCompactionChecker, MemStoreFlusher, LogRoller
-
Regions, representing table, Region, have multiple stores. Store has a Memstore and multiple StoreFiles(HFiles). The bottom layer of StoreFiles is Block.
Store design
In Hbase, tables are divided into smaller blocks and stored on different servers. These Regions are called Regions. The region where Regions are stored is called RegionServer. The Master process distributes regions between different RegionServers. The HRegionServer and HRegion classes represent RegionServer and Region in Hbase. In addition to containing some HRegions, HRegionServer handles two types of files for data storage
-
HLog, a write-ahead log file, also known as WAL(write-ahead log)
-
HFile Indicates the actual data store file
HLog
-
MasterProcWAL: HMaster logs management operations such as resolving conflicting servers, table creation and other DDLs into its WAL file. This WALs is stored in the MasterProcWALs directory. Unlike RegionServer WALs, HMaster WAL also supports elastic operations. If the Master server dies, the file will continue to be manipulated when another Master takes over.
-
WAL records all Hbase data changes. If a RegionServer fails during FLush in MemStore, WAL ensures that the data changes are applied. If writing WAL fails, the entire operation to modify the data fails.
-
Typically, each RegionServer has only one WAL instance. Prior to 2.0, WAL was implemented as HLog
-
WAL is located in */hbase/WALs/*
-
MultiWAL: If each RegionServer has only one WAL, HDFS must be continuous. As a result, continuous WAL must be written, resulting in performance problems. MultiWAL allows RegionServer to simultaneously write data to multiple WAL parallel channels of HDFS to improve the overall throughput but not the throughput of a single Region.
-
WAL configuration:
Provider </name> <value> Multiwal </value> </property>Copy the code
Wikipedia about WAL
HFile
HFile is the format in which Hbase stores data in the HADOOP Distributed File System (HDFS). It contains multi-layer indexes so that Hbase does not need to load the entire file when retrieving data. The size of the index (keys size, data volume size) affects the block size, and in the case of large data sets it is also common to set the block size to 1GB per RegionServer.
To discuss the data storage mode of database is to discuss how the data is effectively organized on disk. That’s because we’re usually focused on how efficiently we can read and consume data, not the data store itself.
Hfile generation mode
At first, there are no blocks in the HFile, and the data still exists in the MemStore.
When Flush occurs, the HFile Writer is created, and the first empty Data Block is displayed. After initialization, space is reserved for the Header part of the Data Block, which is used to store metadata information about a Data Block.
KeyValues in MemStore are then appended one by one to the first Data Block in memory:
Note: If Data Block Encoding is configured, it will be synchronized when Append KeyValue, and the encoded Data is no longer pure KeyValue mode. Data Block Encoding is an internal Encoding mechanism provided by HBase to reduce KeyValue structural bloat.
Read and write process
Author: YiHe 521 links: https://www.imooc.com/article/275991 source: for class In this paper, the original published on mu class network, reprint please indicate the source, thank you for your cooperation