1. Architecture principles
Detailed architecture diagram
-
storeFile
Storefiles are physical files that store actual data. Storefiles are stored in the HDFS in the form of hfiles. Each store has one or more storefiles (hfiles).
-
MemStroe
Write cache: The data in HFile is ordered, so the data is stored in MemStore first. After sorting, the data will be written to HFile when the time to brush is reached. Each brush will form a new HFile.
-
WAL
Data can be written to HFile only after sorting by MemStore, but saving data in memory has a high probability of data loss. To solve this problem, data will be written to a file called write-Ahead logfile before being written to MemStore. So in the event of a system failure, data can be reconstructed from this log file.
2. The writing process
- The client searches the Meta cache for region information and Meta table location information. If yes, go to Step 4
- Client Accesses ZooKeeper and obtains hbase: Region Server where the meta table resides.
- Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
- Communicates with the target Region Server.
- Write (append) data sequentially to WAL;
- Write the data into the corresponding MemStore, and the data will be sorted in the MemStore.
- Send an ACK to the client;
- When the MemStore brush time is reached, the data will be written to HFile.
3. MemStore Flush
Brush time:
- 1) when the size of a memstroe reached the hbase. Hregion) memstore. Flush. The size (the default value of 128 m), the region’s all memstore will flash. (Best case, healthy Hbase)
When the size of memStore reaches
- Hbase) hregion) memstore. Flush. The size (the default value of 128 m)
When the memStore reaches 4 times the refresh maximum value (the default is 4 times), data will not be written to the memStore.
- Hbase) hregion) memstore. Block. Multiplier (the default value (4)
- 2) When the total size of memstore in Region Server reaches (often not a very healthy state)
java_heapsize
- Hbase. Regionserver. Global. Memstore. The size (the default value is 0.4)
- Hbase. Regionserver. Global. Memstore. Size. Upper. The limit (the default value of 0.95),
= “heapSize * 0.4 * 0.95”
Regions are flushed in order of the size of all their memstores (from largest to smallest). Until all the memstore region server in the total size of the reduced to hbase. The regionserver. Global. Memstore. Size. The lower. Below the limit. When the region server memstore. The total size of the achieve java_heapsize * hbase regionserver. Global. Memstore. The size (the default value of 0.4), will continue to write data all memstore block.
-
3) MemStore Flush will also be triggered when the automatic flush time is reached.
Automatically refresh interval by this property to configure hbase. The regionserver. Optionalcacheflushinterval 1 hour (the default).
-
When WAL file more than the number of hbase. Regionserver. Max. Logs, region will be carried out in accordance with the time sequence flash, until a WAL file number decreases to hbase. The regionserver. Max. Log the following (the attribute name has been scrapped, now don’t need to manually, Maximum value is 32)
4. Reading process
- 1) The Client accesses ZooKeeper and obtains hbase: Region Server on which the meta table resides.
- 2) Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
- 3) Communicate with the target Region Server;
- 4) Query target data in Block Cache (read Cache), MemStore and Store File (HFile) respectively, and merge all found data. All data here refers to different versions (time stamp) or types (Put/Delete) of the same data.
- 5) Cache data blocks (HFile data storage unit, default size: 64KB) queried from files to Block Cache.
- 6) Return the final result of the merger to the client.
5. StoreFile Compaction
Because memStore generates a new HFile every time it is flushed, and different versions (TIMESTAMP) and different types (Put/Delete) of the same field may be distributed in different Hfiles, all hfiles need to be traversed during query. To reduce the number of hfiles and clean up stale or deleted data, a StoreFile Compaction occurs.
There are two types of Compaction: Minor Compaction and Major Compaction. Minor Compaction consolidates several nearby smaller Hfiles into one larger HFile, but does not clean up expired or deleted data. A Major Compaction compacts all hfiles from a Store into a single HFile and wipes out expired or deleted data.
- Minor Compaction: Compacts files, but does not delete the cleanup data. (This happens when three small files die by default.)
- Major Compaction: This operation compacts data when it wipes out stale data. (When this happens manually, it takes 7 days to trigger a new Compaction.)
6. Region Split
By default, each Table has only one Region at the beginning. The Region is automatically split as data is written. The HMaster may transfer one Region to another Region Server for load balancing purposes.
Region Split
- 1) when a region in a certain Store all StoreFile. The total size of the over hbase hregion). Max filesize, the region will split before (version 0.94).
- 2) when a region in a certain Store all StoreFile under the total size of more than Min (R ^ 2 * hbase. Hregion) memstore. Flush. “size”, hbase. Hregion. Max. The filesize “), The Region will be split. R is the number of tables in the current Region Server (after 0.94).