The author | push big data operations engineer walker



To upgrade the background


As a professional data intelligent service provider, getupush has massive data storage and query requirements in the process of business development. For this purpose, iT chooses HBase, a highly reliable, high-performance, column-oriented and scalable distributed data storage system.


However, after running the old HBase cluster (HBase1.0) for many years, two major problems occur: the basic environment of each node is inconsistent; The cluster’s servers have been out of warranty for many years. And as the push volume grew, performance bottlenecks began to be encountered. After comprehensive evaluation, Individual decided to upgrade and migrate the old cluster to the new HBase2.0 cluster to solve the above problems in the old HBase cluster.


Upgrade steps

The following is a complete upgrade and migration process for developers. As the whole process will involve multiple departments and take a long time, it is suggested that you can ask the department to designate a person for docking during the operation.

Preparation 1: Claim HBase tables and find the read/write applications and service parties of all tables.

Preparation 2: Deploy a HBase2.0 cluster and connect to all read and write application servers.

Debugging 3: test environment debugging application, confirm the normal use of HBase2.0 cluster;

Debugging 4: Developed data verification tools to verify the integrity of data of new and old clusters after migration;

Migration 5: All table double-write projects come online and ensure that data written to the new and old clusters is consistent.

Migration 6: All read application changes are migrated to the new cluster and the read is normal.

End 7: The write project of the old cluster stops, the table is disabled for half a month, and the old cluster goes offline when no exception occurs.


HBase2.0 new features

On April 29, 2018, HBase2.0 was released, including 4551 Issues in total. There are many new features in HBase2.0. This article only introduces the main features. For more information, see the official website documentation.

[https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12327188]


Feature 1: AssignmentManager V2


Analysis of existing problems and causes of AMv1


The main problem of AMV1 is Regoins in Transition (RIT). People who use HBase heavily are often bothered by RIT, which is maddening for long periods of time. Some Rits are actually caused by RegionServer open failure, but most rits are caused by AM problems.

The main causes of RIT are as follows:

1. The Region status changes complicated

The process of Region Open involves seven components and more than 20 steps, but more complex logic means more bugs.


2. Region status multiple cache

The Master memory, Meta table, and Zookeeper all store region status. Hbase1.0 requires that the three regions be fully synchronized.


Both Master and RegionServer modify the Meta table status and Zookeeper status, which can easily cause region status disorder.


In case of inconsistency, which state shall prevail?


3. Relies heavily on Zookeeper for status notification

Zookeeper is used to notify the Region status, which causes a bottleneck in the speed of bringing a Region online or offline. In particular, when there are too many regions, Zookeeper notification lags behind.


The improvement of AMv2


The main improvements are as follows:

1. Each state change of region is first recorded in ProcedureWAL and then in the Meta table;

2. Region status information is stored only in the meta table and HMaster memory. Zookeeper is not stored.

3. Only HMaster can update the meta table.

4.HMaster synchronizes status information directly with RS to remove Zookeeper dependency.

On the whole, AMv2 removes Zookeeper dependency and has a clear region transition mechanism, which makes the code more readable and effectively solves the RIT phenomenon.


Feature 2: In-memory Flush & Compaction


In the HBase writing process, data is written to the Memstore (memory) first. When the threshold is reached, flush is triggered and HFile is generated and stored in disks. Note that the smallest flush unit in MemStore is’ HRegion ‘, not a single MemStore. If there are too many memstores in HRegion, the IO cost of each flush is very high.


HBase1. The problem of x

There are many triggers for a Memstore flush, but most of them have little impact on the business and don’t need to worry developers. However, if Region Server-level Flush is triggered, the entire RS executes Flush and blocks all update operations on the Region Server for a long time, which may reach the level of minutes, adversely affecting services.


The improvement of HBase2.0


In version 2.0, MemStore data is flushed as an Immutable Segment. Multiple Immutable Segments can be written while Compaction occurs in memory. When the value reaches a certain threshold, data in the memory is persisted into HFile files in the HDFS. This is what’s new In 2.0: In-memory Flush and Compaction, which is enabled by default (except for system tables) In version 2.0.


Benefit 1: Reduces the amount of data and disk I/OS. In many tables, only one version of the column cluster is reserved.


Benefit 2: Replace ConcurrentSkipListMap data structure with Segment to store index, save space, same MemStore can store more data.


Attribute 3: Offheaping of Read/Write Path


The HBase service mostly relies on the in-heap memory to read and write data. The JVM adopts the stop-the-world garbage collection mode, which makes the JVM process pause for a long time due to GC. HBase is a system with low latency and high responsiveness requirements. GC is prone to jitter and high latency of HBase services.


The HBase community’s solution to GC latency is to minimize the use of JVM heap memory. As memory usage decreases, GC decreases, and the community supports offheap for read and write links.



The offheap of read links mainly includes the following optimizations:

1. Count BucketCache references to avoid copying.

2. Use ByteBuffer as the implementation of the server KeyValue, so that the KeyValue can be stored in offheap memory;

3. A series of performance optimizations are made for BucketCache.


Offheap for write links includes the following optimizations:

1. In the RPC layer, the KeyValue of the network flow is directly read into the bytebuffer of the offheap.

2. MSLAB Pool using offheap;

3. Use the Protobuf version (3.0+) that supports offheap.


HBase2.0 “pit”

Versions prior to V2.0.3 do not support HBCK2


<pre>

HBCK2 versions should be able to work across multiple hbase-2 releases. It will fail with a complaint if it is unable to run. There is no HbckService in versions of hbase before 2.0.3 and 2.1.1. HBCK2 will not work against these versions.

</pre>


You are advised to upgrade HBase to V2.0.3 or V2.1.1. For details, see the HBCK2 document.

[https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2]


Heavily dependent on Procedure V2


An important reason for the simplicity and efficiency of AMv2 is that it relies heavily on Procedure V2, transferring some complex logic to Procedure V2. The problem with this, however, is that if ProcedureWAL breaks, the consequences can be catastrophic. Of course, xiaobian believes that after a period of bug repair and improvement, these problems will no longer exist.


As an important basic service for big data promotion, HBase has a significant impact on performance. After the upgrade of HBase1.0 to HBase2.0, the reliability and security have been greatly improved, and various problems in the 1.0 version have been effectively solved. In the future, Getuan will continue to follow HBase 2.0 and discuss how it can be used in production environments.