Didi HBase team recently completed a rolling upgrade of version 0.98 -> version 1.4.8, which users do not know. The new release brings us a wealth of new features and significant improvements in performance, stability, and ease of use. We have written a summary of the challenges, thinking and problems solved in the upgrade process, hoping to be helpful to you all.

1. The background

Currently, there are 11 domestic and overseas HBase clusters in our company, with a total throughput of more than 1kW +/s, serving almost all departments and business lines such as map, universal benefit, car service, engine and finance.

However, there is one problem that continues to plague us: the release is far behind the community — the HBase online cluster is running on version 0.98, while the community is currently running on version 2.3. This imposes many additional constraints and burdens on our work, including the following:

  • New features are expensive to introduce:

Version 0.98 is the first stable HBase release, but it is too old to be maintained by the community. Backport’s new features are getting harder to come by.

  • The maintenance cost of self-developed patch is high: based on version 0.98, we have dozens of large and small self-developed patches, covering from large features such as label grouping and ACL authentication to Improvement such as monitoring system construction and audit log optimization as well as various bug fixes. These patches are either supported in the new version but different from our implementation, or cannot be integrated into the community due to the big version difference, and as the timeline lengthens, this problem will only get worse.
  • Upper-layer components have certain requirements for HBase: Thanks to the active HBase ecosystem, our users currently use various scenarios such as OLAP (Kylin), Space-time index (GeoMesa), Timing (OpenTSDB), and JanusGraph. None of these upper-layer engines, however, relies on HBase version 0.98.

Therefore, upgrade the HBase team urgently.

2. The challenge

This section describes the HBase architecture. HBase is a typical Master/slave structure. The Master/Master structure is used to manage clusters. RegionServer responds to user read and write requests. ZooKeeper is used to ensure cluster consistency. Nodes communicate with each other through RPC. The underlying data file HFile is stored in HDFS.

In addition, HBase has a rich upstream and downstream ecosystem, ranging from real-time tasks represented by StreamingSQL to batch tasks such as Spark and MR.

Based on the above review of cluster architecture and upstream and downstream ecology, it can be clear that the main challenges we face in the upgrading process are as follows:

  • RPC interface compatibility issues: Upgrading cannot happen overnight, so we need to ensure that all RPC communication interfaces are perfectly compatible with old and new versions.
  • HFile compatibility problems; The data format of the underlying file varies from version to version. Version 1.4.8 uses HFile V3 by default. Fortunately, version 0.98 uses HFile V2 but is v3 compatible. In this way, we do not need to add backport related patch to version 0.98. Also for this reason, the official documentation states that 1.x must be used as a transition from 0.98 to 2.x — which is why we chose 1.4.8 for this update.
  • Compatibility problems of our own patches: As mentioned above, we need to fully comb dozens of patches developed by ourselves — whether there is an alternative for the high version, whether the alternative is compatible, whether it needs to be transplanted to the new version, and the function/performance/compatibility test after transplantation, etc.
  • Upstream and downstream compatibility issues: This needs no further elaboration as shown in the figure above. Each engine application needs to be fully compatible.
  • Possible new issues: The community release of HBase is still very active, but this also means that there may be a lot of latent issues (which we actually encountered during the upgrade process, solved and returned to the community) — there is no good way to do this other than to keep track of the community and keep track of what’s going on. Find and fix problems immediately.

Challenge based on the above points, in fact it is not difficult to draw a conclusion: we need to design and implementation of a large number of pre – preparation work to ensure the reliability of the upgrade process, but this is not a bad news, because as long as we are ready to work, good enough to upgrade smoothly assurance and confidence also is stronger, the idea also applies in our future work.

Here’s a quick list of the preparations we’ve done:

  • Release Note review
  • Self-developed patch transplantation and test
  • Basic function test, performance test
  • Advanced functional tests (Bulkload, Snapshot, Replication, Coprocessor, etc.)
  • Community follow-up patch combing and follow-up (up to now, more than 100 patches have been actually joined)
  • Cross-version compatibility test and RPC interface compatibility sorting
  • A full test set was developed and implemented, covering all application scenarios such as HBase, Phoenix, GeoMesa, OpenTSDB, Janusgraph, etc
  • Obtaining software packages and configuration files

3. Upgrade scheme

Two upgrade schemes are available: Creating a cluster + migrating data and rolling upgrade.

In fact, the rolling upgrade was definitely the best option, and we were hesitant because of concerns that “the release was still not stable enough.” In the end, however, we opted for a rolling upgrade based on the confidence that “thorough preparation and testing” gave us.

Briefly describe the general steps of the rolling upgrade:

  1. Solve compatibility problems, mainly reflected in creating rsGROUP metadata table and rewriting data,
  2. Mount new Coprocessor, etc.
  3. Upgrade the master node.
  4. Upgrade meta grouping;
  5. Upgrade service groups in sequence.

4. Practical operation and problems

We started this round of rolling upgrade research and preparation in the second half of 19th year, and officially started online operation in late March this year. By the beginning of May, we had completed the upgrade work of 9 clusters at home and abroad, without users’ awareness.

During this period, we also encountered many unsolved problems. Here is a brief introduction of a Critical problem:

Data loss occurs when RS stacking breaks down during region split:

Region split is a rather complex transaction process, which can be divided into the following steps:

  • RegionServer Starts a split transaction and creates a region node in ZK region-in-transition, marked split.
  • The Master listens to the new split node, does some initialization, and changes the memory state. When it’s done, it changes the node state to SPLITING.
  • RS listens until the node status changes to SPLITING, and the node starts to split — close the parent region, create a sub-region folder and add reference files, modify the meta table to record two sub-regions, and bring the parent region offline.
  • A sub-region goes online.

When the parent region is offline and the RegionServer is offline, the ServerCrashProcedure thread on the Master server starts to roll back and the sub-region is deleted. In addition, there is a thread on the master for CatalogJanitor data cleaning. During normal split process, this thread will be blocked due to the presence of corresponding nodes on ZK. However, when RS breaks down and the temporary node disappears, the thread works normally, and the parent region in the meta table is deleted. In this case, both the parent region and the parent region are deleted, causing data loss. The repair solution has been submitted to community hbase-23693

Other Patch List:

  • Hbase-22620 Repair replication Znode backlog.
  • Hbase-21964 Supports disabling quota Settings by Throttle Type.
  • HBASE – 24401 repair HBASE. Server. The keyvalue. Maxsize = 0 append operation failure problem;
  • Hbase-24184 Fixed authentication problems related to snapshot operations when simple ACL is used only.
  • Hbase-24485 Backport: Optimizes the initialization time of out-of-heap memory.
  • Hbase-24501 Backport: Removes unnecessary locks from ByteBufferArray.
  • Hbase-24453 Backport, fix the lack of verification logic when moving table groups;

5. To summarize

It took nearly a year to complete this upgrade, and I am very happy to successfully complete it. On the one hand, this upgrade greatly widened the gap between the internal version and the community release, laying a solid foundation for more benign version iteration and community interaction; On the other hand, the new version introduces many new features, which bring us broader room for growth in terms of stability and ease of use. More importantly, we have accumulated a set of systematic working ideas and methodology in this process, expecting to better empower the business and create value for the company in the future.


Author’s brief introduction

Didi senior cat slave, focusing on HBase kernel research and development, didi HBase service and upstream and downstream ecological construction and maintenance.

** Welcome to the official account of Didi Technology!

Produced by Didi Technology