In the last article (talking about our practice in the Feed stream scenario), we actually talked more about our implementation of Feed streams in MySQL based on business processes and characteristics, as well as some problems and analysis results.

This time I want to talk about our final plan, HBase.

HBase

HBase is an open-source, non-relational, distributed database with scalability and data versioning features. It provides cluster deployment and an architecture designed to separate computing and storage. HBase can be used for random and real-time reading and writing of big data.

Its design references Google’s BigTable, a high-performance, highly available, column-oriented distributed database that can be used to process unstructured or semi-structured data.

Look at HBase from a different perspective

Data storage form

HBase stores data in data tables. The RowKey RowKey uniquely identifies a row of data.

Here’s a comparison to see what the data store looks like.

The data mapping between MySQL and HBase is as follows:

(For reference and understanding only, there is a big difference in practice)

HBase < -- - > the MySQL Table < -- - > Table Row < -- - > data Row column Family Family < -- - > x x x x x column qualifier Qualifer < -- - > Table field Cell Cell < -- - > Timestamp of a field in the data row TS <-----> Data create/update timeCopy the code

Logical view

Logically, an HBase data table consists of data rows, including RowKey RowKey and ColumnFamily ColumnFamily. The ColumnFamily contains one or more column qualifiers.

Let’s do another example.

ColumnFamily Has only one Info column Qualifier. There are three qualifiers, name, Major, and email

Multiple data lines form a Region, which is the smallest unit of HBase storage and load balancing and stores data in ASCII order. The overall structure is as follows:

Over time, more and more data row, size of the Region more than hbase. Hregion). Max filesize, division will be: (source code, the default configuration file hbase – default. XML records in hbase. Hregion). Max filesize default size is 10 gb, or high)

Physical view

In terms of logical structure, the relationship between data tables and data rows is simple. Region design also supports data fragmentation, migration, and backup in distributed storage scenarios. Next, we analyze further from the physical view, starting with Region.

  • Region

In Region, data is stored in multiple stores according to ColumnFamily instead of rows. Each Store has a ColumnFamily.

The internal structure of Store is designed based on LSM Tree, including MemStore and StoreFile.

When data is updated, it is quickly written to the MemStore in the memory. When the size exceeds the threshold, it is dropped to the disk and saved as StoreFile (in HFile format, binary file of Hadoop).

Memory write and sequential IO drop disk make it very good write performance and good performance in reading recently updated data. (LSM Tree can be analyzed in detail, not detailed)

  • RegionServer

A table consists of multiple regions distributed on different RegionServers. The overall structure is as follows:

Does Region not back up data?

By default, there is only one fragment in a Region, avoiding data inconsistency caused by data synchronization delay in multiple fragments and ensuring strong C in CP. This value can be modified. When there are multiple shards, the sub-shard can only be read.

  • HLog

When HBase writes data, MemStore is written first. This causes a problem. When the data node breaks down suddenly, what can I do if the memory data is lost?

To solve this problem, HLog (WAL) is introduced in HBase. When RegionServer receives a data write or update request, HLog is synchronously written and then stored in MemStore. When a disaster occurs and you need to recover, you can use HLog for replay.

What does HLog record?

Write serial number, time, Region, Table, update and other data required for replay.

Each RegionServer has only one HLog to manage changes of all regions.

  • Master

RegionServer uses Regions to organize and manage data. However, regionServers lack a way to manage global data.

Master is responsible for Region allocation, RegionServer load balancing, and metadata management. It is usually deployed in multi-Master cluster mode to achieve high availability.

  • ZooKeeper

In distributed architecture, the management of cluster node status and metadata is very important to meet the requirements of concurrent data update, consistency and high availability. ZooKeeper is introduced in HBase clusters to manage node status and Region and RegionServer metadata.

The overall architecture

The business scenario

Let’s start with a quick comparison to see how HBase differs from the commonly used MySQL.

HBase MySQL
Deployment architecture Master + DataNode + ZK Master and slave, multiple sources, three nodes
The engine structure LSM B+ Tree
advantages MemStore, column oriented, multi-version data, sparse data storage, distributed data storage Balanced read/write performance, multiple structured indexes, and low latency
disadvantages Regular Compaction, LSM multi-layer query read IO amplification, GC latency spikes, single indexes, etc There is a data bottleneck and performance deteriorates
Operational costs high low
Focusing on the scene Big data, part of online business scenarios (Feed) General business

HBase is insensitive to storage media and compresses data. The storage cost is relatively controllable. At the same time, it supports dynamic columns and the data storage structure makes it suitable for storing sparse data. It can give full play to its advantages in big data and some special scenarios, such as user portrait, real-time recommendation, real-time risk control, Feed flow, etc.

Business Application – Feed streams

In a previous article (talking about our practice in the Feed stream scenario (PART 1)), we talked about the initial solution we designed in the Feed stream scenario based on MySQL and the problems that existed. At the same time, the problem of this scheme lies in its complexity, high maintenance cost, and with the growth of users, MySQL cannot face such a large amount of data.

So we ended up using push mode + HBase.

  • RowKey

{user_id}_{time}

As a prefix, the user ID can be used to quickly find a user’s Feed stream and solve the hot problem of data routing. You can then follow the Feed time to achieve a user-grained chronological Feed stream.

Here is a question, how to achieve time descending order?

By default, HBase rowkeys are sorted in ASCII order. In service scenarios, the time sequence must be continued in descending order. The solution is to use long. MAX_VALUE – time as the time field

  • pre-split

Sixteen zones have been created in advance to avoid performance and hotspot problems caused by initial Region splitting.

  • The compression

The default compression method Snappy is used, and the compression ratio is about 30%, which is compressed from 143GB to 46GB after testing.

  • Test case

This is the test situation at that time, the 16 partition data and read and write volume is relatively average.

  • Effect of online

After the function was put online, some time was spent on online data migration and traffic switch. The final performance changes can be seen in the following figure:

After turning on the traffic switch and entering HBase, the 99 line length is about 95 milliseconds, while the previous 99 line length is 4.5 seconds in extreme scenarios. The overall effect is very obvious.