Disclaimer: This article is a PPT shared by the Hbase technical community. Attention, this kind of conference PPT kind of thing can learn more technical solutions and other people’s experience in the practical process. I hope it’s helpful.

background

Kuaishou generates tens of billions of user characteristic data every day, and analysts need to arbitrarily select multi-dimensional combinations (e.g. city = Beijing & gender = male) from hundreds of billions of user characteristic data spanning 30-90 days to analyze user behavior in seconds. To meet this requirement, Kuaishou independently develops BitBase, an analysis service that supports bitmap transformation, storage, indexing, and rapid calculation based on HBase, and successfully applies it to retention analysis, user growth, advertising and marketing, and ABTest.

Business needs and challenges

The actual business needs of Kuaishou and the business scenarios needed: Select any dimension in the 100-billion-level log, calculate the user retention of 7-90 days, and return in seconds.

Technology selection

To this end, Kuaishou investigated various technology solutions including Hive, ES, and ClickHouse.

Technical solution

Finally, a BitBase solution based on Bitmap and Hbase is formed.

For bitmap unfamiliar classmate look here: https://www.jianshu.com/p/bf9dbbc147ed

In a bitmap, a Bit is used to mark the Value of an element, and the Key is the element. Since Bit is used to store data, the storage space can be greatly saved.

Multidimensional computing is finally designed to do and, or, not, xOR, count, and list calculations between bitmaps.

The whole BitBase

Overall structure:

Storage module:

The original information of all tables will be stored in a bitmap, and the specific data will be stored in different bitmaps. The bits of the bitmap will be determined according to the data volume of the table.

Computing module:

DeviceId problem

In a real problem, the complex deviceId would be converted to an index(long) value. And it needs to have the following characteristics: continuous, consistent, reverse solution, fast conversion speed.

Continuous, consistent, reverse solution technical solutions

How to achieve fast transformation

Effect of the business

In terms of practice delay, the 90-day retention time can also be returned in 10 seconds.

Service Status:

The future planning

Future plans include:

  • Offline bitmap can be imported in 5 minutes
  • SQL support
  • Open source

Statement: all articles in this number are original, except for special notes, public readers have the right to read first, shall not be reproduced without the permission of the author, or tort liability.

Pay attention to my public number, background reply [JAVAPDF] get 200 pages of questions! 50000 people pay attention to the big data into the way of god, don’t you want to know? Fifty thousand people pay attention to the big data into the road of god, really not to understand it? Fifty thousand people pay attention to the big data into the way of god, sure really not to understand it?

Welcome your attentionBig Data as the Road to God

Note: all contents of the first public account, here does not guarantee real-time and integrity, we scan the qr code at the end of the attention oh ~