Disclaimer: This article is a PPT shared by the Hbase technical community. Attention, this kind of conference PPT kind of thing can learn more technical solutions and other people’s experience in the practical process. I hope it’s helpful.
background
Kuaishou generates tens of billions of user characteristic data every day, and analysts need to arbitrarily select multi-dimensional combinations (e.g. city = Beijing & gender = male) from hundreds of billions of user characteristic data spanning 30-90 days to analyze user behavior in seconds. To meet this requirement, Kuaishou independently develops BitBase, an analysis service that supports bitmap transformation, storage, indexing, and rapid calculation based on HBase, and successfully applies it to retention analysis, user growth, advertising and marketing, and ABTest.
Business needs and challenges
The actual business needs of Kuaishou and the business scenarios needed: Select any dimension in the 100-billion-level log, calculate the user retention of 7-90 days, and return in seconds.
Technology selection
To this end, Kuaishou investigated various technology solutions including Hive, ES, and ClickHouse.
Technical solution
Finally, a BitBase solution based on Bitmap and Hbase is formed.
For bitmap unfamiliar classmate look here: https://www.jianshu.com/p/bf9dbbc147ed
In a bitmap, a Bit is used to mark the Value of an element, and the Key is the element. Since Bit is used to store data, the storage space can be greatly saved.
Multidimensional computing is finally designed to do and, or, not, xOR, count, and list calculations between bitmaps.
The whole BitBase
Overall structure:
Storage module:
The original information of all tables will be stored in a bitmap, and the specific data will be stored in different bitmaps. The bits of the bitmap will be determined according to the data volume of the table.
Computing module:
DeviceId problem
In a real problem, the complex deviceId would be converted to an index(long) value. And it needs to have the following characteristics: continuous, consistent, reverse solution, fast conversion speed.
Continuous, consistent, reverse solution technical solutions
How to achieve fast transformation
Effect of the business
In terms of practice delay, the 90-day retention time can also be returned in 10 seconds.
Service Status:
The future planning
Future plans include:
- Offline bitmap can be imported in 5 minutes
- SQL support
- Open source
Statement: all articles in this number are original, except for special notes, public readers have the right to read first, shall not be reproduced without the permission of the author, or tort liability.
Pay attention to my public number, background reply [JAVAPDF] get 200 pages of questions! 50000 people pay attention to the big data into the way of god, don’t you want to know? Fifty thousand people pay attention to the big data into the road of god, really not to understand it? Fifty thousand people pay attention to the big data into the way of god, sure really not to understand it?
Welcome your attentionBig Data as the Road to God
Note: all contents of the first public account, here does not guarantee real-time and integrity, we scan the qr code at the end of the attention oh ~