Write articles carefully and share with your heart.
Personal website: Yasinshaw.com
Public number: XY’s technical circle
The previous article introduced the five most commonly used Redis objects and their underlying data structures. This article focuses on a less commonly used object that is well suited for some specific scenarios: Bitmaps.
Problems faced
Consider a few common scenarios:
- Query the number of users logged in today
- Query which users are logged in today
- Query whether a user has liked an article
- Query whether a user logs in for two consecutive days
- Select * from users who liked article A and liked article B
These requirements can be implemented using a database, using log tables, and then SQL queries. But this wastes a lot of disk space, and performance is poor. In the case of large amount of data, it is very slow to use database for query analysis.
Thanks to our computer scientists, we’ve developed a data structure that can solve these problems. It’s called a BitMap. This article focuses on the underlying data structure used by Redis Bitmaps.
The first article in Programming Abecz talks about using Bitmaps to sort data in large files. Similar questions may be asked in some interviews.
Interview question: A 10G file with all the natural numbers, one in a row, out of order, sort them. This is done on a 32-bit machine with a memory limit of 2 gigabytes.
There is an algorithm problem in LeetCode that can also be solved using BitMap:
Algorithm title: take n different numbers from 0 to n, find the missing one. Note: Your algorithm should have linear time complexity. Can you implement an algorithm that only takes up extra space complexity for constants?
BitMap principle
A BitMap is actually a data structure that uses the high performance of bit computing to store and calculate information. Let’s take a look at how bitmaps work.
Bitmaps record information based on the position of bits. Let’s say we now have an 8-bit BitMap. You start off with zeros in all the bits.
0, 0, 0, 0, 0, 0, 0, 0, 0
Then there is a requirement: suppose that four users with IDS 1,2,4, and 7 log in to our system today, and we want to record the login information, we just need to mark 1 in the corresponding position:
1, 1, 0, 1, 0, 0, 1, 0
In this case, if you want to know whether the user with ID 7 has logged in to the system today, you just need to look at the BitMap to see if the seventh digit is 1.
As you can see, if you are using a list or set, if an ID is a byte, it is 8 bits (in the real case, it might be a long, it is 64 bits), then the 8 ids are 64 bits, whereas with a BitMap it is 8 bits. Similarly, if our ID field takes up 64 bits, we can save 64 times as much space. In addition, the characteristics of bit operation can be used to quickly realize statistics, union, intersection and other operations.
1, 2, 4, 7:
1, 1, 0, 1, 0, 0, 1, 0
For those who have read article B: 1, 3, 4, 8:
1, 0, 1, 1, 0, 0, 0, 1
Those who have seen article A and article B are:
1, 1, 0, 1, 0, 0, 1, 0
&
1, 0, 1, 1, 0, 0, 0, 1
=
1, 0, 0, 1, 0, 0, 0, 0, 0
The people who have seen article A and seen article B are: 1, 4
The above example can only hold 8 bits, what if my ID is 9?
A BitMap is made up of these data pieces one by one. You can do this with an array of integers that can be expanded, such as the long array. So you can scale infinitely.
What if ids are sparse?
For example, THE ID I want to store may be 1,101,1001, which is relatively sparse. If I use BitMap, I will waste some space. Although some current open source implementations can solve this problem by logging offsets, they can also affect performance due to frequent splitting. In this scenario, BitMap is not recommended.
For those interested, check out Google’s open source EWAHCompressedBitMap, which solves the problem of sparse input.
BitMap implementation
Since I am mainly familiar with the Java language, I will introduce the implementation of BitMap in the Java language.
The JDK provides an implementation of a BitMap called BitSet, which is located under the java.util package. Underneath it is an array of type long, with a long representing a word. But BitSet doesn’t solve the problem of sparse input mentioned above. Google’s open source EWAHCompressedBitMap solves the problem of sparse input.
Redis provides bitmaps objects. In fact, the string object is used, the underlying use of SDS mentioned in our previous article. Therefore, there is a maximum limit of 512MB, that is, 2^32 data can be stored. If your ID is a long, 64-bit id, you can use two bitmaps.
As the underlying implementation of SDS shows, it can be scaled up and down, but can still waste a lot of memory if the input is sparse. So using Redis bitmaps is not recommended if the input is sparse.
Redis Bitmaps basic operation
conclusion
Redis provides bitmaps objects. It can save space in some scenarios and significantly improve performance. But if the input is sparse (for example, a site has 100 million registered users but only 100,000 users log in every day), you might as well use set.
Input, on the other hand, can only be plastic. If it’s a string, you can’t use this data structure.
Concern public number: XY’s technical circle