01 background

Hi, I’m Amu! Your harvest is my love, and your thumbs up is my approval.

When I just graduated from college many years ago, the use of cache was rarely involved. When I did courseware design or small projects in college, I did not use cache. Moreover, I did embedded assembly language and C language in college.

The interviewer asked if there was a difference between Redis and memcached. Well, I did some research on the basics and applications of Redis, but I wasn’t familiar with it. Just a vague answer. Then asked me, if the company now do an app, app has check-in function, what should you do?

“Is that hard? “, directly to the interviewer and say mysql storage is ok.

“We currently have over 200,000 active users. Are you sure you only use mysql for storage? Aren’t you afraid to blow up our database?” “The interviewer looked at me incomprehensibly, as if to say to me, what a newbie. Not so simple?

Just recently, a small partner asked this knowledge point about signing in, here is to say!

What is bitmap 02?

The bitmap website says: a bitmap is not an actual data type, but rather a collection of bit-oriented operations defined on string types. The minimum unit of a bitmap is bit, and the value of each bit can only be 0 or 1. Redis has a maximum string limit of 512 megabytes, so a bitmap can hold up to 2^32(4.2 billion) different bits.

You can think of a bitmap as an array of bits, and the index of the array is the offset

Its advantages: small memory overhead, high efficiency and simple operation.

What can we do with bitmaps?

  • Count users’ daily check-ins (most frequently used)
  • Count daily active/Inactive active users (Extension: Accurate data: usehiveorsparkStatistics; For inexact data, useHyperLogLog)
  • Real-time statistics of user online status (100 million users: 12MB storage space required)
  • Data is doubly written
  • Read or unread status of a video, article, etc

What instructions can be used in bitmap 04?

Select * from ‘select’;

Getbit instruction: getbit key offset to obtain the specified offset bit (bit); The time complexity is O(1). Note: If the key does not exist or if offset is greater than the length of the string value, 0 is returned. Bitcount instruction: bitcount key [start] [end] Gets the number of bits in the specified range; Time complexity O(n). Note: If the key does not exist, it is treated as an empty string, so the return value is 0. Bitpos instruction: bittops key bit [start] [end] Gets the position of the first bit in the bitmap; Time complexity: O(n), where n is the number of bits contained in the bitmap.Copy the code

2, add the insert instruction operation:

Setbit key offset value Sets the string value stored by the key, or clears bits at the specified offset. Time complexity: O(1). Note: 1, bits are set or cleared depending on the value parameter, 0 or 1. 2. If the key does not exist, a new string value is automatically generated. The value of offet must be greater than or equal to 0 and less than 2^32(the maximum value of a string is 512 MB).Copy the code

Here is the time complexity of my hash type commands. You can refer to this table:

instruction Time complexity
getbit key offset O(1)
bitcount key [start] [end] Order n, where n is the number of bits
setbit key offset value O(1)
bittops key bit [start] [end] O(n), where n is the number of bits

Bitmap Practice series

The function we want to implement is the most commonly used check-in, which is as follows:

  • 1. Sign in and clock in
  • 2. Check whether you have clocked in on a certain day (because most apps only have a check in button on that day)
  • 3. Obtain the card punching record list of the user in a certain month
  • 4. Count the total number of users punching in a certain month
  • 5. Obtain the consecutive punch times of the user in a certain month
  • 6. User re-sign

Let’s start by creating a relational table of user clock-in information to store the user’s clock-in information. Here’s an important point:

I have seen many people on the network only use Redis to store user punched information, and do not actually deal with it. The most strange thing is that people ask, if Redis fails or what, the operation or product wants to obtain data analysis, what should we do?

One blogger replied: “Redis high availability, Redis persistence, and writing a query cache interface in the background, don’t you?” Is the blogger answering seriously? To play with us? I also smiled…..

Let’s talk about the reasons for creating table records:

1. In the age of big data, any valuable information needs to be collected, and it’s important to have something to do with user activity and DAU

2. Related to users, products and operations will definitely need these data to analyze user behavior and consider the revenue brought by check-in gifts

3. Do not rely too much on the cache. Once the cache has problems or crashes, data loss is a big problem

4. When the user has doubts, the problem can be quickly checked through the landing data

5. When there is a problem with the cache, data can be returned to the source through the records of the database to ensure the consistency of data

CREATE TABLE 'mumu_sign_202105' (' id 'int(11) NOT NULL AUTO_INCREMENT COMMENT' mumu_sign_202105 ', 'user_id' varchar(255) NOT NULL DEFAULT "COMMENT", 'sign_date' date NOT NULL DEFAULT '0000-00-00' COMMENT '0', 'create_at' int(10) NOT NULL DEFAULT '0' COMMENT 'creation time ', PRIMARY KEY (`id`), UNIQUE KEY 'uniq_uid_date' (' user_id ', 'sign_date')) ENGINE=InnoDB DEFAULT CHARSET= utF8 COMMENT=' user_id ';Copy the code

We see this table name is not very strange why to bring _202105, mainly is here used by the principle of the monthly table, because of the large number of users, as far as possible to ensure that a table of data in 500W, of course, some performance is better than more than 1000 optimization speed is also leverage.

After the end of each month, data can be synchronized to Hive, ES, and Sphinx. We only need to ensure that the table data within six months is used up, and the rest can be deleted.

So whenever we design a table or do a feature, it’s important to think about estimates and why we’re doing it, and what are the benefits? So you will become more and more excellent, thinking logic more and more careful.

insert into `mumu_user` (`user_id`, `sign_date`,`create_at`) VALUES
('10001','2021-04-22', unix_timestamp()),
('10001','2021-04-24', unix_timestamp()),
('10001','2021-04-25', unix_timestamp()),
('10001','2021-04-26', unix_timestamp()),
('10001','2021-04-30', unix_timestamp());
Copy the code

So let’s start the code logic comb:

① Sign in and punch in series
/** * @desc @param string $date * @return int */ public function signIn($date = ") {$key =" $this->getKey($date); Return $this->redis->setBit($key, $this->getCurrentDay($date), 1); } require_once './Bitmap.php'; $user_id = 1001; $date = "2021-04-24"; $bitmap = new bitmap ($user_id); Echo $bitmap->signIn($date); Localhost :6379> SETBIT user:sign:1001:202104 23 1 (INTEGER) 0Copy the code

Doesn’t it seem super simple to use the setbit instruction to store the user’s check-in status as 1? Remember that you have to insert the table before you write to the cache to make sure that the database is on the ground. If you’re smart, you might see that the unique key is set in the table to ensure that a user only checks in once a day.

② Check whether a certain day has clocked in
/** * @desc * @param string $date * @return int */ public function judgeUserSign($date = ") {$key =" $this->getKey($date); return $this->redis->getBit($key, $this->getCurrentDay($date)); } require_once './Bitmap.php'; $user_id = 1001; $date = "2021-04-24"; $bitmap = new Bitmap($user_id); echo $bitmap->judgeUserSign($date); Localhost :6379> GETBIT user:sign:1001:202104 Localhost :6379> GETBIT user:sign:1001:202104 24 // The actual query here is whether to check in on April 25 (integer) 0Copy the code
③ Obtain the card punching record list of a certain month
@param string $date * @return mixed */ public function getUserAllSign($date = ") {// $this->getKey($date); $this->getKey($date); $result = $this->redis->bitField($this->redis->bitField($key)); $list = []; $list = []; $date = $this->getMonthDays($date); // Iterate from low to high, where 0 indicates no check-in; For ($I = $days; $I = $days; $i > 0; $I --) {if ($I < 0) break; / / define what is the current date $local_date = date (' Y -m '). '-'. $I; $flag = ($result >> 1 << 1) $flag = ($result >> 1 << 1); = $result ? true : false; $list[$local_date] = $flag? $list[$local_date] = $flag? 1:0; $result >>= 1; $result >>= 1; } return $list; } require_once './Bitmap.php'; $user_id = 1001; $date = "2021-04-24"; $bitmap = new Bitmap($user_id); $result = $bitmap->getUserAllSign($date); var_dump($result); / / execution result set array (30) {(" 2021-04-30 ") = > int (0) (" 2021-04-29 ") = > int (0) (" 2021-04-28 ") = > int (0) (" 2021-04-27 ") = > int (0) ["2021-04-26"]=> int(0) ["2021-04-25"]=> int(0) ["2021-04-24"]=> int(1) ["2021-04-23"]=> int(0) ["2021-04-22"]=> int(0) ["2021-04-21"]=> int(0) ["2021-04-20"]=> int(0) ["2021-04-19"]=> int(0) ["2021-04-18"]=> int(0) ["2021-04-17"]=> int(0) ["2021-04-16"]=> int(0) ["2021-04-15"]=> int(0) ["2021-04-14"]=> int(0) ["2021-04-13"]=> int(0) ["2021-04-12"]=> int(0) ["2021-04-11"]=> int(0) ["2021-04-10"]=> int(0) ["2021-04-9"]=> int(0) ["2021-04-8"]=> int(0) ["2021-04-7"]=> int(0) ["2021-04-6"]=> int(0) ["2021-04-5"]=> int(0) ["2021-04-4"]=> int(0) ["2021-04-3"]=> int(0) ["2021-04-2"]=> int(0) ["2021-04-1"]=> int(0)} localhost:6379> BITFIELD User :sign:1001:202104 get u30 01) (INTEGER) 64Copy the code

Note that the date 2021-04-24 is set as the check-in date above, and the returned 64 is binary data: 0100 0000, so it is clear, indicating the check-in date 24

If you’re wondering why you don’t have support for the bitField directive, redis has added a powerful new bitField directive since version 3.2. R If this instruction does not appear, it is estimated that the above code will cache through the pipeline command batch to obtain data for dozens of days. But it’s not a concept at all. It does all the fetching in a single command. Popularize the bitfield command:

Official documents: The BITFIELD key [GET type offset] [the SET type offset value] [INCRBY type offset increment] [OVERFLOW WRAP | | SAT FAIL] means: The parentheses refer to the subcommands that support them; Get, set, incrby: O(1) Bitfield key get u8 0 1, key refers to the cache key that we need to operate 2, get is a subcommand of bitfield used to select 3, u8 means the unsigned number +30 bits of integer (i8 means the signed number) 4, 0 means to return the specified bit offsetCopy the code

An unsigned number is a non-negative number. It has no sign position, and all the bits in the array are values. An unsigned number is a negative number. The first part of the obtained value is the sign bit and the rest is the usable value. And if you want to understand that, you can look at how computers are made and you can look at that at Bibi.

④ Obtain the total number of punched cards in the current month
/** * @desc * @param string $date * @return int */ public function getSumSignCount($date = ") {$key =" $this->getKey($date); return $this->redis->bitCount($key); } $bitmap = new Bitmap($user_id); $result = $bitmap->getSumSignCount($date); var_dump($result); Localhost :6379> BITCOUNT user:sign:1001:202104 (INTEGER) 1Copy the code
⑤ Obtain the consecutive check-in times of users
/** * @desc * @param string $date * @return int */ public function getContinuousSignCount($date = ") {$key  = $this->getKey($date); $date = $this->getCurrentDay($date); $result = $this->redis->bitField($key, 'u'. $days, 0); $this->redis->bitField($key, 'u'. $days, 0); $signCount = 0; $signCount = 0; $value = isset($result[0]) ? $result[0] : 0; For ($I = $days; $I = $days; $i > 0; $I --) {if ($I < 0) break; If ($value >> 1 << 1 == $value) {if ($value >> 1 << 1 == $value); If ($I! = $days) break; $signCount++; $signCount++; $signCount++; } $value >>= 1; $value >>= 1; } return $signCount; } $bitmap = new Bitmap($user_id); $result = $bitmap->getContinuousSignCount($date); var_dump($result); Localhost :6379> BITFIELD user:sign:1001:202104 get u25 01) (INTEGER) 22Copy the code

The number of consecutive check-in is mainly a test of your proficiency in binary bit operations, knowing how to perform bit operations, so that you can better understand how to use the check-in.

Popular science moves left and right:

A <

A >> A >> A >>b (shift right) Moves a bit to the right a bit to the right a bit to the right a bit to the right a bit to the right b times (each move represents “divide by 2”); When moving right, the left side is filled with a sign bit, meaning that the positive and negative signs are retained.

⑥ User renewal
/ * * * * @ @ desc user retroactive param string $@ the return date * bool | int * / public function rebuildSign ($date = ' ') {$key = $this->getKey($date); If ($this->judgeUserSign($date)) return false; return $this->signIn($date); } // This is very simple, you can operate by yourselfCopy the code

Bitmap actual code warehouse address: github.com/woshiamu/am…

The final summary

This article mainly explains the use of Redis bitmap through the actual application scenarios. How to apply it, how to practice it, you can get a better idea of what bitmaps are all about by doing code case by case.

A bitmap is a string that occupies memory and can store a large amount of data. Give you a little advice, when reading articles or books, we must start to practice after reading, because practice is the only standard to test the truth; If we are still using set hash SiMember for our check-in function, we can try to change it and compare the performance to improve our technical level and speed of interface access.

After reading the article, do you have a further understanding of the use of bitmaps? If Amu’s article was helpful or lacking, please leave a comment below.

Finally, you are welcome to follow my personal public account “I am Amu”, which will update the back-end knowledge and study notes from time to time. Also welcome direct public account private message or email contact me, we can study together, progress together.

All right, I’m Amu, a worker who doesn’t want to be eliminated at 30 ⛽️ ⛽️ ⛽.