1. Introduction to bitmap and Zset data structures

Bitmap and Zset data structures in Redis are very useful in business scenarios, and clever use of them can often solve complex problems perfectly. Let’s start with a brief introduction to these two data structures.

The basic unit of information stored on a computer is the bit. Bits can only store zeros or ones. All data information, such as strings and numbers, is represented by a combination of zeros and ones in a computer. 8 bits are 1 byte (1B),1024 bytes are 1KB, and 1024KB is 1 MB. In other words, there are 8388,608 bits in just one Megabyte. If we use the zeros and ones in each bit to represent a piece of information, then just one Megabyte can store a lot of content.

All programming languages provide a set of standard “bit” operators to operate the bits at the bottom of the computer. The execution efficiency of bit operations is much higher than that of operation instructions such as addition, subtraction, multiplication, division and modulus. The following are the six operators provided by C language:

An operator instructions
& Bitwise and
| Bitwise or
^ The bitwise exclusive or
~ The not
<< Shift to the left
>> Moves to the right

Redis also provides a series of “bit” operations via bitmap. Let’s take a look at some commonly used commands:

  • SETBIT: Sets the bits at the specified offset to 0 and 1 for the string value stored in the key. Automatically generates a new string value when key does not exist. The string is grown to ensure that it can store value at the specified offset. When the string value is stretched, the blank space is filled with 0. The offset argument must be greater than or equal to 0 and less than 2^32 (the bit mapping is limited to 512 MB).

Syntax: SETBIT key offset value, for SETBIT operations using large offset, memory allocation may cause the Redis server to block. It is a good practice to allocate a sufficient amount of contiguous bit space to avoid frequent memory allocation during the use of SETBIT

  • GETBIT: Gets the bit at the specified offset for the string value stored in the key. Returns 0 if offset is greater than the length of the string value, or the key does not exist.

GETBIT key offset

  • BITCOUNT: Counts the number of bits set to 1 in the given string. Nonexistent keys are treated as empty strings, so BITCOUNT on a nonexistent key results in 0.

Syntax: BITCOUNT key start

  • GET: Retrieves the contents of a bitmap. This method is often used to persistently store the contents of a bitmap to a disk database, subject to data structure transformation, as described below.

Redis is written in ANSI C language, and most of the operation instructions in bitmap are also based on the basic operation of THE C counterpoint. C language has many useful skills for the operation of the binary bit, for example, SETBIT sets the specified bit to 0 and 1. C language operates like this:

n | (1 << (m-1)); // Set n to 1 n & ~(1 << (m-1)); // Set the MTH bit of n to 0 from low to highCopy the code

We emphasize going from low to high because cpus on different computers have different sequences of bytes, big endian and small endian.

1. Little endian: Stores low-order bytes at the start address

2. Big endian: Stores high-order bytes at the start address

There are many ways to determine whether a machine is in small-endian mode or big-endian mode. Here is one of the ways to determine whether a machine is in small-endian mode or big-endian mode. :

//return 1 : little-endian
//       0 : big-endian
int checkCPUendian()
{
    union {
        unsigned int a;
        unsigned char b; 
    } c;
  
    c.a = 1;
    return (c.b == 1); 
}
Copy the code

It is important to determine whether the machine is using the big or the small end. This determines whether the program needs to reorder the bitmap data obtained from Redis using the GET command.

Let’s look at the zset, which is called an ordered set. It allows each element in the set to set a score as the weight, the score value is very important, the author studied Laravel using Redis to achieve delayed task source Lumen framework “asynchronous queue task” source analysis, the score value is set to the task to execute the timestamp, A daemon retrieves a task through a time-sliding window and executes it.

There are many common commands for zset. Here are some of the most common:

  • ZADD key score1 member1 [score2 member2] : Adds one or more members to an ordered collection, or updates the scores of existing members
  • ZCOUNT key min Max: Counts the number of members in an ordered set with a specified interval score
  • ZRANGEBYSCORE Key min Max [WITHSCORES] [LIMIT] : Returns the members of the ordered set within the specified range by the score

2. Massive data statistics artifact: Bitmap

Bitmaps use zeros and ones in bits to represent information. Since bits are the basic unit of computing in a computer, using bits can greatly save storage space and provide very fast computing efficiency compared to other data structures. Since 0 and 1 can only represent yes-or-no information, it can only be used in scenarios where there is a large amount of data and only yes or no is concerned, and even then, it is powerful. Let’s take a few examples and talk about persisting bit information to a database.

21,000 client heartbeat status records

The scenario is as follows: Each client sends a ping message to the server every 1s. After receiving the ping message, the server records the ping message, indicating that the client has a heartbeat.

There are 86,400 seconds in a day, and there are 100,000 devices. If we store the heartbeat information of devices every second in a single record of Mysql, Mysql would not be able to handle the amount of data in a single day. Considering that the heartbeat information is a yes/no property, and since the timestamp is continuous, we can use the 86400 bits in bitmap to record the heartbeat information of a device. Assuming that the device number is bit 001, we can initialize it in Redis like this:

127.0.0.1:6379> setbit 001 86399 0
Copy the code

Why 86399? Because the bit subscripts for the key start at 0, a maximum bit is initialized to avoid frequent memory allocation each time the client heartbeat is recorded.

After the client reports the data, the server uses the client device number key to report the offset bit of the time stamp relative to the early morning of the same day to set the heartbeat (the following is an example of Go language):

    currentTime := time.Now()
	startTime := time.Date(currentTime.Year(), currentTime.Month(), currentTime.Day(), 0, 0, 0, 0, currentTime.Location())
	offsetSecond := currentTime.Unix() - startTime.Unix()
	redisConn := cache.NewPool().Get()
	defer redisConn.Close()
	result, _ := redisConn.Do("SETBIT"."001", offsetSecond, 1)
	fmt.Println(result)
Copy the code

2.2 Million Users online status statistics

Many apps and websites claim to have millions of active users, and the server often needs to count the online status of users, or find some recent active users (such as users who have been online continuously for the last month). Because of the huge amount of data, how to use faster speed, smaller space to query out the results we want? The answer is bitmap. As mentioned earlier, 8388,608 bits are available in 1M space. We can map user IDS to offset bitMP, which means that millions of users’ online information can be stored in less than 1M space. Using BITCOUNT is a handy way to get the current number of online users.

23 million consumer data deduplication

In our production/consumption model, data de-duplication to avoid repeated consumption has always been a very troublesome topic. When the amount of data is small, you can store the full amount of data. You can check whether the data has been consumed each time you retrieve the data by creating an index or hash. When the amount of data is large, it is obvious that this method is very unsuitable, and the cost of data search and storage is huge. We can map each piece of consumption data to a unique int ID, which can be a millisecond timestamp * device ID, or some other combination, as long as it is unique. Assuming we pull 10 million data from the consumption queue per day, we can use less than 10 meters of storage space to record consumption information (the corresponding bit is set to 1), and the bit operation code is far more efficient than database indexes and hashes in determining whether the information is consumed.

2.4 How to synchronize Bitmap data to Mysql Database

We know that the underlying implementation of bitmap defaults to bits of the operation, which is a kind of 010101…… This is a string representation, but it is not a string (strings are stored in much more space than binary bits). It is a good practice to break the bitmap store into program ints and store them in a database. In programming languages such as PHP, the maximum number of ints can only be 63 bits (PHP uses zend_long for int values). Eight bytes represent an int value and the highest bit is a sign bit.), THE Go value is expressed in sectors, indicating a maximum of 64 binary bits at a time. We set the key of the bitmap to testBit, and the first bit of the bitmap to 1:

127.0.0.1:6379 > setbittestBit 0 1
(integer) 0
Copy the code

The Go language fetches testBit code

redisConn := cache.NewPool().Get()
defer redisConn.Close()

cfg, err := redisConn.Do("get"."testBit")
iferr ! = nil { fmt.Println(err) } fmt.Println(cfg)Copy the code

The result is:

[128]
Copy the code

1 uint8 slice = 128; 1 uint8 slice = 1; 1 uint8 slice = 1 Since the author’s computer is small end, after taking out the values, we also need to carry out bit symmetrical transformation of the values represented by each uint8 (swap the 1st and 8th bits, and swap the 7th and 2nd bits…..).

Program implementation is also very simple:

/** ** / uint8Slice []uint8 {var j uint8fork, i := range uint8Slice { j ^= (j & (1 << 0)) ^ ((i >> 7 & 1) << 0) j ^= (j & (1 << 1)) ^ ((i >> 6 & 1) << 1) j ^= (j & (1  << 2)) ^ ((i >> 5 & 1) << 2) j ^= (j & (1 << 3)) ^ ((i >> 4 & 1) << 3) j ^= (j & (1 << 4)) ^ ((i >> 3 & 1) << 4) j ^= (j & (1 << 5)) ^ ((i >> 2 & 1) << 5) j ^= (j & (1 << 6)) ^ ((i >> 1 & 1) << 6) j ^= (j & (1 << 7)) ^ ((i >> 0 & 1) << 7)  uint8Slice[k] = j } }Copy the code

We set the first bit of testBit to 1 and use the swapBit function to process the fetched value.

127.0.0.1:6379 > setbittestBit 1 1
(integer) 0
Copy the code

In theory we should get 3, and that’s what it is:

redisConn := cache.NewPool().Get()
	defer redisConn.Close()

	cfg, err := redisConn.Do("get"."testBit")
	iferr ! // Uint8slice := CFG.([]uint8) swapBit(int8Slice) fmt.Println(int8Slice)Copy the code

We mentioned previously, no matter what kind of programming language at best, but it can store 64 int value of contiguous bits, we might as well change a train of thought, the continuous 60 stored in an int64 variable (at the time said, 1 minute is 60 seconds, an hour is 60 minutes, so an int value could be expressed in a more specific meaning), The actual conversion process is more complicated, that is, every 7.5 bytes is converted to an INT64 value (but only 60 bits are used). Here is a code example for interested readers to explore:

// Convert uint8 to uint64, use only 60 bits of storage: 1 minute => 60 seconds func fillInt64(int8Slice []uint8, int64Slice []uint64) {int64Len := len(int64Slice)for i := 0; i < int64Len; i++ {
		odd := i % 2
		if odd == 0 {
			offset := uint64(float64(I) * 7.5) int64Slice[I] = uint64(int8Slice[offset]) + uint64(int8Slice[offset + 1]) << 8 + uint64(int8Slice[offset +) 2]) << 16 + uint64(int8Slice[offset + 3]) << 24 + uint64(int8Slice[offset + 4]) << 32 + uint64(int8Slice[offset + 5]) <<  40 + uint64(int8Slice[offset + 6]) << 48 + uint64(int8Slice[offset + 7] & 31) << 56 }else {
			offset := uint64(math.Floor(float64(i) * 7.5))
			int64Slice[i] = uint64(int8Slice[offset] & 230) >> 4 + uint64(int8Slice[offset + 1]) << 4 + uint64(int8Slice[offset + 2]) << 12 + uint64(int8Slice[offset + 3]) << 20 + uint64(int8Slice[offset + 4]) << 28 + uint64(int8Slice[offset + 5]) << 36 + uint64(int8Slice[offset + 6]) << 44 + uint64(int8Slice[offset + 7]) << 52
		}
	}
}
Copy the code

We can convert bitmap values into int64 slice values, which can be stored in the database using the ‘,’ partition. Mysql query statistics only need to load the internal data int according to the ‘,’ partition, and calculate the offset.

3. Zset: Time sliding window, the first choice for scheduled tasks

As we said earlier, setting score in ZSet as a timestamp can form a time sliding window, which can be used to achieve very rich functions in business logic. Let’s take a look briefly:

3.1 Delay task Timer

When it comes to timers, developers often think of javascript timers. In fact, there are timers on the server, which control the application to perform tasks at a specified time. PHP uses the PCNTL library to implement a timer.

<? php pcntl_signal(SIGALRM,function () {
    echo 'Received an alarm signal ! ' . PHP_EOL;
}, false);

pcntl_alarm(5);

while (true) {
    pcntl_signal_dispatch();
    sleep(1);
}
Copy the code

In this example, a task is set to be executed after 5 seconds by sending signals. This is rarely done in the production environment. The Laravel program can set a task delay by running the following command:

$job = (new ExampleJob())->delay(Carbon::now()->addMinute(1));
dispatch($job);
Copy the code

The principle is to package tasks as payload, compose a message body, add it to Redis, set its score as the timestamp of the task to be executed, and extract the task to be executed by a daemon every certain time (for example, scan zset in 3s).

Go language is friendly to timer support, interested readers can study, Go 1.12 version uses 4-fork heap, to 1.13 version uses 64-fork heap, can maintain millisecond error in tens of thousands of high concurrency, has a wide range of applications in production environment.

3.2 Task planning list

The art of war goes like this: “Before soldiers and horses arrive, food and herbs come first”. In many micro-services, the server often arranges a large number of resources and tasks to be delivered to the server first, and then uses cron scripts or daemons to deliver the tasks at the specified time. This is where the zset structure comes in handy again. You can store the timestamp of the task to be executed as score, thus forming a scheduled task planner, which is similar to the principle of the delayed task mentioned above.

3.3 How to Realize real-time monitoring of online Status of 100,000 Clients

Previously we talked about how to use bitmap to store client heartbeat information. Now let’s look at how to use Zset to implement real-time monitoring of client state. We only care about the final state of the client. We set the number of the device as key, and the last time stamp of the heartbeat reported by the device as SCORE. When the client reports the heartbeat information, we only need to update the score of the device.

Through a daemon process, you can use the ZRANGEBYSCORE command of Redis to retrieve the heartbeat information of the devices in the previous 3 seconds to determine which devices have no heartbeat during this period (offline). Get 1575880725->1575880728 heartbeat device:

func main() {
	redisConn := cache.NewPool().Get()
	defer redisConn.Close()

	result, err := redis.Strings(redisConn.Do("ZRANGEBYSCORE"."client_heart_bit", 1575880725, 1575880728, "WITHSCORES"))
	iferr ! = nil { fmt.Println(err) } fmt.Println(result) }Copy the code

4. The conclusion

This article doesn’t go into much code logic and architectural design, but shows you how to use Redis Bitmap and Zset to solve problems in the right scenarios from a business perspective. At the same time, the author has extended a lot of expansion content, which is just to throw a brick to attract jade. Readers who are interested can explore further and study it. The summary is as follows:

  • 1. When the data information is large and the information expressed is either yes or no, bitmap can be used if the information can be mapped to an int value
  • 2. Zset structure is very useful in implementing delayed tasks, scheduled tasks and simple monitoring statistics services on the server side