preface

RedisThe fourteenth in the series aboutBitmaps. I don’t think people use this very often, but there’s something called bloom filters and there’s a way to implement it based onRedistheBitmaps. For those of you unfamiliar with bloom filters, check out my second postThis bloem filter is enough. After reading good brothers know is what principle, like the title, one is enough (not the title party). Look at that. That’s another lesson. Invisible in the force of the force gushing.

An overview of the

Take a literal split between bit and map. First, a bit is short for binary unit or binary digit. It is the smallest unit in computer memory. In a binary computer system, each bit can represent either a zero or a one (binary) digital signal (either black or white or zero or one). The HashMap of Java(the best language in the world, no one else). In Redis, Bitmaps are not themselves a data structure. You can think of Bitmaps as an array of bits, each of which can only store zeros and ones. The subscripts of the array are called offsets in Bitmaps. It’s actually a string, but it can operate on the bits of the string.

Binary representation string

Good brother will ask, how does that workbitTo represent a string? Well, I’m going to give you an examplebitTo represent theDawn. First of all,DawnalphabeticASCIICode respectively is68,97,119,110. thebitCan only save01So let’s take the correspondingASCIIThe code is converted to binary01000100,01100001,01110111,01101110(One byteByteOf the eightbit). The following figure

ASCIICode table (so close to the good brothers to code table are ready, not sureLike and follow?)

Bitmaps store structure

We already know thatBitmapsGeneral idea, what does it look like in Redis? In fact, it is similar. The corresponding binary string is stored as value, so that the string can be calculated in bitwise, as shown in the figure

The command

Background: Store Bitmaps for each individual user whether they have visited the site or not, mark the users who visited as 1 and those who did not visit as 0, and use the offset as the user ID.

Set the value

Set the first keyoffsetIn the ones place, let’s say I have 20 users,Userid = 0,5,11,15,19The current Bitmaps initialization result is shown below.

## format key, offset, value: 0 or 1
setbit key offset value
## Specific operation process
127.0.0.1:6379> setbit unique:users:2020- 12- 13 0 1
(integer) 0
127.0.0.1:6379> setbit unique:users:2020- 12- 13 5 1
(integer) 0
127.0.0.1:6379> setbit unique:users:2020- 12- 13 11 1
(integer) 0
127.0.0.1:6379> setbit unique:users:2020- 12- 13 15 1
(integer) 0
127.0.0.1:6379> setbit unique:users:2020- 12- 13 19 1
(integer) 0
Copy the code

If I have one right nowuserid=50The structure of Bitmaps becomes the following, with bits 20 to 49 being 0.

A lot of users of our appsidStarts with a specified number (such as 10000 or generated using the Snowflake algorithm) and directly sends the useridandBitmapsThe offset corresponding to is bound to cause a certain amount of waste, the usual practice is to do each timesetbitSubtract the specified number from the user ID when you operate. When initializing Bitmaps for the first time, if the offset is very large, the entire initialization process will be slow and may cause Redis to block.

The values

If we need to find out if the user with id 11 visited the site at 2020-12-13, we can use the following command

## format key, offset, subscript offset
gitbit key offset
Return 0 or 1 if offset is not present in the key
127.0.0.1:6379> getbit unique:users:2020- 12- 13 11
(integer) 1
Copy the code

Gets the number of values 1 in the range specified by Bitmaps

Count the number of users who visit the site by day

127.0.0.1:6379> bitcount unique:users:2020- 12- 13
(integer) 5
Copy the code

If you want to calculate the number of unique users between the first byte (a byte equals 8 bits, 0-17offset) and the third byte, the corresponding user ids are 11, 15 and 19.

The format [start] and [end] represent the number of start and end bytes
bitcount [start] [end]
## Return the statistics result
127.0.0.1:6379> bitcount unique:users:2020- 12- 13 1 3
(integer) 3
Copy the code

Calculations between Bitmaps

Bitop is a composite operation that can do and, OR, NOT, xOR of multiple Bitmaps and store the results in destkey. For example, assume that userids 1, 2, 5, 9 are used to access the web site in 2020-12-12. Intersection: calculate the number of users who visited the site on 2020-12-12 and 2020-12-13

## format op: specific operation (and or),destkey: result set
bitop op destkey key[key....]
# # intersection
127.0.0.1:6379> bitop and unique:users:and:2020- 12- 12 _13 unique:users:2020- 12- 12
unique:users:2020- 12- 13
(integer) 2
127.0.0.1:6379> bitcount unique:users:and:2020- 12- 12 _13
(integer) 2
Copy the code

Union: calculate the number of users who visited the site on any given day (e.g. monthly active)

# # and set
127.0.0.1:6379> bitop or unique:users:or:2020- 12- 12 _13 unique:users:2020- 12- 12 unique:users:2020- 12- 13
(integer) 2
127.0.0.1:6379> bitcount unique:users:or:2020- 12- 12 _13
(integer) 6
Copy the code

Calculates the offset in Bitmaps where the first value is targetBit

For example, we want to calculate the minimum user ID for accessing a web site between the 0th byte and the first byte in 2020-12-13

## format targetBit: 0 or 1, [start] and [end] represent the start and end bytes, as above
bitpos key targetBit [start] [end]
# # chestnuts
127.0.0.1:6379> bitpos unique:users:2020- 12- 13 0 0 1
(integer) 0
Copy the code

contrast

In the scenario of large data volume and high daily activity. Let’s say the site has 100 million users and 50 million unique visitors per day. If you store a graph of active user results per day using collection types and Bitmaps, respectively



Good brother, see the difference. Obviously, using Bitmaps in this situation can save a lot of memory, especially over time



But if the site has a small number of users, say 100,000, then using Bitmaps (too large a base) is not a good idea, with a large number of zeros.

Usage scenarios

  1. Statistics website daily/monthly activity
  2. Users check in by the day
  3. Collect statistics about user login and online status
  4. Implement Bloom filter, unfamiliar can see bloom filter this article is enough (there will be implementation Demo later)

conclusion

Proper use of Bitmaps can significantly reduce memory footprint, and in some statistics with only two states, performance is very good. But in terms of value store state, Bitmaps can actually only store bits, which means 0 or 1. This means that we need to count or mark no more than two states, such as male or female, and cannot count or mark when some mysterious force has a third gender. This is one of the limitations of using Bitmaps. In addition, if used improperly, there will be a waste of memory space, inefficient situation, the above mentioned. Good brother discretion use oh.

That’s the end of this issue. Welcome to leave your comments in the comments sectionAsk for attention, ask for likes

Next article: Fantastic Redis HyperLogLog last article :Redis long swastika Lua details