The interview scene

Interviewer: What types of data does Redis have?

Me: String, List, set, zset, hash

Interviewer: No?

Me: Oh oh, and HyperLogLog, bitMap, GeoHash, BloomFilter

Interviewer: That’s it? Go home and wait for word.



preface

I’m sure 100% of you can answer the first one, but not many of you can answer the second one, but that’s not what I’m going to talk about today.

I went from my own interview answer train of thought, and as an interviewer he wanted to hear the standard answer to give you a stage, the Redis base type of articles (series), writing this I still have a lot of experience, don’t know how many people like me in the beginning, the interviewer asks what type of answer that five ended, If that’s you feel free to leave a comment and let me see how many of you feel that way.

However, an interview can take at least half an hour to start, and you have answered five important points in one sentence. Is this the result you want? Is it what the interviewer wants?

Let me ask you one more question, and you may be confused: how is String stored in Redis? How are these data types stored in Redis? Is Redis fast only because it is single threaded and memory based?

Baby, have you touched the blind spot? Don’t panic, I used to be like this, I thought I recited the five kinds of finished, the result was arranged by the interviewer a wave, behind my painstaking practice, finally is a little better, now is also very familiar with the cache, you will not be ok, there is me, darling.

The body of the

Redis is developed in C language, which has its own character type. However, Redis does not directly use the string type of C language, but builds the abstract type of dynamic string (SDS) by itself.

Just like this command, in fact, I created two SDS in Redis, one is Key SDS named aobing and the other is Value SDS named cool. Even the List of character types is composed of many keys and values.

SDS is not only used as a string in Redis, but also as a buffer. What does an SDS look like? What are the advantages?

For this reason I went to find the source code of Redis, you can see the result of SDS value is about like this, the source code is open source on GitHub, you can search for it.

struct sdshdr{ int len; int free; char buf[]; }Copy the code

Back to the original question, why did Redis use its newly developed SDS instead of C strings? Well, let’s see what the difference is.

SDS and C strings

  1. Different counting methods

C calculates the length of a string completely by traversing it from the beginning to the end, and stopping until null characters are found. In this way, the time complexity for obtaining the length is 0 (n), something like the following:


However, such counting leaves a trap, which is why Redis doesn’t use C strings, as I’ll mention later.

Redis, I have shown you the structure above, has stored the length information itself, so we get the length of time complexity is 0 (1), did you find a little bit faster Redis? That’s not all. That’s not all.

  1. Prevent buffer overflow

String concatenation is a common operation in C and Redis, but the problem is that C does not record the length of the string. Once we call the concatenation function, if we do not calculate the memory in advance, it will cause the cache overflow.

Let’s say the string looks like this:

You now need to concatenate in the back, but if you didn’t calculate the memory properly, you might end up like this:

Is that what you want? Apparently, no, your results have been accidentally modified, and if this were an online system, wouldn’t it be over? So how does Redis avoid this?

As we all know, his structure stores the current length and the unused length of free. That’s easy. Now you have done the splicing operation, AND I will judge whether some of them can fit.

These are all in Redis source code can see the corresponding API, after I will not a paste source code, interested can go to see a wave, need a bit of C language foundation.

  1. Reduce the number of memory reallocations when modifying strings

C string is also an array, and each time it is created, it creates a character of N+1 length. The extra 1 is used to hold the empty character, which is also a pit, but it is not the subject of this section.

Redis is a cached database, and if we need to do frequent concatenation and truncation of strings, if we write code and forget to reallocate memory, we can cause buffer overflows and memory leaks.

The memory allocation algorithm is time consuming, not to mention whether you forget to reallocate memory, and even if you remember all of it, this is an overhead we should avoid for a cached database.

In order to avoid defects like C strings, Redis adopts two solutions to maximize performance and space utilization:

  • Space preallocation: When we extend SDS, Redis will allocate memory for SDS, and according to the specific formula, allocate the extra free space, as well as the extra 1byte space (which is also for empty characters), so that we can avoid the memory allocation caused by continuous string addition.

    For example, there is a character like this:

We call the concatenation function, the side of the string is longer, Redis will also calculate a free value for him to spare:

As we continue to concatenate, you will find that the spare free is used, saving the memory redistribution:

  • Lazy space release: When we perform a string reduction operation, Redis will not immediately reclaim our space, because it can prevent you from adding more operations. This can reduce the cost of allocating space, but Redis will still reclaim the space if you do not use it again to prevent memory waste.

    The same string:

When we call the truncated function, the free space is not immediately freed:

If we need to add more space, we can use it to reduce memory reallocation. If the space is no longer needed, we can call the function to delete it:

  1. Binary security

Look at the boy must see more than once I’ve mentioned above null character ‘\ 0’, C language is the length and width of the character to judge a character of judgment, but there are a lot of data structure often interspersed with null character in the middle, such as images, audio, video, compressed file binary data, such as the following words, It can only recognize the front characters but not the back characters, which is obviously not what we want as developers, right?

Redis does not have this problem, it is not the string length, he does not judge the null character, he determines the length is not good, so Redis is often used to save all kinds of binary data, I am very high anyway, often used to save the binary of small files.

Reference: Redis design and implementation

conclusion

Do you realize that a little SDS has so much sense here?

Before I know Redis fast, at most say a Redis is single thread, say a multiplexing IO, say a memory based operation is over, now whether can also expand to say?

This is the first chapter of a series of articles, will be updated in succession, DO not know if you like this type, you can leave a message to give me feedback.

We go to the interview together, the same question, is that some people can pass, some people can not pass, people often blame their education background, their past experience, but you can ask yourself, the underlying details are in-depth? Details are often the most important, but also the least people know, how to open the gap with other boys to get the offer, I think it is decided by such details, who won’t back?