In this paper, the brain figure

preface

Redis is an open source non-relational memory database written based on C language, which can be used as a database, cache, message middleware, so excellent things customers must understand it bit by bit.

I have written two articles about Redis before, and both the reading amount and the reader’s reaction are ok. The first one is about the three major problems of Redis cache [].

The second is Redis’ memory management and obsolescence strategy [].

This is the third article about Redis. It mainly explains the detailed explanation of five data structures of Redis, including the implementation of the underlying principles of these five data structures.

Theory must be applied to practice, so the most important part is the practical part, which is to explain the five data structure application scenarios.

Without further comment, let’s jump right into the subject matter. Many people know that the five data structures of Redis include the following five:

  1. String: String type
  2. List: List type
  3. Set: Unordered collection type
  4. ZSet: Ordered collection type
  5. Hash: Indicates the hash table type

But a good programmer might not be able to stop at doing CRUD work with only five types, and still need to understand the underlying principles of these five data structures.

Redis core object

In Redis, there is a “core object” called redisObject, which is used to represent all key and value. The structure of redisObject is used to represent five data types: String, Hash, List, Set and ZSet.

RedisObject source code in redis.h, written in C language, interested in can self view, about redisObject I draw a picture here, indicating the structure of redisObject as follows:

A colorful picture of a blinding man

In redisObject, “type indicates the data type and encoding indicates how the data is stored.” This is the underlying data structure that implements the data type. So this article is about encoding.

What do storage types in encoding mean, respectively? The meanings of specific data types are shown in the following figure:

Screenshot from Redis Design & Implementation 2nd Edition

Maybe after looking at this picture, I still feel confused. Don’t panic. We’ll go through the five data structures in detail, but this graph just gives you an idea of what type of storage each data structure corresponds to.

For example, if you set a string key 234 in Redis, and then look at the storage type of the string, you will see that the storage type of the string is int, and the storage type of the string is embstr.

Type String

String is the basic data type of Redis. Redis was developed in C. However, there is a significant difference between Redis and C string types.

Int, raw, and embstr can be used to store String data structures. So what’s the difference between these three types of storage?

int

Redis specifies that if an “integer value” is stored, such as set num 123, it will be stored as an int. This value will be stored in the “PTR property” of the redisObject.

SDS

If “string is a string value and is longer than 32 bytes” is stored using SDS (Simple Dynamic String) and encoding is set to RAW. If “string length is less than or equal to 32 bytes”, the encoding is changed to embstr to hold the string.

SDS is called “simple dynamic string”, for SDS definition there are three attributes in Redis source code int len, int free, char buf[].

Len holds the length of the string, and free represents the number of unused bytes in the BUf array, which holds each character element of the string.

Therefore, when you store a string Hello in Redsi, you can draw the structure of the redisObject in the form of SDS according to the description of Redis source code:

SDS and C language string comparison

Redis must have its own advantages when using SDS as the type of string storage. Compared with the strings of C language, SDS has made its own design and optimization for the strings of C language. The specific advantages are as follows:

(1) The string in C language does not record its length, so “every time the length of the string is obtained, the time complexity is O(n)”, while in Redis, the time complexity becomes O(1) as long as len is read to obtain the string.

(2) “C language” two string concatenation, if not allocated enough memory space “will appear buffer overflow”; The SDS first determines whether the space meets the requirements according to the len attribute. If the space is insufficient, the SDS expands the space accordingly, so “buffer overflow will not occur”.

(3) SDS also provides two strategies: space preallocation and lazy space release. When you allocate space for a string, you allocate more space than you actually do to “reduce the number of memory reallocations resulting from continuous string growth.”

When a string is shortened, SDS does not immediately reclaim the unused space. Instead, SDS records the unused space through the free attribute and releases it later.

The specific space allocation principle is as follows: “if len is less than 1MB, the space with the same length as len is allocated, that is, len=free; if len is greater than 1MB, the space allocated for free is 1MB.”

(4) SDS is binary security, which can store binary files (such as binary data of pictures, audio, video and other files) in addition to strings; In c, the string terminates with an empty string, and some images contain terminators, so they are not binary safe.

In order to make it easy to understand, a comparison table of C language strings and SDS is made, as follows:

C language string SDS
The time complexity of getting the length is O(n) The time complexity of getting the length is O(1)
It’s not binary safe It’s binary safe
You can only hold strings You can also save binary data
Increasing the string n times necessarily results in n memory allocations Number of times to grow string memory allocation <=n

String Application

At this point, I’m sure many people can say that they are already proficient in Redis String type, but pure theory, theory still need to be applied to practice, the above mentioned String can be used to store images, now let’s use image storage as an example to implement.

(1) First to upload the image encoding, here write a utility class to process the image into Base64 encoding form, the specific implementation code is as follows:

@param file @return */ public static String encodeImg(MultipartFile file) {byte[] imgBytes  = null; try { imgBytes = file.getBytes(); } catch (IOException e) { e.printStackTrace(); } BASE64Encoder encoder = new BASE64Encoder(); return imgBytes==null? null:encoder.encode(imgBytes ); }Copy the code

(2) The second step is to store the processed image string format in Redis, and the implementation code is as follows:

/** * Redis store image * @param file * @return */ public void uploadImageServiceImpl(MultipartFile image) {String imgId = UUID.randomUUID().toString(); String imgStr= ImageUtils.encodeImg(image); redisUtils.set(imgId , imgStr); // The imgId can be stored in the corresponding field of the database. If you need to get the imgId from the redis, just get the imgId from the Redis. }Copy the code

This is to achieve the binary image storage, of course, the String type of data structure application also has a regular count: “statistics microblogging, statistics fans” and so on.

Hash type

Hash objects can be implemented in ziplist and Hashtable. The key of hashtable is a String, and the value is also stored in the form of key value.

The bottom of the dictionary type is the implementation of hashtable, understand the bottom of the implementation principle of the dictionary is to understand the implementation principle of hashtable, the implementation principle of hashtable can be compared to the bottom of the HashMap principle.

The dictionary

In both cases, the index is calculated using the key. The difference is that the calculation method is different. In HashMap, the index is calculated using the hash function, while in Hashtable, the index is calculated using the Sizemask property and the hash value.

As we know, the biggest problem of hashtable is hash conflict. In order to solve hash conflict, if different keys in hashtable are calculated to obtain the same index, a unidirectional linked list (” chained address method “) will be formed, as shown in the figure below:

rehash

In the underlying implementation of the dictionary, the value object is stored as a dictEntry object, and the hash table needs to be expanded or contracted as the number of key-value pairs stored in the hash table increases or decreases.

It’s going to do the same thing as a HashMap and it’s going to rehash, it’s going to rehash. Ht [0] and HT [1] are both objects.

There are four properties in the hash structure definition: dictEntry **table, unsigned long size, unsigned long Sizemask, unsigned long Used, Hash table array, hash table size, index used to calculate, always equal to size-1, and the number of existing nodes in the hash table.

The size of HT [0] determines the size of HT [1], and all key pairs in HT [0] are rehashed into HT [1] when it is expanded or shrunk.

Extension operation: ht[1] the size of the extension is the first integer power of 2 twice the current value of HT [0].used; Shrink operation: ht[0]. Used first integer power of 2 greater than or equal to.

After data migration, ht[0] is released. Ht [1] is changed to HT [0] and new HT [1] is created to prepare for the next expansion and contraction.

Progressive rehash

If there is a large amount of data in the rehash process, Redis does not successfully rehash all the data at one time, which will cause the Redis external services to stop. To deal with this situation, Redis adopts “progressive rehash” internally.

Redis performs all rehash operations in multiple steps until rehash is completed. The specific implementation is related to the rehashIndex attribute in the object. “If rehashIndex is represented as -1, there is no rehash operation”.

When the rehash operation starts, the value is changed to 0. In the progressive rehash process, “updates, deletes, and queries are performed in both HT [0] and HT [1].” For example, when updating a value, ht[0] is updated first, and then HT [1] is updated.

The new operation is directly added to the HT [1] table, and ht[0] does not add any data. In this way, ht[0] is only reduced but not increased until the table becomes empty at a certain point. In this way, the rehash operation is complete.

So that’s the basic implementation of the dictionary hashtable, so let’s take a look at the ziplist.

ziplist

Ziplist (ziplist) is a sequential data structure composed of contiguous memory blocks. Ziplist can save space by using multiple nodes to store data.

Compressed list is one of the principles of the bottom implementation of list keys and hash keys. “Compressed list is not a compression algorithm to compress and store data, but it represents the use of a group of continuous memory space to save space.” The memory structure of compressed list is as follows:

The meaning of each node in the compressed list is as follows:

  1. zlbytes: The size of 4 bytes, which records the number of bytes of memory occupied by the compressed list.
  2. zltail: The value is 4 bytes. It records the offset from the end node to the start address and is used to quickly locate the end node address.
  3. zllen: The size of 2 bytes, which records the number of nodes in the compressed list.
  4. entry: indicates each node in the list.
  5. zlend: indicates a special ending symbol for a compressed list'0xFF'.

Each entry node in the list consists of three parts: previous_entrY_ENGth, Encoding, and content.

  1. previous_entry_engthRepresents the length of the previous node entry, which can be used to calculate the actual address of the previous node, since their addresses are contiguous.
  2. Encoding: This holds the content type and length of content.
  3. Content: Content stores the content of each node.

I believe that you have a good understanding of the data structure of Hash. If it is the first time to contact the underlying implementation of the five basic data structures of Redis, I suggest you to watch it several times. The following is the application scenario of hash.

Application scenarios

Hash table is more intuitive than String type to store information, and it is more convenient to erase the European total. It is often used to manage user data and store user information.

Hash can also be used to generate unique ids using Redis in high concurrency scenarios. Let’s use these two scenarios as case code implementations.

Storing user Data

In the first scenario, if we want to store user information, we usually use the user’S ID as the key value to keep it unique, and the user’s other information (address, age, birthday, phone number, etc.) as the value.

If the traditional implementation is to encapsulate the user information into an object and store the data through serialization, when the user information needs to be obtained, the user information will be obtained through deserialization.

However, this will inevitably lead to the performance overhead of serialization and deserialization, and if only one of the property values is changed, the entire object needs to be serialized, which is too much action, resulting in unnecessary performance overhead.

If Redis hash is used to store user data, the original value is treated as a container in the form of K and V, so there is no serialization performance overhead.

Distributed generation of unique IDS

The second scenario is to generate distributed unique IDS. In this scenario, Redis is encapsulated into a tool class for implementation. The implementation code is as follows:

// Offset indicates the increment gradient of the ID. Public Long getId(String key,String hashKey,Long offset) throws BusinessException{try {if (null == offset) { offset=1L; } return redisUtil. Increment (key, hashKey, offset); } catch (Exception e) {int randNo= uuid.randomuuid ().toString().hashcode (); int randNo= uuid.randomuuid ().toString().hashcode (); if (randNo < 0) { randNo=-randNo; } return Long.valueOf(String.format("%16d", randNo)); }}Copy the code

List the type

Lists in Redis were implemented in versions prior to 3.2 using Ziplist and linkedList. Quicklist was introduced after 3.2.

With the Ziplist list covered above, let’s take a look at the structure of the LinkedList and QuickList.

A linkedlist is a two-way linkedlist, which, like a normal linkedlist, consists of Pointers to front and back nodes. The time complexity of insert, modify, update is O(1), but the time complexity of query is O(n).

Linkedlist and quicklist is the underlying implementation of the linkedlist implementation, in c language there is no built-in linkedlist data structure, Redis implemented its own linkedlist structure.

Redis linked list features:

  1. Each node has a pointer to the previous node and the next node.
  2. The prev and next Pointers of the head and tail nodes point to null, so the list is acyclic.
  3. The list has its own length of information, and the time to obtain the length is O(1).

The implementation of List in Redis is relatively simple, so let’s take a look at its application scenarios.

Application scenarios

Lists in Redis allow for “blocking queues”, which can be achieved in combination with the lpush and brpop commands. Producers use lupsh to insert elements from the left side of the list, and consumers use the brpop command to get elements from the right side of the queue for consumption.

(1) First configure the redis configuration. For convenience, I will directly put the redis configuration file in the application.yml configuration file. In practice, you can put the redis configuration file in a separate redis.

Spring redis: host: 127.0.0.1 port: 6379 Password: user timeout: 0 Database: 2 pool: max-active: 100 max-idle: 10 min-idle: 0 max-wait: 100000Copy the code

(2) The second step is to create a redis Configuration class called RedisConfig and annotate it with @configuration annotation to indicate that it is a Configuration class.

@Configuration
public class RedisConfiguration {

@Value("{spring.redis.port}") private int port; @Value("{spring.redis.pool.max-active}") private int maxActive; @Value("{spring.redis.pool.min-idle}") private int minIdle; @Value("{spring.redis.database}") private int database; @Value("${spring.redis.timeout}") private int timeout;

@Bean public JedisPoolConfig getRedisConfiguration(){ JedisPoolConfig jedisPoolConfig= new JedisPoolConfig(); jedisPoolConfig.setMaxTotal(maxActive); jedisPoolConfig.setMaxIdle(maxIdle); jedisPoolConfig.setMinIdle(minIdle); jedisPoolConfig.setMaxWaitMillis(maxWait); return jedisPoolConfig; }

@Bean public JedisConnectionFactory getConnectionFactory() { JedisConnectionFactory factory = new JedisConnectionFactory(); factory.setHostName(host); factory.setPort(port); factory.setPassword(password); factory.setDatabase(database); JedisPoolConfig jedisPoolConfig= getRedisConfiguration(); factory.setPoolConfig(jedisPoolConfig); return factory; }

Copy the code

@Bean public RedisTemplate<? ,? > getRedisTemplate() { JedisConnectionFactory factory = getConnectionFactory(); RedisTemplate<? ,? > redisTemplate = new StringRedisTemplate(factory); return redisTemplate; }}

(3) The third step is to create Redis tool class RedisUtil, since learning object-oriented, I like to dismantle some common things into tool classes, like a part, when needed, put it together.

@Component
public class RedisUtil {

@Autowired private RedisTemplate<String, Object> redisTemplate; / * *

  • Store messages to message queues
  • @ param key button
  • @ param value value
  • @return */ public boolean lPushMessage(String key, Object value) { try { redisTemplate.opsForList().leftPush(key, value); return true; } catch (Exception e) { e.printStackTrace(); return false; }}

/ * *

  • Eject a message from a message queue -
  • @ param key button
  • @return */ public Object rPopMessage(String key) { try { return redisTemplate.opsForList().rightPop(key); } catch (Exception e) { e.printStackTrace(); return null; }}

/ * *

Copy the code

  • To view the message
  • @ param key button
  • @ param start beginning
  • 0 to -1 represents all values
  • @return */ public List<Object> getMessage(String key, long start, long end) { try { return redisTemplate.opsForList().range(key, start, end); } catch (Exception e) { e.printStackTrace(); return null; }}

This completes the creation of the Redis message queue utility class, which can be used directly in the following code.

The Set collection

Both lists and collections can be used to store strings in Redis, but “a Set is an unrepeatable Set, and a List can store the same string.” A Set is unordered, as opposed to an ordered ZSet.

The underlying implementation of a Set is “HT and intSET”. Ht (hash table) has already been looked at in detail. Now let’s look at the storage structure of the Inset type.

Inset is a data structure type used to hold integer values, such as int16_t, int32_t, or INT64_t.

In the integer set, there are three attribute values encoding, length, contents[], which respectively represent the encoding mode, the length of the integer set, and the contents of the element. Length is the size of contents recorded.

When an element is added to the integer set, if it exceeds the size of the original set, the set will be upgraded. The specific upgrade process is as follows:

  1. First expand the size of the underlying array, and the array’s type is the new element’s type.
  2. Then convert the elements from the original array to the types of the new elements and place them in the corresponding locations of the extended array.
  3. The integer set will no longer degrade after being upgraded, and the encoding will remain in the upgraded state.

Application scenarios

The application scenarios of the Set can be deduplication, lottery, mutual friends, and second-degree friends. Next, simulate a case implementation of adding friends:

@RequestMapping(value = "/addFriend", method = RequestMethod.POST) public Long addFriend(User user, String friend) { String currentKey = null; / / judge whether the current user's friends if (AppContext. GetCurrentUser (). The getId () equals (user. GetId)) {currentKey = user. GetId. ToString (); } currentKey==null? 0l:setOperations.add(currentKey, friend); }Copy the code

If user A and user B have added A number of friends to each other using this interface, then there is A requirement to obtain the mutual friends of user A and user B. The following operations can be performed:

public Set intersectFriend(User userA, User userB) {
    return setOperations.intersect(userA.getId.toString(), userB.getId.toString());
}
Copy the code

By analogy, user A’s own friends, or user B’s own friends, can be implemented.

ZSet collection

ZSet is an ordered set. From the figure above, you can see that the underlying implementation of ZSet is implemented by Ziplist and Skiplist. Ziplist has been described in detail above, and here is to explain the structural implementation of Skiplist.

Skiplist is also called “skiplist”, skiplist is an ordered data structure, it through each node to maintain multiple Pointers to other nodes, so as to achieve the purpose of fast access.

Skiplist has the following features:

  1. It consists of many layers, and the number of nodes is gradually dense from top to bottom. The nodes at the top are the most sparse and have the largest span.
  2. Each layer is an ordered list with only two nodes, the head node and the tail node.
  3. Every node at every level contains Pointers to the next node at the same level and the node at the same position at the next level.
  4. If a node is present at one level, it will be present at the same location in all the lists below.

The concrete structure diagram is shown as follows:

In the structure of the skip list, head and tail indicate the pointer pointing to the head node and the tail node, which can be quickly realized after positioning. Level indicates the number of layers, len indicates the length of the skip list, and BW indicates the backward pointer, which is used when traversing from the tail forward.

Two other values below BW represent the score and the member object (the member object held by each node).

In the implementation of skip list, except that the bottom layer holds the complete data of the original linked list, the number of nodes at the top layer will be less and less, and the span will be larger and larger.

The upper layer of the skip table is equivalent to the index layer, which serves to find the final data. The larger the data volume is, the higher the query efficiency reflected by the bar table will be, which is similar to the query efficiency of the balanced tree.

Application scenarios

Because ZSet is an ordered set, it is common for ZSet to implement the sort type of business, such as recommending the 10 most popular posts on the home page, that is, the implementation of the ranking, etc.

The following is the selection of the top 10 players to obtain the leaderboard as a case, the implementation of the code is as follows:

@Autowired private RedisTemplate redisTemplate; Public static List<levelVO > getZset(String key, long baseNum, long baseNum) LevelService levelService){ ZSetOperations<Serializable, Object> operations = redisTemplate.opsForZSet(); / / according to the score score value for data Set of the top 10 < ZSetOperations. TypedTuple < Object > > Set = operations. ReverseRangeWithScores (key, 0, 9); List<LevelVO> list= new ArrayList<LevelVO>(); int i=1; for (ZSetOperations.TypedTuple<Object> o:set){ int uid = (int) o.getValue(); LevelCache levelCache = levelService.getLevelCache(uid); LevelVO levelVO = levelCache.getLevelVO(); long score = (o.getScore().longValue() - baseNum + levelVO .getCtime())/CommonUtil.multiplier; levelVO .setScore(score); levelVO .setRank(i); list.add( levelVO ); i++; } return list; }Copy the code

The basic logic of the above code is to get the top 10 data based on the score value and return it as a list of lawyerVO objects.

Now that you’ve mastered the five basic data types of Redis, you’re ready to fight your interviewer, run away, or read this article a few more times.