Redis is a logging, key-value open source database written in C language, which can be based on memory or persistence. It can be used as database, cache and message middleware. It is also the most popular cache middleware at present. It is known that the core operation of Redis is realized by single thread, so why can it have such high efficiency? This paper will briefly describe the data structure, process thread model and other aspects, and incidentally mention the common problems and algorithms of Redis
The data structure
Redis supports five basic data types:
The data type | define | Usage scenarios |
---|---|---|
String (String) | The basic data type of Redis. Each Key corresponds to a Value | Caching, counters, distributed locks, etc |
Hash (Hash) | A Hash is a mapping of keys and values of string type, which is especially useful for storing objects | User information and Hash tables |
List | A List is a simple List of strings, sorted by insertion order. You can add an element to the head (left) or tail (right) of the list. It’s like a linked list | Linked list, queue, timeline list of weibo followers, etc |
Set | A Set is an unordered Set of strings and fishing must be repeated, through a hash table, adding, deleting, searching is O(1) | To like, like, stomp on, mutual friends, etc |
Ordered set (Zset) | Zset, like set, is a collection of string elements and does not allow duplicate members. The difference is that each element is associated with a double score. Order the members of the collection from smallest to largest by a fraction | The number of visits, clicks, etc |
And range queries, Bitmaps, Hyperloglogs and Geospatial index radius queries |
Why is Redis fast?
Redis adopts the KV database based on memory and adopts the single-process single-thread model. The official data can reach 100000+ QPS (query times per second).
Cause analysis:
-
Completely memory based, the vast majority of requests are pure memory operations, very fast. Data is stored in memory, similar to a HashMap, which has the advantage of O(1) time complexity for both lookup and operation.
-
The data structure is simple and the operation of the data is also simple. The data structure in Redis is specially designed, such as SDS, skip lists and so on
-
The use of single thread, avoid unnecessary context switch and competition conditions, there is no multi-process or multi-thread caused by the switch and CPU consumption, do not have to consider the problem of various locks, there is no lock release lock operation, there is no performance consumption due to the possibility of deadlock
-
Use multiplex I/O multiplexing model, non-blocking IO;
-
The use of the underlying model is different, between them the underlying implementation and communication with the client between the application protocol is not the same, Redis directly built their own VM mechanism, because the general system call system function, will waste a certain amount of time to move and request;
Redis is different from Memcached
- Redis supports server-side data manipulation: Redis has more data structures and supports richer data manipulation than Memcached. In Memcached, you usually need to take the data to the client to make similar changes and set it back. This greatly increases the number of network IO and data volume. In Redis, these complex operations are usually just as efficient as regular GET/SET. So if you need a cache that can support more complex structures and operations, Redis is a good choice.
- Memory usage comparison: Memcached is more efficient using simple key-value storage. Redis uses hash hash for key-value storage, which is more efficient than Memcached due to its combined compression.
- Performance comparison: Because Redis uses only one core and Memcached can use multiple cores, Redis performs better than Memcached on average for storing small data on each core. For data over 100K, Memcached outperforms Redis, which has recently been optimized for storing large data, but still lags behind Memcached.
- Cluster mode: Memcached has no native cluster mode. You need to rely on the client to write data in fragments to the cluster. However, Redis currently supports clustering natively. Redis officially supports Redis Cluster mode, which is better than Memcached.
Cache avalanche
A large number of redis cache keys fail at the same time, resulting in a large number of requests to the database, resulting in excessive database pressure solution:
- Set the cache expiration time reasonably, plus a random value
- Hotspot data never expires and cached data is periodically refreshed
- Mutex, get to access the database again, performance impact
The cache to penetrate
A large number of requests to Redis data does not exist, resulting in a large number of requests to the database, may be the attack page may be illegal parameter solution:
- Mutex, get to access the database again, performance impact
- Return NULL, asynchronous update
- Valid parameter verification, Bloom filter
- Objects that are empty in the database query are also placed in the cache, but the expiration time is set to be shorter
Cache breakdown
The hot key expires, resulting in a large number of requests directly reaching the database, resulting in a sharp increase of database pressure.
- Mutex, distributed lock, only one thread can grab the lock, so only one thread can go into the database, put the data in the cache, and the other thread can get the data out of the cache
- Never expire and refresh the cache periodically
persistence
Bgsave manual, too troublesome
The RDB profile configures the persistence policy
It’s still possible to lose data
AOF
The appendonly yes configuration has a bit: The real-time recording command file is large and the file is large for a long time
Can be used in conjunction with RDB
Master-slave synchronization
Master from taking, reading and writing separated
Publish and subscribe model
The guard mode
After the primary server is down, a new primary server is selected for reading and writing. After the primary server is recovered, it can still serve as the node of the cluster
Expiration mechanism for key
Although Redis can set an expiration date for cached keys, it does not necessarily mean that cached data will be discarded when the expiration date is reached
Periodically delete
By default, it scans 10 times per second, that is, once every 100ms. Expired scanning does not scan all the data (scanning all the performance is too small), but adopts a very simple greedy strategy:
- Randomly select 20 keys from the expired dictionary
- Delete expired keys from these keys
- If more than a quarter of the elements are removed, repeat steps 1 and 2
Lazy to delete
Query will see whether the key expired, expired delete data, do not return any content to make up for the deficiency of regular deletion, there may be a lot of expired key in regular deletion and has not been successfully deleted periodic deletion is fund-raising red processing, inert deletion is scattered processing
Memory flushing strategy
Whether it is periodic deletion or lazy deletion, it is not a completely accurate deletion, or there will be a key is not deleted, so the memory elimination strategy is needed
- Noeviction: Errors will be returned when memory usage exceeds configuration, no keys will be expelled
- Allkeys-lru: When adding a key, if the limit is exceeded, the lRU algorithm will first expel the key that has not been used for the longest time
- Volatile -lru: Expelling the oldest unused key from the set of expired keys if the limit for adding a key is exceeded
- Allkeys-random: if the number of keys added exceeds the threshold, allkeys will be randomly deleted
- Volatile -random: Randomly expel expired keys from the set if the number of keys added exceeds the limit
- Volatile – TTL: Expels keys that are about to expire from keys with expiration time configured
- Volatile – lFU: Expels the least frequently used key from all keys with expiration time configured
- Allkeys-lfu: removes the least frequently used key from allkeys