Is Redis single-threaded or multi-threaded

Prior to Redis 4.0, Redis was completely single-threaded. Multithreading is introduced in 4.0, but the core process is still single-threaded. Core processes: receiving commands, parsing commands, executing commands, returning results, etc.

Redis 6.0 once again introduces the concept of multithreading, but unlike 4.0, this time it involves the core processes described above. In Redis 6.0, multi-threading is mainly used in the network I/O stage, that is, the stage of receiving commands and writing back the results, while in the command execution stage, it is still executed by a single thread in serial. Since the execution is still serial, there is no need to worry about concurrency security.

Redis 6.0 implementation process introduction:

  1. When a read event arrives, the main thread places the client connection on the global wait read queue
  2. Read the data
    1. The main thread allocates client connections waiting for read queues to THE I/O thread using a polling scheduling algorithm.
    2. The main thread is also responsible for handling a client connection read event;
    3. After the main thread finishes processing the read event for the connection, it spins until all I/O threads finish processing
  3. Command execution: The main thread executes the client commands sequentially in the order in which the events are added to the global wait read queue (this ensures that the execution order is correct), and then places the client connection on the global wait write queue
  4. Write back result: Similar to the wait read queue processing, the main thread uses the polling scheduling algorithm to allocate the client connections waiting for the write queue to the I/O thread for processing, and also processes one of its own. When the main thread finishes processing, it spins until all THE I/O threads finish processing, and finally empties the queue.

Why is Redis single threaded

Full memory operation, CPU will not be the bottleneck, the bottleneck is mostly memory and network bandwidth. The most recent version 6.0 introduced multithreading to the core process, primarily to address redis’ performance bottlenecks on network I/O. For the core command execution phase, it is currently single threaded.

Why does Redis use single process and single thread quickly

  1. Memory-based operations
  2. We use I/O multiplexing model, SELECT, epoll, etc., and develop our own network event processor based on REACTOR model
  3. Single threading reduces the performance overhead by avoiding unnecessary context switches and race conditions.
  4. Optimized the data structure, simple dynamic string, compressed list, hop table, etc.

Redis data structure

Basic 5:

  • String: a String, the most basic data type.
  • List: indicates a List.
  • Hash: Hash object.
  • Set:
  • Sorted Set: an ordered Set with a score added to it.

Advanced 4:

  • HyperLogLog: Usually used for radix statistics. A small amount of fixed-size memory is used to count the number of unique elements in a collection. The statistical result is not an exact value, but an approximation with a standard error of 0.81%. Therefore, HyperLogLog is suitable for scenarios where the accuracy of statistical results is not very high, such as UV statistics of websites.
  • Geo: New feature in redis 3.2. The geographic location information given by the user can be stored and manipulated: the distance between two locations can be obtained, and the geographic location set within a specified range can be obtained according to the coordinates of the given geographic location.
  • Bitmap: a Bitmap.
    • User status statistics; Key is the number of days, and the user ID is offset
    • Bloom filter
  • Stream: Mainly used for message queues, similar to Kafka, which can be considered an improved version of pub/ Sub. It provides message persistence and master/slave replication functions, allowing any client to access data at any time, and remembering the location of each client’s access, but also ensuring that messages are not lost.

The underlying implementation of ordered collections

  1. ZipList: compressed list
  2. SkipList: jump table

Ziplist: the ziplist is implemented using a compressed list. If the length of the stored elements is less than 64 bytes and the number is less than 128, use this encoding method. Otherwise, skiplist is used. These two parameters can be customized by using zset-max-ziplist-entries and zset-max-ziplist-value.

Skiplist: Implementation of zset, a zset containing both a dict and a skiplist

Why don’t they use red black trees

  1. Both have the same time complexity, query and maintenance are O(nlogn)
  2. Red-black tree implementation code complex

The hash underlying stores data structures

  1. ZipList: compressed list
  2. Hash table :(inferior to Java hashmap)

Expansion flow of Hash objects

A “progressive rehash” method is used to expand the hash object. The steps are as follows:

  1. Calculate the size of the new table and allocate space

  2. Setting the value of the rehash index counter variable rehashidx to 0 indicates that rehash has officially started.

  3. Each time a dictionary is added, deleted, found, or updated during rehash, the program performs an additional rehash operation, the _dictRehashStep method in the source code, in addition to the specified operation.

_dictRehashStep: rehash an index position. This method will look backwards from the rehashidx index of HT [0], find the first non-empty index position, and rehash all nodes of this index position to HT [1]. When the rehash is complete, Empty the rehashidx node indexed by HT [0] and increment the value of the rehashidx attribute.

  1. It’s nice to spread rehash over each operation, but what if the server is idle and there’s nothing going on? Why should Redis keep two hash tables?

Of course not. As we know, Redis has time events as well as file events. Redis periodically fires time events that are used to perform background operations, including rehash operations: When Redis finds that a dictionary is rehashing, it takes 1 millisecond to help with the rehash.

  1. As the operation continues, eventually at some point in time all key-value pairs of HT [0] are rehashed to HT [1], at which point the rehash process is complete and the final cleanup is performed: Free ht[0], point HT [0] to HT [1], reset HT [1], and reset rehashidx to -1.

What is the downside of rehash in the case of large amounts of data

  1. At the beginning of the expansion period, ht[1] will be allocated space first. Therefore, during the expansion period, BOTH HT [0] and HT [1] will exist, occupying extra space.

  2. Ht [0] and HT [1] exist at the same time during capacity expansion. Searching, deleting, and updating two tables may require operation, which increases the time.

  3. When the memory usage of Redis is close to maxMemory and the expulsion strategy is set, the occurrence of rehash will make the memory usage exceed maxMemory and trigger the expulsion operation, resulting in the expulsion of a large number of keys in both master and slave. Therefore, the master/slave is inconsistent.

Redis Network Event Handler (Reactor model)

Redis policy for deleting expired keys (cache invalidation policy, data expiration policy)

  1. Scheduled deletion: When you set the expiration time of a key, create a timer to delete the key immediately when the expiration time of the key comes. Most memory friendly, least CPU time friendly.

  2. Lazy delete: let the key expire, but each time the key is retrieved, check whether the key is expired, if so, delete the key; If not, the key is returned. Optimized for CPU time, least memory friendly.

  3. Delete periodically: every once in a while, default 100ms, the program will check the database, delete the expired key inside. It is up to the algorithm to determine how many expired keys to delete and how many databases to check. The first two strategies are balanced in terms of CPU time and memory friendliness.

Redis uses lazy deletion and periodic deletion.

Redis’ memory flushing strategy

How is the LRU algorithm of Redis implemented?

What are the persistence mechanisms of Redis, their implementation principles and advantages and disadvantages?

How should I choose Redis persistence

In general, if you want to maximize data security, you should use both RDB and AOF persistence, and you can also enable mixed persistence.

If you care deeply about your data, but can still afford to lose it within minutes, you can use RDB persistence only.

If your data can be lost, you can turn off persistence, in which case Redis performs best.

Set appendfsync to always to ensure that data is not lost. In this case, you can use a database like MySQL.

How does Redis ensure high availability

  1. master-slave
  2. The sentry
  3. Cluster pattern

A master-slave mode

The guard mode

Cluster pattern

There are 100 million keys in Redis, 10 of which contain Java. How do I find all of them?

  1. keys javaCommand, which performs well, but may have performance problems when a large amount of data is generated
  2. scan 0 MATCH java

Implementation Principle of SCAN

Redis vs. Memcached

Redis implements distributed locking as well as Redlock

When using caching, do you write to the cache or to the database first

There’s a lot of dirty data in both of them

Update the cache or invalidate it?

Invalidates the cache because invalidation is idempotent

How to ensure data consistency between database and cache

Because the database and cache are two different data sources, to ensure the consistency of their data, in fact, is a typical distributed transaction scenario, can introduce distributed transactions to solve the common: 2PC, TCC, MQ transaction messages.

Therefore, in practice, it is usually not to ensure strong consistency between the cache and the database, but to make certain sacrifices to ensure the ultimate consistency of the two data.

If dirty data is unacceptable, it is reasonable to abandon the cache and go directly to the database.

1) Update the database. The database generates a binlog.

2) Listen to and consume binlog, perform invalidation cache operation.

3) If step 2 fails to invalidate the cache, a retry mechanism will be introduced. The failed data will be retried through MQ, and whether an idempotent mechanism needs to be introduced is considered.

Bottom line: When unknown problems occur, timely alarm notification, human intervention processing.

Human intervention is the ultimate method, those who look bright and beautiful applications, behind most of them are a group of helpless and painful programmers, constantly fixing all kinds of dirty data and bugs.

Cache penetration description and solution

Access a cache and database do not exist key, this time will directly hit the database, and find no data, can not write cache, so the next time will also hit the database. In this case, the cache does not work. The request will be sent to the database every time, and the database may be suspended when the traffic is heavy. At this point, the cache appears to have been “penetrated” and does nothing.

Solution:

  1. Interface verification. In normal business processes, there may be a small number of accesses without keys, but not a large number of accesses. Therefore, this scenario is most likely to be the result of an illegal attack. You can perform verification at the outermost layer, such as user authentication and data validity verification. For example, if the ID of a product is a positive integer, you can directly filter the non-positive integer.

  2. Cache null values. If no value is found in the access cache or DB, the null value can be written to the cache, but the expiration time is shorter. The expiration time needs to be set based on product service features.

  3. Bloom filter. A Bloom filter is used to store all keys that may be accessed. Non-existent keys are filtered directly, and existing keys are further queried in the cache and database.

Bloom filter implementation scheme

Cache breakdown description and solution

Description: a hot key has a large number of requests coming in at the moment when the cache expires. Because the cache expires at this time, the requests will eventually go to the database, resulting in a large number of instantaneous database requests, a sudden increase in pressure, and even the database may collapse.

Solution:

  1. The mutex. Local/distributed lock
  2. Hotspot data does not expire. The cache is set to never expire, and the scheduled task asynchronously loads the data and updates the cache.

Cache avalanche description and solution

Description: A large number of hotspot keys set the same expiration time, resulting in the cache at the same time all failure, resulting in instantaneous database request volume, pressure surge, avalanche, and even lead to database suspension.

Solution:

  1. Break the expiration time. Since a large number of caches fail centrally, the easiest thing to think about is to let them fail centrally. You can add a random value to the expiration time of the cache so that the expiration time of each key is distributed and not concentrated at the same time.

  2. Hotspot data does not expire. This approach, like cache breakdown, also focuses on the refresh interval and how data exceptions are handled.

  3. Add mutex. This method is the same as cache breakdown. Lock by key dimension. For the same key, only one thread is allowed to calculate.

What are the events of Redis

  1. File events
    1. 1
  2. Time event 2.2

reference

Redis 2021 is the most comprehensive test