Differences and similarities between Redis and Memcached

Thing in common:

  1. Both are memory-based databases and are generally used as caches.
  2. They all have expiration policies.
  3. Both have very high performance.

The difference between:

  1. Redis supports richer data types (supporting more complex application scenarios). Redis not only supports simple K/V type data, but also provides the storage of list, set, zset, hash and other data structures. Memcached supports only the simplest K/V data types.
  2. Redis supports persistent data and can keep data in the memory in the disk, which can be reloaded for use upon restart. Therefore, there is a Dr Mechanism. Memecache stores all the data in memory. Once restarted, all data will be lost.
  3. Memcached is a multithreaded, non-blocking I/O reuse network model. Redis uses a single-threaded multiplex I/O multiplexing model.
  4. Redis supports publish-subscribe models, Lua scripting, transactions, and more, while Memcached does not. Redis also supports more programming languages.
  5. The deletion strategy for expired Memcached data uses only lazy deletion, whereas Redis uses both lazy deletion and periodic deletion.
  6. Memcached does not have a native clustering mode. It relies on the client to write data in fragments to the cluster. But Redis currently supports the cluster model natively.

Redis – File event handler

Redis designed and developed its own set of efficient event processing model based on Reactor model, called file event handler, and this file event handler is single-threaded, so we generally say Redis is single-threaded.

Redis uses IO multiplexing mechanism to listen on multiple sockets at the same time, and puts the socket generating the event into a queue, and then sends one socket to the file event dispatcher each time. The file event dispatcher calls the corresponding event handler according to the event type to process, and returns after processing. The I/O multiplexer then proceeds to deliver the next socket to the file event dispatcher.

Why is the Redis single-threaded model so efficient?

  • Pure memory operation.
  • The use of I/O multiplexing eliminates the need for Redis to create extra threads to listen for a large number of client connections, reducing resource consumption. Because the performance bottleneck for the vast majority of operations running in Redis services is not CPU, but memory and network.
  • Instead, the single thread avoids the frequent context switching problems of multiple threads, which can cause problems such as deadlocks and even affect performance.

The cache to penetrate

What is cache penetration? If the data cannot be queried in Redis and then queried in mysql, this phenomenon is called cache penetration. Cache penetration is very common. Generally speaking, we are afraid of high frequency cache penetration rather than low frequency cache penetration. For example, if a hacker maliciously accesses an ID that does not exist in the database, high frequency cache penetration may occur.

How to solve it? Add a Bloom filter between Redis and mysql. Bloom filter is a data identification algorithm that trades space for error rate. The principle is to calculate multiple hash values of ids in mysql using multiple Hash algorithms of Bloom filter, convert them into subscripts, and identify them as 1 in Bloom filter. If we want to query a key at this time, we also use the same hash algorithms to calculate whether the table under the corresponding index is 1. If one of the tables is not 1, the key is not in mysql and return directly. It takes only a small amount of memory to build such a Bloom filter to solve the cache penetration problem.


Cache breakdown

What is cache breakdown? Cache breakdown is essentially a special form of cache penetration. For example, there is a hot key in Redis that has been carrying a large number of requests. If the key fails at a certain moment, all the requests to access the hot key will be sent to mysql in a flash, which may cause mysql to break down. This phenomenon is called cache breakdown.

How to solve it? (Generally small and medium sized ones do not need to be solved)

  • No expiration time is set for the hotspot Key.
  • Another option is to add distributed locks so that only one request can access the Key at a time. But the larger the concurrency, the more obvious the efficiency problem.
  • Cache the data queried in mysql to Redis.

Cache avalanche

What is cache avalanche?

Cache avalanche is essentially a special form of cache penetration. For example, if we set the same expiration time for a large number of hot data, that is, when the expiration time expires and they are still hot data, a large number of requests will be sent through Redis to mysql. This phenomenon is called cache avalanche.

How to solve it? The easiest way to do this is to give caches a random factor when they expire so they don’t all expire at the same time.

SetRedis (Key, value, time + math.random () * 10000);


Redis expiration deletion policy

Periodically delete: Periodically, the system randomly selects keys with expiration time to check whether they are expired and deletes them.

Lazy delete: When the user queries the key, it checks to see if the key has expired. If it has expired, nothing is returned and the key is deleted.


Memory flushing mechanism?

  • Noeviction: New data written when out of memory will bug, no keys will be eliminated

  • Volatile – TTL: Weeding out keys that are about to expire from keys that have been set to expire

  • Allkeys-lru: Removes the least recently used key (this is the most commonly used) when memory is insufficient to accommodate new writes

  • Volatile -lru: Selects the least recently used key from keys that have expired

  • Allkeys-random: selects keys randomly from allkeys

  • Volatile -random: Selects a random key from a key whose expiration date has been set

  • Allkeys-lfu: Removes the least-used key when the memory is insufficient to hold new written data

  • Volatile – lFU: Selects the least frequently used key from the set expiration key

    The last two are new in version 4.0


Five basic data types

String

How to achieve:

  • The bottom layer is implemented by ** simple dynamic string (SDS) **, because we often change the values in Redis, so we need a dynamically variable string data structure, so Redis itself built an abstract type called SDS, used as the default redis string representation.
  • SDS follows the null – character – ending convention of C language in order to directly reuse some functions in C string function library.
  • The difference between SDS and C strings:
    • SDS obtains the length of the string in constant time complexity: the LEN attribute of SDS holds the length of the string.
    • Buffer overflows can be prevented: changes to C strings may cause buffer overflows. If changes are made to SDS, SDS will automatically expand to the required size, so buffer overflows will not be caused.
    • You can reduce the number of memory reallocations when modifying strings:
      • C string the memory must be reallocated when the string is added or shrunk. Otherwise, buffer overflow or memory leak will occur.
      • SDS has two optimization strategies:
        • Space preallocation: The operation used to optimize the string growth of SDS. When modifying the string, it will not only meet the space required by SDS allocation modification, but also have an unused space (when the length of SDS is less than 1MB, the length of free and len is the same; If the value is greater than 1MB, 1MB of unused space is allocated. Note: Before expanding the space, check whether the unused space is sufficient. When the unused space is insufficient, the memory will be reallocated.
        • Lazy space free: Operations used to optimize string shortening for SDS. When shortening SDS, the extra bytes are not immediately recovered, and free is used to record the number of bytes recovered, which is convenient for next use. It also provides corresponding API to help us release when we need it.
    • SDS is binary safe: the C string must conform to the encoding specification, and there must be no ‘\0’ in it, otherwise it will be mistakenly identified as the end, so only text data can be saved, but not images, videos and other binary files. SDS, on the other hand, processes data stored in BUF in binary form, ensuring that it is read as it is written.
C string SDS
Time complexity O(N) for obtaining string length The O (1)
The API is insecure and can cause buffer overflows or memory leaks The API is secure and does not cause these problems
Every time a string is modified, memory is reallocated Two optimization strategies can reduce the number of memory allocations
Only text data can be saved Can save text and binary data, binary security

What it is used for:

  • Store the key and value of the string as a cache
  • Numerical calculation can be performed directly: used for current limiting and SEC killing scenarios
  • Because it can store binary, it can be used for statistics, such as counting the number of login days of users in any time window and counting the number of active users in time window.

List

How to implement this: ZipList or LinkedList, the list object holds all string elements less than 64 bytes in length and less than 512 elements, using ZipList, the rest of the LinkedList. What it is used for:

  • Instead of some data structures, such as stacks, queues, arrays, etc
  • Implementation for posts and comments

Hash

How to implement: A Hash is also called a dictionary, and is implemented at the bottom through a Hash table. A dictionary has two hash tables HT [0] and HT [1]. Ht [0] is normally used and HT [1] is only used for rehash. To add a new key-value pair to a dictionary, the program first calculates the hash and index values based on the keys of the key-value pair. Then, based on the index values, the hash table node containing the new key-value pair is placed on the specified index of the hash table array.

  • Expansion operation: ht[1] is the first integer power of 2 that is greater than or equal to HT [0]. Used *2.
  • Shrink operation: ht[1] size is the first integer power of 2 greater than or equal to HT [0]. Used.

The rehash operation is not done once; it is done repeatedly and increments. Hash conflicts: hash conflicts with chain addresses

  • Details page, aggregation

Set

How to implement: IntSet or HashTable

  • Draw (SRandMember Key 3, SRandMember Key -4, no repeat draw positive number; Duplicate draw negative number)
  • Random events
  • Recommendation system (Collection operation)
    • Difference set: Recommend people you might know
    • Intersection: Mutual friends
    • Union: Everyone’s circle of friends

ZSet

How to implement it: The underlying implementation is ziplist or Skiplist, ziplist for low data, skiplist for high data or skiplist for large nodes. Most of the time, skip lists are about as efficient as balanced trees, and they are simpler to implement than balanced trees.

Skip lists: For a single linked list, even if the list is ordered, if we want to find some data in it, we have to go through the list from beginning to end, which would be inefficient and skip lists would be different. A skip list is an extension of an ordered linked list, a multi-tiered chain structure for quick lookup. Is a space for time algorithm implementation. It’s kind of like a balanced tree. Both can do quick lookups of elements. One important difference, however, is that the insertion or deletion of the balanced tree is likely to result in a global adjustment of the balanced tree. However, the insertion and deletion of the hop table only need to operate on a part of the entire data structure. The advantage of this is that in high concurrency cases, you will need a global lock to keep the entire balanced tree thread-safe. For jumpers, you only need local locks. This way, you can have better performance in high-concurrency environments. In terms of query performance, the time complexity of a hop table is also O(logn), so in a concurrent data structure. The essence of a hopping list is that it maintains multiple linked lists at the same time, and the linked list is layered. The lowest linked list maintains all the elements in the hopping list, and each upper linked list is a subset of the lower. All linked list elements in a jumper are sorted. You can start at the top of the list. Once the element is found to be larger than the value in the current list, it moves to the next level of the list to continue the search. That is to say, in the process of searching, it searches by leaps and bounds. When the linked list length is relatively large, the efficiency of building index lookup will be significantly improved.

What it is used for:

  • Leaderboard (itself is from small to large, use should use reverse instruction)
  • paging

Skiplist and balance trees

  • Skiplist is a little more flexible than balanced trees in terms of memory footprint and is not very memory intensive. In general, a balanced tree contains 2 Pointers per node (one to the left and right subtrees), whereas skiplist contains an average of 1/(1-p) per node, depending on the size of the parameter p. If p=1/4, as in Redis, then on average each node contains 1.33 Pointers, which is better than a balanced tree.
  • Balanced trees are more complex than skiplist operations when it comes to range lookups. In the equilibrium tree, after we find the low value in the specified range, we need to continue to find other nodes that do not exceed the high value in a middle-order traversal order. The middle-order traversal here is not easy to achieve without some modification of the equilibrium tree. Range lookups on Skiplist are as simple as a few steps through the level 1 list after finding small values.
  • For performance reasons, balancing tree inserts and deletions can cause complex logic adjustments to subtrees, whereas Skiplist inserts and deletions are simple and fast, requiring only pointer changes to adjacent nodes.
  • Skiplist is much easier to implement than a balanced tree.

Persistence of Redis

RDB

It is a way to persist Redis, like a snapshot. It has two ways to generate RDB files, SAVE (which blocks the server process) and BGSAVE (which forks a child process to process it). In addition to manually entering commands to generate RDB files, you can configure the server to execute BGSAVE commands at regular intervals. For example:

  • save 900 1
  • save 300 10
  • save 60 10000

This is checked primarily with two parameters: the dirty counter (how many changes have been made since the lastsave or BGSAVE) and the lastsave attribute (which records the timestamp of the BGSAVE since the lastsave). The RDB file is then automatically loaded and executed when the server starts.

AOF

It persists data by appending files. The way it works is simple: every time a write operation that modifies memory data is performed, the operation is logged. Assuming that the AOF log records all the sequence of modified instructions since the Redis instance was created, it is possible to restore the state of the in-memory data structure of the current instance of Redis by executing all instructions sequentially, known as “replay,” on an empty Instance of Redis. AOF rewrite: The AOF log gets longer and longer over the long run of Redis. If the instance goes down and restarts, replaying the entire AOF log takes time, causing Redis to be unavailable for a long time. So you need to rewrite the AOF log. Redis provides the bgreWriteAOF directive to rewrite AOF logs. The idea is to fork a child process that iterates memory into a series of Redis instructions that are serialized into a new AOF log file. After serialization, incremental AOF logs that occurred during the operation are appended to the new AOF log file, which immediately replaces the old AOF log file. In order to solve the problem that the server process continues to process requests while the child thread overwrites the AOF, the command will cause the AOF file to be inconsistent with the state of the database. So the Redis server sets up an AOF rewrite buffer, and when the child thread overwrites, new commands are added to the AOF buffer and the AOF rewrite buffer, and then the contents of the AOF rewrite buffer are written into the overwritten AOF file. Note: When a program writes to an AOF log file, it actually writes to the kernel’s cache, and the kernel asynchronously flushes the dirty data back to disk. So we need the fsync function to force the AOF log content from the kernel cache to disk. But fsync is a very resource-intensive process! In general, Redis performs fsync every 1s or so. There are really three strategies.

  • “No” : not active fsync. It is unsafe to let the operating system decide to synchronize disks.
  • Everysec: Fsync once per second, default compromise.
  • Always: Fsync every time you write a command, which is very slow.

Redis distributed lock

What is a distributed lock

  • Distributed locking is a lock implementation that controls distributed systems or different systems’ common access to shared resources. If different systems or hosts on the same system share a certain resource, they use mutual exclusion to prevent interference.

Train of thought

  • Provide a global and unique lock acquisition “thing” in the whole system, and then each system needs to add a lock, to ask this “thing” to get a lock, so that different systems can be considered the same lock.

Distributed lock design purpose

  • This ensures that the same method and operation can only be executed by one thread on one machine in a distributed application cluster.

Key points of Redis distributed locking

  • Use the redis2.8 optimized SET key value NX PX milliseconds command. If not, set the value first and then set the expiration time. This operation is not atomic and may break down before setting the expiration time, resulting in deadlock (the Key is permanent).
  • Value must be unique. When unlocking the lock, verify that the Value of the lock is the same as that of the lock.

disadvantages

  • In single-machine deployment mode, there is a single point of failure whenever Redis fails. Locking doesn’t work.
  • In the master-slave mode, only one node is locked during the lock process. Even if Sentinel is used for high availability, if the Master node fails and the master-slave switchover occurs, the lock loss may occur.
  • There is also the setting of expiration time, too long will lead to fewer units of time can be sold, too short will lead to buy more than one thing, oversold problem.

RedLock assumes that Redis is deployed in a Redis Cluster, with a total of five Master nodes. Obtain a lock by following these steps:

  • Gets the current timestamp in milliseconds.
  • Take turns trying to create locks on each Master node with short expiration times, typically tens of milliseconds.
  • Try to create a lock on most nodes, such as 3 nodes for 5 (n / 2 +1).
  • The client calculates the lock establishment time. If the lock establishment time is shorter than the timeout period, the lock establishment is successful.
  • If the lock fails, then the lock is removed in turn.

But this algorithm is quite controversial, there may be a lot of problems, can not guarantee the lock process must be correct.

**Redisson **Redisson solves the issue of redis distributed lock expiration setting by considering many details:

  • All Redisson instructions are executed through Lua scripts, and Redis supports atomic execution of Lua scripts.
  • Redisson sets the default expiration time of a Key to 30s. What if a client holds a lock for more than 30s? Redisson has a Watchdog that sets the Key timeout to 30 seconds every 10 seconds after you acquire the lock. This way, if the lock is held all the time, the Key will not expire and another thread will acquire the lock.
  • Redisson’s “watchdog” logic ensures that no deadlocks occur. (If the machine goes down, so does the watchdog. The lock will expire automatically after 30 seconds, and other threads can acquire the lock.

Zookeeper distributed lock

ZooKeeper can be a distributed lock consistency service. ZooKeeper has the following features:

  • Data can be stored as a file system.
  • There are many types of nodes that can be created, such as persistent nodes, temporary nodes, persistent ordered nodes, and temporary ordered nodes
  • There is an event callback mechanism.

Zookeeper distributed lock is basically implemented around its three features. The specific implementation process is as follows: Whenever a client is connected to a server node, Zookeeper creates a temporary ordered node, which is globally ordered and unique, just like a lock. Because it is an ordered node, each client is like queuing to access a resource. And only the smallest sequential directory node can access resources. When the client discovers that its node is not the smallest, it registers an event on its previous node, and when the previous directory node is released and deleted, it triggers a callback event on that client. The client then knows it’s time to access the resource. The advantage of ZooKeeper distributed lock is that it can save a lot of computing resources.

**Curator **Curator is an open source client of ZK that also provides an implementation of distributed locks.

InterProcessMutex interProcessMutex = new InterProcessMutex(client,"/anyLock"); 
interProcessMutex.acquire(); 
interProcessMutex.release(); 
Copy the code

Redis distributed lock VS Zookeeper distributed lock

Redis distributed lock:

  1. Redis set px nx + unique ID + lua script
  • Advantages: Redis has high read and write performance, so the distributed lock based on Redis is efficient.
  • Disadvantages: Dependent on middleware, there may be node data synchronization problems in distributed environment, reliability may be affected, if the occurrence requires manual intervention.
  1. Redisson Distributed lock
  • Advantages: Can solve the redis distributed lock timeout problem, through the watchdog.
  • Disadvantages:
  1. Redlock based on Redis
  • Advantages: It solves the problem of synchronous availability of Redis clusters.
  • disadvantages:
    • It relies on middleware, is not widely validated, has high maintenance costs and requires multiple independent master nodes; Multiple nodes need to be locked at the same time, which reduces some efficiency
    • The expiration time of lock deletion failure cannot be controlled
    • If the operation fails, polling is required, occupying CPU resources

Zookeeper distributed lock:

Advantages: There are no problems such as redis timeout, data synchronization (ZooKeeper is returned only after the synchronization is complete), and primary/secondary switchover (ZooKeeper services are unavailable during the primary/secondary switchover).

Disadvantages: Reliance on middleware, which ensures reliability at the expense of efficiency (but still high). Performance is not as good as Redis, because when there are more clients, the frequent creation and deletion of nodes will consume performance.

== Efficiency comparison: ==

  • Performance: Redis > ZooKeeper
  • Reliability: redis < ZooKeeper

www.cnblogs.com/owenma/p/12…


Implementation of master-slave synchronization

Implementation of replication function (SYNC) :

  • The secondary server sends the SYNC command to the primary server
  • The primary server receiving the SYNC command executes the BGSAVE command, generates an RDB file in the background, and uses a buffer to record all write commands during RDB file generation.
  • When the BGSAVE command is executed by the master server, the master server sends the RDB file to the slave server, which receives and loads the RDB file to update its database state to the state before the master server executed the BGSAVE command.
  • The master sends all write commands for the buffer to the slave, and the slave executes these write commands to update its database state to the current state of the master database.

Disadvantages: disconnection after the replication efficiency is relatively low, to copy the master server all content, obviously this is not necessary.

(new) Implementation of replication function (PSYNC) :

  • Full resynchronization: The steps are similar to SYNC.
  • Partial resynchronization: used for the case of replication after disconnection: after reconnection from the disconnection server, as long as the master server sends all write commands during its disconnection to it, the database can be updated to the current state of the master server.
    • Implementation principle (replication offset + replication backlog buffer + server run ID) :
      • Both primary and secondary servers need to maintain a replication offset. By comparing the replication offset, you can know whether the primary server is consistent with the secondary server. The replication backlog buffer is a fixed-length fifO queue maintained by the primary server, which is used to store write commands during command propagation.
        • Partial resynchronization occurs if the data after the offset copied from the slave server is in the replication backlog buffer of the primary server
        • Otherwise, complete resynchronization is performed

Specific steps:

  • Set the address and port of the primary server that you want to replicate
  • Establish a socket connection
  • Run the PING command to check whether the read and write of the socket is normal
  • Authentication, authentication passwords, that sort of thing
  • Send port information. The slave server sends its own port to the master server
  • Synchronization: The secondary server sends the PSYNC command to the primary server to perform synchronization.
  • Command transmission

Heartbeat detection: During command propagation, the secondary server sends heartbeat commands to the primary server once per second by default. The main functions are:

  • Check whether the network connection between the master and slave servers is normal: If the master server does not receive the REPLCONF ACK command from the slave server for more than 1s, it knows that the connection between the master and slave servers is faulty.
  • Assist to implement the Min-Slaves option
  • Detection command loss: If the command sent from the master server to the slave server is lost due to a network failure, the slave server sends the REPLCONF ACK to the master server and discovers that the offset is less than its own offset and sends the lost data to the slave server.

Redis transactions

Redis can implement transactions through commands such as MULTI, EXEC, DISCARD and WATCH.

**Redis transactions provide a way to package multiple command requests. All of the packaged commands are then executed sequentially without interruption. ** But rollback operations are not supported.


The sentry

Sentinel is a highly available solution for Redis: Composed of one or more Sentinel instances, the Sentinel system can monitor multiple master servers and all slave servers under these master servers. When monitoring a master server goes offline, the Sentinel system automatically selects one of its subordinate slave servers to upgrade to the new master server.

  • Sentinel sends an INFO command to the monitored primary server every 10 seconds. (Command connection)
  • Sentinel sends an INFO command to the monitored slave server every 10 seconds. (Command connection)
  • Sentinel sends a message every 2s to the master and slave’s __sentinel__: Hello channel. (Subscription connection)

Detection of subjective offline: Once per second, Sentinel sends a PING command to the instances (primary server, secondary server, Sentinel) with which it has created a connection. If no PONG returns the command, the other party is determined to be offline. Objective offline detection: After Sentinel determines that a primary server is offline, it queries other Sentinels that also monitor the primary server in order to confirm that the primary server is actually offline. See if they also think the main server has gone offline. When Sentinel receives a sufficient number of offline judgments from other Sentinels, it determines that the primary server is objectively offline. The number of sentinels is configured with the quorum parameter in the Sentinel configuration.

Election Lead Sentinel:

Complex version:

When a primary server is judged to be objectively offline, the sentinels monitoring the offline primary server will negotiate to elect a lead Sentinel, which will perform failover operations on the offline primary server:

  • All sentinels that monitor the objective offline Master are likely to become lead Sentinels.
  • After each lead Sentinel election, the configuration epochs of all sentinels are automatically increased once, regardless of whether the election is successful or not.
  • In a configuration era, all sentinels have one chance to set a Sentinel as a local lead Sentinel, and once set, the local lead Sentinel cannot be changed in this configuration era.
  • All online sentinels that monitor the Master’s objective downline have the opportunity to ask other Sentinels to set themselves up as local lead Sentinels.
  • When a Sentinel (source Sentinel) sends a Sentinel is-master-down-by-addr command to another Sentinel (target Sentinel), This means that the source Sentinel requires the target Sentinel to set itself as the lead Sentinel.
  • Sentinel sets the local lead. Sentinel is first come, first served. That is, the Sentinel that sends the setting request to the target Sentinel first will become the local lead Sentinel, and all requests received after that will be rejected.
  • After receiving the Sentinel IS-master-down-by-addr command, the target Sentinel will return a command reply to the source Sentinel. The leader_runid and Leader_EPOCH parameters in the reply record the run ID and configuration epoch of the local lead Sentinel for the target Sentinel, respectively.
  • After receiving the command reply from the target Sentinel, the source Sentinel will check whether the value of the leader_EPOCH parameter in the reply is the same as its own configuration era. If so, the source Sentinel will continue to fetch the leader_runid parameter in the reply. If the value of the Leader_runid parameter is the same as the run ID of the source Sentinel, then the target Sentinel has set the source Sentinel to the local lead Sentinel.
  • If a Sentinel is set up as a local lead Sentinel by more than half of the sentinels, the Sentinel becomes the lead Sentinel.
  • The generation of the lead Sentinel requires the support of more than half of the Sentinels, and each Sentinel can only set up a local Sentinel once in each configuration era, so only one lead Sentinel will appear in a configuration era.
  • If no Sentinel is elected as the lead Sentinel within a given time limit, each Sentinel will be elected again some time later until the lead Sentinel is elected.

Simple version:

  • The Sentinel node that finds the master offline (we’ll call him A) sends A command to each Sentinel, asking it to be selected as the local lead Sentinel
  • If no other Sentinels have been selected by the target Sentinel node, A will be selected as the local lead Sentinel
  • If more than half of the Sentinels agree to elect A as the lead, A is elected
  • If there are multiple Sentinel nodes running for the lead at the same time, there may be a round of voting in which no candidate wins. At this time, each node running for the lead Sentinel waits for a random time and then initiates the request again for the next round of voting until the lead Sentinel is elected

Failover:

  • Select a Slave under the offline Master to convert it to the Master server.
  • Make all remaining Slave servers copy the new Master server.
  • A server that has gone offline will become a secondary server to the new master server when it goes online.

Elects a new Master: The lead Sentinel keeps all slave servers that have taken the Master offline in a list and selects them according to the rules.

  • Delete all slaves in the offline or short state from the list.
  • Delete all slaves from the list that did not reply to the INFO command of the lead Sentinel in the last 5s.
  • Delete all slaves that disconnected from the offline Master for more than down-after-milliseconds * 10 milliseconds.

Then, the lead Sentinel sorts the remaining slaves in the list according to their priorities and selects the Slave with the highest priority. If there are multiple slaves with the same priority, the lead Sentinel will select the Slave with the highest offset according to the Slave replication offset. If there are multiple slaves with the highest priority and the highest offset, the new Master is selected according to the minimum running ID rule. After determining the new Master, the lead Sentinel will send the SLAVEOF no one command to the new Master once per second. When the correct response is received, the current server will be successfully upgraded to the Master server.


The cluster

Redis cluster: sharded cluster. This cluster is recommended when a large amount of cache data is stored. This cluster is mainly used in scenarios of massive data, high concurrency, and high availability.

Other clustering patterns, such as master-slave architecture + Sentinel, can be used for scenarios with small data volumes and high concurrency performance.

Shard cluster of slot assignment: for every cluster node distribution channel, and each node in addition to preserve their own channel information, will tell the oneself management which slot to other nodes, after you receive the other nodes, save in the structure of the corresponding node, that is to say, each node in the cluster will know which channel is belong to which node. When a client sends a command related to a database key to a node:

  • If the slot where the key is located happens to be assigned to the current node, the node executes the command directly.
  • If the slot in which the key is located is not assigned to the current node, the current node will return a MOVED error to the client directing it to the correct cluster node and send the previous command again.

The Hash algorithm:

The oldest hash algorithms and drawbacks (heavy cache reconstruction) are among the simplest data distribution algorithms. However, if a master goes down, 1/3 of the data (say three Redis nodes) will fail and a lot of data will enter MySQL.

Consistent hash algorithm:

A consistent Hash ring cannot solve the problem of cache hot spots. That is, there are too many values concentrated in a Hash interval. As a result, a large number of requests flood into a master node at the same time, while other nodes are idle, causing the problem of master hot spots.

At this time, the concept of virtual ring (virtual node) was introduced to make each master evenly distributed, so that the data in each interval can be evenly distributed to different nodes, rather than in a clockwise search, resulting in the problem of flooding into a master.

Hash Slot algorithm:

Instead of using consistent hash algorithms, Redis clusters introduce the concept of hash slots and adopt automatic sharding mechanism. The cluster has a fixed number of 16384 virtual hash slots. All the keys are mapped to integer slots from 0 to 16383 ** according to the hash function. Each key is verified by CRC16 and then modded to 16384 to determine which Slot to place **. For example, if there are three masters, each master may hold more than 5,000 Hash slots. ** The advantage of using a hash slot is that it is easy to add or remove nodes, ** and neither adding or deleting a node nor modifying a node will make the cluster unavailable.

  1. When you need to add nodes, you just need to move some hash slots of other nodes to the new node.
  2. When a node needs to be removed, simply move the hash slot on the removed node to another node.

Why was the Redis Cluster designed with 16,384 slots?

  • If the slot number is 65536 and the number of heartbeat message headers reaches 8 KB, the number of heartbeat packets sent is too large.

    • As mentioned above, in the header, the one that takes up the most space ismyslots[CLUSTER_SLOTS/8]. When the slot is 65536, the size of this block is:65536 present 8 present 1024 = 8 KBBecause the Redis node needs to send a certain number of ping messages as heartbeat packets every second, if the slot number is 65536, the ping message header is too large, which wastes bandwidth.
  • Redis is unlikely to have more than 1000 cluster master nodes.

    • As mentioned above, the more nodes in the cluster, the more data the heartbeat packet carries in its message body. If the number of nodes exceeds 1,000, network congestion will also occur. Therefore, redis authors do not recommend that the number of nodes in a Redis cluster exceed 1000. Therefore, for a Redis cluster with less than 1000 nodes, 16,384 slots are sufficient. There is no need to expand to 65536.
  • The smaller the slots and nodes are, the higher the compression rate is.

    • The configuration information of the Redis master node is saved in the form of a bitmap. During transmission, the bitmap is compressed. However, if the bitmap filling rate of slots/N is high (N indicates the number of nodes), the bitmap compression rate is very low. If the number of nodes is small and the number of hash slots is large, the bitmap compression rate is low.

16384÷8÷1024= 2KB.

To sum up, the author decided to take 16,384 slots, no more, no less, just right!


Redis publishing and subscribing

Involves commands: publish, subscribe, and psubscribe

A client can subscribe to one or more channels, become a subscriber to the channel, and every time another client sends a message to the channel, all subscribers of the channel receive the message.

Use psubscribe to subscribe channels according to the pattern. When there is a message on the channel that matches the pattern, the client can receive it.

Channel subscription and unsubscribe principle:

  • Subscriptions: Redis maintains subscriptions for all channels in a server, which is essentially a dictionary called Pubsub_Channels. The key is the channel subscribed, and the value is a linked list of all clients subscribed to the channel. The subscription process is as follows: when a client subscribes to a channel:
    • If the channel is not already subscribed to by any client, create a key in Pubsub_Channels, set the key to an empty list, and then add the client to the list as the first element.
    • If the channel already has other subscribers, add the client directly to the end of the linked list under the channel key.
  • Unsubscribe: When a client unsubscribe a channel or channels, the server disassociates the client from the unsubscribe channel from pubsub_channels.
    • After deleting the subscriber, delete the key when the list is empty.

Mode subscription and unsubscribe principle:

  • Subscriptions: Subscriptions are stored in pubsub_Patterns, which is a linked list where each node is pubsub_pattern and holds subscribers and subscription patterns. When you subscribe to a pattern, you create a new pubsub_PATTERN structure, fill in two attributes and add them to the end of the Pubsub_Patterns list.
  • Unsubscribe: delete the matching pubsub_pattern node.