Redis data type profiling

1.string

SDS data structure, using space pre-allocation and lazy space release to improve efficiency, the disadvantage is memory consumption.

struct sdshdr {
    int len; / / the length
    int free; // Free space
    char buf[]; // Array of strings
};
Copy the code

Space preallocation: When an SDS is modified to a longer BUF, it allocates additional space in addition to the memory it needs. Lazy space: When an SDS is modified to a shorter BUF, excess memory is not returned, but stored. Summary: The core idea of this design is to swap space for time. As long as free has enough space left, the next time the string becomes longer, the system will not request memory.

2.list

Linked lists are widely used to implement Redis features such as list keys, publish and subscribe, slow queries, monitors, and more.

struct listNode {
    struct listNode * prev; // The front node
    struct listNode * next; // back node
    void * value;// Node value
};
Copy the code

3.hash

The hash structure is not only used when the data type is hash, but all k and v of Redis itself are a big hash. For example, we often use set key value, where key is the hash key and value is the hash value.

struct dict {. dictht ht[2]; / / a hash table
    rehashidx == - 1 // Rehash is used, -1 if no rehash is used
}
Copy the code

Rehash: Each dictionary has two hash tables, one for normal use and one for rehash. Rehash is progressive and is triggered by:

  • The serveCron periodically checks migrations.
  • Rehash every time kv changes (new, updated).

Hash collisions: Hash collisions are resolved using a one-way list, with new conflicting elements placed at the head of the list.

4.zset

Ordered collections can be used in some sort scenarios, with skip lists as the underlying implementation.

struct zskiplistNode {
    struct zskiplistLevel {
        struct zskiplistNode *forward;// Forward pointer
        unsigned int span;/ / span
    } level[];
    struct zskiplistNode *backward;// Back up the pointer
    double score;/ / score
    robj *obj; // Member objects
};
Copy the code

Layer height: The layer height of each skip table node is between 1 and 32. Jump: Jump across nodes through layers to speed up access.

For example, o1 to O3 only need to jump across nodes through L4 with span of 2.

5.set

In order to realize memory saving, the bottom layer of set will adopt different data structures according to the type and number of sets. When the elements of the set are all integer types and the number is not large, the set will adopt integer sets to store.

struct intset {
    uint32_t encoding;// Encoding mode
    uint32_t length;// The number of elements the collection contains
    int8_t contents[];// Hold an array of elements
};
Copy the code

The underlying implementation of the integer collection is an array, and when adding elements, the type of the array is changed as needed (such as int16 to INT32).

Why is Redis so fast

Redis as a caching layer has become very common in large application architectures, and one of the reasons is that it is very fast.

1. In-memory database

Redis is an in-memory database and most operations are memory based.

2. Special data structures

As we know, the data structure of Redis is specially designed. For example, THE STRING type is stored by SDS data structure. Every time the string space is insufficient, it always tries to apply for more memory. In this way, memory application can be reduced to achieve a fast speed. The skip list adopted by the ordered collection can achieve the effect of accelerating node access through different layers, and the progressive rehash is also efficient and fast.

3. Single thread

Redis’s body mode is still single-threaded, except for some persistence related forks. The advantage of single threads versus multiple threads is the issue of locking, the issue of context switching. Redis’s performance is not in CPU, but in memory.

4.IO multiplexing

IO multiplexing is a plurality of TCP connection multiplexing a thread, if the use of multiple requests from multiple processes or multiple threads of the mode is relatively heavy, in addition to considering the process or thread switch, but also user state to traverse to check whether the event arrived, inefficient. Redis supports the multiplexing of select, poll, and epoll modes. By default, the best mode supported by the system is selected. Through IO multiplexing technology, user mode does not have to go through the FDS set, through the kernel notification to tell the arrival of events, relatively high efficiency.

What are the benefits of pipeline

The client executes a command like this: command request -> command queue -> command execute -> result return. This is called RTT (Round trip time). In practice, we might have a scenario where we need to constantly incr a key, but the key doesn’t last forever, so we have to add an expiration time, so our program might look something like this:

Incr key # 1 RTT expire key time # 1 RTT # 2 RTTCopy the code

So we found that we needed two RTTS for the whole process. In our production environment, we usually use connection pools. When there are enough connections in the pool, I may not need to create new connections, which saves the cost of TCP three-way handshake. When I need to create new connections, I have to establish connections, and the whole cost increases again. For this batch command, in order to reduce the overhead of round-trip, pipeline was born, through which we can combine two commands to send:

Pipeline -> Send (expire key time) Pipeline ->execute(Copy the code

So you can reduce RTT from 2 to 1.

Saving resources: Pipelines can consolidate multiple requests into one request, reducing network overhead and saving time. However, the number of combined requests through pipeline should not be too large, because too many of them may occupy a large amount of bandwidth, resulting in network congestion. At the same time, too many commands will cause the client to wait for a long time. It is recommended to split a large number of commands into multiple small batches of pipelines. Non-atomic: Pipelines are not atomic, assuming you issue incR key and EXPIRE key through the pipeline. However, redis failed to execute expire, which means that your key has not been set to expire. This should be noted.

What does the Redis protocol look like

Redis has designed a set of serialization protocol RESP, which is easy to implement, fast to parse and readable. Each part of the protocol ends with \r\n, and the type of data can be distinguished by special symbols:

  • Single line reply: to+The beginning.
  • Error reply: to-The beginning.
  • Integer reply: to:The beginning.
  • Batch reply: to$The beginning.
  • Multiple batch reply: to*The beginning.

Because the redis client can not see the effect, I use the NC TCP tool to simulate.

nc 127.0. 01. 6379Ping +PONG # Send a non-existent command hi-err unknown command'hi'

#age+1
incr age
:1

#get
$2
go

#mget
mget name1 name2
*2
$2
go
$4
java
Copy the code

Each of the above lines ends with \r\n. The implementation performance of the Redis protocol is comparable to that of binary protocols, and due to the simplicity of the Redis protocol, most languages can implement it.

How does Redis guarantee atomicity

What atomicity? During the execution of the program, either all or none are executed. It is impossible that half of the program is executed and half is delayed. We know that Redis is an IO multiplexing model, that is, a thread to process multiple TCP connections, such an advantage is, even if the client concurrent request, also have to queue processing, to a certain extent to solve the concurrency problem with multi-threaded model, but the single-thread model can not solve the atomic problem. Again, take incR and expire:

The command1: incr key command2: the expire the key timeCopy the code

Whether you are pipeline or non-pipeline, you cannot maintain atomicity for either operation. Failure of command 1 does not affect execution of command 2, and execution of command 2 does not affect the result of command 1.

Commands that support atomic operations

Suppose we have two clients that want to change a value. They do this by fetching the old value and updating the new value = old value +1, but because of the chronological order, maybe they do it in this order:

The value of # key starts with 10The client1# : get the key10The client2# : get the key10The client1:set key 11 #11The client2:set key 11 #11
Copy the code

The updated value of client 2 is lost. The reason is that client 1 does not update the value of key after obtaining it. At this time, client 2 gets the old value. For this scenario, you can use The IncR of Redis instead. Incr is atomic operation, it merges get and set together, using the single-threaded nature of Redis, when the first INCR comes in, the second INCR must wait, so that there is no update loss problem. Other types of atomic commands include DECr and setnx.

Transaction + Monitoring

It can also be solved by combining watch monitoring and transaction. Each client can monitor the key to be updated through Watch. In this way, if the key monitored by the client is found to be modified during transaction update, the execution will be rejected and the transaction execution will fail. The essence of watch is still optimistic lock. When the client executes watch, in fact, the watch key will maintain a queue of clients, so as to know which clients are monitoring the key. When one of the clients completes execution, it is removed from the queue and the CLIENT_DIRTY_CAS identifier is opened for all remaining clients in the list, so that the remaining clients find their CLIENT_DIRTY_CAS has been opened when executing exec. Then they refuse to do it.

The client1> watch key OK client1> MULTI OK client1 > SET key 100QUEUED to the client1 > EXEC
1) OK client1 > GET key
"100"
Copy the code
The client2> watch key OK client2> MULTI OK client2 > SET key 101QUEUED to the client2EXEC (nil) client2 > GET key
"100"
Copy the code

Both client 1 and client 2 try to change the value of the key, but client 1 updates it first. Then client 2 finds that the value has been modified by watch during update and rejects execution and returns nil

The lua script

Since 2.6, Redis has enabled developers to write lua scripts for redis. The benefits of using Lua scripts are as follows:

  1. To reduce network overhead, lua scripts can combine multiple requests into one request at a time.
  2. Atomic operation, Redis will lua script as a whole, the execution process, will not be interrupted by other commands, no race problems.
  3. For reuse, lua scripts sent by clients will always be stored in redis service.

So let’s look at how Lua guarantees atomicity, so let’s say we have a logic that says key doesn’t exist, and then we set key, and if it does, we don’t set it. Don’t use lua:

# pseudo-code client1:if key notExist the client2:if key notExist the client1:set key 10086The client2:set key 10086# redundantCopy the code

The lack of atomicity above causes client 2 to execute one more time. Using the lua:

127.0. 01.:6379> eval "if redis.call('get', KEYS[1]) == false then redis.call('set', KEYS[1], ARGV[1]) return 0 else return 1 end" 1 key "10086"
(integer) 0# Execute successfully127.0. 01.:6379> get key1
"10086"
Copy the code

Once lua is used, Redis will treat the entire Lua script as a whole and will not be disturbed by other commands. With Lua scripting, we can write our own complex business to ensure atomicity. When a lua script is very long, executing it on the command line is not elegant. Redis provides the load lua command to import lua script files. After the lua file is successfully imported, it returns an ID encoded by SHA1.

How do I delete a key from the production environment

If a large key is deleted in the production environment, it may cause a line block. This is a very dangerous operation. You can use the following methods as required:

  1. According to service scenarios, deletion during off-peak periods can effectively reduce losses.
  2. Hset can be deleted in batches through SCAN, set and zset can be deleted in batches each time, and List can be directly deleted by pop.
  3. Redis4.0 supports asynchronous unlink deletion without blocking the main thread.

How to solve cache penetration, breakdown, and avalanche

In high performance architecture, we usually add a cache layer on top of the DB layer, because the cache data is in memory, so that when a large number of data access, if the cache hit ratio is high, it can prevent a large number of requests to our DB, to protect the DB and speed up access. If the cache misses, then when the database reads data, it writes a copy to the cache, so that the next request will be retrieved from the cache. What if a cache continues to miss, or if a cache miss is followed by a large number of requests and a large cache failure?

1. Cache penetration

When a user accesses an invalid data (the data does not exist in the cache or in the database), the user first accesses the database, fails to obtain the data, and then fails to obtain the data again, such as user_id=-1 (a user id cannot be negative). In this case, you have to go to the database and ask for data that doesn’t exist, and because there is no data, it won’t be written to the cache, and the next time the same request is made, it will repeat itself.

To solve:

  • Front-end verification: in some cases, for example, when users search by commodity order ID in their personal center page, the front-end can judge and directly intercept orders with illegal IDS (such as negative numbers).
  • Backend validation: At the beginning of the interface, the validation of some normal positive and negative numbers, such as the negative user_id, returns an error.
  • Null cache: Sometimes we also do a cache for the database can not find the data, the cache time can be shorter.
  • Hash interception: Hash verification uses scenarios with a small amount of data, such as store product information. When an item is put on the shelves, the item is marked with a hash (map[” item ID”]=1), so that if the requested item ID is not in the Hash table, it is returned directly.
  • Bitmap markup: Similar to hash, but marked with bits.
  • Bloom filter: When the size of hash and bitmap is too large to be realistic, bloom filter can be used. Unlike hash and bitmap, bloom filter can be used for 100% interception, but it can be used for most illegal interception. The idea of a Bloom filter is to locate a piece of data in a finite space by using multiple hash functions. If only one hash is missing, it must not exist, but if multiple hash functions are in, it may not exist either. This should be noted.

2. Cache breakdown

At a certain point, the cache of hot data expires, and then a large number of requests are suddenly called to db. At this time, if DB cannot carry the hot data, it may hang up, causing a chain reaction online.

To solve:

  • Distributed lock: distributed system, concurrent request problem, the first thought is distributed lock, only put a request into (can use Redis setnx, ZooKeeper, etc.)
  • Single-machine locks: Distributed locks are not necessary either. Single-machine locks are ok when there are few nodes in the cluster (Golang can use Synx. mutex, Java can use JVM locks), ensuring that only one of all requests on a machine can be entered. If you have 10 machines, the maximum number of simultaneous calls to DB is 10, and there is not much impact on the database. This is less expensive than distributed locking, but if you have thousands of machines, consider it carefully.
  • Level 2 cache: When our level 1 cache fails, we can also set a level 2 cache, level 2 cache can also intercept, the level 2 cache can be memory cache or other cache database.
  • Hotspot data does not expire: In some cases, hotspot data does not expire.

3. Cache avalanche

At some point, a large number of cache failures occur and all requests hit DB. Unlike cache breakdown, avalanche is a large number of keys and breakdown is a key, so the pressure on DB is also self-evident. Solution:

  • Make the cache time random: For all caches, try to make the expiration time of each key random to reduce the probability of simultaneous expiration
  • Lock: Lock db based on the scenario
  • Level 2 cache: same as cache breakdown
  • Hot data not expired: same as cache breakdown

What are the persistence schemes of Redis

1.rdb

Save: Save is a manual saving method, which will block the Redis process until the RDB file is created, during which all commands cannot be processed.

127.0. 01.:6379> save
OK
27004:M 31 Jul 15:06:11.761 * DB saved on disk
Copy the code

Bgsave: Unlike the SAVE command, bgSave does not block the REDis process, which forks a sub-process to SAVE the RDB, and the main process continues executing the command.

127.0. 01.:6379> BGSAVE
Background saving started
27004:M 31 Jul 15:07:08.665 * Background saving terminated with success
Copy the code

BGSAVE execution is mutually exclusive with other IO commands:

  • During BGSAVE, all SAVE commands are denied execution, preventing the parent and child processes from executing at the same time and causing some race issues.
  • During BGSAVE, if a new BGSAVE is created, it will be rejected, also a competition issue.
  • If a BGREWRITEAOF occurs during BGSAVE, the BGREWRITEAOF execution is deferred until after BGSAVE.
  • If BGREWRITEAOF is running, the BGSAVE command is rejected.

Both BGSAVE and BGREWRITEAOF are handled by two child processes and target different files, so there is no conflict in themselves, mainly because both can require a lot of IO, which is not very friendly to the service itself.

Users can configure bgSave to be executed at regular intervals:

save 900 1 #900At least it's changed in S1Time to save300 10 #300At least it's changed in S10Time to save60 10000 #60At least it's changed in S10000timeCopy the code

Bgsave can be executed if one of the above conditions is met. There are two parameters involved to keep track of times and times, the dirty counter and lastsave.

  • The dirty counter records how many changes (writes, deletes, updates, etc.) the server has made to the database state (all databases on the server) since the last SAVE or BGSAVE command was successfully executed.
  • The lastSave attribute is a UNIX timestamp that records the last time the server successfully executed the SAVE or BGSAVE command.

The above two metrics are based on redis serverCron, which is a program that executes periodically, by default, every 100ms. Each time serverCron executes, all the criteria are iterated and the count and time are ok. If both are ok, bgSave is executed and the last lastSave time is recorded, reset dirty to 0.

Import: Redis does not have a special user import command, redis will check for RDB files at startup, if so, automatically import.

27004:M 31 Jul 14:46:51.793 # Server started, Redis version 3.212.
27004:M 31 Jul 14:46:51.793 * DB loaded from disk: 0.000 seconds
27004:M 31 Jul 14:46:51.793 * The server is now ready to accept connections on port 6379
Copy the code

DB loaded from disk: the service is blocked during RDB loading. Of course, if AOF is also enabled, AOF is preferentially used to restore data. Only when AOF is disabled, RDB is selected to restore data, and expired keys are automatically filtered during data import.

2.aof

AOF is a command appending mode, assuming it is executed:

RPUSH list 1 2 3 4
RPOP list
LPOP list
LPUSH list 1
Copy the code

Finally stored in redis protocol:

*2$6SELECT$10*6$5RPUSH$4list$11$12$13$14*2$4RPOP$4list*2$4LPOP$4list*3$5LPUSH$4list$11
Copy the code

The AOF is first written to the aOF_buf buffer. Redis provides three ways to flush the buF buffer to disk, which serverCron processes according to the policy.

appendfsync always
appendfsync everysec
appendfsync no
Copy the code
  1. Always: Writes and synchronizes all contents of the AOF_buf buffer to the AOF file.
  2. Everysec: writes all the contents of the aOF_buf buffer to an AOF file. If the last synchronization time is more than 1s, the synchronization is performed again, and the synchronization is done by a single thread.
  3. No: aOF_buf is written to the AOF file, but synchronization is not performed. The timing of synchronization is determined by the operating system.

In modern operating systems, in order to improve the efficiency of file writing, when we call write to write a data, the operating system does not immediately write to disk, but put in a buffer, when the buffer is full or a certain period of time, the actual flush to disk. There is a risk that the data in memory will be lost if the machine crashes before it can be flushed to disk, so the operating system also provides a synchronization function, fsync, which allows users to decide when to synchronize.

AOF rewrite: As more commands are used, the size of AOF increases, for example:

incr num
incr num
incr num
incr num
Copy the code

Execute 4 incr num, the final value of num is 4, and then replace it with set num 4, which saves a lot of storage. Rewriting is not analyzing the existing AOF either. Rewriting is simply reading the existing key from the database and replacing it with a command as far as possible. Not all of them can be replaced by a single command. Sadd, for example, can add up to 64 at a time. Create a new AOF -> traverse the database -> traverse all keys -> ignore expired -> write aOF.

Forking: aof involves a lot of IO and is not appropriate in the current process. The reason for forking aof is to avoid locks. Use child process issue to consider is when the child to write, main process is still in a steady stream of receiving new requests, so for this kind of circumstance redis set a aof rewrite buffer, the buffer in the child began to use when creating, so when a new request to besides writing aof buffer, The AOF rewrite buffer is also written, which does not block. When the child process finishes rewriting, it will send a signal to the main process. After receiving the signal, the main process will synchronize the data in the rewrite buffer to the new AOF file again, and then rename the new AOF, atomic overwrite the old AOF, and complete rewriting. This process is blocked.

When to perform rewriting:

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
Copy the code
  1. 100 means the current AOF file was rewritten when it was twice as large as the last one
  2. The default minimum file rewrite size is 64mb

Import: When Redis starts, it creates a dummy client and executes the commands in the aOF file.

How to delete the Redis expiration key

  • Lazy delete: Every time we obtain a key, we check whether it has expired. If it has expired, we delete it.
  • Periodically delete: Redis divides two dictionaries for keys with expiration time and keys without expiration time. ServerCron randomly checks some keys from the dictionary with expiration time each time. If the keys are expired, it deletes them.

How do RDB and AOF handle expired keys

RDB: In the master/slave mode, the master server automatically filters expired keys when loading RDB files. The slave server does not filter expired keys when loading RDB files, because the master/slave server clears their own data during data synchronization. Aof: When the server is running in AOF persistence mode, if a key in the database has expired, but it has not been lazily deleted or periodically deleted, the AOF file will not be affected by the expired key. When an expired key is lazily deleted or periodically deleted, the program appends a DEL command to the AOF file to explicitly record that the key has been deleted. Aof overwrites expired keys automatically.

What is the expiration key in master/slave mode

In the master/slave mode, the master provides writes and the slave provides reads. When the slave server executes read commands sent by the client, it does not remove the expired key even if it encounters an expired key, but continues to process the expired key as if it were an unexpired key. After removing an expired key, the master server explicitly sends a DEL command to all slave servers to tell slave servers to remove the expired key.

What is Redis’ elimination strategy

The elimination policy is a flexible configuration option. Generally, an appropriate elimination policy is selected according to the service. Of course, we add a key or update a larger key.

  • Noeviction: Errors will be returned when memory usage exceeds configuration, no keys will be expelled
  • Allkeys-lru: Expels the longest unused key through the LRU algorithm
  • Volatile – lRU: Expels the oldest unused key from the set of keys whose expiration time is set by the LRU algorithm
  • Allkeys-random: deletes allkeys randomly
  • Volatile -random: Random expulsion from the set of expired keys
  • Volatile – TTL: Expels keys that are about to expire from keys with expiration time configured
  • Volatile – lFU: Expels the least frequently used key from all keys with expiration time configured
  • Allkeys-lfu: removes the least frequently used key from allkeys

What does serverCron do

Redis has a periodic function servercron, which is executed every 100ms by default. Its functions are as follows:

  • Update the server time cache: Redis has a lot of functions to obtain the current time, each time to obtain the time to make a system call, for some scenarios that do not have high requirements for real-time time, Redis through caching to reduce the number of system calls. The real time requirements are not high: printing logs, updating the LRU clock of the server, deciding whether to execute persistent tasks, calculating the online time of the server. For setting the expiration time, adding slow query logs or system calls to obtain the real-time time.
  • Update the LRU clock: There’s one in RedislruclockProperty is used to hold the server’s LRU clock. Each object also has an LRU clock. The idle time of the object can be obtained by subtracting the lRU of the object from the server’s LRU.
  • Update the number of times executed per second: This is a estimate value, each time, will be the last time according to the current time of sampling time and server, and executed command from previous sampling number and the current server, the number of executed command to calculate two between calls, dealt with how many servers on average every millisecond order request, then the average value multiplied by 1000, This gives you an estimate of how many command requests the server can process in a second.
INFO stats
...
instantaneous_ops_per_sec:1
Copy the code
  • Update memory peak: During each execution, the system checks the current memory usage and compares it with the previous peak value to determine whether to update the peak value.
INFO stats
...
used_memory_peak:2026832
used_memory_peak_human:1.93M
Copy the code
  • Process SIGTERM signalsWhen the SIGTERM signal is received, Redis will mark it and wait for serverCron to arrive.
  • Process client resources: If there has been no communication between the client and the service for a long time, the client is released. If the size of the input buffer exceeds a certain length, the current client’s input buffer is released and a default size is rebuilt to prevent the input buffer from consuming too much memory. Clients whose output buffer exceeds the limit are also closed.
  • Delay the aOF rewriteIf BGREWRITEAOF arrives while the server is executing BGSAVE, BGREWRITEAOF is delayed until the BGSAVE execution is complete, at which point a flag is flaggedaof_rewrite_scheduledEach time serverCron executes, it checks to see if BGSAVE or BGREWRITEAOF is currently running. If not and aOF_rewrite_scheduled has been tagged, BGREWRITEAOF is executed.
  • persistence: If not, it will determine whether the AOF rewrite is delayed. If not, it will check whether the RDB condition is met. If yes, it will perform RDB persistence. If so, aOF rewrite is performed. When aOF is enabled, it is determined whether the AOF buffer needs to be written to the AOF file.
  • Record execution times: serverCron Each time it executes, the number of times it executes is recorded.

What happens when the SLAVE is disconnected and reconnected

When the slave executes the slave of command, it sends a sync command to the master. After receiving sync, the master starts executing the bgSave in the background, while the write records are in the buffer. When the BGSave is complete, the master sends the RDB to the slave. The slave loads data according to the RDB, and the master sends the changes in the buffer to the slave. The subsequent change records of the master are transmitted to the slave through commands. However, if at some point the slave and Manster are disconnected for network reasons, what happens if the slave is connected? What about changes during disconnection?

  • Before redis2.8, even if the slave is disconnected for 1s and sends sync, then the master will execute the BGSave without thinking and send it to the slave, and then the slave will load the RDB again. Sending data also consumes bandwidth.
  • As of Redis2.8, incremental replication is supported, and in some cases, changes lost during disconnection can be synchronized individually, rather than the entire RDB file, if the connection is disconnected and then reconnected. The key to achieving partial synchronization is primarilyPrimary/secondary replication offset.Master replication backlog buffer, as well asServer run ID (run_id)Three indicators. Slave Saves the master run_id after the first slave of master. When the master synchronizes a command to the slave, it records the offset of the synchronization. After receiving the data from the master, the slave also records the offset of the synchronization. At the same time, the master also stores the synchronized command in its own replication buffer. This is done when the slave is disconnected and reconnected:
    • Send the saved run_id of the master to the current master.
    • Sends its offset to the current owner.

    After receiving the message, the master first verifies that the run_id sent by the slave is its own, and then verifies that the offset of the slave is still in the buffer. If both of these conditions are met, the master sends all the data in the backlogged buffer after the offset sent by the slave to the slave for partial synchronization. But as long as there is a little bit of dissatisfaction, then full synchronization is performed.

How to solve the Problem of Redis split brain

What is the split brain problem? In a Redis cluster, if there are two master nodes, it is the split brain problem. In this case, the client will write data to which master is connected, resulting in data inconsistency. How the split brain problem arises: Generally, the master cannot communicate with other slaves due to a network problem on which the master resides. However, the communication between the master and the client is ok. In this case, the sentry elects a master from the remaining slaves. It’s demoted to slave. Impact of split brain: During the disconnection of the original master, the clients communicating with the original master record the changes in the original master. When the original master network recovers and becomes the slave of the new master, the original master clears its own data and synchronizes the data of the new master. So any data written to the original master is lost while the network is down. How to solve the problem: Use two configuration parameters:

min-slaves-to-write 1
min-slaves-max-lag 10
Copy the code
  • Min-rabes-to-write: This configuration item sets the minimum number of slave libraries that the master database can synchronize data to.
  • Min-rabes-max-lag: This configuration item sets the maximum delay (in seconds) for sending ACK messages from the slave database to the master database during data replication between the master and slave databases.

If neither of the two configurations is met, the master rejects the client’s request, and the data loss can be controlled within 10 seconds.

What does a Redis cluster look like

  • First of all, a cluster is composed of multiple nodes, which join other nodes to their cluster by shaking hands.
  • There are 16,384 slots in the cluster.
  • Each node records which slots it is responsible for and which node handles the remaining slots.
  • When the node receives the command, it checks whether the key is its own. If not, it will return a MOVED error and use the information contained in the MOVED error to direct the client to the node that is responsible for the slot.
  • If you want to redistribute slots, use the Redis-Trib tool.
  • During the process of re-slotting, if node A is migrating slot I to node B, node A will return an ASK error when requesting slot I and direct the client to slot B.
  • Each node adds slave nodes to achieve high availability.

Is redis distributed lock necessarily secure

We know that Redis is single threaded and commands are processed one by one, so it is OK to use Redis for distributed locks. The most common ones are:

set key value PX seconds NX
Copy the code

The lock time is up: in the general production environment, we add an automatic expiration time to the lock for safety, so that the lock will automatically expire even if there is an accident and the lock is not unlocked, reducing the risk of loss. However, what if the lock time is up and our business is not finished? General solutions are as follows:

  • Give enough time in advance to ensure that the lock is completed before it automatically deactivates.
  • Open a thread to monitor, for example, the thread every 1/3 lock expiration time to check, if the business is not finished, extend the lock time.
  • If we unlock and find that the lock has already been acquired by another user, then we consider it unsafe and roll back is a good option.

In master/slave mode: In master-slave mode, a master has at least one slave. The master is responsible for writing and the slave is responsible for reading. When we set the lock, it is written to the master, but before the data is synchronized to the slave, the master has a problem. This leads to insecure concurrency.

  • The redis authors come up with an approach to this problem called RedLock. First of all, if you use RedLock, you have to abandon the master/slave mode, and only have multiple master nodes. The official recommendation is 5. When we want to obtain a lock, we first record the current time, and then try to obtain the lock from five nodes in sequence. When obtaining the lock, the client has to set an expiration time, which should be less than the expiration time of the lock, to prevent the client from communicating with a down service and blocking. If obtaining the node fails, go to the next node to obtain the node immediately. When the lock is acquired from most of the nodes (3), the lock is actually valid for the time set at the beginning minus the time when the lock was acquired successfully. If locks cannot be obtained from the majority of nodes for other reasons, the locks of the nodes should be unlocked in sequence, and this lock will fail. There are few practical applications of this scheme, mainly because:
    • High cost, multiple master nodes.
    • Depending on the system clock, if the lock has been obtained through three nodes at this time, but the system time of one instance is fast, then the lock of this instance is invalid in advance, and the next request finds that three nodes can be successfully obtained, then there is a problem.

Summary: Redis distributed locks may not be appropriate in situations that require strong consistency, but they can be appropriate in some situations, such as if the lock fails at some point and multiple requests enter the security zone, multiple requests may be just a few more DB queries, with no impact on the overall business.

How to solve the hot key problem

What is a hot key? A frequently accessed key is a hot key, for example, the product information of Double 11 and the product information of seckilling. When the concurrency of hot keys is very large, making QPS reach hundreds of thousands of levels, at this time, if there is no protection, there may be a problem is a disaster level. How to solve:

  • First, the hot key must be cached. The hot data is loaded into the cache in advance and read directly from the cache as soon as it is online.
  • Caching is at least a cluster architecture that guarantees multiple slaves, so even if one slave dies, there are backups.
  • Level 2 cache, using the machine’s memory to do one more intercept. For example, basic information about goods such as seckill can be used directly in the machine’s memory.
  • Limiting traffic, anticipating supported QPS, and intercepting unwanted requests.