preface

I looked out into the distance in a small, dark room with little lights and white multicolored light. Then a paunchy, middle-aged man in a plaid shirt and thick eyes came in and sat down opposite me, looking me up and down. Looking at my 10-page resume, before I could introduce myself, I scratched my head and asked, “Are you familiar with Redis? My backhand is a master. Middle-aged man pushed the frame to leave a I advised you to go out after the rat tail juice……

What is?

Redis is a non-relational database with multiple data structures, memory based, optional persistence key-value storage system, and apis in multiple languages.

Application scenarios

  1. Cache of hotspot data
  2. Distributed session
  3. Counter, mark related problems
  4. Leaderboard related questions
  5. A distributed lock
  6. Data of other services that cannot be changed or have a low probability of change

Data types (required)

  1. String String
  2. Hash Hash
  3. List the List
  4. The Set collection
  5. Zset/Sorted set: ordered set
  6. Advanced: HyperLogLog, Geo, Pub/Sub, BloomFilter, RedisSearch, Redis-ML, etc

The principle of analytic

  1. When the Redis server starts up, AE_READABLE is registered with eventLoop (IO multiplexing).
  2. The client requests to establish a connection with the server, which generates a Scoket (S1) channel and binds AE_READABLE events.
  3. The client (S1) requests the execution of the Set key value (write command), and S1 triggers AE_READABLE to read the key and value into memory and modify the data by the command requester
  4. After setting the value, s1 is bound with AE_WRITABLE event to trigger write operation. After success, the command responder outputs the result “OK”.
  5. Result The AE_WRITABLE event of S1 is unbound from the command reply handler.

IO multiplexing

I/O multiplexing: I/O refers to network I/ OS. Multiplexing refers to multiple TCP connections (sockets or channels). Multiplexing refers to the multiplexing of one or several threads. This means that one or a group of threads process multiple TCP connections. The biggest advantage is that you don’t have to create too many processes/threads and you don’t have to maintain them. The main mechanisms are select, poll, and epoll

The select mechanism:

Basic Principle: When a client operates on a server, three file descriptors (FD for short) are generated: WritefDs (write), ReadFDs (read), and ExceptfDs (Exception). Select blocks monitoring class 3 file descriptors and returns when there is data, readable, writable, exception, or timeout; Return to find the ready descriptor fd by traversing the entire array, and then perform the corresponding IO operation. Advantages: support on almost all platforms, cross-platform support is good. Disadvantages: 1. As the polling method is adopted for comprehensive scanning, the performance will decrease with the increase of file descriptor FD. Each time select() is called, the fd collection needs to be copied from user to kernel and traversed (message passing is from kernel to user space) 3. The default limit for a single process is 1024 FDS. Macro definitions can be modified, but the efficiency is still slow.

Poll mechanism:

Basic principle: the basic principle is the same as select, also polling + traversal; The only difference is that poll does not have a maximum file descriptor limit (it stores FDS in a linked list).

Epoll mechanism:

Basic principle: there is no limit to the number of fd, user mode copy to kernel mode only need once, using the time notification mechanism to trigger. Epoll_ctl is used to register a FD, and once the FD is ready, the callback mechanism is used to activate the CORRESPONDING FD for IO operations. Epoll’s high performance is due to its three functions: Epoll_create () applies for a B+ tree structure file system in the Linux kernel and returns an epoll object, which is also a FD. 2. Epoll_ctl () operates on the epoll object with this function every time a connection is created. 3. Epoll_wait () trains all callback sets and performs the corresponding I/O operations. There is no limit to fd. The maximum number of FDS supported is the maximum number of file handles of the operating system. 1 GB of memory supports approximately 100,000 handles 2. Increased efficiency, using callback notifications instead of polling, does not decrease efficiency by 3. 3 as the number of FDS increases. The kernel and user-space Mmap are implemented in the same memory block (Mmap is a method of memory-mapping files, in which a file or other object is mapped to the address space of a process). Select, poll, epoll, poll, epoll The default macro definition is 1024 without modification,l will need 100W /1024=977 processes to support 1 million connections, which will make the CPU performance particularly poor. Poll: There is no maximum file descriptor limit,100 W FDS for 1 million links, which is too much for traversal and consumes a lot of resources for copying. Epoll: When the request comes in, a FD is created and a callback is bound. The main need to iterate through 1W active callback, which is efficient without memory copy

persistence

Persistence is to write the data in memory to disk to prevent the loss of memory data when the service is down. Redis provides two types of persistence :RDB (the default) and AOF

RDB:

RDB persistence refers to writing snapshots of data sets in memory to disks at a specified interval. The actual operation is to fork a sub-process to write data sets to temporary files first. After the data is written successfully, the original files are replaced and stored in binary compression.

Default configuration:

Save 900 1 # After 900 seconds (15 minutes), if at least one key has changed, the memory snapshot is dumped.

Save 300 10 # Dump the memory snapshot after 300 seconds (5 minutes) if at least 10 keys have changed.

Save 60 10000 # Dump the memory snapshot after 60 seconds (1 minute) if at least 10000 keys have changed.

Advantage:

1. RDB file compact, full backup, ideal for backup (cold backup) and disaster recovery.

2. Maximize performance. As for Redis server, the only thing it needs to do to start persistence is fork out the child process, and then the child process will do the work of persistence, so that the server process can not perform IO operations.

3. RDB recovers large data sets faster than AOF.

AOF:

Full backup is always time consuming, sometimes we provide a more efficient way, AOF, working mechanism is simple, Redis will receive everyWrite commandAre appended to the file via the write function. The popular understanding is logging. Default configuration:

Appendfsync always # write to the AOF file every time a data change occurs.

Appendfsync everysec # synchronizes once per second, which is the default policy for AOF.

Appendfsync no # Never synchronize. Efficient but data is not persisted.

Advantage:

1. AOF can better protect against data loss. Generally, AOF will execute fsync operation every second through a background thread, and lose data for a maximum of one second.

2. AOF log files do not have any disk addressing overhead, high write performance, and the file is not easy to damage.

3. Even if the AOF log file is too large, the background rewrite operation does not affect client read and write operations.

4. Commands for AOF log files are logged in a very readable way, which is ideal for emergency recovery in the event of catastrophic deletions. For example someone accidentally flushall command to empty all data, rewrite this time as long as the background hasn’t happened yet, you can immediately file copy AOF, will last a flushall command to delete, and then the AOF files back, can through the recovery mechanism, Automatically restore all data

How to choose BETWEEN RDB and AOF



Kids make choices. I want them all, are generally used in two ways

Cluster pattern

Redis provides three Cluster modes: primary/secondary replication/synchronization, Sentinel mode, and Cluster

Primary/secondary replication/synchronization

To avoid a single point of failure, it is common to make multiple copies of the database to be deployed on different servers, so that even if one server fails, the other servers can continue to provide services. For this purpose, Redis provides replication, which automatically synchronizes updated data in one database to other databases. In the concept of replication, databases are divided into two categories, one is the master database, the other is the slave database. The primary database can perform read and write operations. When data changes due to write operations, data is automatically synchronized to the secondary database. Secondary databases are generally read-only and accept data synchronized from the primary database. A master database can have multiple slave databases, while a slave database can have only one master database. Principle 1. Connect the secondary database to the primary database and send the SYNC command. 2. After receiving the SYNC command, the primary database starts to run the BGSAVE command to generate an RDB file and use the buffer to record all write commands executed thereafter. 3. After the primary database BGSAVE is executed, snapshot files are sent to all secondary databases, and write commands are recorded during sending. 4. After receiving the snapshot file from the database, discard all old data and load the received snapshot. 5. After the primary database snapshot is sent, the write command in the buffer is sent to the secondary database. 6. Finish loading the snapshot from the database, start to receive command requests, and execute write commands from the independent database buffer; 7. The master database sends the same write command to the slave database every time it executes a write command. The master database receives and executes the received write command from the slave database. After 2.8, commands during disconnection will be transferred to the reconnection database for incremental replication. 9. Perform full synchronization when the master and slave are first connected. After full synchronization is complete, incremental synchronization is performed. Of course, slave can initiate full synchronization at any time if needed. Redis’s policy is to try incremental synchronization first anyway, and if this fails, to require a full synchronization from the slave machine. Advantages: 1. Supports master/slave replication. The master machine automatically synchronizes data to the slave machine, enabling read/write separation. 2. The Slave server can provide the read-only service for the client to share the read operation pressure of the Master. 3.Slave can also accept the connection and synchronization request of other Slaves, so as to effectively load the synchronization pressure of Master; 4.Master Server provides services to Slaves in a non-blocking manner. Therefore, during master-slave synchronization, clients can still submit query or modify requests. 5. The Slave Server synchronizes data in a non-blocking manner. During synchronization, if a client submits a query request, Redis returns the data before synchronization. Disadvantages: 1.Redis does not have automatic fault tolerance and recovery functions. The downtime of the slave host will lead to the failure of some front-end read and write requests, which can be recovered only after the machine restarts or manually switch the front-end IP address. 2. When the host is down, some data cannot be synchronized to the slave host in a timely manner. After the IP address is switched, data inconsistency may occur, which reduces system availability. 3. If multiple slaves are disconnected, do not restart them at the same time. As soon as the Slave starts, a sync request is sent to fully synchronize with the host. When multiple slaves restart, the Master I/O surge may result in downtime. 4.Redis is difficult to support online capacity expansion, which becomes very complicated when the cluster capacity reaches the maximum; 5. Because all secondary servers have a full data backup of the primary server, the total amount of data supported by all secondary servers is limited.

The guard mode

In the master/slave synchronization/replication mode, when the master server breaks down, you need to manually switch a slave server to the master server. This requires manual intervention and requires a lot of effort, and may cause service unavailability for a period of time. This is not a recommended approach, and more often than not, we prefer sentinel mode. Sentinel mode is a special mode, first of all Redis provides sentinel command, sentinel is an independent process, as a process, it will run independently. The idea is that sentry monitors multiple instances of Redis running by sending commands and waiting for the Redis server to respond. Functions: 1. By sending commands, Redis server returns to monitor its running status, including master server and slave server; 2. 2. When the sentry detects that the master is down, it automatically switches the slave to master and notifies other slave servers in publish and subscribe mode to modify the configuration file and ask them to switch hosts. Principle: 1. Each Sentinel process sends a PING command to the Master server, Slave server and other Sentinel processes in the whole cluster at a frequency of once per second. 2. If a Master server is marked as a subjective offline (SDOWN), all Sentinel processes monitoring the Master server confirm once per second that the Master server is indeed in the subjective offline state 3. The Master server is marked as ODOWN 4 when a sufficient number of Sentinel processes (greater than or equal to the value specified in the configuration file) confirm that the Master server has entered the subjective offline state (SDOWN) within the specified time period. In general, each Sentinel process sends INFO commands to all Master and Slave servers in the cluster every 10 seconds. 5. When the Master server is marked as ODOWN by the Sentinel process, The Sentinel process sends INFO commands to all slaves of the offline Master server once every 10 seconds instead of once every second. 6. The objective offline status of the Master server is removed if there is not a sufficient number of Sentinel processes to allow the Master to go offline. If the Master sends a PING command to the Sentinel process again and returns a valid response, the subjective offline status of the Master is removed. The Sentinel mode is based on the master-slave mode, which has all the advantages of the master-slave mode. 2. The master and slave can be switched automatically, making the system more robust and more available.

Cluster Cluster

Redis sentinel mode can basically achieve high availability, read and write separation, but in this mode, each Redis server stores the same data, which is a waste of memory, so in Redis3.0 added Cluster mode, to achieve distributed storage of Redis. This means that each Redis node stores different content. 1. Each Redis node has two slots, one of which ranges from 0 to 16383. Another is cluster, which can be understood as a cluster management plug-in. When our access Key arrives, Redis will get a result according to crC16 algorithm, and then take the remainder of the result to 16384, so that each Key will correspond to a hash slot with the number between 0 and 16383. Through this value, we can find the node corresponding to the corresponding slot. And then automatically route to the corresponding node for access operation. 2. To ensure high availability, the redis-cluster cluster adopts the master-slave mode, where one master node corresponds to one or more slave nodes. When the master node breaks down, the slave node is enabled. When other primary nodes ping A primary node A, if more than half of the primary nodes time out, primary node A is considered to be down. If both primary node A and its secondary node A1 go down, the cluster can no longer be serviced. Advantages: 1. The cluster is completely decentralized and adopts multi-master and multi-slave; All redis nodes are ping-pong with each other and use binary protocols internally to optimize transmission speed and bandwidth. 2. The client is directly connected to the Redis node without the intermediary agent layer. The client does not need to connect to all nodes in the cluster, but to any available node in the cluster. 3. Each partition is composed of a Redis host and multiple slave units. Shards are parallel to each other.

Cache avalanche, breakdown, penetration

Cache avalanche

Cache avalanche refers to the restart of the cache server or the failure of a large number of caches at a certain point in time, resulting in a large number of requests to the database, resulting in database unavailability solutions: 1. When storing cache data in batches, add a random value to the cache expiration time. 2. Hotspot data never expires, so update the cache whenever there is an update operation

Cache breakdown

Cache breakdowns are similar to avalanches in that a key(very hot key) fails at some point in time, causing all requests to hit the database. With mutex, only one request is allowed to reload the latest data from the database into Redis at the moment the key fails. You can also set the hotspot data to never expire and update the cache whenever there is an update operation

The cache to penetrate

Cache penetration refers to the data that is not in the cache and database. In the general cache system, users constantly initiate requests. Some malicious requests will deliberately query non-existent keys, and a large number of requests will cause great pressure on the back-end system. Solution: 1. Add parameter verification on the interface layer to filter out invalid requests 2. Save the data that cannot be cached or retrieved from the database into Redis and set the expiration time 3. Using the Bloom filter, you can view the bloom filter parsing and check whether the data exists through the Bloom filter. If not, return.

Expiry policies

Redis expiration strategy, there are two kinds of periodic deletion + lazy deletion. Periodically delete: default 100ms randomly selected some set expiration time key, to check whether expired, expired delete. Inert delete: when the client query to check whether expired, expired delete do not return, not expired how how. Memory limit is reached and clients are attempting to execute commands that will allow more memory to be used (most write instructions but DEL and a few exceptions). Try to reclaim the least-used key (LRU) to make room for the newly added data. Volatile – LRU: Attempts to reclaim the least used key (LRU), but only the key in the expired set, so that newly added data can be stored. 4. Allkeys-random: Retrieve random keys to make space for newly added data. Volatile -random: Reclaim random keys to make room for newly added data, but only for keys in expired collections. 6. Volatile – TTL: Reclaim the keys that are in the expired set, and preferentially reclaim the keys with a shorter TTL so that newly added data can be stored.

Updates continue at…..

This period to here, write the wrong place giant people give advice, like words to a key three connect