1. What are the types of cache?
Caches are classified into local cache, distributed cache, and multi-level cache.
Local caching is caching in the process’s memory, such as in our JVM heap, using LRUMap or a tool like Ehcache. Local cache is memory access, no remote interaction overhead, the best performance, but limited by the capacity of a single machine, generally small cache and cannot be expanded.
Distributed caches generally have good horizontal scaling capability and can cope well with large data volume scenarios. The disadvantage is that remote requests are required and the performance is not as good as local caching.
To balance this situation, multi-level caches are generally used in actual services. The local cache stores only the most frequently accessed hotspot data, and other hotspot data is stored in the distributed cache.
2、为什么要用redis
Redis processes requests in a single-threaded mode. This is done for two reasons: one is because of non-blocking asynchronous event handling; Another is that cache data are memory operations, IO time is not too long, single thread can avoid the cost of thread context switch.
Redis supports persistence, so Redis can be used not only as a cache, but also as a NoSQL database.
Redis also has the advantage of supporting multiple data formats in addition to K-V, such as list, set, sorted set, hash, etc.
Redis provides the master/slave synchronization mechanism and Cluster deployment capability to provide high availability services.
Query speed, if more concurrent requests, are directly query the database will hang the database.
3. Redis cache avalanche, cache penetration and cache breakdown
The cache to penetrate
This problem may be caused by external malicious attacks. For example, the user information is cached, but the malicious attacker frequently requests interfaces with non-existent user ids. As a result, the query cache is not matched, and then the query through DB is still not matched. At this point, a large number of requests penetrate the cache and access DB.
The solution is as follows.
1. For non-existent users, an empty object is saved in the cache to mark it to prevent the same ID from accessing DB again. However, sometimes this approach does not solve the problem very well and can result in a large amount of useless data being stored in the cache. 2. Use BloomFilter, which is characterized by existence detection. If BloomFilter does not exist, the data must not exist; If BloomFilter exists, the actual data may not exist. Perfect for solving these kinds of problems.
Cache breakdown
When a hotspot data fails, a large number of requests for that data will penetrate the data source.
There are ways to solve this problem as follows.
1, you can use mutex update, to ensure that the same process for the same data will not be concurrent requests to DB, reduce DB pressure. 2. Use random retreat mode. When failure occurs, sleep randomly for a very short time and query again. 3. If multiple hotspot keys fail at the same time, a fixed time plus a small random number can be used in the cache to avoid a large number of hotspot keys failing at the same time.
Cache avalanche
A large amount of data in the cache reaches the expiration time, but a large amount of query data causes the database to be overburdened or even down. Unlike a cache breakdown, which refers to simultaneous searches for the same data, a cache avalanche is when different data is out of date and a lot of data is not available to search the database.
Solutions:
1, avoid cache expiration at the same time, add a random number. 2. Set the hotspot data to never expire.
3. Persistence mechanism AOF and RDB
Redis provides two persistence methods: RDB and AOF. RDB is to write the data set in memory to disk in the form of snapshot. The actual operation is to fork the sub-process and use binary compression storage. AOF is a text log of every write or delete operation processed by Redis.
RDB saves the entire Redis data in a single file, which is suitable for DISASTER recovery. However, if the system breaks down before the snapshot is saved, the data during this period will be lost. In addition, the service may be unavailable for a short time when the snapshot is saved.
AOF writes to log files in append mode and has a flexible synchronization policy. It supports synchronization per second, synchronization per modification, and synchronization without synchronization. The disadvantage is that AOF is larger than RDB in data sets of the same size, and AOF is slower than RDB in running efficiency.
What happens if the machine suddenly loses power?
Depending on the configuration of the SYNC property of the AOF log, if performance is not required, sync the disk with each write instruction and data will not be lost. However, it is not practical to sync every time due to high performance requirements. Sync is usually timed, such as once a second, and at most 1s of data will be lost.
Advantages and disadvantages of AOF and RDB
RDB
Advantages: Generate multiple data files, each data file represents a moment Redis data inside, this way, is there a feel very suitable for cold standby, complete data set up operational timing tasks, timing synchronization to the remote server, such as ali’s cloud services, so that once the line hang up, how much do you want to restore minutes ago, Just go to the remote and copy the previous data.
RDB has very little impact on Redis’s performance because it only forks a subprocess to persist data while synchronizing data, and it recovers data faster than AOF.
Cons: RDBS are snapshot files that are generated every five minutes or more by default, which means that all data in the five minutes between one synchronization and the next is likely to be lost. AOF can lose one second of data at most.
When RDB takes a snapshot of the data, if the file is very large, the client may pause for a few milliseconds or even a few seconds. When your company is doing a splashkill, it forks a child process to create a large snapshot.
Let’s talk about AOF
Advantages: As mentioned above, RDB takes snapshots every five minutes, but AOF takes snapshots every second through a background thread fsync operation, which loses at most one second of data.
AOF writes to log files in appends-only mode. It only writes data in append mode, which naturally reduces the overhead of disk addressing. The write performance is amazing, and the file is not easy to damage.
AOF logs are recorded in a way called very readable, which makes them suitable for emergency recovery in case of catastrophic data deletion. For example, the company’s interns flushes all data in flushall, so you can make a copy of the AOF log file as soon as the backend rewrite doesn’t happen. Delete the last flushall command and we’re done.
4. Sentinel mechanism
Sentinel is a tool for monitoring Master status in redis clusters. Sentinel is a high availability solution for Redis. Sentinel Sentinel mode has been integrated in versions since RedIS 2.4. The Sentinel system can monitor one or more Redis Master services and all the slave services of these Master services. When a master service goes offline, the secondary service of the master is automatically upgraded to the master service to process requests instead of the offline master service.
Sentinel allows Redis to implement master/slave replication. When a master fails in a cluster, Sentinel can elect a new master to automatically take over the master’s work, and other Redis servers in the cluster will automatically synchronize data to the new master. It is generally recommended to adopt an odd number of sentinel nodes to prevent a sentinel node from being connected to the master node.
Redis Sentinel role
The Sentinel system is used to manage multiple Redis servers (instance), which perform the following three tasks:
1. Monitoring: Sentinel continuously checks whether the primary and secondary servers are operating properly. Notification: Sentinel can send notifications to administrators or other applications via the API when a monitored Redis server has a problem. Automatic failover When a primary server fails, Sentinel starts an automatic failover operation. It upgrades one of the secondary servers of the failed primary server to the new master server and replicates the other secondary servers of the failed primary server. When a client tries to connect to a failed primary server, the cluster also returns the address of the new primary server to the client, allowing the cluster to use the new primary server in place of the failed server.
How Sentinel works
1. Each Sentinel sends a PING command once per second to the Master, Slave, and other Sentinel instances it knows. If an instance has taken longer than the value specified in the down-after-milliseconds option, it will be flagged as subjective offline by Sentinel. 3. If a Master is marked as subjectively offline, all sentinels that are monitoring the Master confirm that the Master is subjectively offline at a rate of once per second. 4. When a sufficient number of sentinels (greater than or equal to the value specified in the configuration file) confirm that the Master is subjectively offline within the specified time range, the Master is marked as objectively offline. 5. In general, each Sentinel sends INFO commands to all known masters and slaves every 10 seconds. 6. When the Master is marked as being offline by Sentinel, Sentinel sends the INFO command to all the slaves of the offline Master once every 10 seconds instead of once every second. 7. If not enough sentinels agree that the Master is offline, the Master’s offline status will be removed. If the Master returns a valid response to the PING command for Sentinel, the subjective offline status of the Master will be removed.
5. Master/slave synchronization
Redis can use master slave synchronization and slave slave synchronization. During the first synchronization, the primary node performs a BGSave and records subsequent modifications to the memory buffer. After the synchronization is complete, the RDB file is fully synchronized to the replication node, and the replication node loads the RDB image to the memory. After the loading is complete, the master node is notified to synchronize the modified operation records to the replication node for replay. Subsequent increments are synchronized through an AOF log, similar to a database’s binlog.
When you start a slave, it sends a psync command to the master. If the slave connects to the master for the first time, it triggers a full copy. The master will start a thread, generate a snapshot of the RDB, and cache all the new write requests in memory. When the RDB file is generated, the master will send the RDB to the slave. The first thing the slave does when it gets the RDB file is write it to its local disk and load it into memory. The master then sends the new commands cached in memory to the slave.
6. Cache elimination mechanism
Memory limit is reached and the client tries to execute commands that will allow more memory to be used (most write instructions, but DEL and a few exceptions)
Allkeys-lru: Attempts to reclaim the least-used key (LRU) to make room for newly added data.
Volatile – LRU: Attempts to reclaim the least-used key (LRU), but only the key in the expired set, so that newly added data has space to store.
Allkeys-random: Retrieves random keys to make room for newly added data.
Volatile -random: Retrievals random keys to make room for newly added data, but only for keys in expired collections.
Volatile – TTL: Retrievals the keys in the expired set, and prioritized the keys with a shorter TTL to make room for newly added data.
The policies volatile- LRU, volatile- Random, and volatile- TTL are designed to be like noeviction if no key satisfies reclamation prerequisites.
7, Memcache
1. Redis not only supports simple K/V type data, but also provides the storage of list, set, hash and other data structures. 2. Redis supports data backup, that is, data backup in master slave mode. 3, Redis supports data persistence, can keep the data in memory in disk, can be reloaded for use when restarting.
The key cannot exceed 250 bytes. The value cannot exceed 1 MB. The maximum expiration time of the key is 30 days. Supports only the K-V structure and does not provide persistence and master/slave synchronization.
8. Redis data type
String:
The String type is the most commonly used type in Redis, and the internal implementation is stored through SDS (Simple Dynamic String). Similar to ArrayList in Java, SDS reduces frequent memory allocation by pre-allocating redundant space.
This is the simplest type, just plain set and get, simple KV cache.
The actual application scenarios of String are as follows:
Cache function: String String is the most commonly used data type, not only Redis, each language is the most basic type, therefore, using Redis as a cache, with other databases as a storage layer, using Redis support high concurrency characteristics, can greatly speed up the system read and write speed, and reduce the back-end database pressure.
Counter: Many systems will use Redis as a real-time counter of the system, which can quickly realize the function of counting and querying. And the final data results can be dropped to the database or other storage media for permanent storage at a specific time.
Shared user Session: When users refresh the interface again, they may need to access the data for re-login or access the page cache Cookie. However, Redis can be used to centrally manage users’ sessions. In this mode, only the high availability of Redis is required, and each update and acquisition of user sessions can be completed quickly. Greatly improve efficiency.
Hash:
This is a Map like structure, which generally allows you to cache structured data, such as an object (provided that the object is not nested with other objects) in Redis, and then operate on a Hash field every time you read or write from the cache.
But this scenario is actually a little bit more simple, because many objects are more complex now, for example, your commodity object may contain a lot of properties, there are also objects. I don’t use my own scenes that much.
The List:
List is an ordered List, and you can do a lot of things with this.
For example, you can store a List of data structures, such as a List of fans, a List of comments on articles, and so on.
For example, you can read elements in a closed interval through the lrange command, and you can achieve paging queries based on List. This is a great function. Based on Redis, you can achieve simple high-performance paging, which can do things like microblog that drop down constantly paging, with high performance, and go page by page.
For example, you could make a simple queue of messages, and then you could pop them in from the head of the List and pull them out from the bottom of the List.
The List itself is one of the most common data structures we use during development, not to mention hot data.
Message queue: Redis linked list structure, can easily achieve blocking queue, can use left in right out of the command composition to complete the queue design. For example, producers of data can insert data from the left using Lpush, and multiple consumers can “grab” data at the end of the list using BRpop blocking.
Article list or data page display applications.
For example, we often use the blog site list of articles, when users more and more large, and each user has its own list of articles, and when the article, all need paging display, then can consider to use Redis list, list not only orderly access elements is also supported in accordance with the scope, can perfectly solve the paging query function. Greatly improve query efficiency.
Set:
A Set is an unordered Set, automatically unduplicated.
If you need to do a quick global de-weighting of some data, you can also do it based on a HashSet in the JVM memory, but what if your system is deployed on multiple machines? Redis is used for global Set de-duplication.
We can use Set to perform intersection, union, and difference operations. For example, we can Set the intersection of two people’s friend lists and see who their common friends are. Isn’t it.
Anyway, these scenes are more, because the comparison is quick, the operation is also simple, two queries a Set to fix.
Sorted Set:
A Sorted set is a Sorted set. It is unrepeatable but Sorted. It is given a score when it is written in.
The usage scenario of ordered sets is similar to collections, but set sets are not automatically ordered. In contrast, Sorted sets can use scores to sort members, and they are Sorted when they are inserted. So when you need an ordered and non-repetitive list of collections, you can choose the Sorted set data structure.
Leaderboards: An ordered collection of classic usage scenarios. For example, a video website needs to make a list of videos uploaded by users. The maintenance of the list may be in many aspects: by time, by play quantity, by the number of likes obtained, etc.
Sorted Sets are used to create weighted queues, such as a score of 1 for common messages and 2 for important messages, and then the worker thread can choose to obtain work tasks in reverse order of score. Prioritize important tasks.
Weibo hot search list, is a behind the heat value, the front is the name
Redis transactions
The last feature is transactions, but Redis does not provide strict transactions. Redis only guarantees serial execution of commands, and can guarantee full execution, but when the command fails, it will not be rolled back, but will continue to execute.
10. Redis implements distributed locking
Have you ever used Redis for asynchronous queues? How do you use it?
The list structure is typically used as a queue, with RPUSH producing messages and LPOP consuming messages. When there are no LPOP messages, sleep for a while and try again. List also has a directive called blPOP, which blocks when there is no message until it arrives.
Using the PUB/SUB topic subscriber pattern, a 1:N message queue can be implemented.
12. Why is Redis so fast
Completely memory based, the vast majority of requests are pure memory operations, very fast. Its data is stored in memory, similar to a HashMap, which has the advantage of O(1) time complexity for both lookup and operation.
The data structure is simple, the data operation is also simple, Redis data structure is specially designed, the application of the hop table;
The use of single thread, avoid unnecessary context switch and competition conditions, there is no multi-process or multi-threading caused by the switch and CPU consumption, do not have to consider the problem of various locks, there is no lock release lock operation, there is no performance consumption due to the possibility of deadlock;
Use multiplex I/O multiplexing model, non-blocking IO;
The use of the underlying model is different, between them the underlying implementation and communication with the client between the application protocol is not the same, Redis directly built their own VM mechanism, because the general system call system function, will waste a certain amount of time to move and request;
Redis IO multiplexing technology
Redis jump table
The hop table has the following properties:
(1) It is composed of many layers
(2) Each layer is an ordered linked list
(3) The lowest Level (Level 1) list contains all elements
(4) If an element appears in the linked list of Level I, it will also appear in all linked lists below Level I.
(5) Each node contains two Pointers, one to the next element in the same list and one to the element at the next level.
Skip number, like a coin toss, probability P of the first layer, P squared of the second layer.
The average number of layers (i.e. the average number of Pointers contained) of a node is calculated as follows:
13. Cache consistency issues
First, let’s make it clear that caching is not updating, but deleting. There are two ways to delete the cache: 1. Delete the cache first and then update the database. The solution is to use delayed dual-delete. Update the database first, then delete the cache. The solution is message queuing or other binlog synchronization. Introducing message queuing causes more problems and is not recommended. For scenarios where cache consistency is not very high, simply set the timeout.
14. Bloom filter
Bloom Filter was proposed by Bloom in 1970. It’s actually a long binary vector and a series of random mapping functions. Bloom filters can be used to retrieve whether an element is in a collection. The advantage of this algorithm is that the space efficiency and query time are much better than the general algorithm, but the disadvantage is that there is a certain error recognition rate and deletion difficulty
Bloom filter data structure
In the initial state, for a bit array of length m, all its bits are set to 0, as shown below:
When a variable is added to the set, the variable is mapped to K points in the bitmap by K mapping functions, setting them to 1 (assuming that there are two variables that each pass three mapping functions).
So when we look up a variable we just have to see if all of these points are 1 and we know with a high probability that it’s in the set
If any of these points have a 0, the queried variable must not be there; If both are 1, then the queried variable is likely to exist why is it possible, not certain? That’s because the mapping function itself is a hash function, and hash functions are subject to collisions.
An element may not exist if it is judged to exist, but it certainly does not exist if it is judged to not exist. A Bloom filter can add elements, but cannot remove them. Because deleting elements increases the misjudgment rate.
Add elements
The element I’m going to add is given k hash functions that correspond to k positions in the bit array and I’m going to set those k positions to 1
Query element
The element that you’re going to query is given k hash functions that correspond to k positions in the bit array and if one of the k positions is 0, you’re definitely not in the set and if all of the K positions are 1, you’re probably in the set
15. What is the election algorithm and process of Redis
Raft election process
The heartbeat mechanism triggers the Leader election. After the system starts, all nodes are initialized as followers and the term is 0. A node retains its Follower status if it receives RequestVote or AppendEntries. If no AppendEntries message is received for a period of time until the election times out, the Follower becomes a Candidate and starts running for the Leader. Once converted to Candidate, the node immediately starts the following things:
1. Add your term. 2. Start a new timer. 3. Vote for yourself. 4. Send RequestVote to all other nodes and wait for the other nodes to reply. If AppendEntries are received during this process, it indicates that a Leader has been created and it becomes a Follower, and the election ends.
If the node receives a majority of the nodes’ consent votes before the timer expires, it becomes the Leader. At the same time send AppendEntries to all other nodes to indicate that you have become the Leader.
Each node can only vote one time per term, which is a first-come, first-served policy. Followers will vote for the first node that receives a RequestVote. Each Follower has a timer. When the timer expires and the Follower still does not receive the heartbeat RPC from the Leader, he/she becomes a Candidate and begins to request votes, which is the step of the Leader election above.
If multiple candidates initiate a Vote and each Candidate does not get a Split Vote, the Candidate will wait until the timer expires and become a Candidate again, and repeat the previous steps of running for Leader.
The Raft protocol’s timer takes a random timeout, which is key to electing a Leader. The timeout period of the timer for each node is randomly set to between one and two times the configured time. Due to random configuration, the time for followers to become candidates at the same time is generally different. In the same term, the node that becomes a Candidate first votes first, thus obtaining the majority of votes. It is highly unlikely that multiple nodes will be converted to candidates at the same time. Even if several candidates vote at the same time, several nodes in that term get the same number of votes, except that the term does not elect the Leader. As the timeout time of each node timer is randomly generated, the node that enters the next term first has a better chance to become the Leader. In theory, the probability of a node getting the same number of votes in a term many times in a row is small, but in practice it can be considered completely impossible. Generally, the Leader will be selected after 1-2 term classes.
Sentinel’s election process
When the Sentinel cluster runs normally, each node has the same epoch. When failover is required, the Leader will be selected from the cluster to perform the failover operation. Sentinel uses Raft protocol to implement the Leader election algorithm among Sentinels, but it is not completely consistent with the steps described in this paper. During the operation of the Sentinel cluster, the failover is complete and all sentinels are restored to equality. The Leader is simply the role where the failover operation occurs.
The election process
1. After a Sentinel identifies a node whose master is objectively offline, the Sentinel will first check whether it has voted. If it has already voted for another Sentinel, it will not become the Leader in the timeout period of 2x failover. Equivalent to it being a Follower. 2. If the Sentinel has not already been voted on, it becomes a Candidate. To become a Candidate, Sentinel needs to do several things as described in Raft protocol: 1) update the failover state to start 2) Add 1 to the current epoch, which is equivalent to entering a new term. In Sentinel, the epoch is the term in Raft protocol. 3) Update the timeout to the current time randomly plus a period of time, the random time is the number of random milliseconds within 1s. 4) Send the IS-master-down-by-addr command to other nodes to request voting. The command will carry its own epoch. 5) Vote for yourself. In Sentinel, you vote by changing the leader and Leader_epoch in your master structure to Sentinel and its epoch. 4. Other sentinels receive is-master-down-by-addr from the Candidate. If the Sentinel current epoch is the same as the epoch passed by the Candidate, it indicates that the Sentinel has changed the leader and Leader_epoch in its master structure to another Candidate, which is equivalent to voting for another Candidate. After voting for another Sentinel, he could only become a Follower within the current epoch. 5. Candidate keeps counting his votes until he finds that more than half of the votes approve him to be the Leader and more than the quorum configured for him (see Redis Sentinel Design and Implementation). Sentinel adds quorum over Raft, and whether a Sentinel can be elected Leader depends on its configured quorum. 6. If a Candidate does not receive more than half of the votes for its quorum within an election period, the Candidate loses the election. 7. If within an epoch, no Candidate received more votes. Then, after waiting more than twice the failover timeout, the Candidate adds the epoch to re-vote. If a Candidate obtains more than half of the votes for its configured quorum, it becomes the Leader. Unlike Raft protocol, the Leader does not send the message that he is the Leader to other Sentinels. Other sentinels wait for the Leader to select the master from the slave, and when the new master is detected to work properly, they will remove the indication of objective offline, so there is no need to enter the failover process.