Interview Shock -Redis - Frequently asked questions on Redis Cluster and cache usage and architecture design

The original link

Can insist on others can not insist, to have others have not. Pay attention to programming avenue public number, let us stick to what we think, grow together!

Interview Shock -Redis – Frequently asked questions on Redis Cluster and cache usage and architecture design

In this series, I will sort out some interview questions to share with you, to help students who want to change jobs in Jinsan Yinsi to consolidate and surprise some interview questions often asked by interviewers, come on!!

Redis data type? Which scenarios are applicable?

The thread model of Redis Why is single thread efficiency so high?

[Interview shock] – Redis – Redis master copy? Sentinel mechanism?

Redis sentry principle and Persistence mechanism

Tell me about Redis cluster

In the Redis cluster architecture, it can be composed of N Redis master nodes. Each master node can mount multiple slave nodes. Data can be automatically sharded, each master put a part of the data; There is also built-in high availability support, which can continue to work even if some master is unavailable. Because each master has salve nodes, if mater fails, the Redis Cluster mechanism will automatically switch a slave to master. Support read/write separation: For each master, write to master, and read from slave mater;

Redis Cluster (Multi-master + read-write separation + high availability)

We need to build a Redis cluster based on the Redis cluster. We do not need to build a Redis cluster based on the Redis cluster. We need to build a Redis cluster based on the Redis cluster.

What are the usage scenarios for Redis Cluster and Redis Replication + Sentinal

Redis replication + Sentinal: Redis replication + sentry mechanism

If the amount of data is very small, mainly bearing high concurrency and high performance scenarios, such as the cache is generally a few GIGABytes, a single machine is enough. Set up redis replication, one mater, multiple slaves, depending on the read throughput you require, and set up a Sentinal cluster to ensure high availability of redis replication.

And Redis Cluster is mainly for massive data + high concurrency + high availability scenarios, if it is massive data, if your data volume is very large, then it is recommended to use Redis cluster.

How does redis Cluster implement data distribution? What are the advantages of this approach?

The Redis cluster has a fixed hash slot (hash slot) of 16384. The CRC16 value is calculated for each key and the hash slot corresponding to the key is obtained by modulating 16384.

Each master in a Redis cluster will hold some slots. For example, if there are three masters, each master may hold more than 5000 Hash slots.

Hash Slot makes it easy to add and remove nodes. Adding a master moves part of the hash slot of another master, and removing a master moves its Hash slot to another master. Every time a master node is added or decreased, the module is added to 16384, rather than the number of master nodes. In this way, the data on the old master node will not be lost due to the addition or decrease of the master node. And the cost of moving hash slots in a Redis cluster when adding or removing a master is very low.

What is the communication mechanism between redis Cluster nodes?

Redis Cluster nodes communicate with each other through the Gossip protocol, and all nodes have iFeng metadata. If there are metadata changes on different nodes, U continuously sends the metadata to other nodes for other nodes to make data changes.

The nodes communicate with each other continuously to keep the data of all nodes in the whole cluster intact. It exchanges fault information, node addition and removal, and Hash slot information.

The advantage of this mechanism is that the update of metadata is scattered rather than centralized in one place, and the update requests will be sent to all nodes intermittently, which has a certain delay and reduces the pressure.

Disadvantages: Metadata updates are delayed, which may cause some delays in cluster operations.

What would you do if you had a system with super high read concurrency and used Redis to hold most read requests? First of all, if it is high read concurrency, first look at the order of magnitude of read concurrency, because the Read QPS of redis single machine is at 10,000 level, tens of thousands per second is no problem, the cache architecture of one master and many slave + sentinel cluster is used to bear 10W+ read concurrency per second, master and slave replication, read and write separation. The sentinel cluster is mainly used to improve the availability of the cache architecture and solve the single point of failure problem. Master library is responsible for writing, multiple slave libraries are responsible for reading, support horizontal expansion, according to the QPS request to determine how many redis slave instances. If the read concurrency continues to increase, simply add the redis slave instances.

What do you do if the system needs to cache 1T+ data due to increased traffic? Because the bottleneck of Redis to support mass data lies in the capacity of single machine, I will choose the Redis cluster mode at this time. Each master node stores part of the data. Suppose that a master stores 32G, it only needs N *32G>=1T. N such master nodes can support 1 terabyte + massive data storage.

The bottleneck of single master in Redis is not the concurrency of read and write, but the memory capacity. Even one master with many slaves cannot solve this problem, because in the architecture of one master with many slaves, the data of multiple slaves is exactly the same as that of the master. If the master is 10 gigabytes then the slave can only store 10 gigabytes. So the amount of data is affected by a single master. This time also need to cache a large amount of data, it must have multiple masters, and the data stored by multiple masters can not be the same. Redis’ official Redis Cluster model perfectly solves this problem.

Understand what redis avalanche and penetration is? How to deal with it?

In fact, this is one of the questions you have to ask about caching, because cache avalanche and cache penetration, those are the two biggest problems with caching, and either they don’t show up, or when they do they can be fatal. So the interviewer will definitely ask you.

Let me describe how the cache avalanche came about.

For example, suppose the system receives 5000 requests per second at peak times each day, and the cache can distribute 4000 requests per second at peak times, with another 1000 requests falling to the database (assuming the database can handle 2000 requests per second). If 5000 requests come in at this point, but Redis for some reason fails and the entire cache becomes unavailable, then all 5000 requests fall into the database. Apparently, the database crashed. At this point, if there is no special solution to handle the failure, but you are in a hurry to restart the database, because there is no data in the cache, immediately the database is killed by new traffic. This is cache avalanche.

The cache avalanche can be divided into two parts: before and after: if the cache is unavailable because most of the data sets in the cache are invalid, we can add a random value to the cache expiration time to disperse the expiration time and try to avoid the centralized expiration. In addition, if the cache is unavailable due to redis downtime for other reasons, at this time, we need to prepare a high availability architecture of Redis in advance, such as master-slave + Sentry or Redis Cluster, to avoid the whole cache being unavailable and total crash when Redis fails.

In this case, a small amount of data can also be cached in the local EhCache, plus hystrix stream limiting & degradation components to prevent MySQL from being killed.

After the fact: If an avalanche does occur, we can also restart Redis with its RDB or AOF to quickly load cached data from disk. This requires us to turn on the Redis persistence mechanism in advance to quickly recover cached data after an avalanche, and to recover data from disk to memory once restarted.

Another problem, cache penetration, generally malicious hacker attacks, or their own system out of the bug. For example, hackers malicious forged requests, these requests are not found in the database, so the cache is useless, then a large number of malicious requests will fall into the database to query, the database will not hang?

The solution is 1, as long as the database does not find, write a null value into the cache. 2. Use a Bloom filter to filter the requested keys. If the system considers that no illegal keys exist, the system filters them out.

Let’s talk about Redis’s expiration strategy

You can refer to the previous article. What? What happened to the data I wrote in Redis?

Talk about cache + database double write inconsistency

Refer to the previous article for analysis and solution design of cache + database dual write inconsistency in high concurrency scenarios

The last

“[interview shock] — Redis” is coming to the end, so much for now, if you have more can tell me to supplement oh

This series of articles is a surprise interview, not a tutorial. You can talk a lot if you dig deep, and you just need to explain the principle of the interview. Even better, you can draw pictures as you talk. This series of articles is quick assault, quick pick up, review.

Please give it a thumbs up

Pay attention to the public number programming avenue, the first time to get the article push.

Feel good, please like, follow, forward oh ~

Interview Shock -Redis – Frequently asked questions on Redis Cluster and cache usage and architecture design

Related Posts

Mainland China IP filter – Java implementation

How can MySQL indexes improve query efficiency so much?

How difficult is it to learn Java from zero? It’s all clear… Suitable for small white learning tutorial