Interviewer: Let’s talk about Redis sharding Cluster first.
Interviewer: Redis Cluser is the official Redis 3.x cluster solution. How much do you know about it?
Candidate: Well, why don’t we start with the basics?
Candidate: When we talked about Redis earlier, we talked about Redis as a “single instance” that stores all data.
Candidates: 1. In the architecture that implements read/write separation in master/slave mode, multiple secondary servers can carry read traffic, but only the primary server fights against write traffic.
Candidates: 2. “vertical scaling” to upgrade Redis server hardware capabilities, but at a certain point it becomes uneconomical.
Candidate: vertical scaling means “large memory”, which increases the “cost” of Redis persistence.
Candidate: So, “singleton” is a bottleneck
Candidates: If you can’t scale vertically, scale horizontally.
Candidate: Cluster multiple Redis instances and “distribute” data to different Redis instances according to certain rules. When the data of all Redis instances in the cluster is added up, the data is complete
Candidate: Actually the concept of “distributed” (except, in Redis, it seems to be called “sharded cluster” more people?
Candidate: As you can see from the previous section, you can’t avoid “distributing” data (also known as routing)
Candidate: Start with Redis Cluster, whose “routing” is done on the client side (the SDK already integrates routing and forwarding)
Candidate: Redis Cluster’s logic for distributing data involves the concept of “Hash Solt”
Candidate: Redis Cluster By default, a Cluster has 16,384 hash slots, which are allocated to different Redis instances
Candidate: As for how to “divide”, can be directly divided evenly, or “manually” set each Redis instance of the hash slot, it is up to us to decide
Candidate: The important thing is, we have to carve up the 16,384, there can be no surplus!
Candidate: When writing data to the client, the client first computs a 16bit hash value for the key based on the CRC16 algorithm, and then modulates the value to 16384
Candidate: After molding, one of the hash slots comes naturally, and data can then be inserted into the Redis instance assigned to that hash slot
Interviewer: So the question is, now that the client has hashed out the location of the hash slot, how does the client know which Redis instance the hash slot is on?
Candidate: Well, each Redis instance in the cluster “propagates” to other instances what hash slots it is responsible for. In this way, each Redis instance can record the relationship between all hash slots and instances (:
Candidate: With this mapping, the client will also “cache” a copy to its own local location, so the client will know which Redis instance to operate on
You can also add or delete instances of Redis in a cluster.
Candidate: When a Redis instance is deleted or added to the cluster, there is always a change in the hash slot relationship for a Redis instance
Candidate: Information about the change is sent to the entire cluster by message, and all Redis instances are aware of the change and update their saved mappings
Candidate: But at this point, the client is not actually aware (:
Candidate: So, when a client requests a Key, it still requests the “original” Redis instance. The original Redis instance will return the ‘Moved’ command, telling the client that it should go to the new Redis instance
Candidate: Once the client receives the ‘Moved’ command, it will know to request a new Redis instance and update the ‘mapping between the cache hash slot and the instance’
Candidates: Summed up: When the data is moved and responded, the client will receive the “Moved” command and update the local cache
Interviewer: And the data has not been fully migrated?
Candidate: If the data is not fully migrated, the client “ask” command is returned at this point. The client is also asked to request a new Redis instance, but the client does not update the local cache
Interviewer: I see
Interviewer: To put it bluntly, if a cluster Redis instance changes, because Redis instances “communicate” with each other
Interviewer: So when the client requests the data, the Redis instance always knows which Redis instance the client is asking for
Interviewer: If the migration is complete, return the “move” command to tell the client which Redis instance to go to for data, and the client should update its cache (mapping).
Interviewer: If you are migrating, return the “ACK” command to tell the client which Redis instance to go to for data
Candidate: You deserve it…
Interviewer: Do you know why there are 16,384 hash slots?
Candidate: Well, this. Redis instances communicate with each other by exchanging slot information, so if there are too many slots (meaning network packets become larger), does it mean that the network bandwidth will be “overused”
Candidates: Another is that the Redis authors argue that clusters are generally less than 1000 instances
Candidate: 16,384, which is a reasonable way to split data among different instances in the Redis cluster without consuming too much bandwidth when exchanging data
Interviewer: I see
Interviewer: Do you know why Redis uses “hash slots” to partition data? Not consistent hashing
Candidate: As I understand it, a consistent hash algorithm has a “hash ring.” When the client requests it, it hashes the Key, determines its position on the hash ring, and then goes back clockwise to find the first real node it finds
Candidate: The advantage of consistent hashing over traditional fixed moding is that if an instance needs to be added or removed from the cluster, only a small portion of the data will be affected
Candidate: But if you add or delete instances in a cluster, under a consistent hash algorithm, you need to know “which piece of data” is affected and need to migrate the affected data
Interviewer: HMM…
Candidate: Hash slots, as we have seen above: each instance in the cluster gets slot information
Candidate: After the client hashes the key, if the instance of the request has no relevant data, the instance returns a “redirect” command telling the client where to request
Candidates: Cluster expansion and reduction are based on the “hash slot” as the basic unit of operation, the overall “implementation” is easier (concise, efficient, flexible). The process is basically to redistribute some slots and migrate the data in the slots without affecting all the data for an instance in the cluster.
Interviewer: Do you understand the general principles of server-side routing?
Candidate: Well, server-side routing generally refers to having a proxy layer that connects requests from clients and then forwards them to the Redis cluster for processing
Candidates: As mentioned in the last interview, Codis is a popular choice
Candidate: The biggest difference between Redis Cluster and Redis Cluster is that Redis Cluster is directly connected to Redis instance, while Codis is directly connected to Proxy by client, and then distributed by Proxy to different Redis instance for processing
Candidate: The Codis scheme for Key routing is similar to Redis Cluster in that Codis initializes 1024 hash slots and allocates them to different Redis servers
Candidate: The mapping between the hash slot and the Redis instance is stored and managed by Zookeeper. The Proxy obtains the latest mapping through the Codis DashBoard and stores it locally
Interviewer: What is the process if I want to expand the Codis Redis instance?
Candidate: Simply add the new Redis instance to the cluster and migrate some data to the new instance
Candidate: The process is as follows: 1. The “original instance” sends part of a Solt’s data to the “target instance”. 2. After receiving data, the target instance sends an ACK to the original instance. 3. After receiving the ACK, the original instance locally deletes the data just sent to the target instance. 4. Repeat steps 1, 2, and 3 until the entire SOLT migration is complete
Candidate: Codis also supports asynchronous migration. For step 2 above, the original instance sends data and continues receiving requests from the client without waiting for the target instance to return an ACK.
Candidate: Data that has not been migrated is marked as read-only, which does not affect data consistency. If there is a “write operation” on the data in the migration, it will cause the client to “retry” and eventually write to the “target instance.
For example, if a set element has 10,000 elements, the “original instance” might send 10,000 commands to the “target instance” instead of migrating the entire BigKey at once (because large objects tend to block).
Interviewer: I see.
This paper concludes:
-
Reason for the birth of Sharded cluster: Write performance will hit a bottleneck under high concurrency && cannot scale vertically indefinitely (not cost-effective)
-
Sharded clustering: Data routing and data migration issues need to be addressed
-
Redis Cluster data routing:
- A Redis Cluster has 16384 hash slots by default, and hash slots are allocated to instances in the Redis Cluster
- Instances of the Redis cluster “communicate” with each other, exchanging information about their respective hash slots (eventually each instance has a complete mapping)
- When the client requests, CRC16 algorithm is used to calculate the Hash value and module 16384, and the Hash slot can be obtained naturally and then the corresponding Redis instance position can be obtained
-
Why 16384 hash slots: 16384 can not only allow Redis instances to allocate relatively uniform data, but will not affect the interaction between Redis instances of slot information to cause serious network performance overhead problem
-
Hash slot implementation is relatively simple and efficient, each scaling only needs to move the corresponding Solt (slot) data, generally does not move the entire Redis instance
-
Codis data routing: 1024 hash slots are allocated by default. Mapping information is saved to the Zookeeper cluster. Proxy caches a copy to the local server. When the Redis cluster instance changes, DashBoard updates the mapping information between Zookeeper and Proxy
-
Redis Cluster and Codis data migration: Redis Cluster supports synchronous migration, Codis supports synchronous migration && asynchronous migration
- Add the new Redis instance to the cluster and migrate some data to the new instance (online)
Welcome to follow my wechat official account [Java3y] to talk about Java interview, on line interview series continue to update!
Online Interviewer – Mobile seriesTwo continuous updates a week!
Line – to – line interviewers – computer – end seriesTwo continuous updates a week!
Original is not easy!! Three times!!