Interviewer: Why don’t you tell me something you’ve been reading lately? We can talk about it (I don’t know what to ask today)
Candidate: reading about Redis recently
Interviewer: Well, I remember asking about the basics and persistence of Redis
Interviewer: Why don’t you tell me the structure of your company’s Redis?
Candidate: The Redis architecture of my former company is “sharded cluster”, using the “Proxy” layer to distribute keys to different Redis servers
Candidate: Dynamic capacity expansion, fault recovery, etc.
Interviewer: Can you talk about the architecture and basic implementation principle of the Proxy layer?
Candidate: Sorry, the middleware team is responsible for this, and I haven’t looked into the details
Candidates:…
Interviewer:… .
Candidate: However, I can tell you about the existing common open source Redis architecture (:
Interviewer: That’s it. Okay, here you go
Candidate: Why don’t I start with the basics?
Candidate: As mentioned earlier, Redis has a persistence mechanism. Even if Redis is restarted, data can be reloaded using RDB or AOF files
Candidate: But at this point, there is only one Redis server that stores all the data, and if the Redis server is “temporarily” unrepairable, the services that rely on Redis are gone
Candidate: Therefore, in order to “high availability” Redis, now basically do “backup” for Redis: start one more Redis server, form “master slave architecture”.
Candidate: Data on the secondary server is replicated from the primary server. The data on the primary server is the same as that on the secondary server
Candidate: If the primary server is down, you can “manually” upgrade the “secondary server” to “primary server” to shorten the unavailability time
Interviewer: How does the master “copy” its data to the slave?
Candidate: “replication” is also called “synchronization” and is performed using the “PSYNC” command in Redis, which has two models: full and partial resynchronization
Candidate: Complete resynchronization is performed if the primary server is not replicated by the secondary server for the first time, or the primary server to be replicated by the secondary server is different from the primary server to be replicated last time
Candidate: If the network is disconnected only for a “short period” of time due to network interruption, replication is performed in “partial resynchronization” mode
Candidate :(if the data gap between the master and slave servers is really too large, the “full resynchronization” mode will be used for replication)
Interviewer: Can you talk a little bit about the process of synchronization?
Candidate: Well, no problem
Candidate: The primary server copies data to the secondary server by setting up a Socket connection. This process does some verification, identity verification, etc
Candidate: The slave server then sends the “PSYNC” command to the master server for synchronization (with “server ID” RUNID and “replication progress” offset, not if the slave server is new).
Candidate: When the master server realizes that this is a new slave server (because the parameters are not brought up), it adopts “full resynchronization” mode and sends the “server ID” (runId) and “replication progress” (offset) to the slave server, which takes note of this information.
Interviewer: HMM…
Candidate: The primary server then generates an RDB file in the background and sends it to the secondary server over the previously established connection
Candidate: After receiving the RDB file from the server, delete its own data first, and then load and restore the RDB file
Candidate: The master server is not idle during this process (continues to receive requests from clients)
Interviewer: HMM…
Candidate: The primary server generates the RDB file “commands modified after” in “buffer”. After the secondary server loads the RDB, the primary server sends all the commands recorded in “buffer” to the secondary server
Candidate: In this way, the master and slave servers achieve data consistency (replication is asynchronous, so the data is “final consistency”)
Interviewer: HMM…
Interviewer: What about the “partial resynchronization” process?
Candidate: Well, offset is used for partial resynchronization. Each time the master server propagates a command, “offset” is given to the slave server
Candidate: Both primary and secondary servers store “offsets” (if the offset is different, the data is not fully synchronized)
Candidate: After the secondary server is disconnected, the “PSYNC” command is sent to the primary server, also with RUNID and offset (these information will still exist after the reconnection).
Interviewer: HMM…
Candidate: after the master server receives the command, see if the RUNID is correct, which means it may have copied part of it before
Candidate: Then check to see if the offset still exists in the primary server record
Candidate :(to explain here, since the primary server records offset using a circular buffer, if the buffer is full, the previous record will be overwritten)
Candidate: If found, start with the missing part of the offer and send the corresponding modification command to the slave server
Candidate: If not found from the ring buffer, the primary/secondary replication must be done again in full resynchronization mode
If the Redis master library fails, you still have to manually upgrade from the master library to the master library
Interviewer: Do you know of any way to “automate” fail-over?
Candidate: Absolutely, and then it’s time for the Sentry
Interviewer: Start your performance.
Candidates: Sentinels are responsible for monitoring (monitoring the status of the primary server), selecting a primary server (when the primary server is down, one of the secondary servers is selected as the primary server), notifying (sending messages to administrators when failures occur), and configuring (serving as the configuration center, providing information about the current primary server).
Candidate: Think of Sentinel as a Redis server running in a “special” mode, which is clustered for “high availability.”
Candidate: First it needs to create the corresponding connection with the Redis master and slave servers (to get their information)
Candidate: Each sentry repeatedly pings the primary server to see if it is offline. If the primary server does not respond within the “configuration time”, the current sentry “subjectively” assumes that the primary server is offline
Candidate: The other sentinels also ping the primary server, and if enough sentinels (again, depending on the configuration) think the primary server is offline, they consider it objectively offline and failover the primary server.
Interviewer: HMM…
Candidates: The sentries choose a “leader” from each other, and there are many rules for choosing the leader. Generally speaking, the leader is first come, first served.
Candidate: Failover of the offline master server by lead Sentry
Interviewer: HMM…
Candidates: First select one from the “slave servers” as the primary server
Candidates: (select the configuration priority of the slave server, determine which slave server has the largest replication offset, RunID size, disconnect time from master…)
Candidate: Then, all previous slave servers need to do a “master slave replication” with the new master
Candidate: the primary server that has been taken offline needs to become a secondary server to the new primary server when it is reconnected
Interviewer: HMM… I would like to ask, does Redis cause data loss during master-slave replication and failover
Candidate: Obviously yes, judging from the “master slave replication” process above, this process is asynchronous (in the replication process: the master server receives requests and sends modification commands to the slave server)
Candidate: If the command from the primary server is not finished sending to the secondary server, it will die. At this point, you want to put the slave server over the master server, but the slave server data is incomplete (:
Candidate: There is another situation where the sentry thinks the primary server is down, but the primary server is not down (network jitter), and the sentry has elected a secondary server as the primary server. The “client” continues to write data to the old primary server before the sentry responds
Candidate: By the time the old master server reconnects, it has been incorporated into the slave server of the new master… So, during that time, the data that the client wrote to the old master server was lost
Candidates: In both cases (master-slave replication delay && split brain), data loss can be configured “as far as possible”
Candidate :(reaches a threshold that directly disallows the primary server from receiving write requests in an attempt to reduce the risk of data loss)
Interviewer: Would you like to talk about Redis sharding clustering?
Candidate: HMM… To put it bluntly, the sharded cluster is to store a part of data to each Redis server, and the data of all Redis servers can be added up to form a complete data (distributed).
Candidate: To form a sharding cluster, you need to “route” different keys (sharding)
Candidates: There are two common routing schemes: “client Routing” (SDK) and “Proxy routing” (Proxy)
Candidates: Client route representative (Redis Cluster), server route Representative (Codis)
Interviewer: Why don’t you explain the differences in detail?
Candidate: A little sleepy today, maybe next time?
This paper concludes:
-
Redis implements high availability:
- AOF/RDB persistence mechanism
- Master/slave architecture (master server is down, manual overhead from slave server)
- Introduce sentry mechanism for automatic failure escape
-
Master/slave replication principle:
- There are two modes of PSYNC: full resynchronization and partial resynchronization
- Full resynchronization: The primary server establishes a connection with the secondary server, the primary server generates an RDB file and sends it to the secondary server, the primary server does not block (records related modification commands to the buffer), and sends modification commands to the secondary server
- Partial resynchronization: The secondary server disconnects and reconnects, and sends RunId and offset to the master server. The master server determines the offset and RunId, and sends the commands related to offset that have not been synchronized to the secondary server
-
Sentinel mechanism:
- Sentinels can be understood as special Redis servers that typically form sentinels clusters
- Sentinels monitor, alert, configure, and select the master
- In the event of a primary server failure, a secondary server is “selected” to take over the “objective offline” server, and the “lead sentry” switches
-
Data loss:
- Data loss can occur in both the master/slave replication and failover phases of Redis (to be avoided as much as possible by configuration)
Welcome to follow my wechat official account [Java3y] to talk about Java interview, on line interview series continue to update!
Online Interviewer – Mobile seriesTwo continuous updates a week!
Line – to – line interviewers – computer – end seriesTwo continuous updates a week!
Original is not easy!! Three times!!