Hello everyone, I am brother Seven.
On an article, I take everybody to learn the principle of the Redis master-slave replication, everyone can see “master-slave replication” response to high concurrent read scenes, because from the library downtime can still sends a request to the main library or other from the library, but the Master goes down, the whole cluster can only respond to read operation, unable to perform the written request.
That is to say, the biggest problem of the master-slave replication scheme is that it cannot automatically select a Slave to switch to the Master, which does not have the capability of automatic failover.
So if it’s just master-slave replication, this is not a highly available architecture, what is highly available? We often hear in companies that the architecture is available in terms of five nines and six nines all year round, which means that our services need to be available 99.9999% of the time. To achieve high availability, we need to use the Sentinels we talked about today and the Redis cluster in the next article: RedisCluster, you can follow me, timely access to more technical dry goods.
One more thing to add here: There is a saying in Redis master-slave replication that “master-slave replication is the cornerstone of high availability”, so it is necessary to understand what the principle is.
0. Content summary
Here are some things you can learn about sentinels:
- What is the Redis Sentinel
- What is the function of the sentry
- How is the Redis Sentry configured
- How sentinels work
Stick to the end of the interview for you and the job of how much use, let’s cut the crap and start serving!
Redis sentinel mode is not available until Redis 2.8.
1. What is Redis Sentry
The disadvantage of the master/slave replication in Redis is that there is no way to dynamically elect the master (if the master is down, you need to re-elect the master).
After the death of the master, the slave brothers are hungry for food, and the incremental data of the user cannot be written in. That’s when our sentry comes to the programmer.
The ** * sentry said: ** Big brother is gone, my little brothers, who will come out to take over the boss? I’ll help you solve this problem, it’s drizzling! As an impartial third party observer, I choose the chosen one from your inner four, and you all think he’s the boss. Meanwhile, once I choose a new boss, even if the old master returns, he will only be fit to join our circle as a younger brother.
What is it used for? What is it used for?
Sentinel is a distributed system where you can run multiple Sentinel processes in a single architecture using the Gossip protocol. Widely used in distributed systems) to receive information about whether a Master is offline and to use agreement protocols to decide whether to perform automatic failover and which Slave to choose as the new Master (RAFT algorithm).
2. The role of sentinels
The Sentinel process is used to monitor the status of the Master server in the Redis cluster. If the Master server fails, it can switch between Master and Slave servers. Ensure high availability (HA) of the system;
We will first describe in dry language what the sentry does in his work, and then explain how it works in the following paragraphs.
Redis’ Sentinel system is used to manage multiple Redis servers. The system performs the following three tasks:
- Monitoring: Who is being monitored? Sentinels constantly check to see if your Master and Slave are working properly.
- Notification: When there is a problem with the server detected by the sentinel, it will send a Notification to other sentinels. The sentinels are like a wechat group. Each sentinel will send the problem to this group.
- Automatic failover: When the primary node is detected to be down, disconnect all secondary nodes that are connected to the faulty primary node, select one of the secondary nodes as the primary node, and connect the other secondary nodes to the new primary node. And inform the client of the latest server address.
Each sentry will periodically send messages to other sentries, master and slave to confirm whether the sentry is alive or not. If the sentry does not respond to the sentry within a specified time (configurable), the sentry will be temporarily considered dead (” Subjective Down “(SDOWN for short).
If most of the sentinels report that the master is not responding, the system considers the master to be “dead” (Objective Down).
** As is shown in the figure, subjective offline means that one sentry thinks master is down, while objective offline means that most sentries think big Brother is dead after the sentry communicates with other sentries. ** If sentries are configured at the same time, they should be set to singular.
Then, through a certain voting algorithm (raft algorithm), one of the remaining slave nodes is selected as the new master, and the configuration of the other Redis servers is automatically modified (the master of the slave machine).
Here’s a quick tip: Sentry is actually a Redis server, but it doesn’t serve anything. Later when we configure, you will see that sentinel is actually just a Redis server running in a special mode. You can start sentinel when starting a regular Redis server with the given –sentinel option.
The sentry’s main tasks:
3. Configure the Sentinel mode
My local Redis installation directory: /usr/local/redis
- Copy sentinel.conf to the etc directory
cp sentinel.conf /usr/local/redis/etc
Copy the code
The sentinel uses a configuration file called sentinel.conf. Let’s read this configuration file and see what we need to do.
- Example Modify the sentinel.conf configuration file
# sentinel monitor <master-name> <master ip> <master port> <quorum>
sentinel Monitor MyMaster 192.168.137.6 6379 1
# background run
daemonize yes
Copy the code
Very simple, we just need to specify which master the sentry needs to monitor.
: You can customize the name of the master node as long as it is composed of fields A-z, numbers 0-9, and “.-_ “, I use mymaster.
: indicates the IP address of the monitored master node.
: Port number of the master controller.
: quorum, this means that there are several sentries who think big Brother master is dead, so they are offline objectively, by setting it to half of the sentries plus one.
Other sentinel.conf configuration items
All configuration items are listed in detail for you, it is recommended to read, you can have a further understanding of the implementation principle.
# Sentinel instance runs on port 26379 by default
port 26379
# Sentinel's working directory
dir /tmp
IP port of the redis master node monitored by Sentinel
# master-name Specifies the name of the master node that can be named by itself. The name can contain only letters a-z, numbers 0-9, and ".-_".
When these quorum sentinels consider the master node to be disconnected, the master node is objectively considered to be disconnected
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel Monitor MyMaster 127.0.0.1 6379 1
# When requirePass Foobared is enabled in the Redis instance, all clients connected to the Redis instance must provide the password
The password must be the same as the primary and secondary authentication passwords. If there is no password, this configuration can be ignored
# sentinel auth-pass <master-name> <password>
sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
At this point, the sentinel subjectively considers the primary node to be offline for 30 seconds by default
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000
# This configuration item specifies the maximum number of slaves that can synchronize the new master at the same time during a failover. The smaller the number is, the longer it takes to complete the failover. This means that the more slaves become unavailable due to Replication. Setting this value to 1 ensures that only one slave at a time is unable to process command requests.
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1
Failover -timeout can be used in the following ways:
#1. The interval between two failover operations for the same sentinel and the same master.
#2. Start time counting when a slave synchronizes data from an incorrect master. Until the slave is corrected to synchronize data to the correct master.
#3. The time required to cancel an ongoing failover.
#4. Maximum time required to configure all Slaves to point to the new Master when performing failover. However, even after this timeout, slaves will still be configured correctly to point to the master, but not according to the rules configured for parallel Syncs
Three minutes by default
# sentinel failover-timeout <master-name> <milliseconds>
sentinel failover-timeout mymaster 180000
# SCRIPTS EXECUTION
# Configure the script that needs to be executed when an event occurs. You can use the script to notify the administrator, for example, send an email to inform related personnel when the system is not working properly.
The following rules apply to the results of a script:
If the script returns 1 after execution, the script will be executed again later. The current default is 10
If the script returns 2 after execution, or a value higher than 2, the script will not be executed again.
If the script is aborted during execution due to a system interrupt signal, it behaves the same as if the value is 1.
The maximum execution time of a script is 60 seconds. If this time is exceeded, the script will be terminated by a SIGKILL signal and executed again.
# Notification script: This script will be called when sentinel has any warning level events (such as subjective and objective failures of redis instances, etc.). In this case, this script should notify the system administrator of abnormal system operation via email, SMS, etc. When the script is called, it is passed two parameters, the type of the event and the description of the event.
If the sentinel.conf configuration file is configured with the script path, then the script must exist in the path and be executable, otherwise sentinel will not start successfully.
Notification script
# sentinel notification-script <master-name> <script-path>
sentinel notification-script mymaster /var/redis/notify.sh
The client reconfigures the master node parameter script
This script will be called when a master changes due to a failover, notifying the client that the master address has changed.
The following arguments will be passed to the script when it is called:
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
# currently
always "failover",
#
is either "leader" or "observer".
The from-ip, from-port, to-ip, to-port parameters are used to communicate with the old master and new master(i.e. the old slave)
# This script should be generic and can be called multiple times, not specific.
# sentinel client-reconfig-script <master-name> <script-path>
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
Copy the code
- Start the Sentinel service with Redis-Sentinel
./redis-sentinel sentinel.conf
Copy the code
Note:
- When sentry mode is enabled, if your master server is down, sentry will automatically vote for a master server from the Redis server. This master server can also do reads and writes!
- If the main server that went down has been fixed, it is ready to run. So the server can only read operations, that is, the former big brother is now a little brother.
- You can go in
./redis-cli
, the inputinfo replication
, check your status information;
4. Working principle of Sentinel
After the sentinel is configured, it is necessary to analyze its working principle. Only by knowing its working process can we have a better understanding of the sentinel.
I will try to make it easy for you to read, but with the sentry function and the configuration, it is very easy to understand.
Getting down to business, the sentry’s role is to monitor, notify, and failover. So the working principle is also around these three points.
Monitoring workflow
-
After the sentinel is started, it sends the info command to the master according to the configuration to obtain and save all sentinel status, master node and slave node information.
-
The master keeps track of all slave nodes and sentinel instances connected to it;
-
The sentinel sends the info command after establishing a connection to the corresponding slave node based on the slave node information obtained at the master node.
-
Then sentry 2 comes, which also sends the info command to the master node, and gets the instance information for both the slave node and sentry.
-
Sentinel 2 also stores the same information as Sentinel 1, except that it stores two sentinels.
-
At this time, in order to ensure the consistency of each sentinel’s information, they established a publish subscription and sent ping commands to each other to ensure long-term information symmetry;
-
When a second sentry 3 comes, it does the same thing, sending info to the master and slave nodes, and establishing connections to Sentry 1 and Sentry 2.
How does the slave server synchronize data with the master server?
By default, the slave server sends commands to the master server once per second:
REPLCONF ACK
// Replication_offset indicates the current replication offset of the secondary server.
Copy the code
Here’s an example: If the write command transmitted from the primary server to the secondary server is lost halfway due to a network fault, the secondary server sends the REPLCONF ACK
command to the primary server, and the primary server finds that the current replication offset of the secondary server is less than its own. The master server then finds the data missing from the slave server in the replication backlog buffer based on the replication offset submitted by the slave server and resends it to the slave server.
This question is often asked in interviews, and if you can answer it on the principle level, it’s easy to impress the interviewer.
What is the role of heartbeat detection?
- Detect the network connection status of the master server;
By sending the INFO replication command to the master, you can create a list of slave servers that show how many seconds have passed since the slave last sent a command to the master.
localhost:6377> info replication
# Replication
role:master
connected_slaves:2
slave0:IP =127.0.0.1,port=6379,state=online,offset=110180,lag=0
slave1:IP =127.0.0.1,port=6378,state=online,offset=110180,lag=1 # 1 seconds ago REPLCONF ACK command was sent
master_replid:55c2177dd69fc21dbea4e9f8a3f4fb0ee948855d
master_replid2:a80967516d1b0821c315fd2eb550f2ff0597010c
master_repl_offset:110313
second_repl_offset:25348
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11612
repl_backlog_histlen:98702
Copy the code
The lag value of the slave server should jump between 0 and 1. If the lag value exceeds 1, the connection between the master and slave server is faulty.
- Assist to realize min-slaves function;
This is a cluster security check. Redis can be configured to prevent the primary server from executing write commands in unsafe conditions.
min-slaves-to-write 3
min-slaves-max-lag 10
The preceding configuration indicates that the primary server rejects the write command when the number of secondary servers is less than three or the lag value of all three secondary servers is greater than or equal to 10 seconds. The delay value here is the one above Lag value of the INFO replication command.
Copy the code
Notification workflow
Sentinel sends commands to all of its master and slave nodes to obtain their status and publishes the information to the Sentinel’s subscription.
Failover principles (Critical)
-
The sentinel will publish sentinel: hello to the primary node until the sentinel receives no response and reports sDOWN. The sentry announces that the master node is down, but the sentry still sends a message to the other sentries on the Intranet indicating that the master node is down. The command sent is sentinel IS-master-down-by-address-port.
-
The rest of the sentinels, after receiving their instructions, thought, Is the master node dead? Let me go see if I’m hung. If no response is received, the sentinel sends the command sentinel IS-master-down-by-address-port to its own Intranet. So everyone gets a mass message from the sentry that the master has died, and they count the votes, and more than half of them think the master has died, and they change its status to ODown. When a sentry thinks that the master node is marked sdown, and when more than half of the sentries think that the master node is marked ODown, this is why the sentry configuration is singular.
-
For one sentinel who thought the primary node was down, it was subjective logoff, while more than half of the sentinels thought the primary node was down was objective logoff.
-
Once the master node is considered to be objectively offline, the sentry takes the next step and elects a new eldest brother.
At this point, the problem has been detected inside the sentry, so which sentry is responsible for electing the new master node? We can’t all vote for Jack, Jack and Pockmarked Wang, that would be a mess. So they had to choose the leader among all the sentinels, and how?
This time! All five sentinels would meet together, all sentinels on an Intranet, and then one thing they would do is all five sentinels would send commands at the same time, sentinel is-master-down-by-address-port and they would carry their campaign count and their RuniD.
Each Sentinel is both a candidate and a voter. Each Sentinel has one vote, and the envelope represents its vote.
Let me give you an example of the selection rules: When Sentinel1 and Sentinel4 simultaneously send orders to sentinel4 to prepare for an election, Sentinel2 says I’ll vote for the one who gets the order first. If Sentinel1 posts early, Sentinel2 votes go to Sentinel1.
And that’s the rule to keep voting until at some point in time there’s a sentinel that’s half the number of sentinels. Sentinel1 will be elected if, say, half the sentinel1 votes are enough. At this point the next stage is taken.
At the top, sentinel1 has been selected to represent all the nodes to find a master node. There are certain rules for choosing a master node, not just any one. This article will not explain the specific election rules in detail, it is Raft algorithm, if you are interested in the comments, we will arrange a detailed discussion later.
Here is a brief description of the conditions that must be met to elect a new master:
- The sentinel will send a message to all the Redis and the sentinel will be killed if it responds slowly.
- If slave4’s offset is 90 and Slave5’s offset is 100, then the sentry will think there is something wrong with the network. Slave5 will be chosen as the new master node. Slave4 and Slave5 have the same offset. Let’s look at the next judgment;
- The final step is to determine the RUNID, or seniority in the workplace.
After the new master node is selected, instructions are sent to all nodes, all brothers are summoned, and the new master is notified. The slave then synchronizes data with the new master and broadcasts the identity of the new master to all clients. The switchover is complete.
5. To summarize
That’s all you need to know about the sentry, but the most important thing is what it does and how it works. The interview is also a key point to inspect Redis high availability, let’s briefly summarize.
Redis master/slave replication is the foundation of high availability implementation. The Redis Sentinel mechanism automatically switches between master and slave libraries, which is another big step on the road to high availability.
Sentinels are used for three main things
- Monitor the running status of master and slave to determine whether they are offline objectively.
- When master is offline objectively, select a slave to switch to master.
- Notifies the slave and client of the new master information.
Sentry principle
- First, the master and slave are monitored, and all sentinels synchronize information with each other.
- Sentinels post messages to subscriptions;
- The sentry found that the primary node went offline.
- Sentry opens voting for chief;
- The new master node shall be elected by the person in charge;
- The new master node disconnects the original master node and notifies other secondary nodes to connect to the new master node. After the original master node goes online, it connects as the secondary node.
That’s all for Redis Sentinels, thanks for watching. If there is a mistake, welcome to correct, seven will be corrected in time. ~
In the next article, “RedisCluster: The official Recommended Redis high availability solution”, follow me for the real core knowledge.
Feel good, welcome to like and share, seven elder brother grateful ~
In addition, technical readers have also opened a public account: “Seven elder brothers chat programming”, access to the author’s wechat, grow up and communicate with him. There are N big factory bosses in the group, but also graduate cute new, can be pushed inside oh.