Redis master-slave replication is the cornerstone of high availability. What is high availability?
High availability is about reducing the amount of time the system can’t provide, which is often heard on the basis of six nines. Sentinels and clusters are essential to high availability. This article focuses on the sentinel mechanism.
Then recommend a very useful Redis tutorial, click to watch ~~~
www.bilibili.com/video/BV1Uz… 支那
Information, source code download:
www.bjpowernode.com/javavideo/1…
This article mainly focuses on the following aspects of sentry
- The sentry is introduced
- The sentry configuration
- Sentry principle
Implementation environment of this paper
- Centos7.3 redis4.0
- Redis working directory /usr/local/redis
- Perform simulation operations on the VM
What is a sentry
Let me just say a few words: when we configure the master/slave replication, we have a situation where the master node goes down. Who provides the service?
Master-slave replication is meaningless when the master node goes down, and the era of data is king without data, there is no high availability.
Then a big brother named Sentry came along and said, “LET me help you with this problem.”
Since the master node as the master node does not lead you to play. I’m going to pick one of the four of you to be the oldest, and you’re going to play with him.
When the boss who doesn’t play with you comes back he’s no longer your boss. He had to play with the boss I picked.
The above dialogue process is the significance of our configuration of the sentry exactly where, with whom to play is who to give the data, knowing the role of the sentry we continue.
Finally, let’s explain in technical terms what a sentry is.
Sentinel, also known as Sentinel, is a distributed system used to monitor every server in the master-slave structure. When the master node fails, the new master node is selected through voting mechanism, and all the slave nodes are connected to the new master node.
Second, the role of sentry
The conversation process we talked about above is one of the sentry’s functions automatic failover.
The role must be what the sentry did on the job. We will describe it in dry terms and then explain how it works in the following sections.
The sentry has three functions: monitoring, notifying, and automatic failover
monitoring
- Monitor? The job of supporting a master-slave structure is to have a master node and a slave node, and that must be monitoring those two.
- Monitor whether the primary and secondary nodes are running properly
- Checks whether the primary node is alive and whether the primary and secondary nodes are running
notice
When the sentinel detects a server problem, it sends a notification to other sentinels. The sentinels are like a wechat group, and each sentinel sends a problem to this group.
Automatic failover
When the primary node is detected to be down, disconnect all slave nodes that are connected to the primary node, select one slave node as the primary node, and connect the other slave nodes to the new primary node. And inform the client of the latest server address.
One caveat here is that Sentry is also a Redis server, but does not provide any services to the outside world.
The sentry is set to singular. So why configure an odd number of Sentinel servers? With this question in mind you’ll find the answer below.
How to configure sentries
1. Preparation
In this chapter we begin to configure the sentry, the preparatory work. The picture below shows the preparation of kaka. Start 8 clients, 3 sentinels, 1 master node, 2 slave nodes, 1 master node client, 1 slave node client.
2. Read the sentinel.conf configuration
The sentinel uses a configuration file called sentinel.conf
Let’s interpret the sentinel.conf configuration
But most are comments, kaka here to offer you a command to filter the useless information cat sentinel. Conf | grep -v ‘#’ | grep -v ‘^ $’
- Port 26379: indicates the external service port number
- Dir/TMP: Stores the work information of the sentinel
- Sentinel monitor myMaster 127.0.0.1 6379 2: Sentinel monitor myMaster 127.0.0.1 6379 2
- Sentinel down-after-milliseconds myMaster 30000: How long did the sentinel connect to the primary node before it responded? And then 30,000 is milliseconds, which is 30 seconds.
- Sentinel parallel-syncs myMaster 1: This configuration item indicates the maximum number of slave nodes synchronizing the new master node during a failover. The smaller the value, the longer it takes to complete failover, and the larger the value, the more slave nodes are unavailable for synchronization.
- Sentinel failover-timeout myMaster 180000: Specifies the timeout period for synchronization. The default value is 3 minutes.
3. Start the configuration
Using the cat command sentinel. Conf | grep -v ‘#’ | grep -v ‘^ $> Conf Move the sentinel-filtered information to /usr/local/redis/conf./data/sentinel-26379.confThen open sentinel-26379.conf to change the directory for storing the informationThen quickly copy the two Sentinel profiles with ports 26380 and 26381. sed ‘s/26379/26381/g’ sentinel-26379.conf > sentinel-26381.confTest that the master/slave replication is working properly. Start three Redis servers with ports 6379, 6380, and 6381 respectivelyCheck the information about the primary node. Yes, two secondary nodes (ports 6380 and 6381 respectively) are connected.
There is a little bit of lag, because one is 1 and one is 0. Lag is the delay time. This is a local test, so there will be 0, which is rare when using a cloud server. A value of 0 or 1 lag is normal.Test the primary node to add a hash value, hset kaka name kaka
Obtain the kaka value from Slave1 and Slave2 respectively to check whether the primary and secondary replication is running properly. After testing our master-slave structure is working properly.
Start a sentinel redis-sentinel 26379-sentinel.confConnect 26379 sentinel, mainly the last line, monitor the master node named mymaster, status is normal, there are two slave nodes, the number of sentinels is 1Check the sentinel configuration for 26379, it has been changedRedis-sentinel 26380-sentinel.conf: redis-sentinel 26380-sentinel.conf: redis-sentinel 26380-sentinel.conf: redis-sentinel 26380-sentinel.conf: Redis-sentinel 26380-sentinel.conf: Redis-sentinel 26380-sentinel.conf: Redis-sentinel 26380-sentinel.conf: Redis-sentinel 26380-sentinel.confThen we go to the client of The sentinel 26379, which is also the id of the new sentinel 26380At this time, we are checking the configuration file of 26379 Sentry. When we check the configuration file for the first time, 26380 sentry is not configured. When we check the configuration file for the second time, 26380 Sentry is configured and the information added is added.Finally we need to start sentinel client 3 with port number 26381. After startup, our configuration and server information will also change, adding the information that sentry 26380 has, and sentry 26381 has.
At this point our sentinel configuration is complete, and then we break the master nodeAfter 30 seconds we arrive at 26379 Sentinel’s client, which has added some information, so what does this information do? Let’s walk through it.
We need to know a couple of things here
- +sdown: One of the three sentinels thinks the master node is down
- +odown: This message refers to the fact that the other two sentinels connected to the master node and found that the master node was down
- Then a vote was taken, and here, kaka used Redis4.0, which is a bit different information between versions
- +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380: until here is the result of sentry voting redis with port 6380 as the primary node
- +slave slave 127.0.0.1:6381 127.0.0.1 6381 @myMaster 127.0.0.1 6380: connect port 6381 to 6379 and the new master node 6380
- +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @mymaster 127.0.0.1 6380
When we put the redis server 6379 online again, we can see that the Sentinel server responds with two sentences. One sentence is to remove 6379 offline. The last sentence is to reconnect 6379 to the new master node.
At this point, the master node is 6380. Set the value in the redis client of 6380 to check whether the master/slave replication is working properly.
Add the list type to the new primary node 6380Get this value at 6379 and 6381, and here it is! Our Sentinel mode is configured.
Three, sentry work principle
After the sentinel is configured, it is necessary to analyze its working principle. Only by knowing its working process can we have a better understanding of the sentinel.
This article explains the principle is not so dry! It allows you to read a technical article as a story.
Getting down to business, the sentry’s role is to monitor, notify, and failover. So the working principle is also around these three points.
1. Monitor workflow
- The sentinel sends the INFO command and saves all sentinel status, master and slave node information
- The master node records information about the redis instance. The information recorded by the master node looks the same as the information recorded by the sentinel, but there is a slight difference.
- The sentry sends the info command to the corresponding slave node based on the slave node information obtained at the master node
- Then sentry 2 comes, which also changes the master node to send the INFO command and establishes a CMD connection
- At this point, Sentry 2 will also store the same information as Sentry 1, except for two sentinels.
- At this point a publish subscription is established between each sentinel to ensure that their information is consistent. The sentinels also ping each other for long-term symmetry.
- When a second Sentinel 3 comes in, it does the same thing, sending info to the master and slave nodes. And make connections to Sentry 1 and Sentry 2.
2. Inform workflow
Sentinel sends commands to all of its master and slave nodes to obtain their status and publishes the information to the Sentinel’s subscription.
3. Failover principle (emphasis of this article)
- The sentinel will publish sentinel: hello to the master until it says sdown. This is exactly what the Sentinel server reported when we disconnected the master node. The sentinel reports that the primary node sDown is not complete. The sentinel also sends a message to the Intranet indicating that the primary node is down. The command sent is sentinel IS-master-down-by-address-port
- When the rest of the sentinels received their instructions, did the master node die? Let me go see if I’m hung. The message sent is also hello. The other sentinels also send the message they receive and send the sentinel is-master-down-by-address-port command to their own Intranet, Check that the first sentinel that sent sentinel is-master-down-by-address-port said you’re right, this guy did die. When everyone thinks the master node is down, it changes its state to ODown. When one sentry thought that the master node with its flag hung was Sdown, and when half of the sentries thought that its flag hung was ODown. This is why sentry configuration is singular.
- For one sentry who thought the master node was down, it was called subjective offline. For half of the sentries who thought the master node was down, it was called guest offline.
- Once the guest officer of the master node is considered offline, the sentry will proceed to the next step
At this point, the sentry has detected the problem, so which sentry is responsible for electing the new master node? It could not be that Three would go too, and four would go, and five would go too. Then there would be confusion, and one would have to choose the leader among all the sentries. Look at the picture below.
This time! All five sentinels would meet together, all sentinels would be on an Intranet, and then one thing they would do is all five sentinels would send commands at the same time, sentinel is-master-down-by-address-port and they would carry their campaign count and their Runid.Each Sentinel is both a candidate and a voter. Each Sentinel has one vote, and the envelope represents its vote.When Sentinel1 and Sentinel4 send orders to the crowd at the same time, Sentinel2 says I’ll vote for the one who gets the order first. If Sentinel1 posts early, Sentinel2 votes go to Sentinel1.So the rules are to keep voting until there is a sentinel that has half the total number of sentinels. Sentinel1 will be elected if, say, half the sentinel1 votes are enough. At this point the next stage is taken.At the top, sentinel1 has been selected to represent all the nodes to find a master node. There are certain rules for choosing a master node, not just any one.
Take out the ones that aren’t online first
The sentinel will send a message to all the Redis, and the slow ones will be killed– the one that disconnected from the original master node for the longest time. Because the demo was running out, a New Slave5 was added.After the above three points are judged, salve4 and Slave5 will be screened according to the principle of priority.
- First, it will make other judgments based on the priority, if the priority is the same
- If slave4’s offset is 90, Slave5’s offset is 100, then the sentry will think there is something wrong with the network. Slave5 will be chosen as the new master node. Slave4 and Slave5 have the same offset. One final judgment
- The last step is to judge the RUNID, which is also the seniority in the workplace. That is to say, according to the creation time of the RUNID, the time is early.
After the new master node is selected, instructions are sent to all nodes.
Four,
That’s all you need to know about the sentry, but the most important part of this article is how it works. Let’s just briefly review how it works.
- Monitoring is done first, and all sentinels synchronize information
- The sentinel posts information to the subscription
- failover
The sentinel found that the primary node went offline
Sentries open polls for chief
The responsible person elects the new master node
The new primary node disconnects the original primary node, and other secondary nodes connect to the new primary node. After the original primary node goes online, it connects to the new primary node as the secondary node.
The above is the understanding of sentry, if mistakes can be raised, timely correction.
— — — — — — — —
The original link: blog.csdn.net/fangkang7/a…