Why Sentinel mode is needed
- With persistence alone, services cannot be restored after a server goes offline
- After the master node goes offline, you can manually switch the slave node to master. However, failover cannot be performed automatically
Sentinel mode (Sentinel)
The main function
- Monitoring: Sentinel continuously checks whether your primary and secondary nodes are working properly.
- Notification: Sentinel can notify system administrators or other programs through the API (PUB) if a monitored Redis instance has a problem.
- Automatic failover: If a master goes offline, Sentinel starts a failover. One slave under the master is selected as the new master, and the other slaves start copying the new master. The application can update the new master address through the notification mechanism of Redis service.
- Configuration Provider: The client can use Sentinel as the authoritative Configuration publisher to obtain the latest master address. If a failover occurs, the Sentinel cluster notifies the client of the new master address and refreshes the Redis configuration.
Main configuration
Sentinel is the provider of the redis configuration, not the proxy. The client only gets the configuration of the data node from Sentinel, so the IP here must be accessible to the Redis client. The sentinel configuration template is provided in the redis source code: sentinel.conf
Sentinel start
$ redis-server sentinel.conf --sentinel
1
Copy the code
- Initialize a normal Redis server
- Load Sentinel specific configurations, such as command table and parameters. Sentinel uses command table and functions in sentinel.c, and common Redis uses redis.c
- In addition to saving the general state of the server, Sentinel also saves sentinel-related state
To prepare
Sentinel and Master: Sentinel monitors master and establishes two asynchronous network connections through master to discover other Sentinel and Slaves:
Command connection: used to send commands to the Redis master data node, for example through the INFO command:
-
The master itself runs the information used to update the local master dictionary (the same data structure used by the dictionary in the Redis Hash implementation).
-
Slaves information (role, IP, Port, connection status, priority, replication offset) for updating the local slave dictionary
Subscribe to: Subscribe to sentinel: Hello channel for discovering other Sentinels. The information in the channel includes:
- Sentinel information (IP, Port, RunID, Epoch)
- Information about the monitored Master node (Name, IP, Port, Epoch)
Sentinel and slave: Sentinel automatically detects slave
- The Sentinel sends the INFO command to the master node to obtain information about all slaves
- Sentinel establishes command and subscription connections with slave
Between Sentinels: Automatic discovery mechanism
- Sentinel uses pub/ SUB (publish/subscribe) mechanism to subscribe the Sentinel: Hello channel of each master and slave data node to automatically discover other Sentinel nodes that also monitor the unified Master
- Sentinel sends a message every 1s to Sentinel: Hello containing the latest master configuration it is currently maintaining. If a Sentinel finds that its configuration version is lower than the one it received, it updates its master configuration with the new configuration
- A command link is established with the Sentinel discovered, and views on the master data node are exchanged through this command link
monitoring
- 2. Every 2 seconds, each sentinel passes through the channel of the master node (named as Sentinel :hello) exchange information (pub/sub), including: 3. Every 1 second, each sentinel sends the PING command to other Sentinel and redis master and slave for heartbeat detection, as the judgment basis of node survival
- Subjective offline and objective offline (fault discovery) 1. Subjective offline (subjectively down, SDOWN) : The current Sentinel instance considers a redis service as “unavailable”. The Sentinel did not receive a valid response (+PONG, -loading or -masterdown) within 30 seconds after sending a message to the Redis master. — Sentinel marks master as being off-line (turn on the FLAG SRI_S_DOWN in the flags of master structures) If multiple Sentinel instances assume that the master is in the SDOWN state, the master will be in ODOWN. ODOWN simply means that the master has been determined as “unavailable” by the cluster and failover will be enabled. The sentinel IS-master-down-by-ADDR message was sent to other Sentinel nodes to inquire about the status of the data node, and the sentinel nodes that had reached quorum number considered the data node to be offline
To deal with
Sentinel election (based on Raft algorithm) to select a Leader
Vote: Modify the local leader and Leader_EPOCH
- For those who have voted and become followers, the election will not be held within twice the failover time (the failover timeout is 3 minutes by default). If you are a Candidate who has not voted before, go to Step 2
- Update the failover status to START, EPOCH + 1, and update timeout to a random time within 1s
- The is-is master-down-by-addr command is sent to another node to request a vote. The command will carry its own epoch
- 2 times failover time for elections
The Sentinel election algorithm differs from Raft:
- Elections are held only before failover is required
- By adding a quorum parameter, a Candidate needs not only more than half the votes, but also the value configured for the quorum parameter
- The Leader does not send the message that he is the Leader to other sentinels, and the other sentinels wait for the Leader to slave
After the master is selected, if the new master is detected to be working properly, the old master is removed from the offline status, so that the failover process is not required
Note:
1. Sentinel cannot perform automatic failover when only a few Sentinel processes are running properly.
2. In normal cases, odd sentries should be configured to avoid competition caused by the same votes during switching
A competing voting process occurs at both nodes
Failover (switching Redis Master data nodes)
Sentinel Master selects the appropriate Redis slave to become master
Slave Selection criteria:
1. Healthy nodes:
- online
- Recent successful communication (reply to PING command within 5s)
- Data is new (no more than 10*down-after-milliseconds)
- Lave-priority (Priority of slave node) Specifies the slave node with the highest priority
- The slave node with the largest offset (the most complete copy)
- Select the slave node with the smallest runId (the earliest node to start)
2. Run SLAVEOF no one to make the master node become the new master node. Sentinel sends the INFO command to it every second until it successfully becomes master
3. Send the SLAVEOF new master command to the remaining slave nodes to make them become slave nodes of the new master node
4. Make the remaining slaves copy the data of the new master. According to the sentinel parallel-syncs (sentinel.conf) configuration, the number of slave nodes initiating the replication operation to the new master node is specified. The faster the slave completes the replication, the greater the load on the network and hard disks of the master node. Therefore, the RDB sent by the slave to the master node cannot be used
5. Update the original master node as a slave node and keep an eye on it. Once the master node recovers, the command will be used to copy the new master node information
6. The Leader Sentinel will push the +switch-master message and reset the master. The reset operation frees all slave objects and Sentinel objects listening to the master. Whether the slave can return data to the client depends on slave-serve-stale-data (redis.conf).
7. Keep an eye on the old master and set it as the slave of the new master when it comes back online
Executing sentinel Failover Master in Sentinel forces the Sentinel node to perform failover without elections with other nodes
Sentinel defects
- In Sentinel mode, write operations can only be performed on master data nodes provided by Sentinel, and load balancing is not possible
- During persistence, the master node is blocked due to disk flushing, and the success rate of service requests decreases
- The storage capability of the slave node is limited by a single slave node
- Partition problem: The original master Redis 3 disconnects from Redis 1 and Redis 2. At this time, Redis 1 and Redis 2 perform failover. Redis 1 is selected as master. In this way, both Redis 1 and Redis 3 can accept write requests, but the data cannot be synchronized and the data is inconsistent
Why not use Cluster mode
- Smart Client is implemented on the Client to complete redirection
- Batch operation is limited and cross-slot query is not supported. Therefore, batch operation is unfriendly
- The support for Key transaction operation is limited and only supports transaction operation of multiple keys on the same node. When multiple keys are distributed on different nodes, the transaction function cannot be used
- As the smallest granularity of data partitioning, Key cannot map a large key-value object such as hash or list to different nodes
- Multiple database Spaces are not supported. A single REDis server can support up to 16 databases. In cluster mode, only one database space can be used, that is, DB 0
reference
- Redis Sentinel Documentation
- Further study Redis (4) : Sentry
- Redis design and implementation
- Redis Deep Adventures: Core Principles and Applied Practices
After: Liu Cong
See here friends, if you like this article, don’t forget to forward, favorites, message interaction!
If you have any questions about this article, please feel free to contact me in the comments section
Recently I have sorted out some Java materials, including interview sharing, simulation test questions, and video dry goods, if you need, welcome to private message me!
What does Kangkang have for you?