👮

The best way of life is to run on the ideal road with a group of like-minded people. There is a story all the way back, a firm step and a clear distance.


Why 👮 Sentinel exists

The premise of the emergence of 👮 Sentinel

Earlier in the Redis technology series, we explored the Redis persistence mechanism and the Master-slave architecture of Redis. Both complement each other to realize the Redis data scalability and high availability and service load, but only rely on persistence and master-slave replication ability (load and the honor of data), in the event of a service outage, cannot be automatically to achieve failover, still need to manually, this caused a huge loss of artificial cost and instability.

👮 persistence + Remaining pain points after master-slave replication

Services cannot be restored after the master server goes offline. If the master server goes offline, you can only manually switch the slave node to master. However, failover cannot be performed automatically.

The addition of 👮 Sentinel would be complete

Sentinel is Redis high availability solution: The Sentinel system, which consists of one or more Sentinel instances, can monitor any number of master servers and all slave servers under these master servers, and automatically upgrade a slave server under the offline master server to a new master server when the monitored master server goes offline.

Master-slave persistence vs. sentry:


Main functions of 👮Sentinel

Redis Sentinel provides a complete high availability solution for Redis. In practice this means that Sentinel can be used to deploy a set of Redis to deal with a variety of failure events without human intervention. It also provides some other functions, such as monitoring, notification, and configuration for the client.

Conceptual definition of 👮Sentinel

Redis-sentinel is an official high availability (HA) solution recommended by Redis. When using Redis as a master-slave high availability solution, if the Master crashes, Redis itself (including many of its clients) does not automatically implement the Master/slave switchover. Redis-sentinel itself is an independent process that monitors multiple master-slave clusters and automatically switches over when the master is down.

Redis released a stable version of Redis Sentinel from 2.8. The current version of Sentinel is called Sentinel 2. It is a rewrite of the original Sentinel implementation using a more powerful and simpler prediction algorithm. Sentinel 1 is available in Redis2.6, but there are some problems.


The functional distribution of 👮Sentinel

  • Monitoring: Sentinel continuously checks whether your primary and secondary nodes are working properly.

  • Notification: Sentinel can notify system administrators or other programs through the API (PUB) if a monitored Redis instance has a problem.

  • Automatic failover: If there is no work as expected, a master node Sentinel will start failover, to promote a slave node master node, and reconfigure the other from the node using the new master node, other nodes from start to copy the new master, and use Redis service application at the time of connection was also notify the new address.

  • Configuration Provider: The client can use Sentinel as the authoritative Configuration publisher to obtain the latest MAste address. If a failover occurs, the Sentinel cluster notifies the client of the new master address and refreshes the Redis configuration. (Sentinel returns the latest master address)

Distribution characteristics of 👮Sentinel

  • If a single Sentinel process is used to monitor the Redis cluster, it is not reliable. When the Sentinel process goes down (sentinel itself has a single point of failure), the entire cluster system will not perform as expected. So it is necessary to cluster sentinel.

  • Redis Sentinel is a distributed system. Sentinel operates in an environment where many Sentinel processes cooperate with each other. Sentinel itself is designed as such. There are many advantages of Sentinel process cooperation as follows:

    1. When multiple Sentinels agree that a master is no longer available, fault detection is performed. This significantly reduces the probability of error.

    2. Even if not all sentinels are working, sentinels work well. This feature makes the system very healthy (preferably an odd number, since it is not easy to elect a single vote).

The overall distribution mode is shown in the following figure:

Fundamentals of 👮Sentinel

Overall: Multiple Sentinel processes, which use gossip Protocols to receive information about whether the primary server is offline and agreement protocols to decide whether to perform automatic failover, And which slave server to choose as the new master server.

Subjective Downline (SDOWN) for 👮Sentinel

A server must return invalid replies for master-down-after-milliseconds to be flagged as subjective offline by Sentinel.

  • During the Sentinel operation phase (which sends messages to other Sentinels, masters and slaves to confirm their survival), if no normal response is received within a specified period of time, the other party is temporarily considered suspended (marked as subjective outage — SDOWN).

    • [Note: When only a single sentinel instance makes a judgment of no response to the Redis instance, the judgment will be subjective and automatic failover and other operations will not be triggered.]
    • Note: A server must always return invalid replies for master-down-after-milliseconds to be flagged as subjective offline by Sentinel.

Objective Downline (ODOWN) of 👮Sentinel

  • When multiple sentinels (the number of sentinels set by the quorum parameter) report that the same master is not responding, the system determines that the master is dead (marked as objective downtime — ODOWN) using Raft algorithm.

    • After multiple Sentinel instances make SDOWN judgment on the same server and communicate with each other through the Sentinel IS-master-down-by-addr command, the server offline judgment is obtained.
    • Sentinel can ask another Sentinel if a given server is considered offline by sending the Sentinel IS-master-down-by-addr command to the other Sentinel.

👮Sentinel offline operation

  • Switching from the subjective offline state to the objective offline state does not use a strict strong quorum algorithm, but instead uses the myth protocol: If Sentinel receives a sufficient number of primary offline reports from other Sentinels within a given time range (master_down_after_milliseconds), Sentinel then changes the status of the master server from subjective offline to objective offline. If the primary server is not subsequently reported offline by other Sentinels, the objective offline status is removed.

  • Objective offline conditions apply only to the primary server: For any other type of Redis instance (other Sentinel and slave service nodes), sentinel does not need to negotiate before judging them to be offline, so slave or other Sentinels will never meet objective offline conditions.

Primary/secondary switchover for 👮Sentinel

  • At this time, the Sentinel cluster will select the leader for fault recovery, select one of the existing slave nodes (algorithm will be introduced later) to be promoted as Master, and point the remaining slaves to the new Master to maintain the master-slave relationship.

👮Sentinel automatic discovery mechanism

  • So how do Sentinel cluster machines find other machines in the cluster?

    • Using radio? It is obviously not suitable. Since it is a product of Redis, it is natural to make full use of redis functions. Sentinel cluster nodes utilize redis Master’s publish/subscribe mechanism to automatically discover other nodes.

Each Sentinel uses publish/subscribe to continuously propagate the configuration version of the master. The publish/subscribe pipeline is: Sentinel :hello, we can subscribe to its channel to view the messages in the channel, as follows:

👮 Sentinel uses pub/ Sub (publish/subscribe) :

Subscribe to the sentinel: Hello channel of each master and slave data node to automatically discover other sentinel nodes that also monitor the unified master. Sentinel sends a message to sentinel: Hello every 1s. Contains the latest master configuration it is currently maintaining.

  • If a Sentinel finds that its configuration version is lower than the received one, it will update its master configuration with the new configuration and establish a command connection between the sentinel and the discovered one. The command connection will then be used to exchange views on the master data node.

  • The status of Sentinel is persistently written to the sentinel configuration file. Each time a new configuration is received or created, the configuration is persisted to hard disk with the version stamp of the configuration. This means that the Sentinel process can be safely stopped and restarted.

How 👮Sentinel is found

As mentioned in the principle, when sentinel finds that the main database is offline objectively, it will conduct the lead sentinel election (more than half cut is greater than the threshold) for fault recovery. Its election algorithm adopts Raft algorithm, which is why its design idea is similar to That of Zookpeer. The election process is as follows:

  • When the sentinel node (called A here) finds that the main database is offline objectively, it sends A command to each sentinel node to ask the other party to elect itself as the leader.

  • If the target sentry has not elected anyone else, agree to elect A as the lead sentry;

  • If A finds that more than half of the sentinels that exceed the quorum parameter value agree to elect itself as the lead sentinel, A sentinel succeeds in electing itself as the lead sentinel.

    • [Majority refers to the number of most sentinel nodes in the sentinel cluster. Only when Max (quorum, majority) nodes vote for a sentinel node, the sentinel node can be determined as leader. The calculation of majority is as follows: Num (sentinels) / 2 + 1]
  • When multiple sentinel nodes participate in the sentry election at the same time, no node is likely to be elected. At this time, each node waiting for a random time to conduct the next round of election until the sentry is elected.

👮 Algorithm for selecting Master from Slave during fault recovery

  • The slave priority is sorted according to the slave priority. The lower slave-priority is, the higher the priority is.

  • If slave Priority is the same, then compare the replication offset. The lower the offset, the closer the synchronization with the old master, the higher the priority.

  • If both conditions are equal, select a slave library with the smallest run ID;

Control election is carried out by sorting slave_priority. The higher the value of slave_offset is, the higher the priority is. If the value of slave_offset is equal to that of slave_offset, the lower the rUNID value is (the earlier the startup time is).

👮Sentinel operation process

  1. Each Sentinel sends a PING command once per second to the Master, Slave, and other Sentinel instances that it knows of. (Heartbeat mechanism)

  2. If an instance has taken longer than the value specified in the down-after-milliseconds option since it last responded to the PING command, it will be flagged as subjective offline by Sentinel.

  3. If a Master is marked as subjectively offline, all sentinels that are monitoring the Master confirm that the Master is indeed subjectively offline at a rate of once per second. (Confirmation vote offline)

  4. When a sufficient number of sentinels (greater than or equal to the value specified in the configuration file) confirm that the Master has indeed gone subjectively offline within the specified time frame, the Master is marked as objectively offline.

  5. In general, each Sentinel sends INFO commands to all known masters and slaves every 10 seconds. (Synchronize data)

  6. When the Master is marked as objective offline by Sentinel, Sentinel sends the INFO command to all slaves of the offline Master once every 10 seconds instead of once every second.

  7. If not enough sentinels agree that the Master is offline, the Master’s objective offline status is removed.

  8. If the Master returns a valid reply to the PING command of Sentinel, the Master’s subjective offline status will be removed.


👮Sentinel deployment configuration

  • The sentinel configuration template is provided in the redis source code: sentinel.conf

  • To deploy Sentinel, you simply need to configure the /etc/redis-sentinel.conf configuration file as follows

/var/lib/redis/sentinel/dir Automatically generate # sentinel myid f0104ad153f34db5a29b8cbb51ef21a31d6d5 # 827 configuration to monitor master name and address, The last 2 indicates that the master is not available until two sentinels in the sentinel cluster consider the master to be faulty. Sentinel monitor myMaster 10.130.2.155 6379 2 # Configuration of the primary server Password (e.g., didn't set the Password, can be omitted) sentinel auth - pass mymaster Password # log logfile "/ var/log/redis/sentinel. Log" configuration is completed, Run the systemctl start redis-sentinel command to start the command. Subjective SDOWN time in milliseconds. Default is 30 seconds. Sentinel down-after-milliseconds myMaster 30000 # How many times does a slave synchronize a master during a failover? The smaller the number, the longer it takes to complete failover, but the larger the number, the more slaves are unavailable because of Replication. Setting this value to 1 ensures that only one slave at a time is unable to process command requests. Sentinel parallel-syncs mymaster 1 #failover-time Specifies the timeout period. If no failover operation is performed within the timeout period, the Sentinel will consider the failover failure. Unit: millisecond. The default time is 3 minutes. sentinel failover-timeout mymaster 180000Copy the code

👮 Core Configuration

sentinel monitor <master-name> <ip> <redis-port> <quorum>: Set the password of the master server (if no password is set, Sentinel auth-pass mymaster 123456 sentinel Down-after-milliseconds mymaster 5000Copy the code
  • Sentinel is a provider of the Redis configuration, not a proxy. The client only gets the configuration of the data node from Sentinel, so the IP here must be accessible to the Redis client.

👮 Sentinel start

Although sentinel is released as a single executable file, redis-sentinel is actually just a Redis server running in a special mode, You can start sentinel when starting a regular Redis server with the given –sentinel option.

If you use the redis-sentinel executable, you can run sentinel with the following command:

$ redis-sentinel /path/to/sentinel.conf

Of course, you can also use the redis service to start:

$ redis-server sentinel.conf –sentinel &

The two ways are the same.

However, it is necessary to run Sentinel using a configuration file that is used by the system to store the current state, which will be reloaded if restarted. Sentinel will refuse to start if there is no configuration file or if the configuration file is in the wrong path.

By default, the Sentinels listen on TCP port 26379, so in order to make Sentinels work, port 26379 on your machine must be open to receive connections from other Sentinel instances. Otherwise, the Sentinels can’t talk to each other and don’t know what to do. No failover is performed.

2. Load the Sentinel special configuration, such as command table, parameters, etc., Sentinel uses the command table, function configuration in sentinel.c, common redis uses the configuration in redis. In addition to saving the general state of the server, Sentinel also saves sentinel-related stateCopy the code
Note:

1. When sentry mode is enabled, if your master server is down, sentry will automatically vote for a master server from the Redis server. This master server can also do reads and writes!

  1. If the main server that went down has been fixed, it is ready to run. Then the server can only read and will automatically follow the new server elected by the sentry!

  2. You can go to./redis-cli and type info to check your status.

👮Redis up to now there are still problems

  • [Sentinel solved] : Once the master node is down, the master node is promoted to the master node, and the address of the master node of the application side needs to be changed, and all the slave nodes need to be ordered to copy the new master node. The whole process requires manual intervention.

  • [Cluster solved] : The write capability of a node is limited by a single node.

  • [Cluster solved] : The storage capacity of a node is limited by a single node.