This blog will provide a brief introduction to the principles of Redis Sentinel. In the final article, we will give you a hard core tutorial so that you can get a hands-on experience with the process.

Previous articles have talked about master-slave replication in Redis, its related principles and drawbacks. For specific advice, check out my previous article on Master-slave replication in Redis.

Overall, in order to meet the high availability of Redis in a really complex production environment, master slave replication is clearly not enough. For example, when the master node breaks down and the master/slave switchover is performed, a failover is required.

Meanwhile, in terms of traffic, the master/slave architecture can only expand read requests by adding slave nodes, while the write capacity cannot be expanded due to the resource limitation of the master node.

That’s why we need to introduce Sentinel.

Sentinel

Functions overview

The basic functions of Sentinel are shown below.

Sentinel is one of the highly available solutions of Redis and is itself a distributed architecture consisting of multiple Sentinel nodes and multiple Redis nodes. Each Sentinel node monitors the Redis node and other Sentinel nodes.

When it finds that a node is unreachable, the master node negotiates with the remaining Sentinel nodes. When the master is considered unreachable by most Sentinel nodes, a Sentinel node is selected to failover the master and notify the Redis caller of the change.

As opposed to manual failover under master slave, Sentinel’s failover is fully automatic and requires no human intervention.

Sentinel itself is highly available

666. How do I know how many Sentinel nodes I need to deploy to meet its own high availability?

Because Sentinel itself is distributed, it also needs to deploy multiple instances to ensure high availability of its own cluster, but there is a minimum requirement of 3 instances.

Oh, shit. You said three, three? I’m only going to deploy two today

You don’t bar… When I tell you why it has to be three…

Because a sentinel failover requires the consent of most sentinels, it works fine if there are only two sentinel instances, like this.

If the Sentinel to the machine due to the room without electricity, optical fiber was digging extreme cases such as the entire hang up, then another sentry even found the master want to failover after failure, but it can’t get any rest, with the approval of the Sentinel node at this time also can never failover, the Sentinel wouldn’t become a decoration?

So we need at least three nodes to ensure the Sentinel cluster itself is highly available. Of course, all three Sentinel nodes are recommended to be deployed on different machines. If all Sentinel nodes are deployed on the same machine, then when that machine dies, the entire Sentinel will cease to exist.

quorum&majority

Most? Eldest brother this but want to go up production environment, most this quantity is rather too perfunctory, we can’t professional a bit?

Most Sentinel consent mentioned above involves two parameters, one called quorum. If a Sentinel cluster has quorum sentinels that consider the master to be down, the master is objectively considered to be down. The other one is called majority…

Wait, wait, wait. Isn’t there already something called quorum? Why do WE need the majority?

Can you just wait till I’m done…

Quorum, as we just said, is used to determine whether or not the master is down, and that’s just a judgment call. In real production, don’t we just decide that the master is down and we have to perform failover to make the cluster work?

Similarly, if a majority of sentinels agree to failover, a sentinel node can be selected to failover.

Subjective downtime & objective downtime

Did you just mention objective downtime? Laughable. Is there a subjective outage?

There are two types of node failure in Sentinel:

  • Subjective Down (SDOWN for short) is the Subjective belief that master is Down
  • Objective Down, oDOWN for short, objectively considers that the master is Down

When A Sentinel node communicates with the monitored Redis node A and finds that the connection is not connected, the Sentinel node will think that the Redis data node A is sdown subjectively. Why subjective? We need to know what is subjective

Without analysis and calculation, conclusions, decisions and behavioral responses cannot be discussed carefully with other objects of different views temporarily, which is called subjective.

To put it simply, it is possible that only the current Sentinel node has A network communication problem with this node, and the remaining Sentinel nodes can still communicate with A normally.

This is why we need to introduce ODOWN. When the number of Sentinel nodes is greater than or equal to quorum deems a node to be down, we objectively consider the node to be down.

When the Sentinel cluster objectively considers that the master is down, a Sentinel node will be selected from all Sentinel nodes to perform the master failover.

So what exactly does this failover do? Let’s look at it through a graph.

Notify the calling client master that something has changed

The remaining slave nodes are told to copy the new master nodes elected by Sentinel

If the original master is restored, Sentinel will also make it copy the new master node. Becomes a new slave node.

Hardcore tutorial

The core tutorial is designed to give you the fastest way to experience the Redis master-slave architecture and Sentinel cluster setup locally, and experience the entire failover process.

Lead requirements

  1. Install the docker
  2. Install the docker – compose

Prepare the compose file

You need to prepare a directory first, and then create two separate subdirectories. As follows.

$tree.. ├ ─ ─ redis │ └ ─ ─ docker - compose. Yml └ ─ ─ sentinel ├ ─ ─ docker - compose. Yml ├ ─ ─ sentinel1. Conf ├ ─ ─ sentinel2. Conf └─ sentinel3. Conf 2 directories, 5 filesCopy the code

Set up Redis master and slave servers

The contents of docker-comemage. yml in redis are as follows.

version: '3'
services:
  master:
    image: redis
    container_name: redis-master
    ports:
      - 6380: 6379
  slave1:
    image: redis
    container_name: redis-slave-1
    ports:
      - 6381: 6379
    command:  redis-server --slaveof redis-master 6379
  slave2:
    image: redis
    container_name: redis-slave-2
    ports:
      - 6382: 6379
    command: redis-server --slaveof redis-master 6379
Copy the code

Above command, briefly explain slaveof

The two slave nodes replicate the redis-master node with container_name as redis-master, thus forming a simple three-node master-slave architecture

Docker-compose compose will do the rest for docker-compose, which will start up all the nodes you need.

At this point, we also need to get the IP address of the master node we just started. The brief steps are as follows:

  1. Find the containerID of the master node using docker PS

    $docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9f682C199e9b redis "docker-entryPoint.s..." 3 seconds ago Up 2 seconds 0.0.0.0:6381->6379/ TCP redis-slave-1 2572ab587558 redis "docker-entrypoint.s..." 3 seconds ago Up 2 seconds 0.0.0.0:6382->6379/ TCP redis-slave-2 f70a9d9809bc redis "docker-entrypoint.s..." 3 seconds ago Up 2 seconds 0.0.0.0:6380->6379/ TCP redis-masterCopy the code

    So f70A9D9809BC.

  2. Docker inspect f70a9d9809BC to get the IP of the container, go to NetworkSettings -> Networks -> IPAddress.

Then write down the value, where MY value is 172.28.0.3.

Setting up Sentinel Cluster

The content of docker-comemage. yml in the sentinel directory is as follows.

version: '3'
services:
  sentinel1:
    image: redis
    container_name: redis-sentinel-1
    ports:
      - 26379: 26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel1.conf:/usr/local/etc/redis/sentinel.conf
  sentinel2:
    image: redis
    container_name: redis-sentinel-2
    ports:
    - 26380: 26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel2.conf:/usr/local/etc/redis/sentinel.conf
  sentinel3:
    image: redis
    container_name: redis-sentinel-3
    ports:
      - 26381: 26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel3.conf:/usr/local/etc/redis/sentinel.conf
networks:
  default:
    external:
      name: redis_default
Copy the code

Also explain the command here

The redis-sentinel command lets Redis start in sentinel mode, which is essentially a Redis server running in a special mode.

The difference between Sentinel and Redis-Server is that they load different command tables, and sentinel cannot perform the set GET operation that is unique to Redis.

Create three identical files called Sentinel1.conf, Sentinel2.conf, and Sentinel3.conf. It reads as follows:

Port 26379 dir "/ TMP "sentinel deny-scripts-reconfig yes sentinel monitor mymaster 172.28.0.3 6379 2 Sentinel config-epoch mymaster 1 sentinel leader-epoch mymaster 1Copy the code

The sentinel monitor myMaster 172.28.0.36379 2 is used to monitor the master node named myMaster. The IP address must be the IP address of the master node. And then the last 2 is quorum, which we mentioned earlier.

Then go to the sentinel directory and type docker-compose up. At this point, the Sentinel cluster is started.

Manually simulate master hangs

We then need to manually simulate the master failure to verify that the Sentinel cluster we built can properly perform failover.

Command line Go to the directory named redis and type the following command.

docker-compose pause master
Copy the code

After 10 seconds, the sentinel will output the following log.

Redis - sentinel - | 1:2 X 07 Dec 2020 01:58:05. 459 # + sdown master mymaster 172.28.0.3 6379... . . Redis - sentinel - | 1:1 X, 07 Dec 2020 01:58:06. 932 # + switch - master mymaster 172.28.0.3 172.28.0.2 6379 6379Copy the code

Yeah. Why are you just throwing up a bunch of log files? -? How the hell can you read that?

Indeed, just looking at the log file line by line, even if I looked at it two weeks later, I was confused. The log file is a complete description of the Sentinel cluster from the start of the failover to the end of the failover, but it is not easy to display it here.

So in order to give you a more intuitive understanding of the process, I simply abstract the process into a graph, you look at the graph combined with the log, should be easier to understand.

I also put in the picture the explanation of the key steps.

The end result is that master has been switched from our original 172.28.0.3 to 172.28.0.2, which is one of the original slave nodes. We can also connect to 172.28.0.2 and run the command to see how it works.

Role: master connected_slaves: 1 slave0: IP = 172.28.0.4, port = 6379, state = online, offset = 18952, lag = 0 master_replid:f0bf5d1c843ec3ab005c5ac2b864f7ffdc6a8217 master_replid2:72c43e1f9c05d4b08bea6bf9b2549997587e261c master_repl_offset:18952 second_repl_offset:16351 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:18952Copy the code

As you can see, 172.28.0.2 is now the master node. Only one slave node is connected to it. The original master has not been started yet, and there are only two active instances.

The original master is restarted

Let’s simulate the original master reboot and see what happens.

Docker-compose unpause master is used to simulate the coming online of the original master after the fault recovery. Again, we connect to the original master machine.

$ docker exec -it f70a9d9809bc1e924a5be0135888067ad3eb16552f9eaf82495e4c956b456cd9 /bin/sh; exit
# redis-cli
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.28.0.2
master_port:6379
master_link_status:up
......
Copy the code

After the master is disconnected and reconnected, the role becomes a slave of the new master (172.28.0.2).

Then we can also testify by looking again at replication of the new master node.

Replication role:master Connected_SLAVES :2 Slave0: IP =172.28.0.4,port=6379,state=online,offset=179800,lag=0 = 172.28.0.3 slave1: IP and port = 6379, state = online, offset = 179800, lag = 1...Copy the code

After the original master short line is reconnected, its Connected_Slaves becomes 2, and the original Master172.28.0.3 is clearly marked as SlavE1, which is also consistent with the principle mentioned in the beginning and figure.

Well, that’s all for this blog post

Welcome to follow “SH’s full stack Notes” on wechat and check out more related articles in the past

  • Redis Basics – Dissects the underlying data structures and their usage
  • Redis Basics – How does Redis persist data
  • Take a look at master/slave replication in Redis
  • WebAssembly is all about learning about WASM
  • Introduction to the JVM and garbage collection