Consul Command line Query service _Consul Cluster high availability test

Through the failover test of Consul cluster, we can know its high availability working mechanism.

Consul Node IP address Role Consul version log path

Consul01192.168.101.11 server0.9.3 / var/log/consulconsul02192.168.101.12 server0.9.3 / var/log/consulconsul03192.168.101.13 se Rver (leader) 0.9.3 / var/log/consul

Note: We are verifying consul cluster’s high availability, not Geo Failover that comes with Consul.

Initial cluster status (ignored by client nodes)

[root@consul01 consul]# ./consul operator raft list-peersNode ID Address State Voter RaftProtocolconsul02 192.168.101.12:8300 192.168.101.12:8300 follower true 2Consul03 192.168.101.13:8300 192.168.101.13:8300 1 [root@consul01 consul]#./consul membersNode Address Status Type Build Protocol DC Segmentconsul01 192.168.101.11:8301 Alive server 0.9.3 2 DC Consul02 192.168.101.12:8301 Alive Server 0.9.3 2 DC Consul03 192.168.101.13:8301 Alive Server 0.9.3 2 DC consul03...Copy the code

Consul of a TSOP domain is a cluster of three server nodes and theoretically supports a maximum of one server node failure. Therefore, we test whether a server node failure affects the cluster

Consul Cluster simulated fault test

§1. Stop a Follower Server node

Take the Consul01 node as an example

[root@consul01 consul]# systemctl stop consul 
Copy the code

Logs of other nodes

[root@consul02 ~]# tail -100 /var/log/consul 2019/02/12 02:30:38 [INFO] serf: EventMemberFailed: Consul01 [INFO] consul01: Handled member-failed event for server "consul01.dc" in area "wan" 2019/02/12 02:30:39 [INFO] serf: EventMemberLeave: Consul01 192.168.101.11Copy the code

View cluster information on other nodes

[root@consul03 consul]# ./consul operator raft list-peersNode ID Address State Voter RaftProtocolconsul02 192.168.101.12:8300 192.168.101.12:8300 follower true 2Consul03 192.168.101.13:8300 192.168.101.13:8300 2[root@consul03 consul]#./consul membersNode Address Status Type Build Protocol DC Segmentconsul01 192.168.101.11:8301 Left Server 0.9.3 2 DC consul02 192.168.101.12:8301 alive server 0.9.3 2 DC consul03 192.168.101.13:8301 alive server 0.9.3 2 dc...Copy the code

Check whether the cluster is running properly. (Query the registration service. If no service exists, manually create one using Consul API.)

[root@consul03 consul]# ./consul catalog servicesconsultest-csdemo-v0-snapshottest-zuul-v0-snapshot 
Copy the code

As you can see, the stopped server node is in the left state, but the cluster is still available

As a result,

Server node, which does not affect Consul cluster service.

Restore this server node shell[root@consul03 Consul]#./ Consul Operator raft list-peersNode ID Address State Voter RaftProtocolconsul02 192.168.101.12:8300 follower true 192.168.101.11:8300 192.168.101.11:8300 follower true 2[root@consul03 Consul]# /consul membersNode Address Status Type Build Protocol DC Segmentconsul01 192.168.101.11:8301 alive server 0.9.3 2 DC Consul02 192.168.101.13:8301 alive Server 0.9.3 2 DC Consul02 192.168.101.13:8301 alive server 0.9.3 2 DC consul02 192.168.101.13:8301 alive server 0.9.3 2 DCCopy the code

Other nodes have detected and added this node

[root@consul02 ~]# tail -100 /var/log/consul 2019/02/12 02:43:51 [INFO] serf: EventMemberJoin: Consul01 [INFO] consul01: consul Handled member-join event for server "consul01.dc" in area "wan" 2019/02/12 02:43:51 [INFO] serf: EventMemberJoin: Consul01 [INFO] consul: added to local server (consul01) TCP / 192.168.101.11:8300) (DC: DC)Copy the code

§1. Stop the Leader Server node

Take the consul03 node as an example

[root@consul03 consul]# systemctl stop consul

The other Follow Server nodes detect that the Leader node is offline and re-elect the Leader

[root@consul02 ~]# tail -100 /var/log/consul 2019/02/12 02:48:27 [INFO] serf: EventMemberLeave: [INFO] consul03.dc consul03.c Handled member-leave event for server "consul03.dc" in area "wan" 2019/02/12 02:48:28 [INFO] serf: EventMemberLeave: [INFO] consul: Removing LAN server consul03 (url: TCP /192.168.101.13:8300) (DC: DC) 2019/02/12 02:48:37 [WARN] raft: Reality Vote request from 192.168.101.11:8300 since we have a leader: 192.168.101.13:8300 2019/02/12 02:48:39 [ERR] Agent: Coordinate Update Error: No cluster leader 2019/02/12 02:48:39 [WARN] raft: [INFO] raft: raft from "192.168.101.13:8300" Node at 192.168.101.12:8300 [Candidate] entering Candidate state in term 5 2019/02/12 02:48:43 [ERR] HTTP: Request GET /v1/catalog/services, error: No cluster leader from=127.0.0.1:44370 2019/02/12 02:48:43 [ERR] HTTP: Request GET /v1/catalog/nodes, error: No cluster leader from=127.0.0.1:36548 2019/02/12 02:48:44 [INFO] raft: Node at 192.168.101.12:8300 [Follower] entering Follower state (Leader: "") 2019/02/12 02:48:44 [INFO] New leader elected: consul01Copy the code

From the log, you can see that Consul01 has been selected as the new leader to view cluster information

[root@consul02 consul]# ./consul operator raft list-peersNode ID Address State Voter RaftProtocolconsul02 192.168.101.12:8300 192.168.101.12:8300 follower true 2Consul01 192.168.101.11:8300 192.168.101.11:8300 leader true 2[root@consul02 consul]#./consul membersNode Address Status Type Build Protocol DC Segmentconsul01 192.168.101.11:8301 Alive Server 0.9.3 2 DC Consul02 192.168.101.12:8301 Alive Server 0.9.3 2 DC Consul03 192.168.101.13:8301 left server 0.9.3 2 dcCopy the code

You can see that the stopped Leader Server node is in the left state, but the cluster is still available. Query service verification

[root@consul02 consul]# ./consul catalog servicesconsultest-csdemo-v0-snapshottest-zuul-v0-snapshot
Copy the code

As a result,

Server node, which does not affect Consul cluster service.

Then restore the node to shell[root@consul03 consul]# systemctl start Consul

You can see from its logs that it is now a Follower server

[root@consul03 ~]# tail -f /var/log/consul  2019/02/12 03:01:33 [INFO] raft: Node at 192.168.101.13:8300 [Follower] entering Follower state (Leader: "") 2019/02/12 03:01:33 [INFO] serf: Ignoring previous leave in snapshot 2019/02/12 03:01:33 [INFO] agent: Retry join LAN is supported for: aws azure gce softlayer 2019/02/12 03:01:33 [INFO] agent: Joining LAN cluster... 2019/02/12 03:01:33 [INFO] agent: (LAN) joining: [consul01 consul02 consul03] 
Copy the code

Node information is also updated on the new leader and follower

[root@consul01 ~]# tail -f /var/log/consul 2019/02/12 03:01:33 [INFO] serf: EventMemberJoin: Consul03.dc consul03.dc consul03.dc Handled member-join event for server "consul03.dc" in area "wan" 2019/02/12 03:01:33 [INFO] serf: EventMemberJoin: Consul03 [INFO] consul: added to local server TCP /192.168.101.13:8300) (DC: DC) 2019/02/12 03:01:33 [INFO] raft: Updating configuration with AddStaging (192.168.101.13:8300, 192.168.101.13:8300) to [{Suffrage: Voter ID: 192.168.101.12:8300 Address: 192.168.101.12:8300} {Suffrage: Voter ID: 192.168.101.11:8300 Address: 192.168.101.11:8300} {Suffrage: Voter ID: 192.168.101.13:8300 Address: 192.168.101.13:8300}] 2019/02/12 03:01:33 [INFO] Raft: Added Peer 192.168.101.13:8300, Starting Replication 2019/02/12 03:01:33 [WARN] RAFT: Added Peer 192.168.101.13:8300, Starting Replication 2019/02/12 03:01:33 AppendEntries to {Voter 192.168.101.13:8300 192.168.101.13:8300} Rejected, sending older logs (next: 394016) 2019/02/12 03:01:33 [INFO] consul: member 'consul03' joined, marking health alive 2019/02/12 03:01:33 [INFO] raft: Pipelining replication to peer {Voter 192.168.101.13:8300 192.168.101.13:8300}Copy the code

Check the cluster information again

[root@consul01 consul]# ./consul operator raft list-peersNode ID Address State Voter RaftProtocolconsul02 192.168.101.12:8300 192.168.101.12:8300 follower true 2Consul01 192.168.101.11:8300 192.168.101.11:8300 leader true 1 [root@consul01 consul]#./consul membersNode Address Status Type Build Protocol DC Segmentconsul01 192.168.101.11:8301 Alive server 0.9.3 2 DC Consul02 192.168.101.12:8301 Alive Server 0.9.3 2 DC Consul03 192.168.101.13:8301 Alive server 0.9.3 2 DCCopy the code

In this case, only the leader role is changed, which does not affect the external services provided by the cluster.

Consul Command line Query service _Consul Cluster high availability test

Related Posts

Build a personal assistant using the open source workflow automation tool N8N

Jxcel – a useful Excel and Java object conversion tool

Mouse over image zoom preview – Chrome plugin for Imagus