What is keepalived

Keepalived is a high performance server high availability or hot spare solution, Keepalived can be used to prevent the single point of failure of the server, with Nginx to achieve high availability of web front-end services. Although nginx is very resistant to stress and rarely goes down, if you do not do hot backup, nginx services will hang. Therefore, hot backup is a must, of course, based on your actual business requirements.

Keepalived principle

Keepalived is based on Virtual Router Redundancy Protocol(VRRP).

Keepalived is based on Virtual Router Redundancy Protocol (VRRP), which stands for Virtual Router Redundancy Protocol.

Virtual route redundancy protocol can be considered as a protocol to realize router high availability. That is, N routers providing the same function form a router group, in which there is a master and multiple backup. The master has a VIP that provides services (the default route is this VIP for other machines in the LAN where the router is located). The master sends multicast packets. If the backup fails to receive VRRP packets, the master considers that it has broken down. In this case, you need to elect a backup as the master based on the VRRP priority. This ensures that the router is highly available.

Keepalived has three main modules, namely core, Check and VRRP. The core module is the core of Keepalived, which is responsible for the startup and maintenance of the main process as well as the loading and parsing of the global configuration file. Check is responsible for health checks, including common checks. The VRRP module implements the VRRP protocol.

Keepalived structure

Keepalived has only one configuration file, keepalive.conf. The configuration areas are global_defs, vrrp_instance, and virtual_Server.

Global_defs area

It is mainly used to configure the notification object and machine ID when a fault occurs. Generally speaking, it is a configuration for sending email notification when a fault occurs.

Global_defs {notification_email {Who to notify by email when a fault occurs [email protected] [email protected]... } notification_email_from [email protected] Address from which notification emails are sent smtp_server smtp.hws.com smpt_server SMTP address for notification emails. Smtp_connect_timeout 30 Timeout period for connecting to the SMTP server enable_traps Enable SNMP traps. Router_id Host163 Specifies a note identifying the local node. The value is usually hostname.Copy the code
Vrrp_instance area

Vrrp_instance Defines the VIP area that provides services externally and its related attributes

Vrrp_instance VI_1 {state MASTER state Can be MASTER or BACKUP interface ENS33 Name of the local NIC virtual_Router_id 51 The value ranges from 0 to 255. Priority 100 Weight advert_int 1 Interval for sending VRRP packets, Authentication {authentication area auth_type PASS auth_pass 1111} virtual_ipAddress {virtual IP address 192.168.27.160}}Copy the code
virtual_server

For very large LVS, I’m not going to use it here.

Virtual_server 192.168.200.100 443 {delay_loop 6 Delay polling time (in seconds) lb_ALgo RR Back-end debugging algorithm lb_kind NAT LVS scheduling type persistence_timeout 50 protocol TCP real_server 192.168.201.100 443 {real server weight 1 SSL_GET {URL {path/digest Ff20ad2481f97b1754ef3e12ecd3a9cc said the results of calculated using genhash} {url path/MRTG / 9 b3a0c85a887a256d6939da88aabd8 digestcdConnect_timeout 3 nb_get_retry 3 Number of retries delay_before_retry 3 Delay for next retry}}}Copy the code

Keepalived installation

yum install keepalived -y 
Copy the code

Environmental simulation

I prepared four hosts, IP 192.168.27.166-169, set up nginx service, and then use 166 and 167 as the primary and secondary hosts respectively.

Nginx configuration
upstream centos_pool{
        server s168:80;
        server s169:80;
}
server {
    listen       80;
    server_name  localhost;

    #charset koi8-r;
    #access_log /var/log/nginx/host.access.log main;

    location / {
       # root /usr/share/nginx/html;
       # index index.html index.htm;
        proxy_pass http://centos_pool;
    }
Copy the code

All four hosts start with this configuration, which looks like four Nginx services. In this instance, this is not the case, except that 166 and 167 are Nginx services, 168 and 169 are Web services (with Nignx opening port 80 to mimic services).

In other words, 166 and 167 are used for load balancing, and 168 and 169 are Web service hosts.

I in 168 and 169 host/usr/share/nginx/HTML/index. A simple identification in HTML:

Okay, come down and configure Keepalived

Configuration keepalived

166 Host Configuration:

! Configuration File forkeepalived global_defs { router_id LVS_DEVEL } vrrp_instance VI_1 { state MASTER interface ens33 virtual_router_id 51 Priority 100 advert_int 1 authentication {auth_type PASS auth_pass 1111} virtual_ipAddress {192.168.27.160}}Copy the code

Can be said to be the most core configuration, is also the simplest configuration, to configure the mail service can refer to the above module introduction notes to do. The same goes for THE LVS configuration.

167 Hot spare Configuration:

! Configuration File forkeepalived global_defs { router_id LVS_DEVEL } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 Priority 50 advert_int 1 authentication {auth_type PASS auth_pass 1111} virtual_ipAddress {192.168.27.160}}Copy the code

As you can see, the state MASTER/BACKUP and priority 100 attributes are the same and must be the same. Ok, now boot Keepalived to see who the host is from the IP route (the network card of the taking over node is bound to the VIP address 192.168.27.160)

test

1. Access virtual IP:

2. View the host route

3. Rehearse scenarios

Once everything is configured correctly, WHAT happens when I stop nginx on 166?

Does 167 take over the virtual IP address to complete hot standby? The answer is no, because when you look back, Keepalived has nothing to do with Nginx. Keepalived actually monitors keepalived’s heartbeat on the master. So, I turned keepalived off, too.

[root@s166 keepalived]# nginx -s stop
[root@s166 keepalived]# service keepalived stop
Redirecting to /bin/systemctl stop keepalived.service
Copy the code

Then look at IP route 166

[root@s166 keepalived]# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0 C :29:7b:59:07 BRD FF: FF :ff:ff: FF inet 192.168.27.166/24 BRD 192.168.27.255 Scope Global noprefixRoute ens33 valid_lft forever preferred_lft forever inet6 fe80::83ee:6998:a0d4:7974/64 scope link tentative dadfailed valid_lft forever preferred_lft forever inet6 fe80::2513:4c77:5da7:f031/64 scope link tentative dadfailed valid_lft forever preferred_lft forever inet6 fe80::99b3:c79:5377:c3fe/64 scope link tentative dadfailed valid_lft forever preferred_lft foreverCopy the code

192.168.27.160 virtual IP address 192.168.27.160 Let’s see if 167 does, and if it does, the configuration works.

192.168.27.160

The script to optimize

Since Keepalived is not associated with Nginx, we can write a script to listen on Nginx, if Nginx is down, and then use the command to stop Keepalived as well. This will complete the dual system hot backup task.

Create the script check_nginx.sh

#! /bin/bash
A=`ps -C nginx --no-header | wc -l`
if [ $A -eq0];then
    echo "restart the nginx server" >> /etc/keepalived/keepalived_error.log
    /usr/sbin/nginx
    sleep 2
    if [ `ps -C nginx --no-header | wc -l` -eq0];then
	echo "keepalived is closed" >> /etc/keepalived/keepalived_error.log
 	/usr/bin/ps -ef | grep "keepalived" | grep -v "grep" | cut -c 9-15 | xargs kill9 -echo /usr/bin/ps -ef | grep "keepalived" >> /etc/keepalived/keepalived_error.log
    fi
fi
Copy the code

Oh, and remember to change script executable permissions. Because echo is not printed on the console, we can trace keepalived_error.log to determine if the script is executed.

In that case, how does our script control time? Sleep time also needs to be controlled, and the transition should be as fast as possible while maintaining high performance. So, we load the script into keepalived configuration and keepalived executes the script every time an election takes place.

Add the script to the Keepalived task

! Configuration File forkeepalived global_defs { router_id LVS_DEVEL } vrrp_instance VI_1 { state MASTER interface ens33 virtual_router_id 51 Priority 100 advert_int 1 authentication {auth_type PASS auth_pass 1111} virtual_ipaddress {192.168.27.160} track_script { chk_nginx# nginx survival detection script
    }
}

vrrp_script chk_nginx {
       script "/etc/keepalived/check_nginx.sh"
       interval 2 
       weight -20
}
Copy the code

Similarly, the BACKUP host must be configured

! Configuration File forkeepalived global_defs { router_id LVS_DEVEL } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 Priority 50 advert_int 1 authentication {auth_type PASS auth_pass 1111} virtual_ipaddress {192.168.27.160} track_script { chk_nginx } } vrrp_script chk_nginx { script"/etc/keepalived/check_nginx.sh"
       interval 2 
       weight -20
}
Copy the code
Post-optimization test

Then how to test nginx without restarting it, let it directly off Keepalived, and then enable BACKUP. I commented out the line of the nginx restart script. Then run again.

#! /bin/bash
A=`ps -C nginx --no-header | wc -l`
if [ $A -eq0];then
# echo "restart the nginx server" >> /etc/keepalived/keepalived_error.log
# /usr/sbin/nginx
# sleep 2
# if [ `ps -C nginx --no-header | wc -l` -eq 0 ]; then
	echo "keepalived is closed" >> /etc/keepalived/keepalived_error.log
 	/usr/bin/ps -ef | grep "keepalived" | grep -v "grep" | cut -c 9-15 | xargs kill9 -echo /usr/bin/ps -ef | grep "keepalived" >> /etc/keepalived/keepalived_error.log

# fi
fi
Copy the code

Note Script and Keepalived integration is ok, remove the comment. Mission accomplished.

thinking

[root@s166 keepalived]# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0 C :29:7b:59:07 BRD FF: FF :ff:ff: FF inet 192.168.27.166/24 BRD 192.168.27.255 Scope Global noprefixRoute ens33 Valid_lft forever preferred_lft forever Inet 192.168.27.160/32 Scope Global ENS33 VALID_lft forever preferred_lft forever inet6 fe80::83ee:6998:a0d4:7974/64 scope link tentative dadfailed valid_lft forever preferred_lft forever inet6 fe80::2513:4c77:5da7:f031/64 scope link tentative dadfailed valid_lft forever preferred_lft forever inet6 fe80::99b3:c79:5377:c3fe/64 scope link tentative dadfailed valid_lft forever preferred_lft foreverCopy the code
[root@s167 keepalived]# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0 C :29: D4 :26:34 BRD FF :ff:ff:ff: FF inet 192.168.27.167/24 BRD 192.168.27.255 Scope Global noprefixRoute ens33 Valid_lft forever preferred_lft forever Inet 192.168.27.160/32 Scope Global ENS33 VALID_lft forever preferred_lft forever inet6 fe80::99b3:c79:5377:c3fe/64 scope link noprefixroute valid_lft forever preferred_lft foreverCopy the code

Although we turned off keepalived for S166, IP routing will still have a virtual IP address 192.168.27.160, this is probably keepalived is not completely terminated. However, when I refresh, there is no error page, which proves that the normal operation of the service is not affected. It’s not a split brain problem. I solved the problem by replacing the script’s forced killkeepalived action with a more euphemistic /usr/sbin/service keepalived stop.

Extension: The split-brain problem of high availability