preface

On a sunny afternoon, a piece of email reminder came from the lower right corner of the computer, accompanied by the vibration of wechat nail. When I opened it, I found various errors in application, rabbit alarm, database server memory explosion and MySql database instance hanging.

screening

First explain the database version:


     

    mysql> status

    --------------

    Mysql Ver 14.14 Distrib 5.7.22-22, for Linux (x86_64) using 6.2

    Connection id:          59568

    Current database:

    Current user:           root@localhost

    SSL:                    Not in use

    Current pager:          stdout

    Using outfile:          ''

    Using delimiter:        ;

    Server version: 5.7.22-22-log Percona Server (GPL), Release 22, Revision F62D93C

    Protocol version:       10

Copy the code

Crash troubleshooting is never a fun task, especially if MySQL doesn’t report the cause of the crash. For example, when MySQL runs out of memory.

The following message is sent by the database email alarm notification:


     

    Type: mysql

    Tags: Production master library

    Host: 172.16.1.66:3306

    Level: critical

    Item: connect

    Value: down

    Message: mysql server down

Copy the code

Log in to the Grafana monitor panel to see which period of time the database connection has increased significantly.

A quick check of the server’s mail monitoring record shows that at the last point in time, the memory usage was 99%, indicating that the increase in database connections may have been the last straw for the server.

In fact, the direct cause of OOM is not complicated, because the server memory is insufficient, and the kernel needs to reclaim memory. To reclaim memory means to kill the programs that use the most memory on the server, while MySQL service may use the most memory, so OOM is created.


     

    Type: os

    Tags: 66 database

    Host: 172.16.1.66:

    Level: critical

    Item: memory

    Value: 99%

    Message: too more memory usage

Copy the code

Viewing System Logs

Let’s go through the logs with this question:


     

    # check log

    tail -500f  /var/log/messages

    # Here is oom-killer

    Nov 27 14:55:48 itstyledb1 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

    Nov 27 14:55:48 itstyledb1 kernel: mysqld cpuset=/ mems_allowed=0-1

    Nov 27 14:55:48 itstyledb1 kernel: CPU: 2 PID: 895 Comm: mysqld Kdump: loaded Not tainted 3.10.0-862.3.2.el7.x86_64 #1

    Nov 27 14:55:48 itstyledb1 kernel: Hardware name: Huawei RH1288 V3/BC11HGSC0, BIOS 3.22 05/16/2016

    Nov 27 14:55:48 itstyledb1 kernel: Call Trace:

Copy the code

Read on, guys:


     

    0 pages HighMem/MovableOnly

    Nov 27 14:55:48 itstyledb1 kernel: 291281 pages reserved

    Nov 27 14:55:48 itstyledb1 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name

    Nov 27 14:55:48 itstyledb1 kernel: [  468]     0   468    28271     4326      62       55             0 systemd-journal

    Nov 27 14:55:48 itstyledb1 kernel: [  490]     0   490    11492        2      24      553         -1000 systemd-udevd

    Nov 27 14:55:48 itstyledb1 kernel: [  787]     0   787    13877       18      27       96         -1000 auditd

    Nov 27 14:55:48 itstyledb1 kernel: [  810]    81   810    14552       81      34       89          -900 dbus-daemon

    Nov 27 14:55:48 itstyledb1 kernel: [  815]     0   815    55956        1      60      466             0 abrtd

    Nov 27 14:55:48 itstyledb1 kernel: [  816]     0   816    55327        9      64      346             0 abrt-watch-log

    Nov 27 14:55:48 itstyledb1 kernel: [  818]     0   818   121607      220      90      495             0 NetworkManager

    Nov 27 14:55:48 itstyledb1 kernel: [  822]     0   822     5415       49      16       33             0 irqbalance

    Nov 27 14:55:48 itstyledb1 kernel: [  823]   997   823   134634       97      60     1306             0 polkitd

    Nov 27 14:55:48 itstyledb1 kernel: [  825]     0   825     6594       42      20       41             0 systemd-logind

    Nov 27 14:55:48 itstyledb1 kernel: [  830]     0   830    31578       28      21      139             0 crond

    Nov 27 14:55:48 itstyledb1 kernel: [  839]     0   839    27522        2      10       31             0 agetty

    Nov 27 14:55:48 itstyledb1 kernel: [ 1142]     0  1142   143454      114      97     2672             0 tuned

    Nov 27 14:55:48 itstyledb1 kernel: [ 1144]     0  1144    28203       11      59      246         -1000 sshd

    Nov 27 14:55:48 itstyledb1 kernel: [ 1145]     0  1145    97438      694     103      328             0 rsyslogd

    Nov 27 14:55:48 itstyledb1 kernel: [ 1369]     0  1369    22526       20      44      256             0 master

    Nov 27 14:55:48 itstyledb1 kernel: [ 1371]    89  1371    22596       32      46      251             0 qmgr

    Nov 27 14:55:48 itstyledb1 kernel: [ 5140]     0  5140     5102     1617      15      239             0 mysqld_exporter

    Nov 27 14:55:48 itstyledb1 kernel: [ 9430]     0  9430    55966      378      62      790             0 snmpd

    Nov 27 14:55:48 itstyledb1 kernel: [30320]    27 30320 22951376 13928375   43437  8163662             0 mysqld

    Nov 27 14:55:48 itstyledb1 kernel: [  688]    89   688    22552      271      46        0             0 pickup

    Nov 27 14:55:48 itstyledb1 kernel: Out of memory: Kill process 30320 (mysqld) score 984 or sacrifice child

    Nov 27 14:55:48 itstyledb1 kernel: Killed process 30320 (mysqld) total-vm:91805504kB, anon-rss:55713500kB, file-rss:0kB, shmem-rss:0kB

    Nov 27 14:56:00 itstyledb1 systemd: mysqld.service: main process exited, code=killed, status=9/KILL

    Nov 27 14:56:00 itstyledb1 systemd: Unit mysqld.service entered failed state.

    Nov 27 14:56:00 itstyledb1 systemd: mysqld.service failed.

    Nov 27 14:56:00 itstyledb1 systemd: mysqld.service holdoff time over, scheduling restart.

    Nov 27 14:56:01 itstyledb1 systemd: Starting MySQL Server...

Copy the code

When outofmemory occurs, the outofMemory function selects a process that the kernel considers guilty of allocating too much memory and kills it. Mysql is clearly the culprit.

MySql will then restart automatically. After the reboot, the memory came down, but by the end of the day, it was almost full.


     

    [root@itstyledb1 ~]# free -m

                 total        used        free      shared  buff/cache   available

    Mem:          55803       54976         241          10         585         349

    Swap:         32064       25036        7028

Copy the code

Find the MySql process and execute the following top -P PID, using 52.4GB of memory


     

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

    935 mysql 20 0 79.7g 52.4g 7336s 0.3 96.1 255:44.76 mysqld

Copy the code

Computational memory usage

1) Check how much memory MySQL is using globally


     

    SELECT (@@innodb_buffer_pool_size

    +@@innodb_log_buffer_size

    +@@key_buffer_size) / 1024 /1024 AS MEMORY_MB;

Copy the code

Query results are as follows:


     

    +----------------+

    | MEMORY_MB      |

    +----------------+

    20512.00000000 | |

    +----------------+

Copy the code

2) Check how much memory performance_schema occupies


     

    SELECT SUBSTRING_INDEX(event_name,'/',2) AS

          code_area, sys.format_bytes(SUM(current_alloc))

          AS current_alloc

          FROM sys.x$memory_global_by_current_bytes

          GROUP BY SUBSTRING_INDEX(event_name,'/',2)

          ORDER BY SUM(current_alloc) DESC;

Copy the code

Query results are as follows:


     

    +---------------------------+---------------+

    | code_area                 | current_alloc |

    +---------------------------+---------------+

    | the memory/performance_schema | 349.80 MiB |

    +---------------------------+---------------+

Copy the code

3) Check how much memory each thread takes up


     

    SELECT ( ( @@read_buffer_size

    + @@read_rnd_buffer_size

    + @@sort_buffer_size

    + @@join_buffer_size

    + @@binlog_cache_size

    + @@thread_stack

    + @@max_allowed_packet

    + @@net_buffer_length )

    ) / (1024*1024) AS MEMORY_MB;

Copy the code

Query results are as follows:


     

    +-----------+

    | MEMORY_MB |

    +-----------+

    87.5156 | |

    +-----------+

Copy the code

View current thread


     

    show full processlist

Copy the code

The final result is:


     

    +-----------+

    | MEMORY_MB |

    +-----------+

    87.5156 * 37 | |

    +-----------+

Copy the code

4) Check how much memory the memory storage engine takes up


     

    SELECT SUM(max_data_length)/1024/1024 AS MEMORY_MB FROM information_schema.tables WHERE ENGINE='memory';

Copy the code

Query results are as follows:


     

    +---------------+

    | MEMORY_MB     |

    +---------------+

    3857.37713909 | |

    +---------------+

Copy the code

The above four entries add up to approximately 27975MB, not a bad 28gb, but the MySql process appears to be taking up 52.4 gb, so where does the remaining 24.4 gb go?

The thread pool

This thread pool is not the other connection pool. There is a big difference between the connection pool and the connection pool. The connection pool is usually configured on the client side, while the thread pool is configured on the DB server. In addition, the connection pool can avoid frequent connection creation and destruction, but it cannot control the number of active MySQL threads. In high concurrency scenarios, it cannot protect DB. A better approach is to combine connection pooling with thread pooling.

Some parameters about thread pools:


     

    mysql> show variables like 'thread%';

    +-------------------------------+---------------------------+

    | Variable_name                 | Value                     |

    +-------------------------------+---------------------------+

    | thread_handling               | one-thread-per-connection |

    | thread_pool_high_prio_mode    | transactions              |

    | thread_pool_high_prio_tickets | 4294967295                |

    | thread_pool_idle_timeout      | 60                        |

    | thread_pool_max_threads       | 100000                    |

    | thread_pool_oversubscribe     | 3                         |

    | thread_pool_size              | 12                        |

    | thread_pool_stall_limit       | 500                       |

    +-------------------------------+---------------------------+

Copy the code
Thread_handling:

This parameter is used to configure the thread model. By default, it is one-thread-per-connection, that is, thread pooling is not enabled. Setting this parameter to pool-of-threads enables thread pooling.

threadpoolThe size:

This parameter is used to set the number of thread pool groups. The default value is the number of system cpus to fully utilize CPU resources.

threadpoolOversubscribe:

This parameter sets the maximum number of threads in the group, threadPoolOversubscribe +1 for each group. Note that listener threads are not included.

threadpoolhighprioMode:

The control parameters of high priority queue, there are three values (the transactions/statements/none), the default is the transactions, the meanings of the three values are as follows:

  • The transactions that have already starting the transaction statement on high priority queue, but also depends on the back of the threadpoolhighpriotickets parameters

  • Statements: In this mode, all statements are placed in the higher priority queue and the lower priority queue is not used

  • None: This mode does not use high-priority queues

threadpoolhighprioTickets:

This parameter controls the maximum number of times each connection word order is placed on the high-priority queue. The default is 4294967295. Note that this parameter is valid only if ThreadPoolHighPriomode is transactions.

threadpoolIdle_timeout:

The maximum idle time of worker threads is 60 seconds by default.

threadpoolMax_threads:

This parameter is used to limit the maximum number of threads in the thread pool, beyond which no more threads can be created. The default is 100000.

threadpoolStall_limit:

This parameter sets the interval for checking whether the group of the timer thread is abnormal. The default value is 500ms.

The final configuration is as follows:


     

    #thread pool

    thread_handling=pool-of-threads

    The default value is the number of cpus in the system, which makes full use of CPU resources

    thread_pool_size=24

    The maximum number of threads per group is thread_pool_oversubscribe+1

    thread_pool_oversubscribe=3

    performance_schema=off

    #extra Connection to prevent MySQL from logging in when the thread pool is full

    extra_max_connections = 8

    extra_port = 33333

Copy the code

Note: Thread pooling is provided in Percona,MariaDB, and Oracle MySQL Enterprise edition, but not in Oracle MySQL Community Edition.

Percona Server 5.7.21-20+ does not have a Thread pool that causes memory leakage. Percona Server 5.7.21-20+ does not have a Thread pool.

The slow query

Since this is a production environment, this issue is dragging on for a bit longer, so will slow queries affect memory usage issues? With this problem in mind, I looked at the background list of slow queries, and there were indeed many slow queries in the period before the database crash. However, this does not explain the problem to a certain extent, because the MySql service of the server had run out of memory before it was killed. At this time, the number of connections was not many, only about 30 or 40, and most of them were in hibernation state, and had occupied most of Swap space. In other words, there are bound to be many slow query statements in the case of limited resources.

summary

In fact, this “accident” is not an accident at all, in fact, it has happened many times. But to summarize, as the problem was not identified in the end, so I released it, in case there are professional DBAs who encounter similar problems.

reference

https://bugs.mysql.com/bug.php?id=91861

https://bugs.mysql.com/bug.php?id=91710

https://dev.mysql.com/doc/refman/5.7/en/memory-use.html

https://www.percona.com/blog/2018/06/28/what-to-do-when-mysql-runs-out-of-memory-troubleshooting-guide/

https://dev.mysql.com/doc/refman/5.7/en/thread-pool-tuning.html

A wechat public account with temperature

I look forward to making progress together with you and sharing beautiful articles

Share Java, O&M, and monitoring learning resources