preface
On a sunny afternoon, a piece of email reminder came from the lower right corner of the computer, accompanied by the vibration of wechat nail. When I opened it, I found various errors in application, rabbit alarm, database server memory explosion and MySql database instance hanging.
screening
First explain the database version:
mysql> status
--------------
Mysql Ver 14.14 Distrib 5.7.22-22, for Linux (x86_64) using 6.2
Connection id: 59568
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.7.22-22-log Percona Server (GPL), Release 22, Revision F62D93C
Protocol version: 10
Copy the code
Crash troubleshooting is never a fun task, especially if MySQL doesn’t report the cause of the crash. For example, when MySQL runs out of memory.
The following message is sent by the database email alarm notification:
Type: mysql
Tags: Production master library
Host: 172.16.1.66:3306
Level: critical
Item: connect
Value: down
Message: mysql server down
Copy the code
Log in to the Grafana monitor panel to see which period of time the database connection has increased significantly.
A quick check of the server’s mail monitoring record shows that at the last point in time, the memory usage was 99%, indicating that the increase in database connections may have been the last straw for the server.
In fact, the direct cause of OOM is not complicated, because the server memory is insufficient, and the kernel needs to reclaim memory. To reclaim memory means to kill the programs that use the most memory on the server, while MySQL service may use the most memory, so OOM is created.
Type: os
Tags: 66 database
Host: 172.16.1.66:
Level: critical
Item: memory
Value: 99%
Message: too more memory usage
Copy the code
Viewing System Logs
Let’s go through the logs with this question:
# check log
tail -500f /var/log/messages
# Here is oom-killer
Nov 27 14:55:48 itstyledb1 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Nov 27 14:55:48 itstyledb1 kernel: mysqld cpuset=/ mems_allowed=0-1
Nov 27 14:55:48 itstyledb1 kernel: CPU: 2 PID: 895 Comm: mysqld Kdump: loaded Not tainted 3.10.0-862.3.2.el7.x86_64 #1
Nov 27 14:55:48 itstyledb1 kernel: Hardware name: Huawei RH1288 V3/BC11HGSC0, BIOS 3.22 05/16/2016
Nov 27 14:55:48 itstyledb1 kernel: Call Trace:
Copy the code
Read on, guys:
0 pages HighMem/MovableOnly
Nov 27 14:55:48 itstyledb1 kernel: 291281 pages reserved
Nov 27 14:55:48 itstyledb1 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Nov 27 14:55:48 itstyledb1 kernel: [ 468] 0 468 28271 4326 62 55 0 systemd-journal
Nov 27 14:55:48 itstyledb1 kernel: [ 490] 0 490 11492 2 24 553 -1000 systemd-udevd
Nov 27 14:55:48 itstyledb1 kernel: [ 787] 0 787 13877 18 27 96 -1000 auditd
Nov 27 14:55:48 itstyledb1 kernel: [ 810] 81 810 14552 81 34 89 -900 dbus-daemon
Nov 27 14:55:48 itstyledb1 kernel: [ 815] 0 815 55956 1 60 466 0 abrtd
Nov 27 14:55:48 itstyledb1 kernel: [ 816] 0 816 55327 9 64 346 0 abrt-watch-log
Nov 27 14:55:48 itstyledb1 kernel: [ 818] 0 818 121607 220 90 495 0 NetworkManager
Nov 27 14:55:48 itstyledb1 kernel: [ 822] 0 822 5415 49 16 33 0 irqbalance
Nov 27 14:55:48 itstyledb1 kernel: [ 823] 997 823 134634 97 60 1306 0 polkitd
Nov 27 14:55:48 itstyledb1 kernel: [ 825] 0 825 6594 42 20 41 0 systemd-logind
Nov 27 14:55:48 itstyledb1 kernel: [ 830] 0 830 31578 28 21 139 0 crond
Nov 27 14:55:48 itstyledb1 kernel: [ 839] 0 839 27522 2 10 31 0 agetty
Nov 27 14:55:48 itstyledb1 kernel: [ 1142] 0 1142 143454 114 97 2672 0 tuned
Nov 27 14:55:48 itstyledb1 kernel: [ 1144] 0 1144 28203 11 59 246 -1000 sshd
Nov 27 14:55:48 itstyledb1 kernel: [ 1145] 0 1145 97438 694 103 328 0 rsyslogd
Nov 27 14:55:48 itstyledb1 kernel: [ 1369] 0 1369 22526 20 44 256 0 master
Nov 27 14:55:48 itstyledb1 kernel: [ 1371] 89 1371 22596 32 46 251 0 qmgr
Nov 27 14:55:48 itstyledb1 kernel: [ 5140] 0 5140 5102 1617 15 239 0 mysqld_exporter
Nov 27 14:55:48 itstyledb1 kernel: [ 9430] 0 9430 55966 378 62 790 0 snmpd
Nov 27 14:55:48 itstyledb1 kernel: [30320] 27 30320 22951376 13928375 43437 8163662 0 mysqld
Nov 27 14:55:48 itstyledb1 kernel: [ 688] 89 688 22552 271 46 0 0 pickup
Nov 27 14:55:48 itstyledb1 kernel: Out of memory: Kill process 30320 (mysqld) score 984 or sacrifice child
Nov 27 14:55:48 itstyledb1 kernel: Killed process 30320 (mysqld) total-vm:91805504kB, anon-rss:55713500kB, file-rss:0kB, shmem-rss:0kB
Nov 27 14:56:00 itstyledb1 systemd: mysqld.service: main process exited, code=killed, status=9/KILL
Nov 27 14:56:00 itstyledb1 systemd: Unit mysqld.service entered failed state.
Nov 27 14:56:00 itstyledb1 systemd: mysqld.service failed.
Nov 27 14:56:00 itstyledb1 systemd: mysqld.service holdoff time over, scheduling restart.
Nov 27 14:56:01 itstyledb1 systemd: Starting MySQL Server...
Copy the code
When outofmemory occurs, the outofMemory function selects a process that the kernel considers guilty of allocating too much memory and kills it. Mysql is clearly the culprit.
MySql will then restart automatically. After the reboot, the memory came down, but by the end of the day, it was almost full.
[root@itstyledb1 ~]# free -m
total used free shared buff/cache available
Mem: 55803 54976 241 10 585 349
Swap: 32064 25036 7028
Copy the code
Find the MySql process and execute the following top -P PID, using 52.4GB of memory
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
935 mysql 20 0 79.7g 52.4g 7336s 0.3 96.1 255:44.76 mysqld
Copy the code
Computational memory usage
1) Check how much memory MySQL is using globally
SELECT (@@innodb_buffer_pool_size
+@@innodb_log_buffer_size
+@@key_buffer_size) / 1024 /1024 AS MEMORY_MB;
Copy the code
Query results are as follows:
+----------------+
| MEMORY_MB |
+----------------+
20512.00000000 | |
+----------------+
Copy the code
2) Check how much memory performance_schema occupies
SELECT SUBSTRING_INDEX(event_name,'/',2) AS
code_area, sys.format_bytes(SUM(current_alloc))
AS current_alloc
FROM sys.x$memory_global_by_current_bytes
GROUP BY SUBSTRING_INDEX(event_name,'/',2)
ORDER BY SUM(current_alloc) DESC;
Copy the code
Query results are as follows:
+---------------------------+---------------+
| code_area | current_alloc |
+---------------------------+---------------+
| the memory/performance_schema | 349.80 MiB |
+---------------------------+---------------+
Copy the code
3) Check how much memory each thread takes up
SELECT ( ( @@read_buffer_size
+ @@read_rnd_buffer_size
+ @@sort_buffer_size
+ @@join_buffer_size
+ @@binlog_cache_size
+ @@thread_stack
+ @@max_allowed_packet
+ @@net_buffer_length )
) / (1024*1024) AS MEMORY_MB;
Copy the code
Query results are as follows:
+-----------+
| MEMORY_MB |
+-----------+
87.5156 | |
+-----------+
Copy the code
View current thread
show full processlist
Copy the code
The final result is:
+-----------+
| MEMORY_MB |
+-----------+
87.5156 * 37 | |
+-----------+
Copy the code
4) Check how much memory the memory storage engine takes up
SELECT SUM(max_data_length)/1024/1024 AS MEMORY_MB FROM information_schema.tables WHERE ENGINE='memory';
Copy the code
Query results are as follows:
+---------------+
| MEMORY_MB |
+---------------+
3857.37713909 | |
+---------------+
Copy the code
The above four entries add up to approximately 27975MB, not a bad 28gb, but the MySql process appears to be taking up 52.4 gb, so where does the remaining 24.4 gb go?
The thread pool
This thread pool is not the other connection pool. There is a big difference between the connection pool and the connection pool. The connection pool is usually configured on the client side, while the thread pool is configured on the DB server. In addition, the connection pool can avoid frequent connection creation and destruction, but it cannot control the number of active MySQL threads. In high concurrency scenarios, it cannot protect DB. A better approach is to combine connection pooling with thread pooling.
Some parameters about thread pools:
mysql> show variables like 'thread%';
+-------------------------------+---------------------------+
| Variable_name | Value |
+-------------------------------+---------------------------+
| thread_handling | one-thread-per-connection |
| thread_pool_high_prio_mode | transactions |
| thread_pool_high_prio_tickets | 4294967295 |
| thread_pool_idle_timeout | 60 |
| thread_pool_max_threads | 100000 |
| thread_pool_oversubscribe | 3 |
| thread_pool_size | 12 |
| thread_pool_stall_limit | 500 |
+-------------------------------+---------------------------+
Copy the code
Thread_handling:
This parameter is used to configure the thread model. By default, it is one-thread-per-connection, that is, thread pooling is not enabled. Setting this parameter to pool-of-threads enables thread pooling.
threadpoolThe size:
This parameter is used to set the number of thread pool groups. The default value is the number of system cpus to fully utilize CPU resources.
threadpoolOversubscribe:
This parameter sets the maximum number of threads in the group, threadPoolOversubscribe +1 for each group. Note that listener threads are not included.
threadpoolhighprioMode:
The control parameters of high priority queue, there are three values (the transactions/statements/none), the default is the transactions, the meanings of the three values are as follows:
-
The transactions that have already starting the transaction statement on high priority queue, but also depends on the back of the threadpoolhighpriotickets parameters
-
Statements: In this mode, all statements are placed in the higher priority queue and the lower priority queue is not used
-
None: This mode does not use high-priority queues
threadpoolhighprioTickets:
This parameter controls the maximum number of times each connection word order is placed on the high-priority queue. The default is 4294967295. Note that this parameter is valid only if ThreadPoolHighPriomode is transactions.
threadpoolIdle_timeout:
The maximum idle time of worker threads is 60 seconds by default.
threadpoolMax_threads:
This parameter is used to limit the maximum number of threads in the thread pool, beyond which no more threads can be created. The default is 100000.
threadpoolStall_limit:
This parameter sets the interval for checking whether the group of the timer thread is abnormal. The default value is 500ms.
The final configuration is as follows:
#thread pool
thread_handling=pool-of-threads
The default value is the number of cpus in the system, which makes full use of CPU resources
thread_pool_size=24
The maximum number of threads per group is thread_pool_oversubscribe+1
thread_pool_oversubscribe=3
performance_schema=off
#extra Connection to prevent MySQL from logging in when the thread pool is full
extra_max_connections = 8
extra_port = 33333
Copy the code
Note: Thread pooling is provided in Percona,MariaDB, and Oracle MySQL Enterprise edition, but not in Oracle MySQL Community Edition.
Percona Server 5.7.21-20+ does not have a Thread pool that causes memory leakage. Percona Server 5.7.21-20+ does not have a Thread pool.
The slow query
Since this is a production environment, this issue is dragging on for a bit longer, so will slow queries affect memory usage issues? With this problem in mind, I looked at the background list of slow queries, and there were indeed many slow queries in the period before the database crash. However, this does not explain the problem to a certain extent, because the MySql service of the server had run out of memory before it was killed. At this time, the number of connections was not many, only about 30 or 40, and most of them were in hibernation state, and had occupied most of Swap space. In other words, there are bound to be many slow query statements in the case of limited resources.
summary
In fact, this “accident” is not an accident at all, in fact, it has happened many times. But to summarize, as the problem was not identified in the end, so I released it, in case there are professional DBAs who encounter similar problems.
reference
https://bugs.mysql.com/bug.php?id=91861
https://bugs.mysql.com/bug.php?id=91710
https://dev.mysql.com/doc/refman/5.7/en/memory-use.html
https://www.percona.com/blog/2018/06/28/what-to-do-when-mysql-runs-out-of-memory-troubleshooting-guide/
https://dev.mysql.com/doc/refman/5.7/en/thread-pool-tuning.html
A wechat public account with temperature
I look forward to making progress together with you and sharing beautiful articles
Share Java, O&M, and monitoring learning resources