An SQL statement that normally executes very fast, but sometimes, without knowing what’s going on, becomes very slow, and this scenario is very difficult to repeat, not only is it random, but it also lasts very short. It looked as if the database had been “shaken”.

@[toc]

1 Why are SQL statements slow

WAL mechanism in Mysql, InnoDB only does the disk operation of writing log when processing update statement. This log is called a redo log.

By analogy, the store keeper’s ledger is a data file, the log board is a redo log, and the store keeper’s memory is memory.Quick updates are written to memory and logs, and the occasional slow MySQL update is a dirty page flush.

When the contents of a memory data page are inconsistent with those of a disk data page, the memory page is called a dirty page. After memory data is written to disk, the contents of memory and data pages on disk are the same, called “clean pages.”

1.1 What causes the database flush process?

  1. In the first scenario, InnoDB’s redo log is full. At this point, the system stops all updates, pushes the checkpoint forward, and leaves room for redo log writing.

Checkpoint is not just a matter of checking up. For example, in Figure 2, pushing checkpoint from CP to CP ‘requires flushing all dirty pages between the two points (light green) to disk. After that, the redo log area is between write pos and CP ‘.

  1. In the second scenario, the system runs out of memory. When new pages are needed and memory is insufficient, some data pages are weeded out to free up memory for other data pages. If dirty pages are eliminated, the dirty pages are written to disk first. If the dirty page must write disk, it ensures that each data page has two states:
  • If it exists in memory, it must be the correct result in memory, and it returns directly;
  • If there is no data in memory, you can be sure that the correct result is on the data file, which is read into memory and returned.
  1. The third scenario is when MySQL considers the system “idle.” Even if you’re using it a lot, take the time to brush dirty pages whenever you can.

  2. The fourth scenario is when MySQL normally shuts down. In this case, MySQL flush all dirty pages from memory to disk, so that the next time MySQL starts up, it can read data directly from disk, which is faster.

1.2 Impact of the preceding four scenarios on Performance

The third case is when MySQL is idle, when the system is not under pressure, and the fourth case is when the database is about to shut down. In both cases, you don’t pay much attention to performance. The first is “Redo log full, flush dirty pages”, which InnoDB is trying to avoid. Because when that happens, the entire system can no longer accept updates, and all updates must be blocked. If you look at the monitor, the update count will drop to 0. The second is “out of memory, write the dirty pages to disk first”, which is actually the normal situation. InnoDB uses a buffer pool to manage memory. The pages in the buffer pool have three states:

  • The first is, not yet used;
  • The second is, use and are clean pages;
  • The third is that the pages are used and dirty.

InnoDB’s policy is to use as much memory as possible, so there are very few unused pages for a long running library. When the data page to be read is not in memory, you must request a data page from the buffer pool. At this point, only the most unused data pages can be eliminated from memory: if a clean page is to be eliminated, it can be released for reuse; However, if the page is dirty, you must first brush the dirty page to the disk and turn it into clean page before reuse. Therefore, although it is normal to brush dirty pages, the following two situations will significantly affect performance:

  1. If a query has too many dirty pages to eliminate, the response time of the query will be significantly longer.
  2. The log is full, updates are blocked, and write performance drops to 0. This situation is unacceptable for sensitive services.

So InnoDB needs a mechanism to control the percentage of dirty pages in order to avoid both of these situations.

2 InnoDB dirty page flushing control policy

First, InnoDB needs to be properly informed about the IO capacity of the host, so InnoDB knows how fast it can swipe dirty pages when it needs to do all it can. This is done using the innodb_io_capacity parameter, which represents the disk capacity of InnoDB. You are advised to set this value to the IOPS of the disk. The IOPS of a disk can be tested using the fio tool:

fio -filename=$filename -direct=1 -iodepth 1 -thread -rw=randrw -ioengine=psync -bs=16k -size=500M -numjobs=10 -runtime=10 -group_reporting -name=mytest 
Copy the code

2.1 What factors will be considered in the design strategy to control the speed of dirty pages?

The question is, what happens if you brush too slowly? There are too many dirty pages in memory, and the redo log is full. So InnoDB’s flush speed is based on two factors: the percentage of dirty pages and the redo log write speed. InnoDB calculates two separate numbers based on these two factors. The innodb_max_dirty_pages_pct parameter is the maximum number of dirty pages. The default value is 75%. InnoDB calculates a number between 0 and 100 based on the current percentage of dirty pages (let’s say M). The pseudocode for calculating this number looks something like this:

F1(M)
{
  if M>=innodb_max_dirty_pages_pct then
      return 100;
  return 100*M/innodb_max_dirty_pages_pct;
}
Copy the code

Every time InnoDB writes logs, it has a sequence number. The difference between the current write sequence number and the corresponding checkpoint sequence number is set to N. InnoDB calculates a number between 0 and 100 based on this N. This calculation formula can be called F2(N). The higher the N, the higher the value.

Then, the engine can control the speed of dirty pages by multiplying innodb_io_capacity by R%, using the larger values of F1(M) and F2(N).

F1 and F2 are the values we calculated above from the percentage of dirty pages and the redo log write speed.InnoDB brushes dirty pages in the background by writing memory pages to disk. Therefore, whether the query statement may require the elimination of a dirty page when the memory needs, or because the dirty page flushing logic will occupy IO resources and may affect the update statement, may be caused by the business side perception of MySQL “shake” a little.

To avoid this, set innodb_io_capacity properly and pay more attention to the dirty page ratio. Do not let it approach 75% too often.

The percentage of dirty pages is calculated by dividing Innodb_buffer_pool_pages_dirty by Innodb_buffer_pool_pages_total.

select VARIABLE_VALUE into @a from global_status where VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty';
select VARIABLE_VALUE into @b from global_status where VARIABLE_NAME = 'Innodb_buffer_pool_pages_total';
select @a/@b;
Copy the code

An interesting strategy: once a query request needs to flush a dirty page during execution, the query may be slower than usual. MySQL has a mechanism that can make your query even slower: when you are about to swipe a dirty page, if the data page next to it happens to be a dirty page, it will swipe the “neighbor” along with it. And the same logic that drags neighbors down can be extended, meaning that for each neighbor page, if the neighboring pages are also dirty, they will also be shared.

The innodb_flush_neighbors parameter controls this behavior. Innodb_flush_neighbors = 1 and innodb_flush_neighbors = 0.

This optimization makes sense in the age of mechanical hard drives and can reduce random I/OS. The random IOPS of a mechanical disk is usually only a few hundred. Reducing the random I/O for the same logical operation means a significant improvement in system performance.

If you use high-IOPS devices such as SSDS, set innodb_flush_NEIGHBORS to 0. In this case, IOPS is not the bottleneck. By “brushing myself”, necessary dirty page operations can be performed more quickly, reducing THE RESPONSE time of SQL statements.

Innodb_flush_neighbors defaults to 0 in MySQL 8.0.