At the end of the meal, learn more about the Buffer Pool principle

The introduction

In the last article, we focused on the architecture of InnoDB’s storage engine and made a key point: both queries and updates are performed using Buffer pools. Therefore, the importance of Buffer Pool is self-evident. Today, we will further analyze the Buffer Pool component.

Buffer Pool size

A Buffer Pool is essentially a memory component, so it is limited in size. By default, the Buffer Pool size is 128MB. In a production environment, you can adjust the size of the Buffer Pool according to the actual situation. You can check the size of the current database’s Buffer Pool in bytes by running show variables like ‘innodb_buffer_pool_size’.

Buffer Pool structure

Now that we know the size of the Buffer Pool, how does the Buffer Pool store data?

Cache pages

Those who are familiar with operating system memory management know that there are three main ways of discontinuous allocated memory management in operating system: a) page management, b) section management, and c) section page management. Page management is to divide the logical address space into pages of fixed size, and the physical memory into page frames of the same size. When the program is loaded, any page can be put into any page frame, and the page frame can be discontinuous to achieve discontinuous allocation.

Similarly, since the Buffer Pool is also a memory component, it takes a similar approach. In a Buffer Pool, data is stored as pages. These pages are called cache pages. Corresponding to this are data pages on disk files. On the operating system, the default page size is 4KB, and the cache page size in the Buffer Pool is 16KB. In the Buffer Pool, each cached page also has a corresponding description: the cached page’s table space, page number, and so on. These descriptions are placed first in the Buffer Pool, followed by the corresponding cached page, as shown in the figure below

The chain of free

The InnoDB storage engine first looks for the data in the Buffer Pool. If it does, it returns it. If it doesn’t, it loads it from disk. Load pages of data into the Buffer Pool. Then the question arises:

How to determine whether a Buffer Pool has cached the required data?
Which cache pages are empty and can be used to cache data pages?

Now, let’s look at how InnoDB does it.

Problem # 1: The first thing you want to do to efficiently and quickly determine the presence of an element is to use a hash. Indeed InnoDB does this. It has a hash table where the key is “table space number + data page number” and the value is “cache page address”. Therefore, every time a page is cached in the Buffer Pool, an entry is added to the hash table. This allows you to quickly determine whether the data you need has been cached in the Buffer Pool.

Problem 2: Free chain. A free chain is a bidirectional linked list (each node has Pre and Next Pointers) that links all the free cached pages together by the description of the cached page, and has a base node that records how many free pages there are currently and points to the head and tail of the list.

The loading process is as follows: See first recorded in the basic nodes of the current free cached page number, if required, will obtain a cached page description from the chain of free information, through the description information can find one to one correspondence of free page caching, then put the disk data information load to the free cached pages, and then modify the cached page description block information, will be cached page is removed from the free chain.

Flush chain

In the previous article, we mentioned in the InnoDB storage engine execution process that the return I/O thread flushed the dirty data from the Buffer Pool back to disk. This is actually a dirty cache page. The reason for the dirty cache page was explained in the previous article. Again, there is a question here, which cache pages are dirty cache pages?

The answer is the Flush chain, which, like the free chain, is a two-way linked list that uses the description of each cached page. It also has a base node that records the number of dirty cached pages.

The I/O thread uses the Flush chain to know which data pages are dirty, flush them back to disk, and flush them out of the Flush chain.

According to the above description, the main structure of the buffer pool is as follows:

Cache flushing mechanism

Now suppose that the number of free pages recorded in the free chain is 0, and you want to perform a query, but the hash table lookup finds that the data is not stored in the Buffer Pool, then you need to go to disk and load the corresponding data pages into the Buffer Pool. However, the free link shows that there are no free pages in the Buffer Pool.

The first thing that comes to mind is to flush some dirty cache pages back to disk. The question is which dirty cache pages are flushed back to disk? This involves the elimination mechanism of cached pages. In the operating system, similar scenes occur in the occurrence of page loss interruption and physical memory has been fully used. The page replacement algorithm of the operating system generally has: first in, first out (FIFO), least recently used (LRU), least recently used (LFU) and so on. In InnoDB, LRU is used as a cache page flushing strategy.

How LRU works

LRU usually uses a two-way linked list as a data structure. To speed up queries, it also uses hash tables. When a cached page is accessed (queried, modified), the cached page is placed at the head of the list, so the end of the list must be a cached page that has not been accessed recently.

To improve cache hit ratio, InnoDB considers that flushing the cache pages at the end of the linked list back to disk has the lowest impact on Buffer Pool hit ratio, so it flushes the cache pages at the end of the linked list back to disk to make room for loading the data pages on disk and placing the data pages at the head of the linked list.

conclusion

This article focuses on the important memory component of InnoDB storage engine: Buffer Pool.

The default size of Buffer Pool and the format of data storage are explained first. Then, InnoDB can know the idle Buffer pages in the Buffer Pool through the free bidirectional linked list. Through the hash table, InnoDB can know whether the required data already exists in the Buffer Pool. InnoDB knows which cached pages are dirty by using Flush bidirectional linked lists. Finally, it explains how LRU works.