Mysql columns – Buffer pool supplements, data pages, and table space briefs

preface

In this section we continue to talk about buffer pools, data pages and table Spaces. Content pages are basic and simple.

An overview of the

  1. Add buffer pool content, about background refresh thread, and multi-threaded access to the buffer pool lock mode, etc
  2. Structure of data rows and data pages, with a brief understanding of simple internal details.
  3. Table space and data area, and the logical structure of the entire mysql table

Buffer pool replenishment

Before I go into specifics, here are some details about buffer pools.

Background threads periodically refresh cold data

Mentioned in the previous section the cold heat data separation, in fact, cold data can’t be in the buffer pool full will refresh, but may be randomly at the end of the LRU cold data to find a few brush into the disk cache pages, he will have a regular task, every once in a while just refresh operation, at the same time, after will be flushed to disk data page to join to the free list. So LRU’s linked list periodically flusher data to disk for processing and empties useless cache pages when the cache runs out.

Flush linked list data is flushed into the cache periodically

Flush’s linked list stores dirty page data, but it also has a timed task that periodically flusher flash linked list data to the buffer pool, and we can generally assume that the entire LRU is constantly moving. Flush’s cached page is decreasing, and the free List is increasing.

Whether multi-threaded concurrent access will be locked

Multi-threaded access will be locked, because reading a buffer page involves three linked lists, free List, Flush list, lRU list, and hash function search for the data page, so the whole operation must be locked, although it seems that there are three linked lists. But in fact, it does not cost much performance, because the operation of the linked list is some pointer operation search operation, so the basic is some constant time and space consumption, even if it is queued to deal with one by one, it does not have much impact.

Multiple buffer pools were optimized in parallel

If the mysql buffer pool is larger than 1GB, you can configure multiple buffer pools. The default rule of mysql is that if you allocate less than 1GB of memory to a buffer pool, you will only be given one buffer pool. For example, in the following example, if it is an 8GB Mysql server, you can do the following configuration:

[server]
innodb_buffer_pool_size = 8589934592
innodb_buffer_pool_instances = 4
Copy the code

This allows you to set up four buffer pools, each occupying 2 gb. It is important to use buffer pools for tuning in real production environments.

Can I resize the buffer pool while running?

As explained so far, dynamic run-time resizing is not possible. Why is that? It is too slow to copy the size of the entire buffer to new memory. So to solve this problem, mysql introduced the concept of chunk.

Mysql’s chunk mechanism diminishes the buffer pool

The size of the buffer pool is controlled by the innodb_buffer_pool_chunk_size parameter. The default value is 128MB. In other words, a chunk is the default size of the buffer pool, and the cache page and description information are also divided by chunk. Suppose there is a 2G chunk, and each chunk is 128M, that is, there are about 16 chunks to be cut.

After chunk is created, when applying for a new memory space, it is easy to copy the previous cache to the new space, and directly generate the new chunk. Then move the data to the new chunk.

How many buffer pools are appropriate for production?

If a 32GB mysql machine wants a 30GB buffer pool, it doesn’t make sense! Crud operations are mostly memory operations, so the performance is very high, for 32GB of memory, your machine needs at least several gigabytes of processing, so first we can allocate half of the memory to mysql. Or maybe 60% or so.

'Innodb_buffer_pool_size' is 128 MB by default, depending on the CPU architecture. The maximum value is' (2^ 32-1) 'on 32-bit platforms and' (2^64-1) 'on 64-bit platforms. When buffer pool size is greater than 1G **, setting 'innodb_buffer_pool_instances' to a value greater than 1 improves server scalability. Finally, large buffer pools can reduce multiple disk I/O accesses to the same table data. If the database is configured on a dedicated server, you can set the buffer pool size to 60-80% of the server's physical memory, which means 32GB of content to 24GB of 26GB is a good choice, of course.Copy the code

Buffer pool allocation formula:

There is a key formula for buffer pools: The total size of the buffer pool = a multiple of (chunk size * number of buffer pool instances). The default chunk size is 128 MB, and the buffer pool size should be 20 GB. Buffer pool = 128 * 16 * 10, i.e., the size of each chunk is 128. Again, the total size of the Buffer pool =(chunk size * number of Buffer pools)

The configuration of a buffer pool has the following requirements:

  • The buffer pool size must always be equal to orinnodb_buffer_pool_chunk_size * innodb_buffer_pool_instances(Innodb_buffer_pool_instances is the number of instances).
  • If you change the buffer pool size to not equal or equalinnodb_buffer_pool_chunk_size * innodb_buffer_pool_instancesThe buffer pool size is automatically adjusted to equal orinnodb_buffer_pool_chunk_size * innodb_buffer_pool_instancesThe multiple of.

Check online

When your database is started, just run SHOW ENGINE INNODB STATUS. At this point you might see a list of things:

Total memory allocated xxxx; Dictionary memory allocated xxx Buffer pool size xxxx Free buffers xxx
Database pages xxx
Old database pages xxxx
Modified db pages xx
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young xxxx, not young xxx
xx youngs/s, xx non-youngs/s
Pages read xxxx, created xxx, written xxx
xx reads/s, xx creates/s, 1xx writes/s
Buffer pool hit rate xxx / 1000, young-making rate xxx / 1000 not xx / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: xxxx, unzip_LRU len: xxx
I/O sum[xxx]:cur[xx], unzip sum[16xx:cur[0]
Copy the code

Let’s explain what’s going on here, mainly about the buffer pool.

Related Explanation:

Allocated buffer pool size, Free buffers, Free buffers, buffers, buffers, buffers, buffers, buffers, buffers (4)Database pages and Old Database pages, that is, how many cached pages are available in the LRU list. And the number of cached pages in the cold data area (5)Modified DB pages, which isflushNumber of cached pages in the linked list (6)Pending readsAnd Pending writes, the number of pages waiting to be loaded into the cache from disk, and the number of pages to be brushed into disk from the LRU listflushNumber of brushes to disks in the linked list (7) Pages made young andnotYoung, this is the number of cached pages that have been accessed in the LRU cold data area and then moved to the hot data area, and in the LRU cold data area1Number of cached pages accessed in s that did not enter the hot data area (8) the youngs/s andnotYoungs /s, that is, the number of cached pages per second from the cold data region to the hot data region, and the number of cached pages per second that are accessed in the cold data region but cannot enter the hot data region (9)Pages readXXXX, created XXX, written XXX, xxreads/s, xx creates/s, 1Xx writes/s is the number of cached pages read, created, and written per second.10)Buffer pool hit rate xxx / 1000That means every1000The number of direct hits to the buffer pool cache (11)young-making rate xxx / 1000 not xx / 1000, each1000Accesses, how many accesses moved the cached page from the cold data area to the hot data area, and the number of cached pages that did not move (12)LRU lenThis is the number of cached pages in the LRU list (13)I/O sum: the most recent50S Total number of disk pages read (14)I/O CUR: Number of disk pages being read nowCopy the code

Structure of data rows and data pages

Before we look at these concepts, we need to understand the following questions:

Why can’t mysql update disks directly?

Because a request reads and writes directly to disk files, although technically fine, performance is extremely poor. Disk read/write performance is very poor, so it is impossible to update disk files read disk.

Why introduce the concept of data pages?

A data load is definitely not a read a disk file, like you don’t take could take only one, burn down to pick up a piece of again, is usually a bundle of sticks and then directly got one by one, so that it is fast, data page, said before a data page 16 KB, so it must be loaded many lines to the interior of the data page.

The data line

How do I put rows on disk?

We’ve been talking about how data is placed in the cache page, so let’s go back and look at how data rows are placed in the data page. This involves a concept called row format **, ** A table can specify the format in which a row is stored, for example:

CREATE TABLE customer (
name VARCHAR(10) NOT NULL, address VARCHAR(20),
gender CHAR(1),
job VARCHAR(30),
school VARCHAR(50)
) ROW_FORMAT=COMPACT;
Copy the code

Rows are stored in the following format:

Length list of variable length fields, null value list, data header, value of column01, value of column02, value of column0n……

How is variable length data stored?

Suppose we have a vARCHar (5) field with abCD content and a varchar(10) field with BCD content. The actual storage would be in the following format: But if you are char(1), you don’t need an extra parameter to change the length of the field, just put it in the corresponding field:

Ox03 ox04 null The data first abc Bcd

Notice that the variable length parameter here is stored in reverse order, in reverse order.

Why can’t a row of null data be stored directly?

Null values are stored in binary format, and variable-length fields actually only store data with values. If the data is null, there is no need to store variable-length fields with length parameters. Null values are stored as bits, and a 1 or 0 is placed in the corresponding null “pit”, where 1 means null, and 0 means not NULL.

For example, if two of the four fields are null and two are not, then 1010 will be stored in reverse order, which is also 0101

In addition, instead of storing four bits, we store them in multiples of eight bits (multiples of eight, a bit like adding data bits to Java headers). , if there are less than 8, 0 needs to be added, so the final result is as follows:

0x09 0x04 00000101 Header Column1 =value1 Column2 =value2… ColumnN = valueN,

How do you store it?

In fact, storing data as a row in a compact way not only saves space, but also makes operating memory a sequential access operation like an array.

40 bit data headers :(read when indexing, paratrooper, skim)

In the diagram above, each row of data needs a 40-bit data header, which is used to describe the data. Here we briefly understand the structure of the data header, which will be explained again in the following content:

  • The first bit and the second bit are both reserved and have no meaning.

  • A bit of delete_mask is used to indicate whether the row has been deleted (the third bit). So no matter how you design it, the delete inside mysql is a fake delete, right

  • The next bit position uses the 1-bit min_rec_mask(the fourth bit), marking the minimum value of the non-leaf node at each level of the B + tree

  • The next bit position is four bitn_owned (the fifth position), and the specific role of the position will not be described.

  • The next 13 bits of heap_no are recorded in the heap, and the heap is also described in the index.

  • Next is the 3-bit record_type row data type: 0 plain type, 1B + leaf node of tree, 2 minimum data, 3 maximum data

  • Finally, there is the 16-bit next_record, which is the pointer to the next data

The actual physical storage structure of each row of data:

In a real disk file, there is a difference in what is stored, and that is the content of the data. Above, we showed how to store a row of data.

0x09 0x04 00000101 0000000000000000000010000000000000011001 jack m xx_school

In practice, however, there is a slight difference. In real disk storage, the process is stored according to the character set encoding, and a row of data actually looks like this:

0x09 0x04 00000101 0000000000000000000010000000000000011001 616161 636320 6262626262

This storage structure is essentially a way of saying that no matter how your field type is defined, you're going to end up with a string.Copy the code

Hidden fields that the database actually stores

Mysql adds some hidden fields to the data rows, partly for MVCC and partly for transaction purposes.

  • DB_ROW_ID: a field that uniquely identifies a row and is not a primary key ID. If no primary key or unique key is specified, a ROW_ID is added internally

  • DB_TRX_ID: one associated with a transaction that updates this is the transaction ID

  • DB_ROLL_PTR: This is a rollback pointer for transaction rollback

What if rows overflow?

Sometimes when the amount of data we define is too large, there will be a data page cannot store the data row. Mysql also uses a linked list to connect multiple data pages together, as shown in the following structure:

It can be seen from the figure that when data overflow occurs, a data page will point to the node of the next data page in a way similar to a linked list pointer, connecting many data pages together in a linked list form.

To summarize, when we insert a row of data into a database, we are actually inserting a row of data into memory that has a complex storage structure, and then some condition occurs that the row is flushed to a disk file.

Data page

The smallest unit is the data page

Database is the smallest unit of data page, but data pages are not one line of data, in fact, a data page contains the following parts: file header, the header data, maximum minimum record, multiple rows of data and leisure area, finally, the data page directory and file the tail, here in order to better observe structure, I put the figure cross come over:

The size of

The file header is 38 bytes, the data page header is 56 bytes, the maximum and minimum records are 26 bytes, the size of the data line area and the free area is not fixed, the directory of the data page is also not fixed, the end of the file is 8 bits. Mysql is a special storage format designed by mysql.

Data pages are stored in this way. Each data page contains many rows, and each row is stored in the way mentioned above. The data page is initially empty.

When there are many data pages, you can see that the LRU linked list moves hot and cold data alternately when updating the cache page, flushing dirty pages to disk via LRU and Flush.

Table Spaces and data extents

In fact, we usually create a table space and data extents concept

Table space

From the perspective of logical storage structure of InnoDB, all data are logically stored in a space, which is called tablespace. A tablespace consists of segments, extents, and pages.

When we create a table, the corresponding table name is displayed on disk. Ibd disk file. The disk file of a tablespace contains many data pages, with a maximum of 16KB for a data page. Since it is not possible to have a disk file for a data page, the concept of a data area was introduced.

A data area corresponds to 64 data pages, which is 16KB, and a data area is 1MB. 256 data areas are divided into groups. For a tablespace, the first three data pages of the first data area of the first group are fixed, which store some descriptive data. The FSP_HDR data page, for example, contains the tablespace and some attributes of this set of data areas. The IBUF_BITMAP data page holds insert buffer information, and the INODE data page holds special information.

Once again, we create tables with corresponding tablespaces. Each tablespace corresponds to a data file on disk. There are many groups of extents in a tablespaceCopy the code

Segment (segment)

Segment is divided into index Segment, data Segment, rollback Segment, etc. The index segment is the non-leaf node part, while the data segment is the leaf node part, and the rollback segment is used for data rollback and multi-version control. A segment contains 256 extents (256M in size).

How many extents a segment contains: 256 extents

Area (among)

An extent is a collection of pages, and an extent contains 64 consecutive pages, with a default size of 1MB (64*16K).

Page (page)

InnoDB page is the smallest unit of management, common types are FSP_HDR, INODE, INDEX, etc. All pages have the same structure, divided into header (first 38 bytes), page data, and footer (last 8 bytes). Page data varies depending on the type of page.

Each space is divided into multiple pages, usually 16 KiB per page. Each page in a space is assigned a 32-bit integer page number, often called an “offset,” which is really just the offset from the beginning of the page to the space (not necessarily the offset of the file for multi-file Spaces). Therefore, page 0 is at file offset 0, page 1 is at file offset 16384, and so on. InnoDB has a data limit of 64TiB, which is actually a per-space limit, mainly due to the fact that the page number is a 32-bit integer combined with the default page size

Finally, you can use the following figure to represent the details ** : **

conclusion

In this section, we continue to fill in the details of the buffer pool and understand the storage structure of the data rows and data pages on disk. Finally, we briefly understand the logical storage structure of a table, mainly covering the table space, data area and data pages. At this point, I believe you have a general understanding of the basic physical and logical structure of mysql.

Write in the last

This article is a little longer, thank you for your patience to watch, personal level is limited, if there are mistakes or comments welcome to give advice.

Historical articles:

Mysql, InnoDB storage engine, Binlog workflow – Juejin (cn)

Juejin: How do you stress test Mysql?

The internal structure of the Mysql buffer pool (1)

Consider:

Why does mysql store data like this, and why does it store them so close together?

In fact, it is not difficult to give the answer after careful thinking, mainly including the following reasons:Copy the code
  1. Store as much content as possible: Compact means more data and content can be stored, and buffer pool space utilization is guaranteed
  2. Easy sequential reads and writes: Sequential reads and writes from disk can rival memory to some extent, so storage in this format is good for I/O operations

Why are null lists stored as bits?

The final data looks like this:

0x09 0x04 00000101 0000000000000000000010000000000000011001 00000000094C(DB_ROW_ID) 00000000032D(DB_TRX_ID) EA000010078E(DB_ROL_PTR) 616161 636320 6262626262

If you load a data page from disk into a Buffer Pool during CRUD and there are no free pages, you must swipe a page from the end of the cold data section of the LRU list to disk and free up a free page. Then you can perform this CRUD operation based on cached data. But if this happens frequently, do you need to flush a cache page to disk for most of your CRUD execution? And then read a data page from disk into a free cache page? Wouldn’t you have to do disk IO twice for every CRUD operation? Wouldn’t the performance be terrible?

So let’s think about a question: how to optimize the kernel parameters of your MySQL database, and where to optimize the behavior, so as to avoid the frequent need to brush a cache page to disk when performing CRUD, before reading a data page from disk into the free cache page?

In fact, based on the operation principle of the buffer pool we have learned, we can know that to avoid the above problems, in order to avoid frequent use of the cache page. So we know that when you actually use cached pages, there is a background thread that periodically flusher some cached pages from the cold data area of the LRU list to disk. So essentially the cache pages are being used by you while being periodically released by background threads.

How to read a data page?

The pseudocode to read a data page looks like this:

dataFile.setStartPosition(25347) 
dataFile.setEndPosition(28890) 
dataFile.write(cachePage)
Copy the code

To read a data page in pseudocode, first need is the start and end time bits, find the corresponding segment through the table space, then find the corresponding data area, find the corresponding data page according to the partition, and then the internal data row of the page is displayed as follows:

Because the size of a data page is fixed, so a fixed data page is possible in a disk file occupies a starting position to the end of a piece of data, at this time it’s the same when you write back, choose good fixed position of a data, write back cache pages of data directly, will overwrite the original data page. See the pseudo-code diagram above