Innodb storage engine
redolog
Innodb data is stored in memory first and then transferred to disk. In the process of memory flushing to disk, data may be lost due to failure. Therefore, RedolOG is designed to ensure data integrity and ensure that REdolOG records before falling to disk. Redolog maintains a write disk, and to ensure data integrity, the transaction commit must be executed after Redolog writes to the disk. Therefore, the operation of RedolOG disks affects performance, so the amount of RedolOG data must be kept as small as possible. In addition, the operation of redolog must be idempotent, in order to perform multiple times to achieve the same effect. To improve performance, redolog operations should only correspond to the same disk Page space to avoid the performance cost of switching between multiple Page operations. Common logs are classified into logical logs and physical logs. Logical logs are idempotent, and physical logs are small. In order to have both characteristics, RedolOG is designed as Physiological Logging mode, which has the advantages of both log types. At the same time, redolog has many types of structure. Redolog log files are created by initialization and are used in a loop first (ib_file0, ib_file1). Each file is divided into blocks. A redo log may be divided into different blocks to record offset identifiers by LSN and SN. The first four blocks in file record checkpoint and block header information in the file. For transaction data operations, redolog writes to logBuffer first, then refreshes pagecache, then drops the disk, then commits the transaction. However, for a transaction submission, there may be multiple RedolOgs involved, and these RedolOgs may be of the same page or multiple pages. Therefore, atomicity of multiple RedolOgs for the same transaction should be guaranteed. In InnoDB, multiple REdologs are combined into a simple transaction record by MTR, which is recorded into MTR_log. After the transaction is committed, it is refreshed into logbuffer. In the case of high concurrency, multiple MTR operate at the same time. In order to ensure thread-safe data writing in logbuffer, each MTR will increase the global offset atomically, so as to ensure thread-safe space required for data copy. As for the falling disk of Redolog in logbuffer, because multiple MTR operate at the same time, the data of LSN position at this time may not be copied to logbuffer, resulting in some data holes. Innodb uses link_buffer to solve this problem. The current continuous logbuffer offset is recorded, and then the data is written to the position from the continuous point to the LSN. If so, the data is written to the Pagecache system. Record the write offset. The log_flusher thread is then told to flush fsync.
During redo redo, the file size is limited to reduce the number of redo redo replays. Innodb records the last flush by checkpoint. Redo only needs to be started at the latest checkpoint to reuse the file space previously at checkpoint. Checkpoint Ensure that all redo logs before checkpoint are flushed. The buffer pool has two variables lwM_Lsn and DPA_Lsn. Record the minimum REdolog LSN offset and the minimum BUFFerPool LSN offset in all dirty pages. In, if lwm_lsn is used to record checkpoint, some data has not been flushed to bufferpool in the MTR concurrent scenario. Therefore, the smaller of the two data points is used as checkpoint to ensure that all the redo files before checkpoint have been flushed to disk.
buffer pool
A buffer pool is a solution to solve the inconsistency between memory and disk read/write performance. Data operations improve the performance of database interactions by first operating in buffer pools rather than directly interacting with disks. When the database is started, memory space is occupied until the database is shut down. The buffer pool is divided into multiple innodb_buffer_pool_size/ Innodb_buffer_pool_instances instances. When innodb_buffer_POOL_size <1G the number of instances is 1, to avoid too many small instances affecting performance. A buffer pool instance consists of buffer Pool chunks, which are contiguous physical units in the instance. The default 128 MB. Each data page stored in chunk contains a control body information. The control body information includes the addresses of space_no,page_no, and page. Page hash: Used to record page data. Since the LRU/FLU/FREE list of buffer pool values is a linked list structure, space_id\page_no can be used to quickly locate page positions when accessing page data.
- A linked list in a buffer pool
Free List: Used to record buffer pool which pages are Free and ready to use. When SQL execution needs to allocate new pages, the pages are requested from the Free List. If new pages cannot be requested, nodes are eliminated from the LRU or FLU List. LRU List: a common elimination strategy. Frequently used data is put in the head, and rarely used data is put in the tail, and the tail data is eliminated during elimination. The right side of the linked list is divided into hot data area and cold data area. The new page data is added to the cold data area first. Flush List: Used to record dirty page data. Dirty page data exists in the buffer pool. If dirty page data is inconsistent between the memory and disk, it is flushed to disk through threads. Flush List records the dirty pages to be flushed and points to the LRU List data pages. Free List, LRU List, Flush List. If the page accessing the data does not exist in the buffer pool, the Free List needs to apply for memory to store the page information. If the Free List has no Free nodes, the clean page will be eliminated from the end of the LRU List. If there are no clean pages, A single page Flush is performed starting at the end of the Flush List to Free the node to the Free List.
- Buffer pool flushing mechanism
- The redo log is almost full
- Page cleaner thread brush dirty
- Too many dirty pages reach innodb_max_dirty_pages_pct
- Mysql > disable mysql
The index
B + tree
B+ tree is a further optimization of B tree. The structure of the B tree is that the key of each node is data data, and the passing degree sets the amount of data stored in each layer. Each leaf node has the same height, and there are Pointers at both ends to the next node, while leaf nodes have no Pointers. All nodes are sorted by key. Such a design can be more query data range quickly find the needed data. The difference of B+ trees is that data values are stored on leaf nodes, while non-leaf nodes only store data positions. In this way, each node stores small index data, can store more keys, and the height of the tree is reduced. B + tree data is only stored in leaf nodes, so every query needs to obtain data from leaf nodes. The height of the tree is consistent, so the depth of each query is consistent, which makes the query more stable. In addition, the data of leaf node is an ordered linked list, which is convenient for range search. For example, to search data from 18 to 20, only the data of 18 need to be found and then the data matching the data in the linked list need to be searched. In InnoDB, only primary key index leaves store data. Other secondary index page bytes store primary keys. If the column is not in the index, you need to check the primary key index again to get the data. Each node in the tree is the size of a page, because innoDB data is page by page, so it only takes one IO read to load the node data.
The index optimization
For composite indexes, follow the matching principle. For example, if multiple fields ABC are indexed, the effective query conditions for indexes are A, A B, A B C. However, in some cases the mysql optimizer may also use indexes. For example where a= ‘x’ and b like ‘%xx%’; In this case, InnoDB filters out indexes that match a= ‘x’ and matches b like ‘%xx%’. Instead of querying all indexes that match a= ‘x’ and filtering data that match B like ‘%xx%’, this is called index push-down. Using index condition; Using index condition; Where b= ‘ ‘where b=’ ‘ Because the combined index is ABC, the data at the beginning of field A is searched in the index. If the query does not match the first field, the index cannot be located. For range queries, the index tree can locate the data range of the index for the first field of the composite index, because the first field of the composite index is also ordered in the index tree. However, in some cases, the optimizer will optimize the range index so that the index is not removed. When the scanned data amount is 30% of the total data amount (this value varies from version to version), the index is not removed.
- The keywordFORCE INDEXCan force SQL to use an index
- Build indexes on fields that are frequently queried
- Fields with low differentiation do not need to be indexed
- Do not have too many indexes; otherwise, insert modification performance will be affected
- Fields that are frequently modified are not suitable for building indexes because they can cause changes to the index tree
- Field type implicit conversion, originally is vARCHAR field, query did not add ‘ ‘, can not go index
- Use function methods to calculate field values, not indexes. For example: Max (id) = 1
- Order by also affects the use of indexes, which can be used if the left-most matching rules are followed and the collation rules are consistent
Count (1), count(1), count(1) Count () contains all columns, which is equivalent to the number of rows. If the value of a column is NULL, count(1) contains all columns. If the value of a column is 1, the number of rows will not be ignored. Counting columns that are null (null is not just an empty string or 0, but represents NULL) is ignored, that is, when a field value is null, it is not counted.
Execution efficiency: If the table has multiple columns and no primary key, count(1) is going to be faster than count(1) if the table has a primary key, Select count() is optimal if the table has only one field.
The transaction
Transaction problems associated with database operations in a concurrent environment
- Update loss: Two transactions read the same data at the same time. When transaction A commits, the data already committed by transaction B is updated to that of transaction A, resulting in data loss.
- Dirty read: Transaction A reads data that transaction B has not committed yet, and then transaction B rolls back, causing transaction A to get incorrect data.
- It is not repeatable. During multiple reads in transaction A, transaction B updates or deletes data and submits the transaction. As A result, data read by transaction A for multiple times is inconsistent
- Phantom read: When transaction A reads multiple times, transaction B adds new data, resulting in more data read multiple times in transaction A
Mysql provides four transaction isolation levels
- Data updates that have not been committed by other transactions are allowed to be read. The problem of dirty reads, unrepeatable reads, and phantom reads is not solved
- Read committed, allows to read data that has been committed by other transactions, solves the problem of dirty read, unrepeatable read, and phantom read
- Repeatable read ensures that the data read for multiple times between transactions is consistent, unless the transaction itself modifies the data to solve the dirty read, unrepeatable read, and phantom read problems
- Serial, many transactions are executed in turn, solve the problem of dirty read, unrepeatable read, unreal read. But performance is low.
Among them, repeatable reads in InnoDB use gap lock and temporary key lock to solve the phantom read problem. All innoDB default transaction isolation levels have been able to achieve complete transaction isolation.
MVCC
RR transaction isolation Through MVCC, when the transaction is queried for the first time, the MVCC generates a snapshot of the data, and subsequent queries are poor snapshot data, so the data does not change. MCVV records the transaction version number of the data when the transaction is started. This version number is identified as the updated version number and the deleted version number. When the query is to determine the version number before the transaction, the deleted data version number is not defined. Update the data version number to the transaction version number when inserting. When updating, update the version number of the data row to the version number of the current transaction, and delete the version number of the original data row to update the version number of the current transaction. Save the current version to delete the version number identifier. The version number ensures the consistency of data queried multiple times in the same transaction to solve the problem of unrepeatable read. Before a transaction is committed, InnoDB records the undo log to keep the state of the current data. When a rollback occurs, innoDB records the logical execution log to roll back the data to the state before the transaction. SQL > lock an operation on an index, and lock a row when the primary key index is located. Queries that cannot determine the primary key result in a table lock. For range queries innoDB adds a gap lock, which locks a range of indexes. For the query on the non-unique index key, we also design the implementation of temporary key lock is gap lock + record lock. Gap lock and key lock can solve illusionary problem. So InnoDB’s RR transaction isolation level can solve all problems caused by transactions.
Performance optimization
- The service side
Mysql optimization is not only about SQL optimization. The number of links on the server also affects the connection on the client. You can check the current connection status by viewing the server status.
- The client
First, SQL optimization can view the SQL execution plan through Explain, whether the index is used, and whether the index is used correctly. Try to optimize the types that explain plans to execute toward higher performance types. Setting up a proper index can speed up SQL queries, SQL statement queries that return only the required fields may reduce the return to the table. The application uses the link pool technology framework, different framework performance has certain difference, not enough influence is not big. There is also the connection pool size configuration, which reduces the performance impact of CPU slice switching, and this is also relevant to the application server. The other is split table. Vertical split will split uncommon or large fields into sub-tables, while horizontal split will split data into different tables with the same table structure according to certain rules, such as id modeling and time. Furthermore, the business is suitable for table query, such as historical records can be checked by month, rather than query all historical records at one time. You can only query data for three months and so on.