This is the third day of my participation in the August More text Challenge. For details, see:August is more challenging

1. Mysql architecture diagram

Chinese name	An overview of the English	Functional description
The connection layer	Connectors	To connect to mysql, the client needs to connect to the default mysql port 3306 through the connector. The connector is in the listening state and mainly used to receive user requests. Max_connections indicates the maximum number of connections to be set. In a production environment, the value is typically 1000-2000
The service layer	Server	It is mainly divided into Sql interface, Parser, Optimizer, Cache and Buffer query Cache, and system management and control tools
Storage engine layer	Storage Engin	Mainly divided into MyISAM, InnoDB, etc., the storage engine has the characteristics of pluggable, because the storage engine is not for data, but for specific database tables

2. Mysql query execution process

2.1 the flow chart

2.2 Connectors Establish connections

The first step is to connect to the database and receive the connector. Connectors are responsible for establishing connections with clients, obtaining permissions, and maintaining and managing connections.

2.3 Querying cache Information

The second step, if the query SQL and the last time is exactly the same (more than one space will be considered different), then the cache data to obtain data, but the cache generally invalidation is very frequent, table structure changes, data updates will lead to cache invalidation.

Therefore, mysql does not recommend using query caching because the benefits outweigh the disadvantages.

2.4 The parser performs SQL parsing

In the third step, the parser performs lexical analysis and splits a completed SQL statement into strings. Then the grammar is analyzed to determine whether the grammar is valid or not. It then generates a parse tree from SQL based on the syntax rules defined by mysql, as shown below:

2.5 The preprocessor checks the parse tree

In the fourth step, the preprocessor further checks whether the parse tree is valid, such as whether the table name exists, whether the column of the table in the statement exists, etc. MySQL checks whether the user has permission to operate on the table. After preprocessing, you get a new parse tree.

2.6 The Optimizer performs SQL optimization

In the fifth step, the query optimizer finally generates different execution plans according to the generated parse tree, and then selects the optimal execution plan. For the optimizer based on the cost model used by mysql internally, the less the execution plan costs, the higher the priority. The functional part of the processing that the optimizer does is as follows: Determine which index to use when multiple indexes exist; When multiple tables are associated, the join order of the tables is determined and the base table is determined.

2.7 Execute SQL Statements according to the Execution Plan

In the sixth step, the executor executes SQL statements according to the execution plan, mainly to obtain data according to the interface defined by the storage engine. The user’s permissions are checked before the data is retrieved.

3. Mysql storage engine

The storage engine	describe
MyISAM	MySQL default database before version 5.5, high speed engine, has high insert, query speed, but does not support transactions, row locking, foreign keys
InnoDB	MySQL’s default database since version 5.5 supports transaction and row-level locking and is slightly slower than MyISAM
ISAM	Indexed Sequential Access Method. MyISAM’s predecessor, MySQL5.0, is no longer installed by default
Memory	Memory storage engine, with extremely high insert, update and query efficiency. But it takes up memory space proportional to the amount of data. Keeping data only in memory means it can be lost
CSV	The CSV storage engine stores data based on CSV format files (for cross-platform data exchange)
Falcon	A new storage engine that supports transaction processing is rumored to be an alternative to InnoDB
Archive	Used for data archiving. Compressed data for storage is ideal for storing large volumes of independent, historical data, but only for insert and query operations
MRG_MyISAM (MERGE)	Combining multiple tables into a single table is useful for very large data stores
BLACKHOLE	Discard the write operation, the read operation will return empty content
FEDERATED	Used to access remote tables
NDB	MySQL cluster dedicated storage engine

4, Mysql index

4.1 Index Classification

4.1.1 Classification by Usage Type

Primary key index: The value of the index column must be unique. Null values are not allowed. Primary key fields are often used
Normal index: The value of the index column is not restricted and can be repeated or empty
Unique index: Index columns have unique values, but null values are allowed
Full-text index: CHAR, VARCHAR, and TEXT can be used only when the data volume is small or the concurrency is low. Otherwise, select ES, Lucene, and so on

4.1.2 Partition by Number of index Columns

Single-column index: The index is created with only one column
Combined index: The index to be created contains more than two fields. When using a combined index, follow the left-most prefix rule. It is recommended to use a combined index instead of a single-column index (except the primary key index) because it saves more space

4.2 Index storage and features

The data in the table is stored in a disk file, and MySQL needs to read the data from disk to memory before processing the data. A hard disk is usually composed of multiple platters, and the number of platters is generally less than 5.

The working mechanism of the disk determines the speed at which data can be read. The time required for reading and writing disk information can be divided into seek time, delay time, and transmission time. The time it takes to read data from the disk is the sum of the time required by these three steps.

MySQL is essentially a piece of software, so when MySQL needs to read data, MySQL calls the operating system interface, and the operating system calls the disk driver to read the data into kernel space, and then it copies the data from kernel space to user space, and MySQL can then read the data from user space. When the operating system reads disks, the minimum unit for Linux to read disks is 4K. The minimum unit is determined by the operating system and may vary by operating system.

MySQL’s InnoDB storage engine reads data in pages, and the size is controlled by the innodb_page_size parameter, which defaults to 16K.

4.3 Data structure of the index

4.3.1 the Hash table

Hash tables, hashmaps and Treemaps in Java, are Hash table structures that store data as key-value pairs. We use Hash tables to store table data. Keys can store index columns, and values can store row records or row disk addresses.

For equivalent query, the efficiency of the Hash table is high and the time complexity is O(1). However, the fast range search is not supported, and the range search can only be performed by scanning the full table.

4.3.2 binary tree

Characteristics of binary tree: each node has at most 2 bifurcations, the left subtree and right subtree data order is small on the left and large on the right, the retrieval complexity of binary tree is related to the height of the tree, and there will be a special case of unidirectional linked list structure in primary key index.

4.3.3 Balanced binary trees

The hierarchy of the left and right subtrees of the tree differs at most by 1. During data insertion and deletion, the balance of binary tree is maintained by left-handed/right-handed operation, so that the left subtree is not very tall and the right subtree is very short. The performance of the query using balanced binary search tree is close to binary search, and the time complexity is O(log2n).

Existing problems:

Time complexity is related to tree height. The tree is as tall as it needs to be retrieved
The balanced binary tree does not support the fast search of range query, and the query efficiency is not high because the range query requires multiple traversals from the root node

4.3.4 B tree

If key is bigint=8 bytes, each node has two Pointers, each pointer is 4 bytes, and each node occupies 16 bytes of space (8+4*2=16).

Store multiple elements per node and as much data as possible per node. Each node can store 1000 indexes (16K /16=1000), so the binary tree is transformed into a multi-tree, and the tree is changed from tall and thin to short and fat by increasing the tree’s branching tree. To build a million pieces of data, you only need 2 layers of tree height (1000*1000=1 million).

B tree is a multi-fork balanced search tree. Its main characteristics are:

B tree nodes store multiple elements, and each inner node has multiple forks
The elements in a node contain key values and data, and the keys in a node are arranged from largest to smallest. That is, data is stored at all nodes.
Elements in the parent node do not appear in the child node
All leaf nodes are at the same level, leaf nodes have the same depth, and there is no pointer between leaf nodes

B. Disadvantages of tree:

B tree does not support the fast search of range query. If we want to find the data in the range, after finding the starting value, we need to go back to the root node for traversal and search again, which requires multiple traversal from the root node, and the query efficiency needs to be improved.
If data stores rows, the row size will increase as the number of columns increases. At this point, the amount of data that can be stored in a page will be smaller, the tree will be correspondingly taller, and the number of disk I/OS will be larger.

4.3.5 B + tree

Only leaf nodes store data, and non-leaf nodes only store key values. Leaf nodes are connected by bidirectional pointer, and the bottom leaf node forms a bidirectional ordered list.

B+ tree can guarantee fast lookup for equivalence and range query.

4.4 ICP is pushed in the index condition

Index Condition Pushdown (ICP) Index Condition Pushdown (ICP) Index Condition Pushdown (ICP) Index Condition Pushdown (ICP) You can control the start and turn off of ICP with the optimizer_switch parameter.

ICP’s main purpose is to reduce the number of backtable queries and can be used in Both InnoDB and MyISAM engines, with the former only used as a secondary index.

Conclusion:

1. When ICP is not used, index conditions that meet the leftmost prefix are compared at the storage engine layer, and non-index conditions are filtered at the Server layer.

2. When using ICP, all the comparison of index conditions are compared in the storage engine layer, and the comparison of non-index conditions is filtered in the Server layer.

Comparing ICP with no ICP, it can be seen that ICP can effectively reduce the number of table queries and the number of records returned to the service layer, thus reducing the number of disk I/O and the number of interactions between the service layer and the storage engine.

4.5 Overwriting an Index

The query columns are in the index columns during the query, ensuring that the data is obtained from the index query and no separate backtable query is performed. This reduces the number of backtable queries and I/O queries and speeds up the query.

4.6 Index failure analysis

The left-most prefix rule for an index is not met, or <>,! = and so on, starting from the first decision condition does not walk the index
Use operation functions on index columns
The index cannot be accessed using is NULL, is not NULL
The string query does not include quotation marks
When ‘% XXXX %’ is used for like fuzzy matching, the index cannot be accessed
Use overwrite indexes as much as possible and reduce select *
If or is rarely used for conditional query joins, the index becomes invalid
Use full value matching index queries more often

5. Explain the query execution plan

5.1 select_type

simple

Represents a simple SELECT query that does not require a union operation or contains no subqueries. When there is a join query, the outer query is simple and there is only one query.
primary

A select that requires a union operation or contains a subquery whose select_type of the outermost unit query is primary and only one.
union

Union joins two select queries, the first query is dervied derived table, and all subsequent tables select_type is union except the first table.
dependent union

Appears in the union or union ALL statement as in the union, but this query is affected by the external query
union result

The result set containing the union, in the UNION and union ALL statements, has the ID field null because it is not required to participate in the query
subquery

Any subquery that appears anywhere except in the from clause may be a subquery
dependent subquery

Similar to the Dependent Union, means that the query of this subquery is affected by the external table query
derived

Subqueries that appear in the from clause, also called derived tables, might be called inline views or nested select in other databases

5.2 type

system

The table has only one row of data or is empty.
const

When a unique index or primary key is used and the record must be the equivalent of 1 row, type is usually const. Other databases are also called unique index scans.
eq_ref

Used in the case of equivalent joins for multiple table associations where the two equivalent joins are primary keys or unique indexes. This type is usually used in join queries with multiple tables, indicating that each result of the former table can only match the result of one row of the later table. And the query comparison operation is usually =, the query efficiency is higher.
ref

In the case of an equivalent join with multiple table associations where the equivalent join is a non-unique index or an index query using the leftmost prefix rule.
fulltext

Full-text index retrieval, note that full-text index has high priority, if both full-text index and normal index exist, mysql will use full-text index regardless of cost
ref_or_null

Similar to the ref method, but with the added comparison of null values. Not much is actually used.
unique_subquery

Used in where subqueries that return unique values with distinct values
index_subquery

Subqueries that use the in form use auxiliary indexes or lists of in constants. Subqueries may return duplicate values. You can use indexes to deduplicate subqueries.
range

Index range scan, commonly seen in queries using >,<,is NULL,between,in,like, etc.
index_merge

The common and, or condition uses a different index. Officially, this is sorted after ref_or_NULL, but in practice, performance may not be as good as range most of the time due to multiple indexes being read.
index

The select result column uses an index, and type is displayed as index. Full index scans, which scan an index from beginning to end, are common in queries that use index columns to process queries that do not require data files to be read, and queries that can be sorted or grouped using indexes.
all

This is a full table scan of the data file, which is then filtered at the Server layer to return the required records.

5.3 extra

using index

You do not need to query back to the table. You can directly obtain the query result data through the index. Covering Index (Covering Index) : Covering Index (Covering Index) : Covering Index (Covering Index) : Covering Index

If Using Where is present, the index is being used to perform a lookup for the index key value. If not, the index is used to read data rather than perform a lookup.
using where

Indicates that the mysql will filter the results extracted by the Storage Engine. The filter condition field has no index.
using index condition

Indicates that the ICP index condition is used to push.
using filesort

This occurs when an index is not available for sorting. This is common in order BY and Group BY statements. Note mysql uses an external index to sort files instead of reading them in index order. The sort operations that cannot be done using indexes in mysql are called file sort.
using temporary

Mysql > select * from ‘order BY’ where ‘group BY’ = ‘order by’ and ‘group BY’ = ‘order by’;
distinct

Use of distinct keyword (index field) in select section
no tables used

The query without the from sentence or from Dual query.

Join queries that use the not in() form subquery or the Not EXISTS operator are called antijoins. That is, the general join query is the first query inside the table, and then query the appearance, the reverse join is the first query outside, and then query inside the table.

6. Introduction to Mysql Lock

6.1 Division of locks

6.1.1 By Particle size

Global lock: Locks the entire database. This is implemented by MySQL’s SQL Layer
Table level lock: Locks a table. SQL layer by MySQL, the main table lock and metadata lock.
Row-level locking: Locks a row of data, but may also lock gaps between rows. Implemented by some storage engines, such as InnoDB. A row lock locks a row of data, and may also lock gaps between rows. InnoDB row locking is implemented by locking an index entry in a leaf node of an index tree.

InnoDB has three types of row-level locks in terms of locking range:

Record Locks: Locks a Record in the index.

Gap Locks: Locks the interval (Gap) between two indexes, a left-to-right interval.

Next-key Locks: gap Locks + the Next record lock immediately adjacent to the gap lock, opening left and closing right.

In the absence of an index, a row lock is upgraded to a table-level lock.

6.1.2 By Functions

Shared read lock: Allows one transaction to read a row, preventing other transactions from acquiring exclusive locks on the same dataset
Exclusive write lock: Allows transactions that acquire exclusive write locks to update data, preventing other transactions from acquiring shared read locks (not reads) and exclusive write locks on the same dataset.

6.1.3 By Implementation mode

Pessimistic locking
Optimistic locking

7, Mysql things and MVCC underlying principles

7.1 Four Properties of food (ACID)

atomic

A transaction is an atomic unit of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes are successful when the transaction commits or all the changes are undone when the transaction is rolled back. It means that the sequence of operations that occur within a transaction is an indivisible unit, a sequence of update operations within a transaction, which either all execute successfully upon a transaction commit or all undo upon a transaction rollback. We use transactions in our program. If an exception occurs, we must roll back the transaction. If we roll back the transaction, it will be as if we did not perform the previous database update operation.
consistency

The state of the database remains consistent at all times — after each commit or rollback, and for the duration of a transaction. If you update the relevant data across multiple tables, you will see all the old values or all the new values when querying outside the transaction, rather than a mix of old and new values. The intermediate state of the data between the start and end of the transaction is not visible to other transactions, and the atomicity of the transaction guarantees the consistency of the data.
Isolation,

Transactions are isolated from each other as they proceed, and they cannot interfere with each other or view each other’s uncommitted data. Transaction isolation is achieved through a locking mechanism, and experienced developers can improve performance and concurrency by tuning the isolation level so that they can ensure that transactions do not interfere with each other.

By the lock mechanism and MVCC mechanism to achieve; MVCC(multi-version concurrency control) : Optimize read/write performance (read without locking, read/write without conflict)
persistence

The results of a transaction are persistent and must all be written to disk after a successful transaction: once the commit operation is successful, the changes made by the transaction are not affected by power failures, system crashes, and other potential hazards. Database data is usually kept on disk, and modifications to the data involve writing operations to disk storage, with a certain amount of redundancy to prevent power failures or software crashes during write operations.

7.2 InnoDB Memory and disk structure

7.2.1 Memory Structure

Buffer Pool Buffer Pool

The Buffer Pool is an area of main memory where InnoDB caches data pages and index pages when it accesses a table. On MySQL dedicated servers, up to 75% of the physical memory is normally allocated to the buffer pool. The size of the buffer pool can be controlled by the innodb_buffer_pool_size parameter.

InnoDB table data, whether primary key index or secondary index, is stored in the disk space as a page. When InnoDB accesses data on a page, it loads the entire page into the buffer pool and then reads and writes. InnoDB does not immediately remove the page from the buffer pool after the operation. Instead, InnoDB will cache the page so that the next time the page needs to be accessed, it will be retrieved directly from the buffer pool. This saves disk IO overhead and speeds up data access.

The size of the pages in the Buffer Pool is the same as the size of the pages in the data file. When the Buffer Pool runs out of space, the Buffer Pool uses the LRU algorithm to weed out the least recently used pages.

When MySQL initializes, the Buffer Pool is divided into several cache pages:

Free page list: Unused cached pages
Dirty page list: The InnoDB storage engine reads the page from the disk to the buffer pool and then modifies the page in the buffer pool. The page in the buffer pool is inconsistent with the page on the disk. Call this page dirty page
LRU list: The LRU list holds all the data pages and index pages loaded into the Buffer Pool. According to the least recently used rule, the most recently used is at the top of the list, and the least recently used is at the bottom.

When no free pages are available in the Buffer Pool, some cached pages are eliminated from the end of the LRU list. After elimination, the LRU header holds the hot data. The LRU list is not a traditional LRU list. InnoDB optimised the LRU list by dividing it into yong (5/8) and old (3/8). This ratio can be adjusted with the innodb_old_blocks_pct parameter. The default value is 37, which represents 37% of the old block.

Young: Stores cached pages that are used very frequently. This part of the list is also called hot data.

Old: Stores cached pages that are not used very often, so this part of the list is also called cold data.

When InnoDB reads a page into the buffer pool, it first inserts it into the header of the old list, and then inserts it into the header of the yong section the next time the page is visited and certain conditions are met. When a certain condition is met, the time of the last page visit minus the time of the first visit is less than a certain interval. This interval can be controlled by the innodb_old_blocks_time parameter, which is 1s by default. That is, if the time between the first and last visit to the page is less than 1s, the page will be inserted into the head of the yong area.

For some hot index pages in the buffer pool, InnoDB will automatically create an adaptive Hash index in the buffer pool in order to speed up access to these pages. The adaptive hash index function can be enabled using the innodb_adaptive_hash_index parameter. It is enabled by default.

Change Buffer

Change Buffer is a special data structure, which is an update optimization measure for secondary index (secondary index) pages. When the secondary index page is not in the Buffer Pool, InnoDB will temporarily cache changes to the secondary index in the Change Buffer, and then merge the changes into the index page when the index page is loaded into the Buffer Pool due to other reads. Changes in the Change Buffer can be caused by Insert, Delete, and Update operations, thus reducing random IO on secondary indexes through merge operations. The use of Change Buffer can effectively improve the speed of INSERT, UPDTE, and DELETE execution.

Step: 1. DML operation of secondary index page, and the index page is not in the Buffer Pool, then the operation is stored in the Change Buffer. 2. The next time the page needs to be loaded, the index page is loaded into the Buffer Pool. Changes in the Change Buffer are merged into the Buffer Pool. 3. This change is then flushed to disk when the server is idle.Copy the code

When is the change Buffer updated to the buffer pool index page?

A: Index pages merge updates in memory when they are loaded into the cache pool.

When will the buffer pool dirty pages be updated to the disk file?

Answer: When the redo log is full; When the database is idle, by the background thread; The database shuts down normally. The change buffer is also updated to disk in all three cases.

Log Buffer (Redo Log)

MySQL will cache updates in memory and flush dirty pages to disk only when the server is idle. The problem with this is that if the server shuts down unexpectedly or MySQL crashes before the dirty pages fall to the disk, the dirty pages will be lost. To avoid this problem, InnoDB writes all changes to the page to a log file to persist to disk, so when MySQL crashes and restarts, MySQL will use this log file to perform the recovery operation and reapply the changes to the data file, making the update operation persistent. This Log file is called the Redo Log.

By default, the physical files of the redo log are ib_logfile1 and ib_logfile2 in the database data directory. You can use parameters to control the storage civilian, number and size of log files, as shown below:

Specifies the path where the log file is located. By default, the log file is located in the data directory of the database.
innodb_log_group_home_dir=. /
# Specifies the number of files in the redo log file group. The default is 2, indicating that there are two redo log files.
# Write to two files in a loop. When one file is full, start using the other.
innodb_log_files_in_group=2 
# Size of each redo log file, default 48M.
innodb_log_file_size=16777216
Copy the code

This is a write-ahead Logging technique often referred to in MySQL. The key is to Write to the log before writing to the disk. MySQL has an area of memory called the Log Buffer where the redo Log is written. The size of the Buffer can be controlled by innodb_log_buffer_size, which is 16M by default.

The default value is 1. It indicates that the Log Buffer will be written to the disk immediately after the transaction is committed.

Rules of falling:

0: Data is written from the log buffer to the log file every second and fsync is flushed to disk at the same time. Log buffer data is not written to the redo log file immediately upon each transaction commit. If MySQL crashes or the server goes down, all the data in memory will be lost, up to 1 second of transactions will be lost.
1: Each transaction commit, MySQL writes data from the log buffer to the log file and fsync is flushed to disk at the same time. This mode is the system default. MySQL crashes and committed transactions are not lost. To fully comply with ACID, the default setting 1 must be used.
2: Each time the transaction is committed, MySQL writes data from the log buffer to the log file. MySQL performs the fsync operation every second to synchronize the data to disk. Each time a transaction is committed, data is flushed to the operating system buffer, and the disk is considered persistent. If MySQL crashes, committed transactions will not be lost. However, if the server goes down or an unexpected power failure occurs, data in the operating system cache will be lost, so up to 1 second of transactions will be lost.

Conclusion:

Considering security and performance, this mode is often used in business. Data will not be lost when MySQL restarts abnormally. Data will only be lost for 1 second when the server goes down unexpectedly, which is very rare and can be tolerated relative to performance.

Log files are also disk files. Why not update them directly to a data file instead of updating them to the redo log first?

A: If the data we need to update is scattered in different sectors on different pages at the time of transaction commit, we need to find the corresponding track according to the disk address and then find the corresponding sector before writing the data, which usually takes 10ms. A transaction commits data that requires multiple disk I/O interactions to complete. This is random I/O, and the read and write speed is relatively slow.

The redo log file is a continuous area of the disk. When a transaction is committed, the redo log is written to the first sector and only needs to be written backwards. This means that only one disk I/O operation is required. Dirty paging disks are random I/OS, and logs are sequential I/OS. WAL technology is used to record change operations in log files and delay disk dumping to improve system performance. Note that the redo log is primarily used for crash recovery. Updates to the data files in the disk are still coming from the dirty buffer pool disk.

Features of the Redo Log:

The Redo log is generated in the InnoDB storage engine layer and not in other storage engines.
The redo log is a physical log that records changes made to the data on a data page.
The size and number of redo log files are fixed. When one file is full, the redo log is switched to the next file. You start with the first file, and then you write to the last file and you go back to the first file and start writing. When the system is idle or redolog is full, MySQL will erase the redolog and synchronize changes to the redolog.

bin log

Binlog records the operations (all DDL statements and DML statements) that the database performs to change, but does not include operations such as select and show. Before the transaction is committed, all uncommitted binary logs will be recorded in a cache. When the transaction is committed, the binary logs in the buffer will be directly written to the binary log file, which is mainly used to implement master-slave replication and data recovery of mysql.

Binlog is disabled by default. You are advised to enable binlog. You need to perform the following configurations to enable binlog.

# Off by default
log-bin=OFF 	
Mysql-bin is the prefix of the binlog file name, and the full name of the binlog file is mysql-bin-000001.log
log-bin=mysql-bin
Copy the code

Max_binlog_size specifies the maximum value of a single binary log file. If the value exceeds the threshold, a new binary log file with the suffix +1, such as mysql-bin-000002.log, is generated.

The binary buffer is session-based and its size is determined by binlog_cache_size. The default size is 32K. When a thread starts a transaction, MySQL automatically allocates a cache of size binlog cache_ size.

Timing:

The timing of the write to disk, controlled by the sync_binlog parameter, which defaults to 1, is flushed to disk before the Commit.

0: disables the function of the MySQL server to synchronize binary logs to disks. Relies on the operating system to flush the binary log to disk from time to time, just like any other file. This setting provides the best performance, but because binary files are cached in the binlog_cache, there is a possibility that committed transactions will not be synchronized to disk in the event of a power failure or an operating system crash (all binlog information in the binlog_cache will be lost).
1: The binary log is synchronized to disk before the transaction is committed. This is the safest setting, but it has an impact on performance due to the increased number of disk writes. In the event of a power failure or an operating system crash, all committed transactions are guaranteed to be flushed to disk, and those that are not flushed to disk are only in the ready phase (uncommitted) and will not be lost.

Sync_binlog =1, binary logs are written to disk immediately before a transaction is committed. If the binary log has been written at this point, but the COMMIT has not occurred, and if the outage occurs, the next time the MySQL database is started, the transaction will be rolled back because the COMMIT has not occurred. However, the binary log already records the transaction information, which can cause inconsistency in the master/slave replication environment (or using the binlog for recovery). This problem can be solved by using the innodb_support_xa=1 parameter, which also ensures that the binary log and InnoDB storage engine data files are synchronized.
N (N>1) : For every N transactions, Mysql synchronizes the binary log to disk. In the event of a power failure or an operating system crash, it is possible that committed transactions are not synchronized to disk.

The redo log is different from the bin log

The Redo log is generated by the InnoDB storage engine layer and the bin log is generated by the data service layer.
Redo log space is fixed. Binlog uses the append mode. When a file reaches a certain size, it switches to the next file.
The redo log is used for crash recovery. Binlog is mainly used for master-slave replication and data recovery.

Double Write Double Write buffer

Why do you need a dual-write buffer?

A: The database data page size is 16K, and the minimum UNIT of THE OPERATING system I/O is 4K. That is, a dirty page in the Buffer Pool needs to be written to a data file four times. If the server is powered off or goes down while the data page is being written, partial page write (e.g. 8K) may occur and half of the data page is not written. This phenomenon is called partial page write. After MySQL restarts, you cannot use the redo log to restore the damaged page, so the page data is lost and may cause data inconsistency. In order to improve reliability, InnoDB introduced Double Write mechanism to solve partial Write failure.

Double write buffering is a key feature of InnoDB. You can use the parameter Innodb_doublewrite to control whether to enable doublewrite buffering. It is enabled by default. If double write caching is enabled, InnoDB will flush the dirty pages from the buffer pool to the double write buffer and then flush the dirty pages from the double write buffer to the data file before InnoDB writes them to the data file. If the MySQL service crashes while writing a dirty page to a data file, InnoDB can retrieve the last copy of the page from the Double Write of the system tablespace, copy it to the tablespace file, and apply the redo log.

undo log

Atomicity and isolation of transactions are implemented by undo logs. Achieving atomicity is, in fact, achieving consistency. Transaction persistence is implemented by the Redo log.

The undolog (undolog or rollback log) records the state of the data before the change in the transaction. If the user rolls back, the database can use the undolog to restore the data to the state before the transaction. The redo Log and undo Log are called transaction logs. The undoLog is stored in the system tablespace by default, and the undoLog can be stored in an independent tablespace. You can set the following parameters:

show global variables like '%undo%';
#undo store directory, default in data directory
innodb_undo_directory=. /
The default value is 0. Number of undo files. Format: undo001, undo002, etc
innodb_undo_tablespaces=0
Copy the code

In addition to rollback operations, the other function of undolog is MVCC, which is implemented in InnoDB storage engine through undolog. When a user reads a record that has been occupied by another transaction, the current transaction can use the undo log command to read the previous version of the record.

7.2.2 InnoDB transaction analysis

Atomicity, persistence and consistency

Atomicity, persistence, and consistency are accomplished through the redo log, undo log, Force log at Commit, and Double Write mechanisms. The redo log is used to recover data in the event of a crash, and the undo log is used to undo transactions when they are rolled back. It is also used for isolated multi-versioning. Force Log at Commit ensures that the redo Log is persisted after a transaction is committed. The Double Write mechanism is used to improve the reliability of the database and solve the problem of partial Write failure when dirty pages fall into disks.

Atomicity: redo log, undo log, Force log at Commit, and Double Write mechanism.

Persistence: redo log, Double Write.

The redo log is used for crash recovery. After the database is restarted after a database crash, restore the dirty pages in the redo log and rewrite them to disks.

In addition to crash recovery, the redo log is used for committed transactions. Uncommitted transactions also need to be rolled back during crash recovery. The redo log is required for the integrity and reliability of the undo log, so redo data is restored first, and then undo rollback is performed.

The redo log is logged by time or space. The undo log and redo log are logged by checkpoint. The redo log is logged by time or space. The corresponding redo log can be deleted. If the transaction has not been committed, check the checkpoint log and use the redo log to restore the data and undo log. Then check the undo log to find that the transaction has not been committed. Then use the undo log to roll back the transaction. Before a COMMIT, all redo logs related to the transaction are logged. A COMMIT is performed only when all redo logs are logged. Then the dirty pages in the memory continue to be dropped by checkpoint. If a crash occurs at this point, only the redo log is used to recover the data.

Isolation,

Read Uncommitted: The lowest level, which is not guaranteed in any case.
Read COMMITTED (RC) : Avoid dirty reads.
Repeatable Read (RR) : Avoids dirty read and non-repeatable read.

(Note: InnoDB’s RR also works with phantom reading, mainly because next-key (Gap) locks are available only to RRS)
Serializable: Avoids dirty reads, non-repeatable reads, and fantasy reads.

(downgraded from MVCC to Locking-Base CC)

MVCC makes the database read data does not lock, ordinary SELECT requests do not lock, improve the concurrent processing capability of the database. With MVCC, databases can implement isolation levels such as READ COMMITTED, REPEATABLE READ, and users can view the previous or previous versions of the current data, ensuring the I feature (isolation) in ACID.

In MVCC concurrency control, read operations can be classified into two categories: snapshot read and current read.

Snapshot read, which reads the visible (and possibly historical) version of the record without locking. (select)
The current read is the latest version of the record read, and the record returned by the current read is locked to ensure that other transactions will not concurrently modify the record.

** Consistent nonlocking read ** refers to the way InnoDB storage engine reads rows in the current database through multiple version control (MVCC).

If a row is being read in the middle of a DELETE or UPDATE operation, the read operation does not wait for the lock to be released. Instead, InnoDB reads the latest visible snapshot of a row.

The transaction list

MySQL transactions are stored in a linked list called TRx_SYS from the start to the commit. This is a basic list structure:

ct-trx –> trx11 –> trx9 –> trx6 –> trx5 –> trx3;

The transaction list contains transactions that have not been committed. Once a transaction is committed, it is removed from the transaction list.

At RR isolation level, at the start of each transaction, all active transactions in the current system are copied to a list (readview)

At the RC isolation level, all active transactions in the current system are copied to a list at the start of each statement (readview).

Show engine Innodb Status to see the list of transactions.

ReadView

A Read View is a collection of all current transactions at the time the transaction is started. This class stores the maximum transaction ID and minimum transaction ID in the current Read View.

# Assume this is the list of currently active transactions. As follows:
ct-trx --> trx11 --> trx9 --> trx6 --> trx5 --> trx3;
#ct-trx indicates the id of the current transaction, which corresponds to the read_view structure above
read_view->creator_trx_id = ct-trx; 
read_view->up_limit_id = trx3; Low water level
read_view->low_limit_id = trx11; High water level
read_view->trx_ids = [trx11, trx9, trx6, trx5, trx3];
Copy the code

If you read db_trx_id>=low_limit_id on row, you can see that no data has been committed before this date. If you read db_trx_id>=low_limit_id on row, you can see no data before this date.

Up_limit_id is the “low watermark”, that is, the minimum transaction ID in the active transaction list at the time. If db_trx_id

If the db_trx_id of the row is between low_limit_id and up_limit_id, then look for whether the db_trx_id of the record is in the read_view-> trX_IDS list of its own transaction. If it is, then the current version of the record is not visible; otherwise, the current version of the record is visible.

With repeatable Read isolation level, the current global READView is generated when the transaction TRX structure is created. Created using the trx_assign_read_view function until the end of the transaction. The Read View is not rebuilt with each query during the end of the transaction, enabling repeatable reads.

At the read-commited isolation level, the read_view is closed during each statement execution and the current copy of the read_view is re-created in the row_search_for_mysql function. In this case, the phenomenon of non-repeatable reads occurs.

Mysql From beginner to master – part 1