The organization form of the full text: one, the interview questions and a sentence summary two, detailed knowledge points

Mysql architecture

The serial number The problem One sentence explanation Detailed knowledge
0 The architecture of Mysql is layered The client;

Connector, cache layer, profiler, optimizer, actuator;

Storage engine layer
1 What are the Mysql logs The commonly used: Redo log, undo log, and binlog

Not commonly used: Errorlog, slow Query log, general log, and relay log
1 The redo log Increase the throughput of the service

Make the service highly available
Geek time <02 Logging System: How is an SQL update statement executed? >
2 Redo log writing In fact, the redo log writing process isPhase 2 Submission (2PC).

The executor calls the engine to write data to the redo log

After the redo log was written, the file was in the prepare state

The executor generates a binlog and writes it to disk

Redo log: prepare -> commit

3 What if the redo log fails to write? 1 In the Prepare phase 2 Write binlog 3 Commit

When it crashes before 2, restart recovery: If no commit is found, rollback is performed. Backup and restore: no binlog. Same as before the crash

When it crashes before 3Restart recovery: Although the commit is not performed, the system is automatically committed after the restart because the prepare and binlog are complete. Backup and restore: binlog. consistent
3 The role of the binlog binlogArchive logs are logical logs, which means migrating to another mysql instance is ok, even if the other is MyISAM.

Main functions:Backup and restoration and primary/secondary replication
4 The format of the binlog Three formats are available. Row is recommended

Row: a secure row-based mode that records changes in rows

Statement: A schema based on SQL statements. Some functions, such as UUID NOW, may cause data inconsistency or even errors during replication

3. Mixed mode: Select statement or Row mode according to the statement
5 Log and bin log associations Redo logs and binlogs have a common data field calledXID. Redo log is scanned sequentially during crash recovery.

If you encounter a prepare and commit redo log, commit it

If you encounter a redo log with parepare but no COMMIT, take the XID to the binlog to find the corresponding transaction.

Detailed knowledge points

Mysql architecture is layered

  • 1. Client, our own service
  • 2. Connectors, including authorization and authentication, connection processing, etc.
  • 3. Cache layer, if hit memory directly returned, do not need to check the file system. But the query cache invalidates so frequently that whenever a table is updated, all the query cache on that table is cleared. Note :MySQL 8.0 directly removed the entire query cache function, meaning that this function has been completely removed from MySQL 8.0
  • 4. Profilers, a bit like a compiler’s tokenizer, do lexical analysis, syntax analysis
  • Optimizer, execute plan generation, select appropriate index, etc
  • 6. Executor, operation API, return result
  • Storage engine layer, to implement MySql Api, but do not parse Sql. Like InnoDb and MyISAM where, except for the engine layer, everything else is in the Server layer

Mysql log comparison

The log The alias role content storage The write way Applicable scenario
binlog Binary log The actual storage of data Logs in logical format

SQL itself is read without recording

It also records the rollback of SQL, such as 1 INSERT, and also 1 DELETE
Master/slave replication, data recovery (at some point), and cluster synchronization
redo log Redo log Ensure the persistence of transactions Logs in physical format, which records changes to the physical data page, sequentially written to the physical file Crash recovery of transactions
undo log Roll back log Provides multi-version read under concurrent control (MVCC), also known as unlocked read Logs in logical format Transaction rollback

1. The redo log

  • The redo log is a physical log that records every update performed on the client, such as update and insert. Redo logs are recorded in the redo log and are updated when available. The Logging method is WAL (write-Ahead Logging).
  • role:
    • 1, increase the throughput of the service, because logging is incremental append, fast ah
    • 2,High availabilityEven if a service restarts or dies, it can be recovered from the redo logcrash-safe
      • Note: InnoDb has no redo log on the Server. MyISAM does not have crash-safe capability

The index

The serial number The problem One sentence explanation Detailed knowledge
0 What is a B plus tree? Why is the data structure used as an index by Mysql?
Advantages: Fewer IO reads (page reads each time), faster range lookup (Pointers between adjacent pages)
Understand the underlying principles of Mysql indexing
1 B+ trees versus other trees
2 What indexes does mysql have? What are the usage scenarios? The primary key index

Secondary index

Clustering index

Non-clustered index

The full text indexing

The only index

The primary key index

Composite index

3 What is a cluster index? Why is it faster than a normal index? The data and index are placed together on the leaves of the B+ tree, without the need to look up the data by address
4 What is stored in the Data field of InnoDB’s B+ tree index leaf node? MyISAM? MyISAM separates data and indexes, one file per person, which is called non-clustered indexing

(.myd file: myisam data file)

.myi file: myisam index file inside the table);



The leaves of the tree store the physical addresses of the corresponding data. Once we have this physical address, we can directly locate the specific data record in the MyISAM data file



Innodb engine puts data and index in the same file, this is called clustered index (.idb file: table data + index file)

InnoDB only uses clustered indexes in the primary key index tree, because other indexes also store data, too much space

Understand the underlying principles of Mysql indexing
5 When does the Mysql index fail The left – most prefix rule is not used when the associative indexes a and B are used

Where subject like “%game”

Select * from A where A= A order by B; select * from B where A= A order by B

Select * from A where A = a1 and B = a2 where A = a1 and B = a2 where A = a1 and B = a2 where A = a1 and B = a2 where A = a1 and B = a2

5, varchar type if not added “”, will do the type conversion, and then invalidates the index
MySQL index — classification, when to use, when not to use, when not to use
6 What does index coverage mean? Back to the table? Select a,b,c where a=1 and b = 3 and c = 10; select a,b,c where a=1 and b = 3 and c = 10;

Mysql > select * from WHERE primary key = 3; mysql > select * from where primary key = 3; mysql > select * from where primary key = 3
When does mysql InnoDB run a table back operation?

Detailed knowledge points

0. What is a B plus tree? Why is the data structure used as an index by Mysql?

Core ideas:

  • Index files are large and stored on disk, so you can only read a few pages of index into memory at a time, so try to minimize the number of I/O reads

Advantages:

    1. The data are all in the leaf node, and the query efficiency is stable
    1. Similar nodes that are logically adjacent to leaf nodes are also similar in the physical storage of index files and can be loaded all at once, possibly without multiple disk reads and memory sweeps
    1. There is a linked list pointer between leaf nodes, which is convenient for SELECT * to sweep the library, and the complete table can be traversed from any node

1. B+ trees versus other trees

  • Binary Search: the left subtree has a lower key than the root, and the right subtree has a higher key than the root. Disadvantages: Unbalanced, in fact, and linked list is the same, very low efficiency
  • The AVL Tree automatically rotates when balancing add and remove to maintain balance, that is, the number of child nodes on both sides of the root node should be the same as possible
  • Balanced multi-path search tree (B-tree, – is split, not subtraction) all key values are distributed throughout the tree; Leaf nodes don’t need to be 2; The fewer keywords, the higher the number of layers, the more I/O times, so it is required that [m fork search tree, except the root node, any node has at least M /2 forks, that is, at least m/ 2-1 keywords]
  • B+ tree leaves need not be 2; It doesn’t have to be balanced very often; The data are all on the leaf node; Rudolf Bayer, creator of the B+ tree, does not explain what B stands for. The most common view is that B stands for balanced.

Reference document << So many trees, article summary >>

The transaction

The serial number The problem One sentence explanation Detailed knowledge
0 Transaction isolation level Information,The higher the isolation level, the worse the performance

So most of the time we strike a balance between performance and data consistency, depending on the requirements of the business

The following isolation levels, from the top downMore and more strict, performance is getting worse and worse

Read uncommitted, principle: return directly, no additional processing

Ii. Read COMMITTEDview, the view created for each SQL execution

Repeatable read, principle:view, created when the transaction startsThe onlyA view

4, Serializable, principle: read and write locks, subsequent threads need to wait
The relationship between MVCC and transaction isolation level
1 How is the isolation level of a transaction set This can be configured by setting the value of the startup parameter transaction-ISOLATION to read-committed. You can use Show variables to see the current values
2 What is MVCC? And transaction isolation level? When querying the same record, transactions initiated at different times have different Read-views.

Multiple versions of the same record can exist in the system, which is the multi-version concurrency control (MVCC) of the database.
3 ACID properties of transactions ACID (Atomicity, Consistency, Isolation, Excel,

Atomicity, consistency, isolation, persistence)

Daily use

The serial number The problem One sentence explanation Detailed knowledge
0 How to handle slow Sql
1 How do explain statements work
2 Depots table
3 What is a trigger, and in what situations?

Join statements

The serial number The problem One sentence explanation Detailed knowledge
0 Join What are associated queries and their application scenarios? There areinnner join (inner join = join),

straight join(this function is similar to join, but allows the left table to drive the right table, and can change the order in which the optimizer executes a join query.)

select * from t1 straight_join t2 on (t1.a=t2.a);

left join ;

right join;

Inner join includes implicit inner join (where a.id = B.ID), explicit inner join (inner join, straight join); External joins include left Join and right Join

MySQL considers any query to be an “association”, even a single table query is an “association”

Mysql > join associated query
1 The underlying principle of JOIN The mysql optimizer gives the execution plan, and the execution engine executes according to the plan. That is, the optimizer gives the code and the execution engine runs the code. So, the underlying principle of relational query here is really to see what the optimizer gives the execution plan.

The concept of driver table: when connection conditions are specified in mysql, the table that meets the query conditions with fewer rows is the driver table. If no query condition is specified, the table with fewer rows is the driver table. The default mysql optimizer determines the order of execution in such a way that small tables drive large tables.

The join algorithmThere are the following execution plans:

Simple nested-loop JoinMysql is not used because it is too slow. This is just an introduction

The data of each driver table T1 should be scanned for the whole table once, time complexity = O(Nt1 * Nt2)

Index nested-loop Join uses the Index of the driven table to select and match the Index file on the disk where the table is located

Block nested-loop Join in the driven tableNo index used“, use this algorithm.

Use join buffer to add datablockThe size of the join buffer is 256K by default

The time complexity is the same as Simple, but it takes less time because it is in memory

However, there is still a big gap between index and Block

Mysql 8.0 will Hash Join Mysql 8.0
Geek time << can we use join>> at all?



Mysql > join associative query
2 Why do many DBAs not recommend using the JOIN statement If you can use the Index nested-loop Join algorithm, which means you can use indexes on the driven table, this is fine;

If the Block nested-loop Join algorithm is used, too many rows will be scanned. In particular, join operations on large tables may have to scan the driven table many times and consume a lot of system resources. Therefore, do not use this kind of join.
3 What is a small watch The two tables are filtered according to their own conditions. After filtering, the total amount of data of each field participating in the join is calculated. The table with the small amount of data is the “small table”.
4 Why do small tables drive large tables Conclusion: Whether Block or Index, small tables drive large tables more efficiently

Suppose the number of rows in the small table is N and the number of rows in the large table is M.

No matter which algorithm, both tables need to do a full table scan, soTotal number of rows scannedisM+N; In the memoryThe number of judgments isM*N . From this dimension, the interchangeability of M and N does not affect performance

2. Assume that the size of a buffer_size is not enough to load the data of two tables and can only be loaded in segments. Assume that the number of data rows of the driven table is N, K segments are required to complete the algorithm process, and the number of data rows of the driven table is M. Notice that K is not a constant here, the bigger N is, the bigger K is, so I’m going to write KLambda. * NObviously λ is in the range of (0,1). So, in the execution of this algorithm: the number of lines scanned isN + lambda * N * M; Memory to determineN*M Times.

Obviously, if N is smaller, this whole thing is going to be smaller

Of course, λ as a coefficient has a larger influence range, so if possible, the larger the buffer size is, the better, so that λ will be small.
5 – algorithm