What is the index structure of InnoDB in mysql?
First, we know that the index structure of InnoDB in mysql is b+ tree.
In general, in an index structure, each node of a B + tree is a data page. What is a data page? The data page is an in-memory logical structure defined in the InnoDB engine. It specifies that the storage engine must read at least one page of data every time it reads data from disk. It also specifies that it must refresh at least one page of data every time it refreshes data from memory to disk. Innodb specifies that each data page can store at least two data.
In a data page, the data is sorted by the primary key ID, and multiple pieces of data in the data page form a one-way linked list.
So how do they form a B + tree?
Data pages can be divided into different types. We often talk about two types, one is the directory item data page, and one is the user record data page.
What is the difference between the catalog item data page and the user record data page? The directory entry data page stores only the primary key ID and the page number of the data page to which this data points.
The columns of the data stored in the user record data page are the columns of our own database tables.
In other words, the leaf node in our B + tree is our user record data page, and the non-leaf node is our directory item data page.
B + tree nodes at the same level form a bidirectional linked list in order.
The index structure I just described is a clustered index that any mysql table would have. There are other types of indexes in mysql, such as secondary indexes. So what is the data structure of this index?
Of course, the data structure of this index is also b+ tree, but this index is different from the clustered index we just said. As we have just mentioned, the non-leaf node of the clustered index has only two columns, the primary key ID and the page number of the data page pointing to the next level. So in the normal index, the number of columns of the data page of our non-leaf node is not determined, mainly depends on how many index columns we have. The column of the data page is our index column plus the page number of the data page pointing to the nodes in the next level. And our leaf node, the store is our index column plus our primary key ID.
Therefore, in our actual query, if you use the normal index, then we conducted on the common index after the query, there is no way to query to the data of all columns, also need to get the primary key id, back to the table to the cluster index for operation, get the data of all the data column.
What are the columns in the Explain query? What do they mean?
There are several main columns, I don’t remember all of them, just a general outline.
The first column is the ID column, which is mainly the ID assigned by mysql to a SELECT keyword. The second column is the table column, which is the corresponding table that the select keyword queries. Then there is the type column, which mainly represents the access method of the query, such as system, const, ref, index, all, etc. Next comes possible_key, which indicates which indexes may be used in our query. Another column is key, which means which indexes are actually used in our query. The value is determined according to the cost analysis in our execution plan. There is also the “rows” column, which represents the amount of data that our mysql has estimated to satisfy the query criteria.
Why use B+ tree as index structure? What are the advantages over other data structures?
Why use B+ tree as our index structure?
As we know, the index itself is very big, can’t be all stored in the memory, so the index by index file as stored on disk, so in this case, the index in the process of query, will produce the I/O operations, so the evaluation index to index of a is in the process of looking for the gradual complexity of the number of disk I/O operations.
That is, through the index to query speed, performance to achieve the best.
We can certainly choose another data structure to be the mysql index, but consider whether the index is the fastest query and takes the least disk space.
Fastest query based on the age of this, we look at the binary search trees, the characteristics of the binary tree is the left child of any node is less than the right child nodes, and right child nodes are greater than the current node, but binary search trees in extreme cases will be downgraded to a chain table structure, the efficiency is too low, and binary search trees itself relative to the height of b + tree, or too high, There are too many IO times to find data.
Then we can use the balance of binary tree, the characteristics of the balance of binary tree is based on binary search trees, any one node of the difference in height between the left and right subtrees can’t more than 1, the same thing, obviously, but it is a balance of binary tree is too high, another point, each insert or delete nodes, The balanced rotation of binary balanced trees is also time consuming.
We can also choose a red-black tree, which is characterized by the longest path from root to leaf node, no more than twice the shortest path. But red black trees are still too tall compared to B+ trees, and red black trees can only store one key and data per node, requiring too much disk space.
So why did we end up using b+ trees instead of B trees?
B and B + tree tree have essentially the same features, such as every node has child nodes, there is no limit to the number of child nodes, based on this, at the time of querying data, the number of IO will reduce, but any one node B tree is more than can be stored key-value pairs, and can also store data, so the consumption of the disk space is relatively large, B + trees only store data in leaf nodes, while other nodes only store key-value pairs, which consumes the least disk space and maximizes the number of nodes read each time data is read into memory.
So, this is why mysql’s InnoDB index finally chooses b+ tree.
If there is a “like” keyword, how to determine whether to go to the index?
The key to determining whether a query is indexed or not is whether its index structure supports such a query.
In general, when we do a query, the % following the value of “like” is preceded and the index is not followed, and the % following the value is preceded and the index is followed.
Why is that? Because our mysql index, an index from left to right field an index of the sort of, that is to say, when our % is front-facing, that means the whole match, if go for mysql index that is to query all index field, and return to watch, check after the no sense, as a full table query.
If the primary key id is the primary key id, or the column in the query is the index field, then mysql will also go to the index based on the cost calculation, because it can directly find all the required fields in the index, without the need to return to the table.
Mysql > select * from table_name where table_name = ‘index’; mysql > select * from table_name where table_name = ‘index’; mysql > select * from table_name where table_name = ‘index’;
What are the transaction isolation levels of mysql?
There are four isolation levels for MySQL: uncommitted read, committed read, repeatable read, and serialized read.
The main difference between the four isolation levels is the level of protection they provide for exceptions to consistency. As we all know, database consistency has four kinds of exceptions: dirty write, dirty read, unrepeatable read, phantom read.
All four of Mysql’s transaction isolation levels handle dirty writes, but for the remaining three exceptions, uncommitted reads at transaction isolation levels, dirty reads, unrepeatable reads, and phantom reads, all three of which can occur. However, committed reads are unlikely to produce dirty reads, but may produce unrepeatable reads and illusory reads. And repeatable read, it is impossible to happen dirty read, dirty write and unrepeatable read phenomenon, but will occur phantom read phenomenon. Serializability means that none of the four phenomena can happen.
In general, uncommitted reads are the fastest, but with the highest risk, while serializability is the slowest, but also the safest.
So, in terms of efficiency versus security, Mysql is repeatable by default.
What does mysql do for consistent reads (snapshot reads), or for phantom reads?
What is consistent reading? In our mysql, if have two threads of the same data at the same time, to read a data, then this article read the thread, the query to the qualified data is 4, its transaction has not yet been submitted, but this time another transaction after this insert the article 1 in accordance with the first transaction data, The first transaction then reads the data for a second time and finds that the data read is actually 5, which causes a consistent read problem.
The emergence of this kind of problem is a classical phantom Read phenomenon, in fact the mysql default transaction isolation level Reapetable Read, to get rid of it can be dirty, dirty reads, non-repeatable reads these three phenomena, but for the phantom Read phenomenon, is possible, and phantom Read phenomenon, is I just Read about the consistency of the problem.
In order to solve this problem, mysql proposed MVCC solution.
So what is an MVCC scheme?
Intuitively, the MVCC scheme is what we call a version chain. So how does mysql implement this version chain? First, in our clustered index record, each row holds two parameters, a transaction ID and an old version of a pointer that can find the last undo log for our data.
Mysql will log an undo log for each change to the record. Each undo log will have an old pointer and the transaction ID of the transaction to which it belongs, just like the data in the clustered index record. In this way, the clustered index and undo log form a linked list structure. That’s the most recent value for our column.
When a transaction reads data, mysql generates a readView consistency view at the same time it generates a transaction ID for the transaction. Note that under repeatable read-isolated transaction levels, the readView consistency view is generated only when the transaction reads data for the first time. The readView consistency view generated in the first read will be used for subsequent reads. When this transaction queries the database, we will obtain the version of this transaction in the linked list through the transaction ID, and then return this value to the transaction.
What is the difference between Myisam and InnoDB?
First, the indexing schemes are different. Innodb is divided into clustered index and non-clustered index. The clustered index is the primary key ID value of the non-leaf node, but in the leaf node, in addition to the primary key ID, other fields of the table are stored in it, while the non-clustered index is the storage of the index column whether the leaf node or non-leaf node. Mysaim index of the main difference is that there is no clustering index Mysaim, it has a primary key index, but the leaf node is not on the primary key index database table data, but the data table of all the data stored in a file alone, to put it bluntly, is the index and data are separated, Mysaim storage index are secondary indexes, A secondary callback is required for all indexes, including primary key indexes.
In addition, MyISam does not support transactions, InnoDB does support transactions, and mysaim has table-level locks, while InnoDB has row level locks in addition to table-level locks.
What is the difference between mysql clustered index and non-clustered index?
A clustered index, also known as a primary key index, is also made up of a B + tree. It is used to sort records and data pages by the size of the primary key value. In addition, the leaf node of a B + tree stores the complete table data, while the non-leaf node store is the primary key ID.
Non-clustered indexes are what we call secondary indexes, union indexes, and so on. The characteristic of these indexes is that table records and pages are sorted using the size of the index column, and then the leaf node of the B + tree no longer stores the complete table data, but the index column + primary key ID, while the non-leaf node stores the index column and the page number of the data page of the next level node.
What does index push-down refer to?
For example, if we have A joint index, and the index columns are A and B, the query conditions are about A and B. When there is no index push down, we first query the query condition of A and find that it can be queried through the index, but the query condition of B cannot be queried through the index. Typically, the query condition of B is like, and % is preceded.
So, theoretically speaking, when we query, we should query through index column A, but find that we cannot query through index column B, so we should query the data that meets condition A in the index, and then judge whether it meets condition B after returning to the table.
The index condition push down, that is, when the query meets condition A data, do not return to the table, but directly through index B to determine whether this data meets the query condition, if so, then go back to the table to query all the data. If not, skip to the next data.
In this way, I/O performance loss caused by back table can be greatly reduced.